This work is licensed under a Creative Commons Attribution 4.0 International License.
Real health data is complex, often unstructured, at times inaccurate, inconsistent, contains missing values, and is organized for clinical care rather than to meet analytic needs. Learning from health data requires a solid grasp of data operations, data visualization, statistics, and machine learning, as well as an understanding of ethical and legal frameworks guiding health data privacy and security. Students in this course will learn foundational topics in data science focused on health data and will apply this knowledge on real health datasets through hands-on labs integrated into the lectures. The course is based on two large themes: (a) understanding health data, and (b) making inferences based on data. Students will develop a systematic working understanding of R, one of the most widely used languages for data science, and an introductory understanding of several packages useful in analyzing health data. They will participate in a group project focused on answering a health-related question. After completing this course, students should be able to securely store a health data set, summarize its structure, merge tables, visualize relationships, reshape and subset it to meet analytic needs, deal with missing values, apply statistical and machine learning methods to build prediction models, and evaluate the performance of these models.
Tutorial page: http://rcode.run/lhs_610_tutorial
If the tutorial webpage is down, please let me know!
namcs08.RData -- National Ambulatory Medical Care Survey (NAMCS)
Slides
Videos
1-1 What is exploratory data analysis? (28 mins)
1-2 Is LHS 610 the right course for you? (21 mins)
1-3 Primary versus secondary use of health data (18 mins)
1-4 Basics of R and RStudio (5 mins)
1-5 Anatomy of an R Notebook (2 mins)
<< Note: Need to record a new video introducing students to key functions -- on last slide above>>
Slides
Videos
3-1 Overview of Content (1 min)
3-2 Refresher of Data Frame Verbs (2 mins)
3-3 R Tips of the Day - Dealing with Dates (6 mins)
3-4 Combining mutate() with if_else() and case_when() (6 mins)
3-5 Joining Data Frames (15 mins)
3-6 Reshaping Data with spread() and gather() (21 mins)
3-7 Separating and Uniting Columns (6 mins)
3-8 A Challenging Case of Tidying Office Hours Data (13 mins)
3-9 Reviewing the Old and New Verbs of Data Science (3 mins)
Slides
Videos
4-1 What Makes a Health-Related Question Important? (10 mins)
4-2 What Makes a Health-Related Question Answerable? (15 mins)
4-3 Dealing with Confounders (8 mins)
4-4 Bradford Hill's Criteria of Causation (12 mins)
4-5 Study Designs and Bias in Observational Studies (4 mins)
Slides
Videos
5-2 R Tips of the Day - Saving Your Workspace and the Plus Sign (7 mins)
5-3 Mini-Lab - Anscombe's Quartet (4 mins)
5-4 Principles of Visualization (14 mins)
5-5 Tell the Right Story (6 mins)
5-6 Graphics with Grammar (17 mins)
5-7 ggplot - Geometric Objects and Mappings (13 mins)
5-8 ggplot - Position, Labels, and Facets (12 mins)
5-9 ggplot - Coordinates, Scales, and Themes (14 mins)
5-10 Exploring the Relationship Between Weight and Blood Pressure (9 mins)
Slides
Videos
6-1 Overview of Content (2 mins)
6-2 R Tips of the Day - Replacing and Assigning Missing Values (22 mins)
6-3 What is Hypothesis Testing? (10 mins)
6-4 Why a Null Hypothesis? And Interpreting a P-Value (10 mins)
6-5 What Common Statistical Tests Should I Know? (6 mins)
6-6 Which Test, Which Plot? (31 mins)
6-7 Multiple Hypothesis Testing (5 mins)
Slides
Videos
7-1 Overview of Content (4 mins)
7-2 R Tips of the Day - The magic of !!parse_expr() (9 mins)
7-3 Converting R Notebooks into R Markdown Documents (26 mins)
7-4 Live Coding - Converting an R Notebook into an R Markdown Document (11 mins)
7-5 Converting R Markdown Documents into Interactive Shiny Documents (36 mins)
7-6 Live Coding - Converting an R Markdown Document into an Interactive Document (20 mins)
Slides
Videos
8-1 Unsupervised and Supervised Learning (13 mins)
8-2 Reinforcement Learning and ML vs Stats Terminology (9 mins)
8-3 Is a Predictive Model Needed and Should You Develop One? (15 mins)
8-4 Supervised Learning is a Curve-Fitting Exercise (23 mins)
8-5 Common Problems with Fitting and Applying Models (25 mins)
8-6 Step-by-Step Process for Training and Evaluating Models Using Tidymodels (29 mins)
Slides and content in this Module were developed by V.G. Vinod Vydiswaran, PhD. They are shared here with his permission.
Slides
Videos
9-1 What is Supervised Learning? (20 mins)
9-2 The Majority Baseline (8 mins)
Slides
Videos
10-1 Review of Key ML Concepts and Performance Measures (15 mins)
10-2 The Missing Data Problem in a Nutshell (5 mins)
Slides and content in this Module were developed by V.G. Vinod Vydiswaran, PhD. They are shared here with his permission.
Slides
Videos
11-1 Support Vector Machines (22 mins)
11-2 Perceptrons and Neural Networks (14 mins)
Slides
None
Videos
None
Papers
Coming soon.
Slides
Videos
13-1 Reading in Text Data and Calculating Term Frequencies (27 mins)
13-2 Why Common Words are Not Useful (and How tf-idf Can Help) (25 mins)