LHS 610: Exploratory Data Analysis for Health

This work is licensed under a Creative Commons Attribution 4.0 International License.

Welcome to the course page for LHS 610: Exploratory Data Analysis for Health.

Real health data is complex, often unstructured, at times inaccurate, inconsistent, contains missing values, and is organized for clinical care rather than to meet analytic needs. Learning from health data requires a solid grasp of data operations, data visualization, statistics, and machine learning, as well as an understanding of ethical and legal frameworks guiding health data privacy and security. Students in this course will learn foundational topics in data science focused on health data and will apply this knowledge on real health datasets through hands-on labs integrated into the lectures. The course is based on two large themes: (a) understanding health data, and (b) making inferences based on data. Students will develop a systematic working understanding of R, one of the most widely used languages for data science, and an introductory understanding of several packages useful in analyzing health data. They will participate in a group project focused on answering a health-related question. After completing this course, students should be able to securely store a health data set, summarize its structure, merge tables, visualize relationships, reshape and subset it to meet analytic needs, deal with missing values, apply statistical and machine learning methods to build prediction models, and evaluate the performance of these models.

Course Materials


Tutorial page: http://rcode.run/lhs_610_tutorial

If the tutorial webpage is down, please let me know!


namcs08.RData -- National Ambulatory Medical Care Survey (NAMCS)

Module 11: Supervised Learning Algorithms 2

Slides and content in this Module were developed by V.G. Vinod Vydiswaran, PhD. They are shared here with his permission.


Module 11 Slides (Part 1)

Module 11 Slides (Part 2)


11-1 Support Vector Machines (22 mins)

11-2 Perceptrons and Neural Networks (14 mins)

11-3 Naive Bayes (28 mins)

11-4 Review of Supervised Learning Algorithms (8 mins)

11-5 Combining Classifiers (21 mins)

Module 12: Machine Learning in Clinical Practice






Coming soon.