Follow these Khan Academy modules to learn about variable relationships, distributions, and study design. Check out our Getting Started page for a review of categorical and quantitive variables.
- Modeling data distributions: Calculate percentiles and z-scores to assess model distributions.
- Exploring bivariate numerical data: Use scatter plots to explore the relationship between quantitative variables.
- Statistical study design: Learn about different types of studies and their sampling and data collection methods.
Probability and Sampling
These modules focus on probability topics and sampling distributions.
Learn how to test if your data support your hypotheses.
- Confidence Intervals: Understand the purpose of confidence intervals for means and proportions.
- Hypothesis testing: Learn how to test for statistical significance.
- Confidence intervals and hypothesis testing for bivariates: Apply confidence intervals and significance tests to two-sample comparisons.
Intermediate Data Science
Follow these IBM OpenDS4All modules to get started with data representation and machine learning. Each module contains lecture slides, sample code in Jupyter Notebook, and homework problems.
- Data Acquisition and Wrangling: Slides and Jupyter Notebook
- Data Representation and Modeling: Slides and Jupyter Notebook
- Unsupervised Machine Learning: Slides and Jupyter Notebook
- Supervised Machine Learning: Slides and Jupyter Notebook
Also check out the UC Berkeley course Data 8: Foundations of Data Science, which covers computational and programming skills, inferential thinking, and privacy and study design.