COVID Info Commons Data Science Project – Data Essentials & Hypothesis Building


Two young volunteers with mobile gadgets working with data by table in donation tent

Data Essentials & Hypothesis Building Project

Project Description

It is the first day of your new job at Acme Insurance, an American medical insurance company. You have been asked to do some preliminary research on the impact of Long COVID on healthcare in the U.S. There are 10 milestones in this project. There are several Tasks you need to complete along the way. This project should take you no more than 40 hours to finish (~3 hours per week during a 12 week semester)

Together, we’ll clean a Long COVID dataset, learn about the ethics of data science research, do some preliminary analysis, and develop a working hypothesis about the impact of Long COVID. We’ll build some simple data visualizations, test our hypotheses, write up a short conclusion, and prepare a brief presentation. Your project should prepare you to share your research with your colleagues (Milestone 10, optional). At the end of the project, you will submit a completed version of this document (Tasks are highlighted in green, Milestones are in purple) as well as a spreadsheet of your work in Excel / Google Sheets, and any visualization materials you develop. Refer to the CIC Slack Channel for project updates and to request support at any time. 

Learn more about the COVID Information Commons (CIC) Student Working Group, including future projects and examples of submitted student projects on the CIC website.


Dataset

CDC Long COVID Household Pulse Survey


Relevant Skills You May Apply

No pre-requisites are required to begin this data science project!


Skills You May Gain

Data cleaning, data analysis, data science ethics, data visualization, scientific communications and presentations, Excel (pivot tables, formulas, etc.), scientific research


Total Time

Approximately 40 hours


Milestones

Milestone 1: Preliminary Research
Milestone 2: Finding & Sourcing Quality Data
Milestone 3: Diving into the Documentation & Identifying Bias
Milestone 4: The Basics of Data Prep & Cleaning
Milestone 5: Exploratory Data Analysis
Milestone 6: Continued Analysis
Milestone 7: Testing our Hypothesis
Milestone 8: Summarizing our Conclusions
Milestone 9: Visualizing our Findings
Milestone 10: Sharing our Insights


Deliverables

Deliverables include a final data visualization, a written summary of your research insights, and an optional presentation of your findings.