About
The Data Science Resource Repository (DSRR) is a curated set of resources for learners and educators that promotes data science literacy. Data science literacy skills are crucial to the successful development of our economy and provide a strong knowledge base for the future of our workforce. The DSRR is made openly available on the Northeast Big Data Innovation Hub’s website with the aim of leveraging these best practices and resources to broadly increase data science capacity, with a specific focus on underserved communities.
Resources from the Big Data Hub’s Education + Data Literacy projects, as well as data science education materials available from across the big data ecosystem (including academia, government, not for profits and industry) are included. Resources from the National Student Data Corps (NSDC), Data Science for All, Big Data for Education, and other projects from the Hub community, such as CUAHSI, are provided here.
The DSRR’s target ‘Learner’ audience is students of all ages, from high school through higher ed and adult learners. Similarly, our target ‘Educator’ audience is K-12 teachers and caregivers, undergraduate and community college educators, library professionals teaching data science to their community, and not-for-profits that teach STEM.
Please let us know of any data science resources (English or Spanish) which are openly available and should be added to the DSRR by filling out our Data Science Resources Submission form or complete este formulario en Español.
Data Science Resources For Educators
These Resources for Educators listed here include sample data science curricula and program materials to teach data science to all students of all ages, from K-12 through college and adult learners.
Data Science programs are increasingly common, yet many students do not have access to data science education, particularly in underserved communities. Community leaders in K-12, not-for-profit STEM academies, libraries, community colleges, and schools without data science programs can create curricula and content for their student population.
Data science education can begin early with K-12 learners. For students at this level, data science demonstrates how decisions are made and explains statistics and trends. Data science programs can raise young students’ awareness of what data science is, how to learn about the field, and prompt interest in jobs in this area.
Elementary School Educator Data Science Resources
An NSF-funded project with the New York Hall of Science (NYSCI), Big Data for Little Kids teaches data science concepts to children ages 4 – 8 years old. NYSCI developed this program to use the NYSCI’s Design, Make, Play approach to learning. This activity guide follows the NYSCI’s Big Data for Little Kids workshop series “Museum Makers: Designing With Data,” and includes a glossary of the recurrent terms and resources used throughout the program.
To assist educators in supporting students’ data science learning, Data4Kids created five data stories educators can freely access and modify for their own uses and students’ experiences. Each story is a starter kit for educators at different levels: grades 3–5 (band 1), grades 6–8 (band 2), and grades 9–12 (band 3).
This is a great tool to teach the basics of machine learning to kids via hands-on experiences. Check out this easy-to-use guided environment for training machine learning models to recognize text, numbers, images, or sounds.
Teach data science to kids from Pre-K to 8th grade!
High School Educator Data Science Resources
International Data Science in Schools Project (IDSSP)
The focus of the IDSSP project is to provide teachers with methodologies for students’ data science education. Phase One of the curriculum has been released and approved by the IDSSP’s Advisory Group.
Data Science for High School Computer Science
A group of 41 data science experts gathered to identify successes, challenges, and solutions to improving the teaching of data science. Special focus was paid to high school data science education. Learn more about the challenges in data science education and find resources and solutions for your classroom by reading the findings of this workshop.
Use a high school-level resource for teaching data science!
Institute for Data-Driven Dynamical Design (ID4)
NSF Institute for Data-Driven Dynamical Design (ID4) aims to transform how scientists and engineers harness data when designing materials and structures.
College Educator Data Science Resources
City University of New York Bachelor of Science in Data Science
Design a data science program for undergraduates with the City University of New York’s Bachelor of Science in Data Science curriculum.
Data 8: Foundations of Data Science
Learn the core concepts of inference and computing while working with real data, including economic, geographic and social networking statistics. This course by University of California, Berkeley was designed for students who have not taken any Statistics or Computer Science courses.
Data 100: Principles and Techniques of Data Science
Explore key data science concepts, including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. This class from the University of California, Berkeley bridges the space between the Data 8 course noted above and upper division computer science and statistics courses.
The Missing Semester of Your Computer Science Education
Learn about advanced computer science topics, from operating systems to machine learning. Master the command-line, use a powerful text editor, use fancy features of version control systems, and much more, with this series of classes developed by MIT.
Learn about the Unix shell, version control with Git, and programming in Python or R.
Learn about data organization, cleanup, analysis, and visualization. These data science lessons provide information about genomics, ecology, social science and more, including tools such as Python and R.
Learn how to apply software development and data science concepts to library contexts.
Undergraduate and Graduate Responsible Data Science Courses at NYU CDS
Explore responsible data science lectures with this technical course which tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection.
This Columbia course integrates the teaching of algorithms and data manipulation with the political whirlwinds and ethical controversies from which those techniques emerged.
DSC 101: Introduction to Data Science
Created and taught by NSDC Founding Committee member Ajay Anand, this introductory course was adapted from the Data 8 course.
Institute for Data-Driven Dynamical Design (ID4)
NSF Institute for Data-Driven Dynamical Design (ID4) aims to transform how scientists and engineers harness data when designing materials and structures.
Data Science Educator Resources for All Levels
Accessing Quantitative Reasoning
Assessing student learning is a vital component of any initiative to improve students’ QR/QL skills. The purpose of assessment is to gather information on the effectiveness of pedagogy to inform and/or improve one’s teaching.
Watch a collection of lectures on the ethical implications of data and artificial intelligence.
Best Practices for Quantitative Reasoning Instruction
Several pedagogical approaches which are especially important for teaching Quantitative Reasoning.
Learn how to think through and teach the ethics surrounding privacy, data sharing, and algorithmic decision-making.
Internet Exercises and Modules for Teaching Quantitative Reasoning Skills
There are a variety of collections of exercises and modules publicly available on the Internet that users may borrow or adapt for use in their own instruction (please obtain permission when necessary and/or provide credit to the original author(s) for use of these materials).
Internet Resources for Data Analysis
There are a variety of excellent resources on the Internet for active learning with data analysis.
Investing in America’s Data Science and Analytics Talent
This joint report from the Business Higher Education Forum and PwC looks at eight actions for change to put the supply of skills in balance with the demand.
Microsoft Tools and Resources for Teachers and Educators
Create learning environments that empower students to be independent and creative learners and build skills in reading, language, and STEM.
Explore IBM’s Open DS4All to access over 15 modules on GitHub with instructor notes and slide decks that can be used to build an academic program focused on data science.
Quantitative Reasoning Learning Goals
Articulating learning goals or outcomes is an important step in effective QR instruction.
Splunk for Good Product Donation Program
Splunk provides a one-year, 10GB license for Splunk Enterprise to qualifying nonprofits at no cost with free support to get you up and running.
Get all the benefits of Splunk, deployed in a secure, reliable and scalable service.
Teaching Materials for Mathematical, QR and Statistical Skills
There are a wide array of resources on the Internet for strengthening fundamental quantitative, mathematical, statistical and computer skills.
Binghamton University MS in Data Analytics
This program is a collaboration across STEM, business, and engineering departments. Students learn how to analyze data and work on projects tackling real-world problems.
CUNY City Tech BS in Data Science
This program synthesizes applied mathematics, high-performance computing, and data management and analysis for a well-rounded data science education.
This program enables students to learn about global challenges and collaborate on projects to address them, which can be data and technology-driven.
Queensborough Community College Data Science Lab for Undergraduate Research
This lab engages undergraduate students in data science-related training and research activities.
Data Science Resources For Learners
If you are a Learner (a student, researcher, or someone interested in learning data science), these resources are for you. From lessons to datasets, challenges, and fellowships, this list can get you started on your data science journey or give you the skill boost you are looking for.
Learn Data Science
These lessons cover a wide range of data science topics and are presented by top universities and organizations. From beginners to experts, there is material for everyone to explore data science and take their skills to the next level.
Data 8: Foundations of Data Science
Learn the core concepts of inference and computing while working with real data, including economic, geographic and social networking statistics. This course by University of California, Berkeley was designed for students who have not taken any Statistics or Computer Science courses.
Data 100: Principles and Techniques of Data Science
Explore key data science concepts, including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. This class from the University of California, Berkeley bridges the space between the Data 8 course noted above and upper division computer science and statistics courses.
The Missing Semester of Your Computer Science Education
Learn about advanced computer science topics, from operating systems to machine learning. Master the command-line, use a powerful text editor, use fancy features of version control systems, and much more, with this series of classes developed by MIT.
Institute for Data-Driven Dynamical Design (ID4)
NSF Institute for Data-Driven Dynamical Design (ID4) aims to transform how scientists and engineers harness data when designing materials and structures.
Learn about the Unix shell, version control with Git, and programming in Python or R.
Learn about data organization, cleanup, analysis, and visualization. These data science lessons provide information about genomics, ecology, social science and more, including tools such as Python and R.
Learn how to apply software development and data science concepts to library contexts.
Cognitive Class with Data Science and AI
This is a great resource for learning about the latest developments in data science, AI, cloud, blockchain, docker, Kubernetes, quantum computing and more. Earn certificates and badges and learn industry skills. Learning paths cover big data fundamentals, data science fundamentals, data science with python, blockchain for developers, containers, and reactive architecture foundations.
Learn about AI, cloud, cybersecurity, quantum, and more, while earning industry-recognized digital badges.
Get started with easy, step-by-step instructions and pre-written recipes to bring your own AI robot to life. TJBot is a great resource for learning, experimenting, and exploring AI using IBM Watson services.
Check out user guides and interactive demos about quantum principles. You can also create and run algorithms on real quantum computing hardware.
Undergraduate and Graduate Responsible Data Science Courses at NYU CDS
Explore responsible data science lectures with this technical course which tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection.
Explore a multi-faceted data-science website that contains many resources from competitions, datasets, to courses.
The Open Source Data Science Masters
Explore a data science study-guide!
Introduction to Computational Thinking and Data Science at MIT OCW
Take an undergraduate data science course offered at MIT!
Take a free machine learning course offered by Stanford!
Explore a free collection of data science courses!
Full Data Science Course for Beginners by freeCodeCamp.org
Take this beginner data science course at freeCodeCamp.org
Scientific Computing with Python by freeCodeCamp.org
This beginner level course by freeCodeCamp starts with teaching Python fundamentals like variables, loops, conditionals, and functions. Further in the course you would indulge in complex data structures, networking, relational databases, and data visualization, all of which are necessary Data Science skills.
Data Analysis with Python by freeCodeCamp.org
Learn the fundamentals of data analysis with Python. This is a certification course by freeCodeCamp that will cover topics around how to read data from sources like CSVs and SQL, and how to use libraries like Numpy, Pandas, Matplotlib, and Seaborn to process and visualize data.
Machine Learning with Python by freeCodeCamp.org
The Machine Learning course by freeCodeCamp teaches the use of the TensorFlow framework to build machine learning models, including neural networks and exploration of advanced techniques like natural language processing and reinforcement learning.
Learn the methods and strategies for using large-scale educational data to improve education and make discoveries about learning.
Data Science Study Guides
Github Data Science Best Resources
Become a well-rounded data scientist using this data science study-guide!
Chris Engelhardt’s Github Data Sci Guide
Learn more about data science using this Github study guide.
Mathematics
Learn the fundamentals of the mathematics necessary for data science!
Differential Calculus | Khan Academy
Statistics and Probability | Khan Academy
Learn about types of data from a video produced by Ahmed Gomaa, Associate Professor at University of Scranton and NSDC member.
Python
These resources are for learning Python, an accessible and popular programming language used often for data science!
Learn Python from scratch with this Youtube series by CS Dojo!
Follow along and run Python code examples in your web browser!
Learn beginner and advanced Python topics with coding examples!
Plotting and programming in Python by Software Carpentry
Learn how to plot data and improve your programs!
Learn data structures and algorithms, data analysis, and machine learning in Python!
Become an expert at Data Science using Python!
Access a free, in-depth python tutorial for data science.
Data Science Workshop Series – Introduction to Python
Are you interested in learning data science and modern computational tools for your research? If so, come join our Data Science Workshop Series on topics including: Python Programming, Machine Learning, Distributed Machine Learning for Big Data, and Deep Learning.
Python for Big Data – OARC Workshop
This workshop from Rutgers University will go through some of these techniques including vectorization, parallelization, just in time compilation, and distributed task executions. Hands-on exercises to emphasize the following solutions will be provided.
This workshop from Rutgers University will review the basics of unsupervised and supervised methods, discuss some of the popular models such as Support Vector Machines, Decision Trees, and Random Forest, and show how these models work for classification and regression problems.
Deep Learning with Python and Keras
This workshop from Rutgers University focuses on learning the basics of Deep Learning with Python and Keras including data preparation, Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN).
R
R is a programming language for statistical computing and graphics used widely in data science!
Watch this introductory video on R and RStudio!
Get started programming in R using RStudio!
This series breaks down R into 3 levels of difficulty!
Learn the basics and venture into machine learning!
R for Reproducible Scientific Analysis by Software Carpentry
Learn best practices for data wrangling, graphing, summarizing your results, and more!
SQL
SQL is used for performing operations on data stored in databases and is a must-learn for data scientists.
Watch these introductory videos created by NSDC volunteer Jingnan Qi on SQL (Structured Query Language), widely used for database management!
Industry Skills
Use the below resources to polish your resume writing and interview skills. Explore these tools to gain relevant industry certifications and badges.
Microsoft Certifications and Exams
Earn certifications that show you are keeping pace with today’s technical roles and requirements. Select from nine job roles’ certification paths and master core concepts at your speed and on your schedule.
Azure Video Tutorials
Explore video tutorials to learn about Microsoft Azure cloud computing, machine learning and working with data in the Azure cloud environment.
This course discusses how to use search and navigate functions in Splunk, use fields, get statistics from your data, create reports, dashboards, lookups, and alerts. Scenario-based examples and hands-on challenges will enable you to create robust searches, reports, and charts. You will also be introduced to Splunk’s datasets features and Pivot interface.
Splunk Infrastructure Overview
This self-paced course gives users an overview of the Splunk Enterprise infrastructure. Users get a high-level look at how to grow a Splunk deployment from a single instance to a distributed environment. The course provides tips and best practices for deploying, extending and integrating Splunk and highlights what is happening behind the scenes.
Splunk User Behavior Analytics
Learn at your own pace through this free eLearning course that covers Splunk’s User Behavior Analytics interface. Explore how the User Behavior Analytics team defines threats and discuss efficient steps to take when responding to these potential threats. Learn and practice accurately triaging false positives.
Introduction to Splunk IM and Splunk APM
This eLearning course targets Ops, SREs and observability teams. It provides practical applications for the Splunk Infrastructure Monitoring and Splunk APM. Learn to navigate the user interface and monitor your infrastructure using out-of-the-box functionality.
Deliver apps and integrations that bring new kinds of data into the Splunk platform and deliver data-based insights which enable users to investigate, monitor, analyze, and make better decisions. Use Splunk Enterprise free for six months while you develop your app with their powerful Software Development Kits and helpful online documentation.
Get all the benefits of Splunk, deployed in a secure, reliable and scalable service.
Easily aggregate, analyze, and get answers from your data with Splunk Enterprise.
IBM Global University Program Awards
Learn about award programs designed to support a spectrum of university needs, including: the highly competitive IBM PhD Fellowship Awards, which provides funding for PhDs in the final years of their program, the IBM Masters Fellowship Awards, which provides funding for MAs, and the IBM Academic Awards program, which provides broad support for faculty in emerging science and technology fields.
Challenges are a great way for students to apply their data science knowledge to large-scale problems, connect with like-minded students, and push themselves to explore additional materials. Some challenges are ongoing. You can also create your own challenges, basing your project on past examples.
In their mission to match wits with some of the best minds in IBM research, IBM posts monthly programming problems along with hints and solutions. Competitions cover a wide range of topics, from rock paper scissors to vaccinating robots, and machine learning. Participate in this month’s challenge or check out results from previous challenges, beginning from 1998.
ASSISTments Longitudinal Data Mining Competition
For an example of a competition where data miners tried to predict an important longitudinal outcome using real-world educational data, refer to this competition. This can be useful in organizing your own competition, class, or workshop exercise. The findings of this study show how choices at a young age affect students’ career trajectories.
Use the below data sets to jump start or enhance your research.
IEEE Dataport provides freely available Open Access datasets, subscribe for full access.
Learn how open data can generate more value and better outcomes; access tools and resources to advance data collaboration, explore projects using over 30 open data repositories and data sharing models for societal benefit, and find links to resources that support open data sharing.
Explore free public data published by New York City agencies and other partners.
Explore publicly available data products from the Boston Area Research Initiative (BARI) projects, published to foster policy and research collaborations.
Explore datasets on issues of finance, city services, public safety, the environment, and more from Boston’s open data hub!
UCI Machine Learning Repository
Explore the University of California, Irvine’s repository of over 550 datasets on diverse topics!
Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI), Inc.
Access thousands of hydrologic, biogeochemical, and geographical data sets from US federal agencies, university projects, and community science monitoring.
Explore an online hub for developers and data scientists to find free and open data sets under open data licenses on weather, airlines, finance and more.
Stanford Large Network Dataset Collection
Explore free datasets offered by Stanford!
Explore a large collection of free datasets offered by The World Bank!
Access awesome public datasets!
The People’s Speech Dataset is among the world’s largest English speech recognition corpus today that is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0.
Use Cases can give researchers ideas about how to use and implement data sets into their own work.
Access more than 100 open-source programs, a library of knowledge resources, developer advocates and supporters, and a global community of developers. The use cases cover many topics, including AI, analytics, Node.js, blockchain, and Java.
Explore IBM’s key open-source projects covering the cloud, data science and analytics, containers, blockchain, quantum, cognitive and artificial intelligence, security, IoT and more.
Multimedia
Learn data science and its real-world applications as you listen to a popular data science podcast!
Learn complex topics easily and quickly by listening to this data science podcast.
Grasp machine learning concepts through a podcast!
Listen to an industry professional talk about data science!
Learn about data science news in the industry through a podcast!
The Data, Responsibly Comic Series
Learn about data ethics and fairness in this sophisticated virtual comic book series from the Framework for Integrative Equity Systems Institute. Available in English, Spanish and French, these virtual comic books explore equity issues in data science systems.
A Gentle Introduction to Data Science
Begin your Data Science journey by listening to a talk by Marc Garcia!
Explore professional, visually-appealing data science articles
COVID-19 Resources
While COVID-19 has been an overwhelming challenge to the whole world, advancement in data science and the work of dedicated researchers help us understand aspects of this pandemic and how to navigate this new reality and attempt to alleviate its effects.
COVID Information Commons
For COVID-19 resources visit the COVID Information Commons (CIC) funded by NSF COVID RAPID Award #2028999 to search 990 NSF COVID awards, all featuring data and science.
- COVID-19-related datasets
- COVID-19 research funding opportunities
- COVID-19 organizations and networks
Boston COVID Data Stories
Read their reports that highlight Living in Boston During COVID covering different topics each. Inequities in Navigating a Pandemic, Fear and Ambivalence, Economic Impact, Lifestyle, Ideology, Context Drive Attitudes, Vaccination Planning and Hesitancy, Physical and Mental Health, The Inequitable Consequences-of Vaccination Intentions.
Daily United States COVID-19 Testing and Outcomes Data By State
The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. Our dataset was in use by national and local news organizations across the United States and by research projects and agencies worldwide.