Data Science Resource Repository


About

The Data Science Resource Repository (DSRR) is a curated set of resources for learners and educators that promotes data science literacy. Data science literacy skills are crucial to the successful development of our economy and provide a strong knowledge base for the future of our workforce. . The DSRR is made openly available on the Northeast Big Data Innovation Hub’s website with the aim of  leveraging these best practices and resources to broadly  increase data science capacity, with a specific focus on underserved communities.

Resources from the Big Data Hub’s Education + Data Literacy projects, as well as data science education materials available from across the big data ecosystem (including academia, government, not for profits and industry) are included. Resources from the National Student Data Corps, Data Science for All, Big Data for Education, and other projects from the Hub community, such as CUAHSI, are provided here. .

The DSRR’s target ‘Learner’ audience  is students of all ages, from high school through higher ed and adult learners.  Similarly, our target ‘Educator’ audience is K-12 teachers and caregivers, undergraduate and community college educators, library professionals teaching data science to their community, and not-for-profits that teach STEM.

Please let us know if you are aware of any data science resources which are openly available and should be added to the DSRR. Email us at contact@nebigdatahub.org, or fill out our online submission form.

Data Science Resources For Educators

These Resources for Educators listed here include sample data science curricula and program materials to teach data science to all students of all ages, from K-12 through college and adult learners.

Data Science programs are increasingly common, yet many students do not have access to data science education, particularly in underserved communities. Community leaders in K-12, not-for-profit STEM academies, libraries, community colleges, and schools without data science programs can  create curricula and content for their student population.

Data science education can begin early with K-12 learners. For students at this level, data science demonstrates how decisions are made and explains statistics and trends.  Data science programs can raise young students’ awareness of what data science is, how to learn about the field, and prompt interest in jobs in this area.

 

Elementary School Educator Data Science Resources

Big Data for Little Kids

 An NSF-funded project with the New York Hall of Science (NYSCI), Big Data for Little Kids teaches data science concepts to children ages 4 – 8 years old. NYSCI developed this program to use the g that embodied NYSCI’s Design, Make, Play approach to learning. This activity guide follows the NYSCI’s Big Data for Little Kids workshop series “Museum Makers: Designing With Data,” and includes a glossary of the recurrent terms and resources used throughout the program.

 

Machine Learning for Kids

This is a great tool to teach the basics of machine learning to kids via hands-on experiences. Check out this easy-to-use guided environment for training machine learning models to recognize text, numbers, images, or sounds.

 

CodeMonkey

Teach data science to kids from Pre-K to 8th grade!

 

High School Educator Data Science Resources

International Data Science in Schools Project (IDSSP)

The focus of the IDSSP project is to provide teachers with methodologies for students’ data science education. Phase One of the curriculum has been released and approved by the IDSSP’s Advisory Group.

 

Data Science for High School Computer Science

A group of 41 data science experts gathered to identify successes, challenges, and solutions to improving the teaching of data science. Special focus was paid to high school data science education.  Learn more about the challenges in data science education and find resources and solutions for your classroom by reading the findings of this workshop. 

CheckiO

Use a high school-level resource for teaching data science! 

College Educator Data Science Resources

City University of New York Bachelor of Science in Data Science

Design a data science program for undergraduates with the City University of New York’s Bachelor of Science in Data Science curriculum.

Data 8: Foundations of Data Science

Learn the core concepts of inference and computing while working with real data, including economic, geographic and social networking statistics. This course by University of California, Berkeley was designed for students who have not taken any Statistics or Computer Science courses.

 

Data 100: Principles and Techniques of Data Science

Explore key data science concepts, including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. This class from the University of California, Berkeley bridges the space between the Data 8 course noted above and  upper division computer science and statistics courses. 

 

The Missing Semester of Your Computer Science Education

Learn about advanced computer science topics, from operating systems to machine learning. Master the command-line, use a powerful text editor, use fancy features of version control systems, and much more, with this series of classes developed by MIT. 

Software Carpentry

Learn about the Unix shell, version control with Git, and programming in Python or R.

Data Carpentry

Learn about data organization, cleanup, analysis, and visualization. These data science lessons provide information about  genomics, ecology, social science and more, including tools such as Python and R.

Library Carpentry

Learn how to apply software development and data science concepts to library contexts.

Undergraduate and Graduate Responsible Data Science Courses at NYU CDS

Explore responsible data science lectures with this technical course which tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. 

Past, Present, Future

This Columbia course integrates the teaching of algorithms and data manipulation with the political whirlwinds and ethical controversies from which those techniques emerged.

DSC 101: Introduction to Data Science

Created and taught by NSDC Founding Committee member Ajay Anand, this introductory course was adapted from the Data 8 course.

Data Science Educator Resources for All Levels

Open DS4All

Explore IBM’s Open DS4All to access over 15 modules on GitHub with instructor notes and slide decks that can be used to build an academic program focused on data science.

 

Microsoft Tools and Resources for Teachers and Educators

Create learning environments that empower students to be independent and creative learners and build skills in reading, language, and STEM.

 

Data Science Ethics

Learn how to think through and teach the ethics surrounding privacy, data sharing, and algorithmic decision-making.

 

AI Ethics

Watch a collection of lectures on the ethical implications of data and artificial intelligence.

 

Splunk for Good Product Donation Program

Splunk provides a one-year, 10GB license for Splunk Enterprise to qualifying nonprofits at no cost with free support to get you up and running.

  1. Nonprofits
  2. Workforce Development
  3. Academic Instruction
  4. Research Institutes

Splunk Cloud Trial

Get all the benefits of Splunk, deployed in a secure, reliable and scalable service.

Binghamton University MS in Data Analytics

This program is a collaboration across STEM, business, and engineering departments. Students learn how to analyze data and work on projects tackling real-world problems.

 

CUNY City Tech BS in Data Science

This program synthesizes applied mathematics, high-performance computing, and data management and analysis for a well-rounded data science education.

 

Global Columbia Collaboratory

This program enables students to learn about global challenges and collaborate on projects to address them, which can be data and technology-driven.

 

Queensborough Community College Data Science Lab for Undergraduate Research

This lab engages undergraduate students in data science-related training and research activities.

Data Science Resources For Learners

If you are a Learner (a student, researcher, or someone interested in learning data science), these resources are for you. From lessons to datasets, challenges, and fellowships, this list can get you started on your data science journey or give you the skill boost you are looking for.

Learn Data Science

These lessons cover a wide range of data science topics and are presented by top universities and organizations. From beginners to experts, there is material for everyone to explore data science and take their skills to the next level. 

 

Data 8: Foundations of Data Science

Learn the core concepts of inference and computing while working with real data, including economic, geographic and social networking statistics. This course by University of California, Berkeley was designed for students who have not taken any Statistics or Computer Science courses.

 

Data 100: Principles and Techniques of Data Science

Explore key data science concepts, including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making. This class from the University of California, Berkeley bridges the space between the Data 8 course noted above and  upper division computer science and statistics courses. 

 

The Missing Semester of Your Computer Science Education

Learn about advanced computer science topics, from operating systems to machine learning. Master the command-line, use a powerful text editor, use fancy features of version control systems, and much more, with this series of classes developed by MIT. 

 

Software Carpentry

Learn about the Unix shell, version control with Git, and programming in Python or R.

 

Data Carpentry

Learn about data organization, cleanup, analysis, and visualization. These data science lessons provide information about  genomics, ecology, social science and more, including tools such as Python and R.

 

Library Carpentry

Learn how to apply software development and data science concepts to library contexts.

 

Cognitive Class with Data Science and AI

This is a great resource for learning about the latest developments in data science, AI, cloud, blockchain, docker, Kubernetes, quantum computing and more. Earn  certificates and badges and learn industry skills. Learning paths cover big data fundamentals, data science fundamentals, data science with python, blockchain for developers, containers, and reactive architecture foundations. 

 

Open P-TECH

Learn about AI, cloud, cybersecurity, quantum, and more, while earning industry-recognized digital badges.

 

AI Robot – TJBot

Get started with easy, step-by-step instructions and pre-written recipes to bring your own AI robot to life. TJBot is a great resource for learning, experimenting, and exploring AI using IBM Watson services.

 

Quantum Computing

Check out user guides and interactive demos about quantum principles. You can also create and run algorithms on real quantum computing hardware.

 

Undergraduate and Graduate Responsible Data Science Courses at NYU CDS

Explore responsible data science lectures with this technical course which tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. 

 

Kaggle

Explore a multi-faceted data-science website that contains many resources from competitions, datasets, to courses.

 

The Open Source Data Science Masters

Explore a data science study-guide!

 

Introduction to Computational Thinking and Data Science at MIT OCW

Take an undergraduate data science course offered at MIT!

 

Machine Learning by Andrew Ng

Take a free machine learning course offered by Stanford!

Analytics Vidhya

Explore a free collection of data science courses!

Full Data Science Course for Beginners by freeCodeCamp.org

Take this beginner data science course at freeCodeCamp.org

Data Science Study Guides

Github Data Science Best Resources

Become a well-rounded data scientist using this data science study-guide!

Chris Engelhardt’s Github Data Sci Guide

Learn more about data science using this Github study guide.

Mathematics

Learn the fundamentals of the mathematics necessary for data science!

Arithmetic | Khan Academy

Algebra Basics | Khan Academy

Precalculus | Khan Academy

Differential Calculus | Khan Academy 

Linear Algebra | Khan Academy

Statistics and Probability | Khan Academy

Types of Data

Learn about types of data from a video produced by Ahmed Gomaa, Associate Professor at University of Scranton and NSDC member.

Python

These resources are for learning Python, an accessible and popular programming language used often for data science!

Python for Beginners

Learn Python from scratch with this Youtube series by CS Dojo!

learnpython.org

Follow along and run Python code examples in your web browser!

Jobtensor Python Introduction

Learn beginner and advanced Python topics with coding examples!

Plotting and programming in Python by Software Carpentry 

Learn how to plot data and improve your programs!

Jovian.ai 

Learn data structures and algorithms, data analysis, and machine learning in Python!

Chris Albon

Become an expert at Data Science using Python!

Learn Data Science

Access a free, in-depth python tutorial for data science.

R

R is a programming language for statistical computing and graphics used widely in data science!

R programming for beginners

Watch this introductory video on R and RStudio!

Install RStudio

Get started programming in R using RStudio!

DataFlair R tutorials

This series breaks down R into 3 levels of difficulty!

Guru99 R tutorials

Learn the basics and venture into machine learning!

R for Reproducible Scientific Analysis by Software Carpentry

Learn best practices for data wrangling, graphing, summarizing your results, and more!

SQL

SQL is used for performing operations on data stored in databases and is a must-learn for data scientists.

NSDC Video Library

Watch these introductory videos created by NSDC volunteer Jingnan Qi on SQL (Structured Query Language), widely used for database management!

Industry Skills

Use the below resources to polish your resume writing and interview skills. Explore these tools to gain relevant industry certifications and badges.

 

Microsoft Certifications and Exams

Earn certifications that show you are keeping pace with today’s technical roles and requirements. Select from nine job roles’ certification paths and master core concepts at your speed and on your schedule.

 

Azure Video Tutorials

Explore video tutorials to learn about Microsoft Azure cloud computing, machine learning and working with data in the Azure cloud environment.

 

Splunk Fundamentals 1

This course discusses how to use search and navigate functions in Splunk, use fields, get statistics from your data, create reports, dashboards, lookups, and alerts. Scenario-based examples and hands-on challenges will enable you to create robust searches, reports, and charts. You will also be introduced to Splunk’s datasets features and Pivot interface.

 

Splunk Infrastructure Overview

This self-paced course gives users an overview of the Splunk Enterprise infrastructure. Users get a high-level look at how to grow a Splunk deployment from a single instance to a distributed environment. The course provides tips and best practices for deploying, extending and integrating Splunk and highlights what is happening behind the scenes.

 

Splunk User Behavior Analytics

Learn at your own pace through this free eLearning course that covers Splunk’s User Behavior Analytics interface. Explore how the User Behavior Analytics team defines threats and discuss efficient steps to take when responding to these potential threats. Learn and practice accurately triaging false positives.

 

Introduction to Splunk IM and Splunk APM

This eLearning course targets Ops, SREs and observability teams. It provides practical applications for the Splunk Infrastructure Monitoring and Splunk APM. Learn to navigate the user interface and monitor your infrastructure using out-of-the-box functionality.

 

Splunk Developer Program

Deliver apps and integrations that bring new kinds of data into the Splunk platform and deliver data-based insights which enable users to investigate, monitor, analyze, and make better decisions. Use Splunk Enterprise free for six months while you develop your app with their powerful Software Development Kits and helpful online documentation.

Splunk Cloud Trial

Get all the benefits of Splunk, deployed in a secure, reliable and scalable service.

Splunk Enterprise Trial

Easily aggregate, analyze, and get answers from your data with Splunk Enterprise.

IBM Global University Program Awards

Learn about award programs designed to support a spectrum of university needs, including: the highly competitive IBM PhD Fellowship Awards, which provides funding for PhDs in the final years of their program, the IBM Masters Fellowship Awards, which provides funding for MAs, andthe IBM Academic Awards program, which provides broad support for faculty in emerging science and technology fields.

Challenges are a great way for students to apply their data science knowledge to large-scale problems, connect with like-minded students, and push themselves to explore additional materials. Some challenges are ongoing. You can also create your own challenges, basing your project on past examples. 

 

Ponder This

In their mission to match wits with some of the best minds in IBM research, IBM posts monthly programming problems along with hints and solutions. Competitions cover a wide range of topics, from rock paper scissors to vaccinating robots, and  machine learning. Participate in this month’s challenge or check out results from previous challenges, beginning from 1998.

 

ASSISTments Longitudinal Data Mining Competition

For an example of a competition where data miners tried to predict an important longitudinal outcome using real-world educational data, refer to this competition. This can be useful in organizing your own competition, class, or workshop exercise. The findings of this study show how choices at a young age affect students’ career trajectories.

Use the below data sets to jump start or enhance your research. 

IEEE Dataport Open Data Sets

IEEE Dataport provides freely available Open Access datasets, subscribe for full access.

 

Microsoft Open Data Campaign

Learn how open data can generate more value and better outcomes; access tools and resources to advance data collaboration, explore projects using over 30 open data repositories and data sharing models for societal benefit, and find links to resources that support open data sharing.

 

NYC Open Data

Explore free public data published by New York City agencies and other partners.

 

Boston Data Portal

Explore publicly available data products from the Boston Area Research Initiative (BARI) projects, published to foster policy and research collaborations.

 

Analyze Boston

Explore datasets on issues of finance, city services, public safety, the environment, and more from Boston’s open data hub!

 

UCI Machine Learning Repository

Explore the University of California, Irvine’s repository of over 550 datasets on diverse topics!

 

Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI), Inc.

Access thousands of hydrologic, biogeochemical, and geographical data sets from US federal agencies, university projects, and community science monitoring.

 

Data Asset eXchange

Explore an online hub for developers and data scientists to find free and open data sets under open data licenses on weather, airlines, finance and more.

 

Stanford Large Network Dataset Collection

Explore free datasets offered by Stanford!

 

DataBank

Explore a large collection of free datasets offered by The World Bank!

 

Awesome Public Datasets

Access awesome public datasets!

Use Cases can give researchers ideas about how to use and implement data sets into their own work. 

IBM Use Cases

Access more than 100 open-source programs, a library of knowledge resources, developer advocates and supporters, and a global community of developers. The use cases cover many topics, including AI, analytics, Node.js, blockchain, and Java.

 

World of Open Source

Explore IBM’s key open-source projects covering the cloud, data science and analytics, containers, blockchain, quantum, cognitive and artificial intelligence, security, IoT and more.

Multimedia

Data Skeptic

Learn data science and its real-world applications as you listen to a popular data science podcast!

 

Linear Digressions

Learn complex topics easily and quickly by listening to this data science podcast.

 

Talking Machines

Grasp machine learning concepts through a podcast!

 

O’Reilly Data Show

Listen to an industry professional talk about data science!

 

Not So Standard Deviations

Learn about data science news in the industry through a podcast!

The Data, Responsibly Comic Series

Learn about data ethics and fairness in this sophisticated virtual comic book series from the Framework for Integrative Equity Systems Institute. Available in English, Spanish and French, these virtual comic books explore equity issues in data science systems.

CSS Diner

Learn CSS by playing a game!

 

CodinGame

Learn any programming language by playing a coding game!

A Gentle Introduction to Data Science

Begin your Data Science journey by listening to a talk by Marc Garcia!

Distill

Explore professional, visually-appealing data science articles

COVID-19 Resources

While COVID-19 has been an overwhelming challenge to the whole world, advancement in data science and the work of dedicated researchers help us understand aspects of this pandemic and how to navigate this new reality and attempt to alleviate its effects.

 

COVID Information Commons

For COVID-19 resources visit  the COVID Information Commons (CIC) funded by NSF COVID RAPID Award #2028999 to search 990 NSF COVID awards, all featuring data and science.

Boston COVID Data Stories

Read their reports that highlight Living in Boston During COVID covering different topics each. Inequities in Navigating a Pandemic, Fear and Ambivalence, Economic Impact, Lifestyle, Ideology, Context Drive Attitudes, Vaccination Planning and Hesitancy, Physical and Mental Health, The Inequitable Consequences-of Vaccination Intentions

Daily United States COVID-19 Testing and Outcomes Data By State

The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. Our dataset was in use by national and local news organizations across the United States and by research projects and agencies worldwide.