Data Science Resource Repository


About

The Data Science Resource Repository (DSRR) provides a curated open set of resources for learners and educators to enable data science literacy, a very important tool in the future of our economy and a strong base for workforce development. The DSRR is openly available on the Northeast Hub website, to enable leverage of these best practices and resources to more broadly increase data science capacity, with a specific focus on underserved communities.

Resources from the Big Data Hub’s Education + Data Literacy projects, as well as data science education materials available from across the big data ecosystem including academia, government, not for profits and industry, are included. This includes resources from the Northeast Student Data Corps, Data Science for All, and Big Data for Education projects, and other projects from the Hub community such as CUAHSI to provide data sets and resources to learn data science for hydrologic analysis.

The DSRR target audience for learners includes students from high school through higher ed and adult learners. The DSRR target audience for educators includes K-12 teachers and caregivers, undergraduate and community college educators, library professionals who wish to teach data science to their community members, and not for profits which teach STEM and other skills who want to add data science awareness and literacy to their offerings.

Please let us know if you are aware of data science resources for students and educators which are openly available that we may add to the Northeast Hub DSRR. Email us at contact@nebigdatahub.org, or click here.

Data Science Resources For Educators

Resources for educators listed here include sample data science curriculum, and artifacts from workshops and programs to teach data science to all ages, from K-12 through college and adult learners. If you have data science educational resource examples to share on the Hub website, please Email us at contact@nebigdatahub.org, or click here.

Data Science programs are growing, yet there are many students who do not have access to data science education, especially in underserved communities. Community leaders in K-12, not for profit STEM academies, libraries, community colleges, and schools without data science programs can begin to create curriculum, content, and pedagogy to increase data science literacy in their student population and community.

Data science awareness can start with activities for K-12 learners to teach them about data and how it helps us understand more and make better decisions, from understanding COVID-19 statistics and trends, to sports data analysis to determine the characteristics of the best baseball teams in history. There is more opportunity to create data science programs, beginning with awareness of “What is data science?”, and “How do I learn about data science?”, and “Why would I want to learn data science, what type of job could I get in the future?”.

The Northeast Hub is working with the higher ed, not for profit, industry, government, and K-12 community to enable the development of data science programs for students of all ages. In the DSRR, we share examples of how to develop a data science curriculum.

Learn about the Northeast Student Data Corps, the content and pedagogy for undergraduate and high school programs, and adult learners. Leverage learnings from student challenges for high schoolers. Learn about the Big Data for Education MOOC – Massively Open Online Course. Participate in the COVID Information Commons Student Paper Challenge, with the first challenge in 1Q2021.

If you have data science educational resource examples to share on the Hub website, please Email us at contact@nebigdatahub.org, or click here.

 

Elementary School Educator Data Science Resources

Big Data for Little Kids

In 2017, as part of a National Science Foundation funded project, the New York Hall of Science (NYSCI) set out to teach Big Data concepts to children ages 4 – 8 years old. NYSCI developed and piloted an after-school program for families to utilize the data cycle as a method of informed decision-making that embodied NYSCI’s Design, Make, Play approach to learning. This “Big Data for Little Kids” activity guide describes what took place during NYSCI’s Big Data for Little Kids workshop series, Museum Makers: Designing With Data. In addition to detailed outlines of the activities implemented during the program, this guide includes a glossary of recurrent terms and resources used throughout the workshops.

You can use this guide as an example of how to develop data science enabling programs with practical and fun activities to teach young students the concepts of data science.

 

Machine Learning for Kids

For a great way to teach the basics of machine learning to kids through hands-on experiences, check out this easy-to-use guided environment for training machine learning models to recognize text, numbers, images, or sounds.

 

High School Educator Data Science Resources

International Data Science in Schools Project (IDSSP)

The IDSSP gathered a team of computer scientists and statisticians to develop curriculum frameworks to teach teachers from a variety of backgrounds how to teach Data Science to students. Phase one of the curriculum has been released and approved by their Advisory Group.

 

Data Science for High School Computer Science

A group of 41 data science experts, practitioners, and researchers gathered to identify successes, challenges, and solutions to improving the use of data science in education and specifically for high school. To read about the gaps in data science education, learn validated practices from researchers, explore resources, challenges, and solutions, and ideas to pave the data science path in high school settings, refer to the findings of this workshop.

 

College Educator Data Science Resources

City University of New York Bachelor of Science in Data Science

If you are looking to design a data science program for a bachelor’s degree or teach your students college level data science, refer to the City University of New York’s Bachelor of Science in Data Science curriculum.

 

Data 8: Foundations of Data Science

To learn the core concepts of inference and computing, while working hands-on with real data including economic data, geographic data and social networks, check out this course by University of California Berkeley designed specifically for students who have not previously taken any statistics or computer science courses.

 

Data 100: Principles and Techniques of Data Science

To explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making, check out this intermediate class that bridges between Data 8 and upper division computer science and statistics courses as well as method courses in different fields developed by the University of California Berkeley. 

 

The Missing Semester of Your Computer Science Education

To learn about advanced topics within CS, from operating systems to machine learning and master the command-line, use a powerful text editor, use fancy features of version control systems, and much more, check out this series of classes developed by MIT. The Data Wrangling class is particularly pertinent to Data Science. 

 

Software Carpentry

Learn about the Unix shell, version control with Git, and programming in Python or R.

 

Data Carpentry

Learn about data organization, cleanup, analysis, and visualization in Data science lessons for  genomics, ecology, social science and more, including tools such as Python and R.

 

Library Carpentry

Learn how to apply concepts of software development and data science to library contexts.

 

Undergraduate and Graduate Responsible Data Science Courses at NYU CDS

Explore lecture material on responsible data science through a technical course that tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection. 

 

Data Science Educator Resources for All Levels

Open DS4All

Explore IBM’s Open DS4All to access over 15 modules on GitHub with instructor notes and slide decks that can be used to build an academic program focused on data science.

 

Microsoft Tools and Resources for Teachers and Educators

Create learning environments that empower students to be independent and creative learners; build skills in reading, language, and STEM; and prepare them for their futures.

 

Data Science Ethics

To learn how to think through the ethics surrounding privacy, data sharing, and algorithmic decision-making and teach it to students, check out this course on data science ethics.

 

AI Ethics

Watch a collection of lectures on the ethical implications of data and artificial intelligence from different perspectives.

 

The Data, Responsibly Comic Series

Learn about data ethics and fairness in this sophisticated virtual comic book series from the Framework for Integrative Equity Systems Institute. Available in English, Spanish and French, these virtual comic books entitled “Mirror, Mirror”, and “Fairness and Friends” dive into how to handle equity issues in data science systems. 

Challenges are a great way for students to apply their data science knowledge to large scale problems, connect with like-minded students, and push themselves to explore material beyond their curriculums. Some challenges are ongoing but if you are also looking to create a challenge and need previous samples to base your work on, if you are preparing for an upcoming challenge, or would like to learn from amazing data science challenges and their results:

 

Ponder This

In their mission to match wits with some of the best minds in IBM research, IBM posts monthly programming problems along with hints and solutions. Competitions cover a wide range of topics from rock paper scissors to vaccinating robots and cover different aspects of programming including machine learning. Explore this link to participate in this month’s challenge or check out results from previous challenges that have been ongoing since 1998.

 

ASSISTments Longitudinal Data Mining Competition

For an example of a competition where data miners tried to predict an important longitudinal outcome using real-world educational data, refer to this competition. This can be useful in organizing your own competition or as a class or workshop exercise. Aside from the application of Data Science, the findings of this study also reveal how student choices at a young age affects their career choices and their tendencies to join Data Science careers.

While COVID-19 has been an overwhelming challenge to the whole world, advancement in data science and the work of dedicated researchers help us understand aspects of this pandemic and how to navigate this new reality and attempt to alleviate its effects.

For COVID-19 resources go to the COVID Information Commons (CIC) funded by NSF COVID RAPID Award #2028999 to search 990 NSF COVID awards, all featuring data and science. Browse the research, global datasets, research funding opportunities, and organizations and networks using data and data science to unlock mysteries of COVID and address the global pandemic.

The COVID Info Commons – NSF Awards and PI database allows you to enter keyword searches for the research you wish to learn from and perhaps researchers to collaborate with. For instance, you can do a keyword search for “machine learning” and find the 96 NSF awards mentioning machine learning, to learn about the research and how to apply machine learning. Clear the filter, and enter “artificial intelligence” to find the 38 NSF funded COVID research projects including artificial intelligence. Other resources include:

Data Science Resources For Learners

If you are a learner: a student, researcher, or someone interested in learning data science from scratch or certain aspects of it, these resources are for you. From lessons to datasets, challenges, and fellowships, this list will get you started on your data science journey or give you the boost you are looking for to take your skills to the next level. If you have data science educational resource examples to share on the Hub website, please Email us at contact@nebigdatahub.org, or click here.

Learn Data Science

These lessons cover a wide range of data science topics curated from top universities and organizations. From beginners to experts, there is material for everyone to explore data science and take their skills to the next level. 

 

Data 8: Foundations of Data Science

To learn the core concepts of inference and computing, while working hands-on with real data including economic data, geographic data and social networks, check out this course by University of California Berkeley designed specifically for students who have not previously taken any statistics or computer science courses.

 

Data 100: Principles and Techniques of Data Science

To explore key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making, check out this intermediate class that bridges between Data 8 and upper division computer science and statistics courses as well as method courses in different fields developed by the University of California Berkeley. 

 

The Missing Semester of Your Computer Science Education

To learn about advanced topics within CS, from operating systems to machine learning and master the command-line, use a powerful text editor, use fancy features of version control systems, and much more, check out this series of classes developed by MIT. The Data Wrangling class is particularly pertinent to Data Science. 

 

Software Carpentry

Learn about the Unix shell, version control with Git, and programming in Python or R.

 

Data Carpentry

Learn about data organization, cleanup, analysis, and visualization in Data science lessons for  genomics, ecology, social science and more, including tools such as Python and R. 

 

Library Carpentry

Learn how to apply concepts of software development and data science to library contexts.

 

Cognitive Class with Data Science and AI

To learn leading-edge technologies for data science, AI, cloud, blockchain, docker, Kubernetes, quantum computing and more, earn certificates and badges, and take on skills that employers seek, check out IBM’s cognitive class. Learning paths cover big data fundamentals, data science fundamentals, data science with python, blockchain for developers, containers, and reactive architecture foundations. After choosing your path, complete your courses, and earn badges to show off to your network.  

 

Open P-TECH

Learn about AI, cloud, cybersecurity, quantum, and more and earn industry-recognized digital badges.

 

AI Robot – TJBot

Get started with easy step-by-step instructions and pre-written recipes to bring your own AI robot to life. TJBot is a great resource to create your own template to learn, experiment with, and explore AI using IBM Watson services.

 

Quantum Computing

Check out user guides and interactive demos about quantum principles or create and run algorithms on real quantum computing hardware.

 

Undergraduate and Graduate Responsible Data Science Courses at NYU CDS

Explore lecture material on responsible data science through a technical course that tackles the issues of ethics, legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection.

 

The Data, Responsibly Comic Series

Learn about data ethics and fairness in this sophisticated virtual comic book series from the Framework for Integrative Equity Systems Institute. Available in English, Spanish and French, these virtual comic books entitled “Mirror, Mirror”, and “Fairness and Friends” dive into how to handle equity issues in data science systems. 

 

Industry Skills

Microsoft Certifications and Exams

Earn certifications that show you are keeping pace with today’s technical roles and requirements. Select out of 9 job roles to discover their certification paths. All certifications present a free learning path to master core concepts at your speed and on your schedule.

 

Azure Video Tutorials

Explore more than 15 video tutorials to learn about Microsoft Azure cloud computing, machine learning and working with data in the Azure cloud environment

 

Splunk Fundamentals 1

This course covers how to search and navigate in Splunk, use fields, get statistics from your data, create reports, dashboards, lookups, and alerts. Scenario-based examples and hands-on challenges will enable you to create robust searches, reports, and charts. It will also introduce you to Splunk’s datasets features and Pivot interface.

 

Splunk Infrastructure Overview

This self-paced course gives users an overview of the Splunk Enterprise infrastructure. Users get a high-level look at how to grow a Splunk deployment from a single instance to a distributed environment. With tips and best practices for deploying, extending and integrating Splunk while showing the user what is happening behind the scenes.

 

Splunk User Behavior Analytics

Learn at your own pace through this free eLearning course that covers Splunk’s User Behavior Analytics interface. Explore how the User Behavior Analytics team defines threats and discuss efficient steps to take when responding to these potential threats. Finally, learn and practice how to accurately triage false positives.

 

Introduction to Splunk IM and Splunk APM

This eLearning course targets Ops, SREs and observability teams. It provides practical applications of using Splunk Infrastructure Monitoring and Splunk APM. Learn to navigate the user interface and monitor your infrastructure using out-of-the-box functionality.

 

Splunk Developer Program

Deliver apps and integrations that bring new kinds of data into the Splunk platform and deliver data-based insights, enabling users to investigate, monitor, analyze, and act to make better and smarter decisions. Use Splunk Enterprise free for six months while you develop your app with their powerful Software Development Kits and helpful online documentation.

IBM Global University Program Awards

Learn about the award programs designed to support a spectrum of university needs: The highly competitive IBM PhD Fellowship Awards provide funding for PhD students in the final years of their PhD program, the IBM Masters Fellowship Awards provide funding for Masters students, while the IBM Academic Awards program provides support ranging from individual faculty to broader areas of emerging science and technology that contain significant interest to the university and IBM.

 

COVID Research Funding

Learn about U.S. and international COVID-19 research funding opportunities.

Challenges are a great way for students to apply their data science knowledge to large scale problems, connect with like-minded students, and push themselves to explore material beyond their curriculums. Some challenges are ongoing but if you are also looking to create a challenge and need previous samples to base your work on, if you are preparing for an upcoming challenge, or would like to learn from amazing data science challenges and their results:

 

Ponder This

In their mission to match wits with some of the best minds in IBM research, IBM posts monthly programming problems along with hints and solutions. Competitions cover a wide range of topics from rock paper scissors to vaccinating robots and cover different aspects of programming including machine learning. Explore this link to participate in this month’s challenge or check out results from previous challenges that have been ongoing since 1998.

 

ASSISTments Longitudinal Data Mining Competition

For an example of a competition where data miners tried to predict an important longitudinal outcome using real-world educational data, refer to this competition. This can be useful in organizing your own competition or as a class or workshop exercise. Aside from the application of Data Science, the findings of this study also reveal how student choices at a young age affects their career choices and their tendencies to join Data Science careers.

For Data Science Education + Data Literacy

Microsoft Open Data Campaign

Learn how open data can generate more value and better outcomes; access tools and resources to advance data collaboration, explore projects using over 30 open data repositories and data sharing models for societal benefit, and find links to resources that support open data sharing.

 

For Health

COVID-19-related datasets

Explore global datasets related to COVID-19.

 

For Urban to Rural Communities

NYC Open Data

Explore free public data published by New York City agencies and other partners.

 

Boston Data Portal

Explore publicly available data products from the Boston Area Research Initiative (BARI) projects published to foster policy and research collaborations.

 

Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI), Inc.

Access thousands of hydrologic, biogeochemical, and geographical data sets from US federal agencies, university projects, and community science monitoring.

 

Data Asset eXchange

Explore an online hub for developers and data scientists to find free and open data sets under open data licenses on weather, airlines, finance and more.

IBM Use Cases

Access more than 100 open-source programs, a library of knowledge resources, developer advocates ready to help, and a global community of developers covering topics such as AI, analytics, Node.js, blockchain, and Java.

 

World of Open Source

Explore IBM’s key open-source projects covering the cloud, data science and analytics, containers, blockchain, quantum, cognitive and artificial intelligence, security, IoT and more.

 

Boston COVID Data Stories

Read their reports that highlight Living in Boston During COVID covering different topics each. Inequities in Navigating a Pandemic, Fear and Ambivalence, Economic Impact, Lifestyle, Ideology, Context Drive Attitudes, Vaccination Planning and Hesitancy, Physical and Mental Health, The Inequitable Consequences-of Vaccination Intentions. Check out the COVID in Boston page for more data stories. 

While COVID-19 has been an overwhelming challenge to the whole world, advancement in data science and the work of dedicated researchers help us understand aspects of this pandemic and how to navigate this new reality and attempt to alleviate its effects.

For COVID-19 resources visit  the COVID Information Commons (CIC) funded by NSF COVID RAPID Award #2028999 to search 990 NSF COVID awards, all featuring data and science. Browse the research, global datasets, research funding opportunities, and organizations and networks using data and data science to unlock mysteries of COVID and address the global pandemic.

The COVID Info Commons – NSF Awards and PI database allows you to enter keyword searches for the research you wish to learn from and perhaps researchers to collaborate with. For instance, you can do a keyword search for “machine learning” and find the 96 NSF awards mentioning machine learning, to learn about the research and how to apply machine learning. Clear the filter, and enter “artificial intelligence” to find the 38 NSF funded COVID research projects including artificial intelligence. Other resources at the CIC include: