Big Data Spoke Projects


wheel-1305434_1920What’s a Hub without Spokes? Each Big Data Hub supports its constituent members—drawn from academia, industry, non-profit organizations, and/or government—to work in concert to achieve common Big Data goals that would not be possible for the independent members to achieve alone. Big Data Spokes are multi-sector projects addressing regionally-defined priority areas, convening stakeholders whose work is guided by the following themes:

  • Accelerating progress towards addressing societal grand challenges relevant to the regional and national priority areas defined by the BD Hubs;
  • Helping automate the Big Data lifecycle; and
  • Enabling access to and spurring the use of important and valuable available data assets, including international data sets where relevant.

In FY2016, the Northeast Big Data Innovation Hub received $3.3 million in funding for seven full Spoke projects and planning projects (defined as seed funding to assist with the planning of future BD Spokes proposals). Details on these projects are included below.


FY2016 Spokes

A Licensing Model and Ecosystem for Data Sharing (Data Sharing Spoke Project)

Principal Investigators: Jane Greenberg (Drexel), Tim Kraska (Brown,) Samuel Madden (MIT)
Co-Principal Investigators: Carsten Binnig (Brown), Daniel Weitzner (MIT)

pieces-of-the-puzzle-592779_1280As a community, we seek to address key data sharing challenges relating to policy and privacy, platforms and formats, software and costs, and ethics and education about data sharing benefits.

The NSF Big Data Spoke project, “A Licensing Model and Ecosystem for Data Sharing,” is addressing some of these challenges. Our team is developing a safe and secure data sharing platform that facilitates sharing data that may or may not be open or free between different organizations (industry, academia, government).

Website for “A Licensing Model and Ecosystem for Data Sharing” project

Workshop on “Enabling Seamless Data Sharing in Industry and Academia,” September 29-30, 2016

The Northeast Big Data Innovation Hub: “Enabling Seamless Data Sharing in Industry and Academia” Workshop Report


Integration of Environmental Factors and Causal Reasoning Approaches for Large-Scale Observational Health Research (Health Spoke Project)

Principal Investigators: Gregory Cooper (U. Pittsburgh), Noemie Elhadad (Columbia), Vasant Honavar (Penn State), Chirag Patel (Harvard)

iStock_19334089_MEDIUMOur Health Spoke project is assembling a first-ever data warehouse to house numerous health/clinical, environmental, behavioral, and economic data streams. By breaking current data silos and bringing together multiple large environmental and clinical data streams, this project will enhance health research, allowing causal discovery between these data sources. The ultimate goal of the project is to facilitate community-led and collaborative causal discovery through dissemination of integrated and open big data and analytics tools.


Grand Challenges for Data-Driven Education (Education Spoke Project)

Principal Investigators: Ivon Arroyo (Worcester Polytechnic Institute), Ryan Baker (U Penn), Beverly Woolf (U. Mass, Amherst)
Co-Principal Investigator: Neil Heffernan (Worcester Polytechnic Institute)

Pre-teen students in computer lab with instructor in foregroundThe Northeast is a center of gravity for innovations in education, anchored by universities and publishers who drive K-12 education in America. This project will improve capacity in data-driven education by sharing educational databases, managing yearly data competitions, and conducting educational data science workshops and hackathons. The team intends to improve classroom learning and leverage the unique types of data available from digital education to better understand students, groups and the settings in which they learn.


Building Capacity for Regional Collaboration in Closing the Big Data Divide (Data Literacy Planning Project)

Principal Investigator: Stephen Uzzo (New York Hall of Science)

iStock_26046204_MEDIUMThe Big Data Literacy initiative led by members of the Northeast Big Data Innovation Hub is actively identifying knowledge and resource gaps, with a goal to help lifelong learners of all ages become data literate, throughout the Northeast and beyond.

Big Data Literacy Workshop at the New York Hall of Science, April 13-14, 2017


Planning for Privacy and Security in Big Data (Privacy & Security Planning Project)

key-74534_1280Principal Investigators: Adam Smith (Penn State), Rebecca Wright (Rutgers)

This planning project is bringing together stakeholder communities to:

  • Understand how privacy currently limits data sharing.
  • Develop standards and best practices to enable new information flows.
  • Highlight privacy and security issues associated with our priority areas.

Workshop on Privacy and Security for Big Data, Rutgers University, April 24-25, 2017


Cross-organization Big Data Cyber Attack Awareness – CROSSBAR (Privacy & Security Planning Project)

Principal Investigator: John Yen (Penn State)
Co-Principal Investigators: Vijayalakshmi Atluri (Rutgers), George Cybenko (Dartmouth), Peng Liu (Penn State), Andrew Sears (Penn State)

Cyber security concept on virtual screen with a consultant doing presentation in the background

With a focus on protected sharing of cybersecurity data for countering attacks on digital infrastructure, the CROSSBAR project is developing a platform to enhance collaborative cyber security operations through cross-organization sharing of relevant cybersecurity data.

A Workshop on Cross-Organization Big Data Cyber Attack Awareness was convened in Washington, D.C., November 11, 2016.


Partnerships for Energy Cycle Innovation through Big Data (Energy Planning Project)

Principal Investigator: Abani Patra (U. Buffalo)

lightsThe initial planning project explores how to use a brownfield redevelopment and associated energy infrastructure reinvention in Buffalo, NY as a case study to frame the energy sector’s big data innovation needs.