2018 Annual Summit of the Northeast Big Data Innovation Hub


Join us on March 27th and learn how the Hub has grown over the past year, including updates on new cross-sector initiatives, lightning talks from our Big Data Spokes, and opportunities to collaborate with our stakeholders in breakout sessions on data literacy, ethics, and health. Keynote speaker Corinna Cortes (Google Research, New York) will highlight her team’s data-driven approach to fighting fake news. A panel of leaders drawn from academia and the private sector will discuss how they address the challenges of rapidly advancing digital media – both in maximizing its benefits and minimizing its potential drawbacks.


Register Now at Eventbrite.

AGENDA – Tuesday, March 27

8:00 – 9:00 Check-in and breakfast
9:00 – 9:05

Welcome Address

Kathleen McKeown, Henry and Gertrude Rothschild Professor of Computer Science, Columbia University; and Chair, Northeast Big Data Innovation Hub

9:05 – 9:30

Opening Address: Evolution of the Hub

Rene Baston, Executive Director, Northeast Big Data Innovation Hub

9:30 – 10:15

Keynote Speaker: 

Corinna Cortes, Head of Research, Google New York

Corinna Cortes is the Head of Google Research, NY, where she is working on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Corinna spent more than ten years at AT&T Labs – Research, formerly AT&T Bell Labs, where she held a distinguished research position. Corinna’s research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000.

Corinna is also a competitive runner and a mother of two.

10:15 – 10:45 Coffee Break
10:45 – 12:00

Panel Discussion: Industry and Academia

  • Guru Banavar, Chief Technology Officer, Viome
  • Alfred Spector, Chief Technology Officer, 2Sigma
  • Jeannette Wing, Avanessians Director, Data Science Institute, Columbia University
  • Andrew McCallum, Director, Center for Data Science; and Professor, University of Massachusetts Amherst
  • Henry Kautz, Robin & Tim Wentworth Director, Goergen Institute for Data Science, University of Rochester
  • Moderator:  Kathleen McKeown
11:55 – 12:00

Announcement: NSF collaborations – Next round of Azure awards to Northeast Hub 
Vani Mandava, Director, Data Science Outreach, Microsoft Research

12:00 – 1:15 Lunch
1:15- 2:15

Data Sharing Lightning Talk: Jane Greenberg, Alice B. Kroeger Professor at Drexel University and Director of the Metadata Research Center

Sharing data between sectors can provide tremendous mutual benefits. However, legal and policy issues, privacy concerns, and other challenges make data sharing agreements difficult to finalize. Our Data Sharing Spoke PI Jane Greenberg (Alice B. Kroeger Professor, Drexel University, and Director of the Metadata Research Center) will highlight progress toward addressing these challenges with a licensing model and ecosystem for data sharing.

Big Data for Education Lightning Talk: Jaclyn Ocumpaugh, Associate Director, Penn Center for Learning Analytics

Big Data can have a transformative impact on education practice and outcomes. Jaclyn Ocumpaugh (Associate Director, Penn Center for Learning Analytics) will provide an overview on progress made by our Big Data for Education Spoke project, which builds capacity in data-driven education by sharing educational databases, managing yearly data competitions, and conducting educational data science workshops and hackathons.

Health Lightning Talk: Chirag Lakhani, Research Fellow, Harvard Medical School Department of Biomedical Informatics

Learn about progress made by our Health Spoke from Chirag Lakhani (Research Fellow, Harvard Medical School Department of Bioinformatics), as he provides an overview of ExposomeDW: “a search engine to find environmental and phenotypic factors associated with disease and health.” (Join Chirag and others from the Spoke team in the afternoon breakouts for a hands-on demo!)

2:15 – 2:30 Closing Remarks: Rene Baston

Transition to Breakout Sessions: Katie Naum

2:30 – 3:00 Break

Breakout Sessions

3:00 – 5:00

Breakout Session – Health

Join our Health Spoke team for a deeper dive into their project. First, Vasant Honavar (Professor and Edward Frymoyer Chair of Information Sciences and Technology, Penn State University) will give a talk on creating secure infrastructure for data access and use policy compliant integrative analyses of clinical, biomedical, environmental, and socio-demographic data. Then, Chirag Lakhani and Shreyas Bhave will lead a hands-on demo of the ExposomeDW web interface and API, collecting user feedback and ideas on how this resource might be used. Bring your laptop!

3:00 – 5:00

Breakout Session – Data Literacy

What is Data Literacy? We know that everyone needs to be data literate – here’s your chance to help figure out what that means! Join our ongoing effort to draft a comprehensive framework of Data Literacy Essential Concepts, in a session led by Stephen Uzzo and Catherine Cramer (New York Hall of Science). Bring your laptop and come prepared to work in small groups, reviewing and adding to the draft that emerged from last year’s meeting of our Data Literacy inquiry group. This is a collaborative process – add your voice!

3:00 – 5:00

Breakout Session – Ethics

How might we advance the development and adoption of ethical principles and standards in data science? Join us for a discussion and working session aimed at igniting collaborative effort in this critical area. The breakout will open with lightning talks from:

  • Natalie Evans Harris (Chief Operations Officer, BrightHive)
  • Jennifer Stromer-Galley (Professor, Syracuse University)
  • Maria Palombini (Director, Emerging Communities & Initiatives Development, Global Business Strategy & Intelligence, IEEE Standards Association)
  • Meredith Lee (Executive Director, West Big Data Innovation Hub)
  • Norma A. Padrón (Assistant Professor, Thomas Jefferson University

We’ll then break into small groups oriented around best practices, technical standards, data literacy, and tools, to brainstorm ideas for an ethics-driven project.

Justin Hendrix (Executive Director, NYC Media Lab), will then lead a group discussion to identify our community’s priorities for data science ethics and discuss next steps, including identifying resources to make our ideas happen.

5:00 Attendees depart



Columbia Graduate School of Journalism
Pulitzer Hall
Joseph D. Jamail Lecture Hall, 3rd Floor
2950 Broadway
New York, NY 10027



Travel & Accessibility:



Columbia University photo by Getty Hall