Big Data in Education: News and Competition

How can big data help predict student outcomes? Ryan Baker (U Penn) and Neil Heffernan (Worcester Polytechnic Institute) of our Big Data for Education Spoke hope to do just that via the Longitudinal Educational Big Data Competition. Using carefully de-identified, real-world educational data, participants will predict whether 172 students in validation and test sets pursue a career in STEM fields.

This competition uses data from a longitudinal study, now over a decade long, led by Professor Ryan Baker and Professor Neil Heffernan. This study, funded by multiple grants from the National Science Foundation, tracks students from their use of the ASSISTments blended learning platform in middle school in 2004-2007, to their high school course-taking, college enrollment, and first job out of college. Several papers have shown that behavior in ASSISTments in middle school can predict high school and college outcomes. In this competition, you will receive access to extensive (but carefully deidentified) click-stream data from middle school ASSISTments use, as well as carefully curated brand new outcome data on first job out of college, never before used in published research.

The competition is now open and runs through October 1st at noon EST. Successful entries will be invited to submit both to a conference workshop (at EDM2018, in Buffalo, NY) and to a special issue of the Journal of Educational Data Mining.


Further details, including registration information and dataset access, are available at

In the video below, Prof. Heffernan provides an overview of the competition.

In addition, the next meeting of the Big Data for Education Spoke will be held on 8/28, 9 am – 4:30pm, at Teachers College Columbia University, in New York, NY. From 9 am – 12 pm noon, we will hold a general meeting for the Spoke. Lunch will be provided noon – 1 pm. From 1 pm – 4:30 pm, participants will have their choice of a workshop on the Longitudinal Educational Big Data Competition, or two tutorials on educational data mining methods.

To RSVP, or for more details, please contact