We live in the age of the data deluge, with research group IDC forecasting that humankind will create and replicate 163 zettabytes of data by the year 2025. For perspective, only about 2.7 zettabytes were created and replicated globally in 2012. How might we most effectively manage our abundant data resources to advance scientific research?
The Open Storage Network, a new initiative funded by the National Science Foundation (press release), will support the initial development of a data storage network described in its abstract as “a pilot for a potential national-scale storage infrastructure for open scientific data,” which at full scale could serve hundreds of petabytes. OSN will support innovative data-driven discovery by allowing researchers to work with and share data more effectively, with ease of use highlighted as a key priority. As noted in the grant, “Many of the technologies associated with such a distributed system already exist; the key challenge in this project is social engineering: how can one design a simple enough yet robust storage node that can be easily replicated, is attractive for universities and research projects to adopt, is easy to manage and can support the various patterns for large scale scientific analyses?”
The project emphasizes that broad community buy-in is critical for the success of a network at this scale. Members drawn from each of the four Big Data Hubs will collaborate with project lead Alex Szalay (Johns Hopkins University) and the National Data Service in OSN’s development: the West BD Hub at the San Diego Supercomputer Center (SDSC), the Midwest BD Hub at the National Center for Supercomputer Applications (NCSA), the Southern BD Hub at the Renaissance Computing Institute (RENCI), and the Northeast BD Hub at the Massachusetts Green High Performance Computing Center (MGHPCC) and Pittsburgh Supercomputing Center (PSC).
Szalay, an astrophysicist and OSN’s project lead, spoke with Johns Hopkins University about the project, noting it “could completely change the academic big data landscape.” Read more (external link).