Using machine learning to optimize big data workflows for collaborative computational steering

Guest post by Chase Wu, Associate Chair of and Professor in the Department of Computer Science at NJIT.

Research Background

Model-based simulations have become an essential component in next-generation scientific applications and are generating big data on the order of terabyte at present and petabyte or exabyte in the predictable future. In many such applications, the data must be processed and analyzed by a team of scientists working in different geographic locations using various computing techniques for knowledge discovery and scientific innovation. Often, simulation results are compared with or calibrated against new observational or experimental data to validate and refine the model. But when individual researchers change the model’s parameters in this practice of computational steering, the work of other researchers involved in the study can be affected. Big data-driven technologies deployed on clouds, such as Apache Hadoop, have become increasingly popular for managing workflow changes for these simulation-computing processes.

Research on Steering

Researchers at NJIT are developing a cross-layer coupled design framework, which uses machine learning to enable and optimize big data workflows for collaborative computational steering. This steering service allows scientists to upload simulation codes based on the model so that the workflow can update automatically or with limited intervention as model parameters are tuned for exploration, evaluation, and selection through visualization. This machine learning-guided approach is anticipated to streamline research processes and revolutionize current offline and manual steering methods to accelerate scientific discoveries across the broad science community.

Please contact Chase Wu at if you are interested in collaborating or learning more about how this steering approach can be adapted for different research applications.