Cybersecurity as Big Data Science Interactive Workshop

April 12, 20213:00-4:30pm EDT (UTC-4)

Join Us

We welcome you to join us for an interactive workshop to discuss and design data science techniques to address current and emerging cybersecurity challenges.  

The workshop objectives are to: 

  • Understand common and unique data science challenges for cybersecurity data.
  • Share best data science practices, technical and non-technical, for cybersecurity challenges.

Cybersecurity challenges we will discuss and determine how to address with data science tools and techniques include:

  • Ever-evolving data from sources such as log files, intrusion alerts, vulnerability databases, dark web, and GitHub.
  • Data quality issues such as susceptibility to adversarial attacks, lack of labeled data, high signal-to-noise ratios, and data fragmentation.

Experts in data science, databases, visualization, statistics, feature engineering, modeling, reproducibility, streaming data, heterogeneous data, irregular data, explainable AI, and human-centered computing in all application areas, and cybersecurity are encouraged to join the workshop. 


All times in ET on April 12, 2021. Please check back for updates.

3:00pm: Overview: Cybersecurity as Big Data Science — Findings and Challenges

Speakers: S. Jay Yang, Rochester Institute of Technology, and Sagar Samtani, Indiana University

3:15pm: Panel Q&A: Framing Cybersecurity as a Data Science Problem

Moderator: S. Jay Yang

Panelists: Sagar Samtani; Kathy Benninger, Pittsburgh Supercomputing Center; Amarnath Gupta, San Diego Supercomputing Center; and Jelena Mirkovic, University of Southern California Information Sciences Institute

Audience participation through open discussions and live polls.

Sample questions:

  • What are the common and not-so-common data science challenges for cybersecurity data?
  • What are the blind spots for the cybersecurity community when treating data problems and vice versa?
  • What are the data science tools and techniques we can leverage for cybersecurity data science?
  • How to deal with and model multiple types of data (sources) simultaneously? Tips and tricks?
  • What are the barriers for entry for a relatively new data science domain? How to manage those?
  • How to influence practitioners who are not data scientists to use the output of your work? 
  • How to get students who are interested in AI to get familiar with the cyber domain and vice versa? 

4:15pm: Discussion: Follow-up Actions with the Community

  • Community forums, quarterly webinars, opportunities for collaboration (research and transition-to-practice), industry engagement, outreach for underrepresented groups, etc.


This workshop is hosted by Dr. S. Jay Yang, Rochester Institute of Technology, and Dr. Sagar Samtani, Indiana University, and supported by the NSF Big Data Innovation Hubs and Cyberinfrastructure and Data Sharing Working Group.

For more background on big data science for cybersecurity, read “Trailblazing the Artificial Intelligence for Cybersecurity Discipline: A Multi-Disciplinary Research Roadmap” by Sagar Samtani, Indiana University; Murat Kantarcioglu, University of Texas at Dallas; and Hsinchun Chen, University of Arizona.