Teaching Responsible Data Science through Cybersecurity Analytics


Guest post by Dr. S. Jay Yang, Rochester Institute of Technology (RIT)

This Success Story is a report on the results of one of the awards in the Northeast Big Data Innovation Hub’s 2021 Seed Fund program.


There were three primary objectives for this project: 

1) To compile cybersecurity datasets from the open-domain for education and learning;

2) To develop education content for responsible data science from the cybersecurity perspective;

3) And to engage a broad community of students and professionals, with a particular emphasis on underrepresented minorities.

At the conclusion of this study, a list of continuously updated cybersecurity datasets was produced. This list contains network traffic data for cybersecurity analytics and was used in PI Yang’s classes. The datasets collected represent varying quality and usability. Among the most commonly used open-domain repositories are: Canadian Institute of Cybersecurity Datasets, Czech Technical University, Stratosphere Research Laboratory Datasets, USB (University of Sannio, Benevento) Datasets, UNSW Dataset, and VizSec

Additionally, PI Yang used this grant to develop and enhance a course on Machine Intelligence for Cybersecurity Analytics at RIT. The course is a semester-long project based course for senior undergraduate and graduate students. The key modules developed include the winsorizing of polarized feature values, upsampling, frequency encoding for categorical features, and the use of cross entropy to assess the effect of feature engineering. Some notable Git repositories developed by students in PI Yang’s class include those from Vazgen Tadevosyan, Dan Popp, Anna Nicolais, and Varun Malhorta

Students who worked with PI Yang  in short-term research studies have examined a number of continual learning and transfer learning techniques for network traffic and malware analysis. In total, six students received partial support from this project (five undergrad, one master’s student; two women; four international students). PI Yang and Chanel Cheng (RIT undergraduate computer science student) presented a poster on “Cross-Organizational Continual Learning of Cyber Threat Models” at the Annual Computer Security Applications Conference (ACSAC) in Austin, TX in December 2022. Other students, including Matthew Heller, Vazgen Tadevosyan, and Serena Yang, contributed to the various aforementioned feature analysis techniques across open-domain datasets.

At the conclusion of the project, PI Yang and his PhD student, Reza Fayyazi, worked with visiting international students, Pradumna, and Stavros Damianakis, to explore how Large Language Models (LLMs) may be applied to cybersecurity operations. This has led to the discovery of new research directions in investigating the tradeoffs of LLMs’ creative imagination versus factually grounded capabilities for cyber-defense. 


Lead PI: Jay Yang (Rochester Institute of Technology)

Shanchieh (Jay) Yang is a Professor in the Department of Computer Engineering at the Kate Gleason College of Engineering, Rochester Institute of Technology. He is also the Director of Research for the ESL Global Cybersecurity Institute. His research focuses on advancing generative AI, data science, and simulation for predictive cyber intelligence and anticipatory cyber defense. His research group has been supported by NSF, IARPA, DARPA, NSA, AFRL, ONR, and ARO. His research group has introduced ASSERT to continuously learn and update emerging statistical attack models, CASCADES to simulate synthetic scenarios grounded with a theoretical understanding of adversary behaviors, and CAPTURE to forecast cyberattacks using unconventional signals in the public domain. Other earlier works include Variable Length Markov Models (F-VLMM), Virtual Terrain (VTAC), and Attack Social Graphs (ASG) for predictive cyber situation awareness (FuSIA, VTAC, & ViSAw). 

He was a 2019 NSF Trusted CI Open Science Fellow and a 2020 NSF Trusted CI TTP Fellow. He received the IEEE Region 1 Outstanding Teaching in an IEEE Area of Interest Award for outstanding leadership and contributions to cybersecurity and computer engineering in 2019. He received the Norman A. Miles Award for Academic Excellence in Teaching in 2007, and was also a co-chair for the IEEE Joint Communications and Aerospace Chapter in Rochester NY in 2005, when the chapter was recognized as an Outstanding Chapter of Region 1. As an innovative and collaborative leader in academia, he has also established several international partnership programs and collaborations with universities across Europe and Asia.

Dr. Shanchieh (Jay) Yang received his BS degree in Electronics Engineering from National Chiao-Tung University in Taiwan in 1995, and MS and Ph.D. degrees in Electrical and Computer Engineering from the University of Texas at Austin in 1998 and 2001, respectively.