COVID Information Commons: Lightning Talks (January 2022)

Guest Post:  Kenia Pujols

Keywords: Pandemic, STEM, structural biology, crystallography, MARCO polo, COVID-ARC, bibliometrics, health, data science, collaboration, archive, COVID-19.

A recording of this event is available at the Northeast Big Data Hub’s YouTube Channel and The COVID Information Commons is a NSF-funded project brought to you by the Big Data Innovation Hubs, led by the Northeast Big Data Innovation Hub at Columbia University.

This month’s COVID Information Commons (CIC) webinar took place on January 12th, 2022. In this forum, four leading COVID-19 scientists funded by the NSF presented their current research on the global pandemic. Florence Hudson, Executive Director of the Northeast Big Data Innovation Hub at Columbia University, moderated the discussion. The four researchers presented on a wide variety of topics, each touching on broader themes related to the COVID-19 pandemic. 

From the Hauptman-Woodward Medical Research Institute (HWI), Sarah Bowman presented “Enhanced SARS-CoV-2 High-Throughput Crystallization for Structural Studies”. Dr. Bowman returned to provide some updates about her previous talk in October 2020. As such, she commented that structural biology studies the pieces of the virus to help understand the different structures of the proteins encoded by the viral genome. She cited that the X-ray crystallography technique performed at the HWI has been critical to advancing vaccine development and possible treatments in the past two years. After generating the image from SARS-Cov samples, scientists can visualize the viral proteins in detail using the HT pipeline to identify crystallization conditions by monitoring crystal growth over time. As of right now, they have crystalized many coronavirus proteins from different groups across the country and all over the world, generating close to 14,000 images per sample. They also built a software with machine recognition called ‘MARCO polo’. She highlighted that now is the perfect time to do outreach and collaborate with the research community. 

Dominique Duncan, University of Southern California, presented the “COVID-ARC (COVID-19 Data Archive)”. This month, Dr. Duncan returned to the CIC webinar to update her presentation from September 2020, when she first discussed the ‘COVID-ARC’. She explained that her institute has plenty of experience working with large-scale multimodal data repositories. The program’s goal is to aggregate COVID-19 data and resources, build a platform with networked and centralized archives, and integrate visualizations and analytic tools to understand the impact of COVID-19. She reported that ‘COVID-ARC’ contains 28 worldwide datasets built-in one centralized archive, and uses ASPERA, IBM’s HIPAA compliant encrypted high-speed file transfer system for either uploading data to the server or for people who want to access the data. Additionally, Dr. Duncan’s students have been working on complementary projects. One example is the comparison of 40 convolutional neural network architectures for COVID-19 diagnosis. They identified EfficientNet-B5 as the most accurate, sensible, and specific model to predict neural networks. 

Gerald Marschke, University at Albany, presented the paper “Investigating the Impact of COVID-19 on the Future of the U.S. STEM Workforce”. Dr. Marschke talked about some results from one of his papers about STEM resiliency during COVID-19. He expressed that STEM workers are a key workforce sector because their jobs affect scientific research and economic growth. Overall, STEM workers have done well in comparison to non-STEM workers when it comes to job security during the pandemic. He reported job losses in occupations with more face-to-face contact and less ability to work remotely. He concluded that higher-skilled disciplines have better labor market outcomes during an economic recession. What explains the resiliency of the STEM workforce is that STEM workers are concentrated in professional, scientific, and technical services. STEM knowledge used in the job also protects the STEM and non-STEM workforce. 

Alan Porter, Search Technology, Inc. and Georgia Institute of Technology, presented “Exploring Causes and Cures for COVID-19 through Improved Access to Biomedical Research”. Dr. Porter talked about tools to classify data from multiple sources. They developed intelligent bibliometrics to enhance research retrieval, created a recommender system to identify pertinent research outside the core literature, and built a COVID-19 dashboard. Using ‘tech mining,’ Dr. Porter has retrieved research abstracts from, which is updated monthly. ‘Tech mining’ allows the tracking of scientific evolutionary pathways to analyze how research topics change over time by grouping articles with related topics in period one and tracking relationships to the next period. 

Following the four presentations, Florence Hudson hosted a Q&A session where the audience engaged in a rich discussion with the researchers. It is evident by this talk that data science and molecular biology are providing excellent insights to leverage innovation and discoveries during the COVID-19 pandemic and beyond. As it was summarized above, data science enables curated search mechanisms to link data points from different disciplines. We all can learn from the STEM workforce resiliency to find solutions to the unemployment rate experienced during this crisis. Moreover, PIs shared datasets, tools, and techniques with the research community to build collaborations and find immediate answers to the needs of our population. Visit our website to find more details about the researchers.

We look forward to welcoming you to our next webinar. Stay tuned! 

How else can you stay in touch with the CIC community and receive updates about future events and opportunities?

·        Sign up for the CIC Mailing List

·        Join the CIC Slack Channel

·        Follow the CIC on Twitter (@CIC_COVID) and Instagram (@nebigdatahub)

Email us at and let us know what you’re interested in hearing about next!