Codeathon: VCF Files for Population Genomics: Scaling to Millions of Samples

NCBI and NIAID are excited to announce our codeathon, VCF Files for Population Genomics: Scaling to Millions of Samples, to be held virtually from July 31st to August 4th, 2023 (1-5 PM ET). VCF, or Variant Call Format Files, are a widely used file for storing information about genetic variations across multiple samples. This event aims to bring together experts in viral evolution, molecular epidemiology, and population genomics to explore methods for computing on the scale of millions of records using VCF files as inputs, with SARS-CoV-2 VCFs serving as a case study. 

We invite both programmers and non-programming subject matter experts to apply for this event. During the week-long codeathon, teams of 5-10 people will collaborate virtually to design visualizations and write software to address the following objectives: 

  • Explore the use of SARS-CoV-2 variant call format (VCF) files as input to downstream applications such as viral classification, characterization, and phylogenetic tree construction. 
  • Estimate intra-host sequence diversity and predict emerging new variants of SARS-CoV-2 using VCFs as input. 
  • Streamline processes to assess the current therapeutic and preventative options using VCFs. For example, assessing genome-scale responses to vaccines or therapeutics.  
  • Enhance and facilitate the data clustering process and modeling for predictions using VCFs. 

The codeathon will be cooperative rather than competitive, and teams will share ideas and technical expertise. At the end of the event, teams will present their work to each other and representatives across the National Institutes of Health (NIH). 

After the codeathon, we will make the team products publicly available through the NCBI Codeathons GitHub Organization. We encourage participants to co-author a joint manuscript, share their work online, and/or at conferences. 

If you are interested in participating, pitching an idea for a team project, and/or serving as a team leader, please apply or email us at Please note that participation may be capped due to technical limitations and the total number of accepted projects. 

We look forward to hearing from you and working together to explore methods for computing on the scale of millions of records. 

