A scalable computational pipeline to develop polygenic risk scores from biobank data

Guest post by Dr. Hongyu Zhao, Yale School of Public Health, Yale University

This Success Story is a report on the results of the Northeast Big Data Innovation Hub’s 2020 Seed Fund program.

The goal of this project was to address the computational and implementation issues by developing a unified and user-friendly web platform for practicing PRS analysis and benchmarking most existing PRS methods.

Dr. Zhao’s team developed a faster version of Annopred software, which significantly reduces the computational time (about 2X speed up) while achieving the best performance in the benchmark. The updated Annopred is freely available. A web-server presenting data processing details during PRS calculation and also visualizing benchmarking results across different algorithms is constructed. The web platform can also calculate genetic risks based on uploaded genotype files or new weights from users. Dr. Zhao’s team intends to host the webserver on Google Cloud computing.

This project has established collaborations between Dr. Zhao’s group and Dr. Robert Bjornson of the Yale Center for Research Computing. Together, they can develop efficient computation tools for polygenic risk scoring methods. An NIH application was submitted and ranked in the 3rd percentile based on some of the preliminary results from this seed grant. The Yale portion of that R01 application was budgeted for $600K for four years.

A number of students supported this project, including Jerry Shan, an undergraduate at Yale University, Takintayo Akinbiyi, a postdoc at Yale University, and Wei Jiang, a postdoc at Yale University.

Lead PI: Hongyu Zhao (Yale School of Public Health)

Collaborators: Robert Bjornson (Yale University), Wei Jiang (Yale School of Public Health)

Image of Hongyu Zhao

Hongyu Zhao is the Ira V. Hiscock Professor of Biostatistics, Professor of Genetics and Professor of Statistics and Data Science in the Yale School of Public Health at Yale University. He received his B.S. in Probability and Statistics from Peking University in 1990 and Ph.D. in Statistics from the University of California at Berkeley in 1995. His research interests are the developments and applications of novel statistical methods to address scientific questions in genetics, molecular biology, drug developments, and precision medicine.

Dr. Robert Bjornson, a Research Scientist in Computer Science at Yale University, has a background in parallel computing and bioinformatics. He has extensive experience with HPC algorithms and software in both academic and commercial environments.  Dr. Bjornson manages the high-performance computing clusters for the Biological & Biomedical Sciences, and the Keck Biomedical HPC facility.  In addition to providing training, consultation, programming support, and debugging assistance for user applications, he manages the installation and availability of a variety of HPC software applications, libraries, and tools.  Dr. Bjornson holds a Ph.D. in Computer Science from Yale University.

Image of Robert Bjornson
Image of Yale School of Public Health

Dr. Wei Jiang, an Associate Research Scientist in Biostatistics in the Yale School of Public Health at Yale University, received his Ph.D. in Electronic and Computer Engineering from the Hong Kong University of Science and Technology. His research interests lie in the fields of Bioinformatics and Biostatistics. His current research topic is to develop computational and statistical analysis methods in genome-wide association studies. His papers have been published in AJHG, Briefings in Bioinformatics, Bioinformatics, PLoS Computational Biology, etc. He received the best paper award in APBC2016 held in San Francisco, US.