Using a data-driven approach to study health disparities and secular trends in the chemical and individual exposome in the NHANES (National Health and Nutrition Examination Surveys)


Guest post by Chirag Patel, Harvard Medical School

This Success Story is a report on the results of the Northeast Big Data Innovation Hub’s 2021 Seed Fund program.


This research project considered the health challenges posed by environmental hazards across the U.S., with a particular focus on the health disparities experienced by different social groups.  The research team considered chemical exposomes – the totality of chemical exposures and its secular trends – to understand the relationships between complex exposures, environmental inequalities, and health disparities. Earlier research by this group showed that geographic differences play a significant role in disease risk, potentially intersecting with other factors such as income, education, and individual behaviors. Chemical environmental exposures may also be an important contributor to health disparities across U.S. populations (e.g. circulating dioxins and polycyclic aromatic hydrocarbons in blood). This project was designed to systematically investigate chemical co-exposures in the U.S. to identify disparities in the patterns, correlations, and temporal trends of exposures in the disadvantaged groups using chemical biomarker data (over 140 chemicals from 16 different classes) available in multiple National Health and Nutrition Examination Surveys (NHANES, 1999 to 2018). 

Specifically, this project had three specific aims: 

1) to estimate the temporal changes of the chemical mixtures in relation to the disadvantaged populations with simple and set-based statistics — Spearman’s rank correlation and canonical correlation analyses 

2) to calculate a novel “chemical Gini index” to succinctly quantify the environmental inequalities between and within the disadvantaged groups

3) to develop a chemo-exposure risk score (CRS) to summarize the risks of particular health outcomes from the totality of chemical exposures and investigate how CRSs and their temporal trends differ in the disadvantaged groups.

The outcomes of this project demonstrate the value of using a data-driven approach to conduct exploratory analyses in disparity and inequality research. New insights that are previously hidden in the data could provide guidance for further studies with a traditional hypothesis-driven approach. Such a combination of approaches can speed up the discovery and translation of research findings to support science-based policies that are formulated to address environmental inequality and health disparity issues in the U.S. 

Key results from this study were shared in a collaborative article titled “Spatio-Temporal Interpolation and Delineation of Extreme Heat Events in California between 2017-2021.”


Lead PI: Chirag Patel (Harvard Medical School)

Chirag Patel

Chirag Patel is an Associate Professor of Biomedical Informatics at Harvard Medical School. Chirag Patel’s long-term research goal is to address problems in human health and disease by developing computational and bioinformatics methods to reproducibly and efficiently reason over high-throughput data streams spanning molecules to populations. Chirag’s group aims to dissect inter-individual differences in human phenomes through strategies that integrate data sources that capture the comprehensive clinical experience (e.g., through the electronic medical record), the complex phenomena of environmental exposure (e.g., high-throughput measures of the exposome), and inherited genomic variation. He received his doctorate in biomedical informatics from Stanford University.

Learn more on Dr. Patel’s website.