Human Microbiome Compendium: large-scale curation and processing of human microbiome datasets
Full Description
ABSTRACT
Mounting evidence shows the microbial communities living in (and on) the human body play a key role in the
etiology of disease. A major obstacle in the field is the dearth of reliable methods for extracting meaningful signals
from small, noisy, intercorrelated, and highly variable microbiome datasets. Enhancing the ability of researchers
to generate robust characterizations of the complex relationship between microbiota and their hosts will support
novel, more reliable diagnosis of disease and bring the field one step closer to finding the causal links underlying
microbiome-based therapeutics. Until now, however, researchers have not had the huge volume of data required
to draw these conclusions. Although microbiome data from hundreds of thousands of samples is available in the
NCBI Sequence Read Archive (SRA), these datasets have not been leveraged at a large scale. To bridge this
gap, we will build an automated pipeline to process and aggregate more than 750,000 samples of amplicon and
shotgun metagenomics sequencing data from all publicly available human microbiome samples. We will build a
platform, which we call "The Human Microbiome Compendium," for compiling collections of relevant samples
that can be used by researchers to find ecological dynamics that have until now been hidden in the noise. The
compendium will allow users to see relative abundances of microbial taxa in every sample, which will also be
linked to NCBI metadata and annotations generated by a new tool that imputes a uniform set of descriptors for
sample type, body site, and host traits. We will also use the compendium to train machine learning models for
dimensionality reduction, which will improve the power of independent microbiome studies by incorporating
insights from the compendium's collection of hundreds of thousands of samples. These data and tools will be
distributed across multiple channels, including a web application where users will be able to upload data to be
processed in real time by the dimensionality reduction tools. The proposed studies will generate the first
comprehensive aggregation of the microbiome datasets available via the SRA, which will be used to provide
characterizations of the human microbiome in unprecedented detail. The resulting compendium will encourage
the use of publicly available data and inform new microbiome analysis tools that will help extract important
associations in studies where it's impractical to acquire the sample sizes required by conventional techniques.
Results from this study will be a starting point to identification of microbiome biomarkers for disease and the
development of novel therapeutic approaches.
Grant Number: 5R01LM013863-04
NIH Institute/Center: NIH
Principal Investigator: Ran Blekhman
Sign up free to get the apply link, save to pipeline, and set email alerts.
Sign up free →Agency Plan
7-day free trialUnlock procurement & grants
Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.
$29.99 / month
- 🔔Email alerts for new matching tenders
- 🗂️Track tenders in your pipeline
- 💰Filter by contract value
- 📥Export results to CSV
- 📌Save searches with one click