A Genome Data Analysis Center Focused on Batch Effect Analysis and Data Integration
Full Description
* * * * PROJECT SUMMARY * * * *
Abstract: Technical batch effects pose a fundamental challenge to quality control and reproducibility of even
single-laboratory research projects, but the possibilities for serious error are greatly magnified in complex, multi-
institutional enterprises such as the cancer molecular profiling projects being undertaken by the NCI Center for
Cancer Genomics (CCG). To aid in detection, quantitation, interpretation, and (when appropriate) correction for
technical batch effects in such data, we have developed the MBatch software system. MBatch proved
indispensable for quality-control “surveillance” of data in The Cancer Genome Atlas (TCGA) and ongoing CCG
projects. But detecting and quantitating batch effects (or trend effects or statistical outliers) are just the first steps
in a process. The next steps involve detective work in collaboration with those who generated the data, drawing
upon expertise in integrative analysis across data types, pathways, and systems-level biology. That detective
work usually succeeds in diagnosing the cause of a batch effect as technical or biological. If technical, then
computational methods to ameliorate the batch effect can be applied (judiciously).
The primary aim of the proposed Genome Data Analysis Center (GDAC) is to continue to translate that
successful quality-control model to the CCG’s other current and future large-scale molecular profiling projects
We will be ready to do that on Day 1. We will continue to enhance and extend the power of MBatch and
incorporate a number of innovative new algorithms, tools, and interactive visualizations into it (OmicPioneer-sc,
MutBatch, CarDEC, and CorNet). Evaluating and correcting batch effects is a complex process, so we will
collaborate with other GDACs and data generating centers to determine the influence of artifacts on any analysis
results they produce. The second aim is to contribute and enhance additional competencies. We are prepared
to (i) provide integrated cluster solutions to segregate cases into biologically relevant groups; (ii) provide tools
and expertise for high-level visualization of omic data (including single-cell data); and (iii) analyze RPPA
proteomic data from the subset of projects that generate such data. Our final aim is to communicate results and
distribute corrected data back to other network members, project stakeholders, and the scientific community.
We bring a number of assets to the table, including multidisciplinary expertise in bioinformatics, biostatistics,
software engineering, cancer biology and cancer medicine; PIs with a combined 40+ years of experience in
molecular profiling of cancers; expertise gained in 10 years of doing the batch effects surveillance for TCGA and
other CCG projects; a highly professional software engineering team with a track record of producing high-end
bioinformatics tools; extensive computing resources, including one of the most powerful academic clusters in the
world; and close working relationships with first-class basic, translational, and clinical researchers across MD
Anderson, one of the foremost cancer centers in the U.S. The bottom-line mission of the GDAC will be to aid the
research community’s effort to understand cancer and to prevent, detect, diagnose, and treat it more effectively.
Grant Number: 5U24CA264006-05
NIH Institute/Center: NIH
Principal Investigator: Rehan Akbani
Sign up free to get the apply link, save to pipeline, and set email alerts.
Sign up free →Agency Plan
7-day free trialUnlock procurement & grants
Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.
$29.99 / month
- 🔔Email alerts for new matching tenders
- 🗂️Track tenders in your pipeline
- 💰Filter by contract value
- 📥Export results to CSV
- 📌Save searches with one click