grant

Manifold representations and active learning for 21 st century biology

Organization MASSACHUSETTS INSTITUTE OF TECHNOLOGYLocation CAMBRIDGE, UNITED STATESPosted 1 Jun 2021Deadline 31 May 2026
NIHUS FederalResearch GrantFY2025Active LearningAlgorithm DesignAlgorithmic DesignAlgorithmic EngineeringAlgorithmsBig DataBigDataBiologicalBiologyBiotechBiotechnologyBody TissuesCRISPRCRISPR/Cas systemCell BodyCellsCellular ExpansionCellular GrowthCellular biologyClustered Regularly Interspaced Short Palindromic RepeatsCommunitiesComplexComputational algorithmComputer Software ToolsComputer softwareCooperative LearningDNA mutationDataData SetDevelopmentDimensionsDiseaseDisorderExperiential LearningExperimental DesignsFeedbackFunctional RNAGenesGeneticGenetic ChangeGenetic defectGenetic mutationGenomicsHeterogeneityHigh-Throughput Nucleotide SequencingHigh-Throughput SequencingIndividualMachine LearningMeasuresMemoryModelingMutationNoncoding RNANontranslated RNAOutcomePathologicPerformancePhenotypePropertyProteomicsSecureSoftwareSoftware ToolsSpatial DesignState InterestsSystems BiologyTimeTissuesTranslatingUncertaintyUntranslated RNAValidationWorkalgorithm engineeringalgorithmic compositionbiologiccell biologycell growthcomputational resourcescomputer algorithmcomputing resourcesdesigndesigningdevelopmentaldoubtexperimentexperimental researchexperimental studyexperimentsgenome mutationgenome profilinggenomic datagenomic datasetgenomic profilinghigh dimensionalityimprovedinsightinterestmachine based learningmachine learning based frameworkmachine learning frameworkmulti-modalitymultimodalitymultiomicsmultiple omicsnoncodingnovelpanomicsprecision medicineprecision-based medicinesmall moleculesoftware toolkitspatial RNA sequencingspatial gene expression analysisspatial gene expression profilingspatial resolved transcriptome sequencingspatial transcriptome analysisspatial transcriptome profilingspatial transcriptome sequencingspatial transcriptomicsspatially resolved transcriptomicsspatio transcriptomicsstructural biologytranscriptomicsvalidations
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

Project Summary
With the rise of high-throughput sequencing and multiplexed biotechnologies enabling single-cell multi-omics

and massively parallel CRISPR experiments, the biomedical community is generating a monumental amount of

data. These data promise to reveal new biology and drive personal and precision medicine. However, the sheer

volume of genomic data is overwhelming current computational resources, requiring prohibitively high compute

time, memory usage, and storage. My lab has been at the forefront of solving big data challenges in genomics,

designing novel algorithms that enable efficient and secure analyses that were previously computationally

infeasible, and that reveal novel structural, cellular, and systems biology. Drawing upon our expertise in

developing scalable and insightful algorithms for analyzing genomic, transcriptomic, and proteomic data, we aim

to tackle two key data-driven challenges facing the biological community: 1) efficient, accurate, and robust

characterization of tissues at the single-cell level, and 2) translating high-throughput datasets into biological

discoveries via machine learning-based prediction. To solve the first challenge, we will leverage our discovery

that seemingly high-dimensional sequencing data often lies on low-dimensional manifolds that capture the

underlying biological state of interest. We will design algorithms that generate these compact, meaningful

manifold representations of single-cell omics datasets. This will enable a number of key applications including

characterizing co-expression and gene-modules that define healthy and pathologic cell states; integrating

multi-modal single-cell omics datasets to more richly characterize cellular diversity; and investigating the

mechanisms underlying transcriptomic diversity across tissues and developmental states. To solve the second

challenge, we will take a two-pronged approach. First, we will design novel machine learning frameworks that

provide a measure of confidence when predicting in unfamiliar biological states, enabling prediction that is robust

to “out-of-distribution” (unobserved) examples. We will then work with our experimental collaborators and CROs

to rapidly perform experimental validation of model-based predictions. Finally, we will return the experimental

results to the model to further improve performance. This will enable an “active learning” feedback loop to

efficiently explore a complex biological space for outcomes of interest. We will use this uncertainty-powered

active learning approach to explore several pressing biological concerns such as the identification of small

molecule compounds with enzymatic or whole-cell growth inhibitory properties, efficient design of spatial-

transcriptomic experiments, computationally guided CRISPR perturbation experiments, and identification of

functional non-coding mutations. This project will result in 1) numerous software tools with wide utility that

efficiently analyze massive biological datasets and guide complex experimentation, and 2) reveal biological

insights, especially into biomolecular interactions and cellular heterogeneity.

Grant Number: 5R35GM141861-05
NIH Institute/Center: NIH

Principal Investigator: BONNIE BERGER

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →