grant

Multi-modal unsupervised embeddings to advance machine learning in healthcare

Organization ICAHN SCHOOL OF MEDICINE AT MOUNT SINAILocation NEW YORK, UNITED STATESPosted 1 Apr 2022Deadline 31 Jan 2027
NIHUS FederalResearch GrantFY2025AddressCase-Base StudiesCase-Comparison StudiesCase-Compeer StudiesCase-Referent StudiesCase-Referrent StudiesCase/Control StudiesCatalogsClinicalCommunitiesComplexComputer Vision SystemsDataData ReportingData SetData SourcesDevelopmentDimensionsDiseaseDisorderECGEKGEffectivenessElectrocardiogramElectrocardiographyElectronic Health RecordGWA studyGWASGeneticHealth CareHealth Care SystemsHealth systemHeterogeneityHistoryHospitalsImageInstitutionIntuitionInvestigatorsJointsKnowledgeLabelLearningLinkLiteratureMachine LearningManualsMasksMedicalMedical ResearchMedicineMethodsModalityMolecularNatural Language ProcessingNew YorkOnset of illnessOutcomePatientsPatternPhenotypePopulationPopulation HeterogeneityProcessPubMedRecording of previous eventsRecordsResearchResearch PersonnelResearchersSecureSpecific qualifier valueSpecifiedSupervisionSystemTechniquesTestingTextTypologyUMLSUnified Medical Language SystemVisualizationWorkbiobankbiorepositorycase-controlled studiescatalogclinical imagingclinical predictorscohortcomputer visiondata modalitiesdata representationdata representationsdeep learningdeep learning methoddeep learning strategydepositorydevelopmentaldisease onsetdisease phenotypedisease subgroupsdisease subtypedisorder onsetdisorder subtypediverse populationselectronic health care recordelectronic health medical recordelectronic health plan recordelectronic health registryelectronic medical health recordfeature selectionfederated learninggenome wide associationgenome wide association scangenome wide association studygenomewide association scangenomewide association studyheterogeneous populationhigh dimensionalityhistoriesimagingimprovedindividualized predictionsintuitivemachine based learningmachine learning based modelmachine learning modelmulti-modal datamulti-modal datasetsmulti-modalitymultimodal datamultimodal datasetsmultimodalitynatural language understandingnext generationnoveloperationoperationspatient stratificationpersonalization of treatmentpersonalized health carepersonalized medicinepersonalized predictionspersonalized therapypersonalized treatmentpopulation diversityprecision health carepublic repositorypublicly accessible repositorypublicly available repositoryrepositorystratified patientsuccesstoolunsupervised learningunsupervised machine learningvectorwhole genome association analysiswhole genome association study
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

PROJECT SUMMARY
Integrating high-dimensional and heterogenous biomedical data, such as electronic health

records (EHRs), molecular data, imaging, and free text, is a key challenge for making robust

discoveries that transform healthcare. Current work in the literature commonly analyze

biomedical data types separately, focus on small disease-related cohorts of patients, and rely on

domain experts and manual clinical feature selection in an ad hoc manner. Although

appropriate in some situations, supervised definitions of the feature space scale poorly, do not

generalize well, include inherent biases, and miss opportunities to discover novel patterns and

features. To address these issues, we will develop novel methods based on unsupervised

machine learning to derive low-dimensional vector-based representations, i.e., “embeddings”, of

medical concepts and patient clinical histories from large- scale, multi-modal and domain-free

biomedical datasets. These pre-computed representations aim to overcome common biases due to

population, supervised labeling, and specific hospital operation processes. These multi-modal

embeddings can be fine-tuned and applied to a number of specific predictive tasks,

improving scalability, generalizability and effectiveness of machine learning models in

healthcare. In particular, we will first develop methods based on unsupervised learning to create

multi-modal embeddings of medical concepts using heterogeneous EHRs, linked biobanks and

electrocardiogram waveform data, from the diverse population of five hospitals within the Mount

Sinai Health System in New York, NY, and publicly available medical knowledge. We will then

create a scalable framework to compute unsupervised multi-modal embeddings that can

summarize patient clinical histories and lead to subtyping and patient stratification. We will also

develop a federated learning system to share, visualize, and combine embeddings generated

separately at different medical institutes to capture a larger and more diverse population and

clinical landscape. We will apply embeddings to advance methods for EHR-based disease

phenotyping, onset prediction, and subtyping. While tested on EHRs, genetic and waveform data

from linked repositories, and medical knowledge, the proposed approaches will be easily

extendable to include other data, such as clinical images. This project will represent a step

towards the next generation of ML in healthcare ML that can (i) scale to billions of patients, (ii)

embed complex relationships of multi-modal data, and (iii) create less biased disease

representations by securely learning from patients across institutions via federated learning.

Grant Number: 5R01LM013766-04
NIH Institute/Center: NIH

Principal Investigator: Gabriele Campanella

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →