grant

Semi-supervised Approaches to Denoising Electronic Health Records Data for Risk Prediction

Organization HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTHLocation BOSTON, UNITED STATESPosted 1 Aug 2021Deadline 30 Apr 2027
NIHUS FederalResearch GrantFY2024(TNF)-αAddressAdoptionAfter CareAfter-TreatmentAftercareAlgorithmsAutoprothrombin IIIBiologicalBlood Coagulation Factor XCachectinCancersClinicalClinical DataClinical InvestigatorClinical TrialsCoagulation Factor XCodeCoding SystemCohort StudiesCollaborationsColon CancerColon CarcinomaComplexComputer softwareComputerized Medical RecordConcurrent StudiesD2E7 antibodyDataData SetDevelopmentDimensionsDiseaseDisease OutcomeDisease ProgressionDisorderEHR systemElectronic Health RecordElectronic Medical RecordFaceFactor XGeneticGoalsHealth Care SystemsHealthcareHealthcare SystemsHeterogeneityHumanIndividualized risk predictionInflammatoryInflammatory Bowel DiseasesInflammatory Bowel DisorderLabelLearningMAb cA2Macrophage-Derived TNFMalignant NeoplasmsMalignant TumorMassachusettsMeasurementMedical RecordsMethodologyMethodsModelingModern ManMonocyte-Derived TNFOncologyOncology CancerOutcomePatientsPopulationPredicting RiskProceduresPrower factorRegistriesRemicadeResearchResearch ActivityResearch ProposalsResearch SpecimenRiskRisk EstimateRisk FactorsSiteSoftwareSourceSpecimenStatistical AlgorithmStatistical MethodsStatistics AlgorithmStuart FactorStuart-Prower FactorSystemTNFTNF ATNF AlphaTNF geneTNF-αTNFATNFαTarget PopulationsTestingTrainingTranslational ResearchTranslational ScienceTumor Necrosis FactorTumor Necrosis Factor-alphaValidationadalimumabantagonismantagonistbiobankbiologicbiorepositoryburden of diseaseburden of illnesscancer in the colonclinical applicabilityclinical applicationclinical practicecohortcolon cancer riskcolorectal cancer riskcomputer based predictioncostcost effectivecustomized therapycustomized treatmentdata integrationde-noisingdeep learningdeep learning methoddeep learning strategydenoisingdevelopmentaldisease burdendisease riskdisorder riskelectronic dataelectronic health care recordelectronic health informationelectronic health medical recordelectronic health plan recordelectronic health record systemelectronic health registryelectronic medical health recordfacesfacialforecasting riskgenomic datagenomic data-setgenomic datasethealth carehigh dimensionalityimprovedindividualized managementindividualized medicineindividualized patient managementindividualized patient treatmentindividualized predictionsindividualized therapeutic strategyindividualized therapyindividualized treatmentinflammatory disease of the intestineinflammatory disorder of the intestineinfliximabintestinal autoinflammationlarge data setslarge datasetslearning activitylearning methodlearning strategieslearning strategymachine learning based methodmachine learning based modelmachine learning methodmachine learning methodologiesmachine learning modelmalignancymonoclonal antibody cA2multiple data sourcesneoplasm/cancernoveloutcome predictionpatient populationpatient privacypatient specific therapiespatient specific treatmentpersonalized clinical managementpersonalized disease managementpersonalized managementpersonalized predictionspersonalized risk predictionpost treatmentprecision managementprecision medicineprecision-based medicinepredict riskpredict riskspredicted riskpredicted riskspredicting riskspredictive modelingpredictive riskpredicts riskprogramsresponse to therapyresponse to treatmentrisk predictionrisk prediction algorithmrisk prediction modelrisk predictionsstatistic methodsstudy populationsupervised learningsupervised machine learningtailored medical treatmenttailored therapytailored treatmenttherapeutic responsetherapy responsetooltransfer learningtranslation researchtranslational investigationtreatment responsetreatment responsivenessunique treatmentuser-friendlyvalidations
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

Project Summary
While clinical trials remain a critical source for oncology research, their study findings may not be gener-

alizable to the real world due to the restricted patient population. In recent years, due to the increasing adoption

of electronic health records (EHR) and the linkage of EHR with specimen bio-repositories and other research

registries, integrated large datasets now exist as a new source for translational research. These integrated

datasets open opportunities for developing accurate EHR-based prediction models for disease progression

and treatment response, which can be easily incorporated into clinical practice. These models can also be

contrasted with models derived from clinical trials, bridging the gap between clinical trials and the real world.

However, efficiently deriving and evaluating personalized prediction models using such real world data (RWD)

remains challenging due to practical and methodological obstacles. For example, validated outcome

information from EHR, such as development of colon cancer and 1-year treatment response, requires

laborious medical record review and hence is often not readily available for research. Naive use of error prone

surrogates of the outcome, such as billing codes or procedure codes, as the true outcome may greatly hamper

the power of EHR studies and produce biased results. Semi-supervised risk prediction methods, leveraging

noisy surrogates and a small amount of human annotations on the outcome, may greatly improve the utility of

EHR for precision medicine research. Deriving a precise estimate of the risk model becomes even more

challenging when the number of candidate features is not small relative to the number of annotated outcomes.

Another major challenge with EHR risk modeling lies in the transportability. Complex machine learning models

trained in one EHR system often attain low accuracy in another EHR system, due to the heterogeneity in the

patient population and healthcare system. Transfer learning methods that can automatically adjust model

developed for one EHR cohort to better fit to another EHR cohort is of great value. Synthesizing information

from multiple data sources can improve the quality of evidence. However, meta analyzing EHR from multiple

EHR cohorts faces an additional challenge due to patient privacy. We address these challenges by developing

semi-supervised risk prediction methods with high dimensional predictions in Aim 1; semi-supervised transfer

learning methods to enable risk prediction modeling in target populations with no gold standard labels uted

learin Aim 2; and distributed learning methods for high dimensional predictive modeling in Aim.

Grant Number: 5R01LM013614-04
NIH Institute/Center: NIH

Principal Investigator: TIANXI CAI

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →