grant

Semi-supervised Approaches to Denoising Electronic Health Records Data for Risk Prediction

Organization HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTHLocation BOSTON, UNITED STATESPosted 1 Aug 2021Deadline 30 Apr 2027

NIHUS FederalResearch GrantFY2024(TNF)-αAddressAdoptionAfter CareAfter-TreatmentAftercareAlgorithmsAutoprothrombin IIIBiologicalBlood Coagulation Factor XCachectinCancersClinicalClinical DataClinical InvestigatorClinical TrialsCoagulation Factor XCodeCoding SystemCohort StudiesCollaborationsColon CancerColon CarcinomaComplexComputer softwareComputerized Medical RecordConcurrent StudiesD2E7 antibodyDataData SetDevelopmentDimensionsDiseaseDisease OutcomeDisease ProgressionDisorderEHR systemElectronic Health RecordElectronic Medical RecordFaceFactor XGeneticGoalsHealth Care SystemsHealthcareHealthcare SystemsHeterogeneityHumanIndividualized risk predictionInflammatoryInflammatory Bowel DiseasesInflammatory Bowel DisorderLabelLearningMAb cA2Macrophage-Derived TNFMalignant NeoplasmsMalignant TumorMassachusettsMeasurementMedical RecordsMethodologyMethodsModelingModern ManMonocyte-Derived TNFOncologyOncology CancerOutcomePatientsPopulationPredicting RiskProceduresPrower factorRegistriesRemicadeResearchResearch ActivityResearch ProposalsResearch SpecimenRiskRisk EstimateRisk FactorsSiteSoftwareSourceSpecimenStatistical AlgorithmStatistical MethodsStatistics AlgorithmStuart FactorStuart-Prower FactorSystemTNFTNF ATNF AlphaTNF geneTNF-αTNFATNFαTarget PopulationsTestingTrainingTranslational ResearchTranslational ScienceTumor Necrosis FactorTumor Necrosis Factor-alphaValidationadalimumabantagonismantagonistbiobankbiologicbiorepositoryburden of diseaseburden of illnesscancer in the colonclinical applicabilityclinical applicationclinical practicecohortcolon cancer riskcolorectal cancer riskcomputer based predictioncostcost effectivecustomized therapycustomized treatmentdata integrationde-noisingdeep learningdeep learning methoddeep learning strategydenoisingdevelopmentaldisease burdendisease riskdisorder riskelectronic dataelectronic health care recordelectronic health informationelectronic health medical recordelectronic health plan recordelectronic health record systemelectronic health registryelectronic medical health recordfacesfacialforecasting riskgenomic datagenomic data-setgenomic datasethealth carehigh dimensionalityimprovedindividualized managementindividualized medicineindividualized patient managementindividualized patient treatmentindividualized predictionsindividualized therapeutic strategyindividualized therapyindividualized treatmentinflammatory disease of the intestineinflammatory disorder of the intestineinfliximabintestinal autoinflammationlarge data setslarge datasetslearning activitylearning methodlearning strategieslearning strategymachine learning based methodmachine learning based modelmachine learning methodmachine learning methodologiesmachine learning modelmalignancymonoclonal antibody cA2multiple data sourcesneoplasm/cancernoveloutcome predictionpatient populationpatient privacypatient specific therapiespatient specific treatmentpersonalized clinical managementpersonalized disease managementpersonalized managementpersonalized predictionspersonalized risk predictionpost treatmentprecision managementprecision medicineprecision-based medicinepredict riskpredict riskspredicted riskpredicted riskspredicting riskspredictive modelingpredictive riskpredicts riskprogramsresponse to therapyresponse to treatmentrisk predictionrisk prediction algorithmrisk prediction modelrisk predictionsstatistic methodsstudy populationsupervised learningsupervised machine learningtailored medical treatmenttailored therapytailored treatmenttherapeutic responsetherapy responsetooltransfer learningtranslation researchtranslational investigationtreatment responsetreatment responsivenessunique treatmentuser-friendlyvalidations

— or —

Get email alerts for similar roles

Full Description

Project Summary
While clinical trials remain a critical source for oncology research, their study findings may not be gener-
alizable to the real world due to the restricted patient population. In recent years, due to the increasing adoption
of electronic health records (EHR) and the linkage of EHR with specimen bio-repositories and other research
registries, integrated large datasets now exist as a new source for translational research. These integrated
datasets open opportunities for developing accurate EHR-based prediction models for disease progression
and treatment response, which can be easily incorporated into clinical practice. These models can also be
contrasted with models derived from clinical trials, bridging the gap between clinical trials and the real world.
However, efficiently deriving and evaluating personalized prediction models using such real world data (RWD)
remains challenging due to practical and methodological obstacles. For example, validated outcome
information from EHR, such as development of colon cancer and 1-year treatment response, requires
laborious medical record review and hence is often not readily available for research. Naive use of error prone
surrogates of the outcome, such as billing codes or procedure codes, as the true outcome may greatly hamper
the power of EHR studies and produce biased results. Semi-supervised risk prediction methods, leveraging
noisy surrogates and a small amount of human annotations on the outcome, may greatly improve the utility of
EHR for precision medicine research. Deriving a precise estimate of the risk model becomes even more
challenging when the number of candidate features is not small relative to the number of annotated outcomes.
Another major challenge with EHR risk modeling lies in the transportability. Complex machine learning models
trained in one EHR system often attain low accuracy in another EHR system, due to the heterogeneity in the
patient population and healthcare system. Transfer learning methods that can automatically adjust model
developed for one EHR cohort to better fit to another EHR cohort is of great value. Synthesizing information
from multiple data sources can improve the quality of evidence. However, meta analyzing EHR from multiple
EHR cohorts faces an additional challenge due to patient privacy. We address these challenges by developing
semi-supervised risk prediction methods with high dimensional predictions in Aim 1; semi-supervised transfer
learning methods to enable risk prediction modeling in target populations with no gold standard labels uted
learin Aim 2; and distributed learning methods for high dimensional predictive modeling in Aim.

Grant Number: 5R01LM013614-04
NIH Institute/Center: NIH
Principal Investigator: TIANXI CAI

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

🔔Email alerts for new matching tenders
🗂️Track tenders in your pipeline
💰Filter by contract value
📥Export results to CSV
📌Save searches with one click

Start 7-day free trial →

Explore more

📍 More grants in UNITED STATES 🏷 More NIH opportunities 🏷 More US Federal opportunities 🏷 More Research Grant opportunities 🏢 All nih_reporter opportunities

Semi-supervised Approaches to Denoising Electronic Health Records Data for Risk Prediction

Full Description

Unlock procurement & grants

Explore more

More from HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH

More from HARVARD UNIVERSITY D/B/A HARVARD SCHOOL OF PUBLIC HEALTH