grant

From Text to Translation: Using Language Models to Resolve and Classify Variants

Organization BRIGHAM AND WOMEN'S HOSPITALLocation BOSTON, UNITED STATESPosted 1 Sept 2025Deadline 31 Aug 2027
NIHUS FederalResearch GrantFY2025AI language modelsAssayAttentionAutomated AbstractingBackBenignBioassayBiological AssayCategoriesCell BodyCellsClassificationClinVarClinicalClinical ManagementClinical Practice GuidelineClinical geneticsCommunicationComputer AnalysisDataData BasesDatabasesDevelopmentDiagnosticDiseaseDisorderDorsumEnsureEvaluationFamilyGene variantGeneral PopulationGeneral PublicGeneral TaxonomyGenesGeneticGenetic ScreeningGenomic medicineGenomicsGoalsGuidelinesHep G2HepG2HepG2 cell lineIndividualInfluentialsInformation RetrievalInformation extractionInvestigatorsLDLLDL LipoproteinsLabelLaboratoriesLanguageLow-Density LipoproteinsManualsMapsMeasuresMediationMedicalMedical GeneticsMetadataMethodsModelingNatureNegotiatingNegotiationParticipantPathogenicityPatient CarePatient Care DeliveryPatient outcomePatient-Centered OutcomesPatient-Focused OutcomesPatientsPerformancePersonsPredictive textProcessProviderPublicationsRecordsReportingResearch PersonnelResearch ResourcesResearchersResourcesReview LiteratureRiskRisk AssessmentScientific PublicationSourceStandardizationStructureSystematicsTaxonomyTestingTextTherapeuticTimeTrainingTransformer language modelTranslationsUpdateValidationVariantVariant Curation Expert PanelVariationallelic variantartificial intelligence language modelsbeta-Lipoproteinsbio-informatics pipelinebiobankbioinformatics pipelinebiorepositorycare for patientscare of patientscaring for patientsclinical practice and guidelinesclinical predictorsclinical significanceclinically significantclinician communicationcommunicate to clinicianscommunicate to providerscommunicate with clinicianscommunicate with doctorscommunicate with providerscomputational analysescomputational analysiscomputer analysesdata basedeep learningdeep learning methoddeep learning strategydevelopmentaldisease riskdisorder riskdoctor communicationgenetic variantgenome medicinegenomic variantimprovedknowledge graphlarge language modellarge scale language modelmassive scale language modelsmeta datanew approachesnovel approachesnovel strategiesnovel strategypatient oriented outcomesphrasespreventpreventingprobandprovider communicationtext summarizationtranslationunclassified variantuptakevalidationsvariant of uncertain clinical significancevariant of uncertain significancevariant of undetermined significancevariant of unknown significance
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

Project Summary: Deep learning methods toward resolving uncertain variant classifications
Genomic sequencing can substantially improve clinical management, by optimizing surveillance and treatment

options, and improving risk assessment. As the interpretation of genetic variants increases, thousands of new

variant interpretations are entering variant databases each month. Most variants in these databases have

insufficient evidence to be classified as pathogenic or benign, and as a result are classified as Variants of

Uncertain Significance (VUSs). Despite potentially increasing risk, information about these variants cannot be

communicated to providers or patients due to a lack of structured evidence. This translational gap is preventing

many patients who collectively carry such variants from benefiting from genomic medicine.

ClinVar, a large diagnostic variant database contains a unique abundance of predictive information that has

been curated by clinical experts over many years. This includes over 1.1 million plaintext diagnostic reports

that often describe case data, literature review, and an analysis of computational predictions or functional

assay data. We will use these clinical reports to make predictions of pathogenicity, and to identify which

specific sources of evidence of pathogenicity are provided in each report. This project will enhance the value of

data in ClinVar, a public resource used by thousands of investigators, clinicians, and bioinformatic pipelines.

We will first optimize a text classification model to make predictions from diagnostic summaries, evaluating and

fine-tuning a set of large language models which have been trained on different text corpora. Using clinical

reports and known classifications from ClinVar variant submissions, we will evaluate different filtering criteria

used in the training process. We measure performance on high confidence labeled data which have been

previously reviewed by expert panels, as well as on bona fide VUSs, using expert panel curated variant

interpretations as ground truth validation data.

Next, we identify the information from these reports which drive predictions using post-hoc explainability

methods (attention mapping, representation probing, and causal mediation analysis), and then map this

evidence to biomedical concepts related to variant interpretation and pathogenicity, using a knowledge graph

which is refined to highlight these concepts relevant to diagnostic review criteria.

Finally, we will measure the extent to which these approaches can identify complementary evidence across

variant reports generated by different clinical labs related to the same variant, which can be used to re-classify

VUS or resolve a variant with conflicting interpretations. We will manually review a set of clinical reports to

evaluate accuracy of the sources of information that have been recovered. If evidence is sufficient, we will

identify up to 100 variants which are carried by participants in the Mass General Brigham biobank, and attempt

to update their variant classifications so that these results can be communicated to patients.

Grant Number: 1R21HG014015-01
NIH Institute/Center: NIH

Principal Investigator: Christopher Cassa

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →