grant

From Text to Translation: Using Language Models to Resolve and Classify Variants

Organization BRIGHAM AND WOMEN'S HOSPITALLocation BOSTON, UNITED STATESPosted 1 Sept 2025Deadline 31 Aug 2027

NIHUS FederalResearch GrantFY2025AI language modelsAssayAttentionAutomated AbstractingBackBenignBioassayBiological AssayCategoriesCell BodyCellsClassificationClinVarClinicalClinical ManagementClinical Practice GuidelineClinical geneticsCommunicationComputer AnalysisDataData BasesDatabasesDevelopmentDiagnosticDiseaseDisorderDorsumEnsureEvaluationFamilyGene variantGeneral PopulationGeneral PublicGeneral TaxonomyGenesGeneticGenetic ScreeningGenomic medicineGenomicsGoalsGuidelinesHep G2HepG2HepG2 cell lineIndividualInfluentialsInformation RetrievalInformation extractionInvestigatorsLDLLDL LipoproteinsLabelLaboratoriesLanguageLow-Density LipoproteinsManualsMapsMeasuresMediationMedicalMedical GeneticsMetadataMethodsModelingNatureNegotiatingNegotiationParticipantPathogenicityPatient CarePatient Care DeliveryPatient outcomePatient-Centered OutcomesPatient-Focused OutcomesPatientsPerformancePersonsPredictive textProcessProviderPublicationsRecordsReportingResearch PersonnelResearch ResourcesResearchersResourcesReview LiteratureRiskRisk AssessmentScientific PublicationSourceStandardizationStructureSystematicsTaxonomyTestingTextTherapeuticTimeTrainingTransformer language modelTranslationsUpdateValidationVariantVariant Curation Expert PanelVariationallelic variantartificial intelligence language modelsbeta-Lipoproteinsbio-informatics pipelinebiobankbioinformatics pipelinebiorepositorycare for patientscare of patientscaring for patientsclinical practice and guidelinesclinical predictorsclinical significanceclinically significantclinician communicationcommunicate to clinicianscommunicate to providerscommunicate with clinicianscommunicate with doctorscommunicate with providerscomputational analysescomputational analysiscomputer analysesdata basedeep learningdeep learning methoddeep learning strategydevelopmentaldisease riskdisorder riskdoctor communicationgenetic variantgenome medicinegenomic variantimprovedknowledge graphlarge language modellarge scale language modelmassive scale language modelsmeta datanew approachesnovel approachesnovel strategiesnovel strategypatient oriented outcomesphrasespreventpreventingprobandprovider communicationtext summarizationtranslationunclassified variantuptakevalidationsvariant of uncertain clinical significancevariant of uncertain significancevariant of undetermined significancevariant of unknown significance

— or —

Get email alerts for similar roles

Full Description

Project Summary: Deep learning methods toward resolving uncertain variant classifications
Genomic sequencing can substantially improve clinical management, by optimizing surveillance and treatment
options, and improving risk assessment. As the interpretation of genetic variants increases, thousands of new
variant interpretations are entering variant databases each month. Most variants in these databases have
insufficient evidence to be classified as pathogenic or benign, and as a result are classified as Variants of
Uncertain Significance (VUSs). Despite potentially increasing risk, information about these variants cannot be
communicated to providers or patients due to a lack of structured evidence. This translational gap is preventing
many patients who collectively carry such variants from benefiting from genomic medicine.
ClinVar, a large diagnostic variant database contains a unique abundance of predictive information that has
been curated by clinical experts over many years. This includes over 1.1 million plaintext diagnostic reports
that often describe case data, literature review, and an analysis of computational predictions or functional
assay data. We will use these clinical reports to make predictions of pathogenicity, and to identify which
specific sources of evidence of pathogenicity are provided in each report. This project will enhance the value of
data in ClinVar, a public resource used by thousands of investigators, clinicians, and bioinformatic pipelines.
We will first optimize a text classification model to make predictions from diagnostic summaries, evaluating and
fine-tuning a set of large language models which have been trained on different text corpora. Using clinical
reports and known classifications from ClinVar variant submissions, we will evaluate different filtering criteria
used in the training process. We measure performance on high confidence labeled data which have been
previously reviewed by expert panels, as well as on bona fide VUSs, using expert panel curated variant
interpretations as ground truth validation data.
Next, we identify the information from these reports which drive predictions using post-hoc explainability
methods (attention mapping, representation probing, and causal mediation analysis), and then map this
evidence to biomedical concepts related to variant interpretation and pathogenicity, using a knowledge graph
which is refined to highlight these concepts relevant to diagnostic review criteria.
Finally, we will measure the extent to which these approaches can identify complementary evidence across
variant reports generated by different clinical labs related to the same variant, which can be used to re-classify
VUS or resolve a variant with conflicting interpretations. We will manually review a set of clinical reports to
evaluate accuracy of the sources of information that have been recovered. If evidence is sufficient, we will
identify up to 100 variants which are carried by participants in the Mass General Brigham biobank, and attempt
to update their variant classifications so that these results can be communicated to patients.

Grant Number: 1R21HG014015-01
NIH Institute/Center: NIH
Principal Investigator: Christopher Cassa

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

🔔Email alerts for new matching tenders
🗂️Track tenders in your pipeline
💰Filter by contract value
📥Export results to CSV
📌Save searches with one click

Start 7-day free trial →

Explore more

📍 More grants in UNITED STATES 🏷 More NIH opportunities 🏷 More US Federal opportunities 🏷 More Research Grant opportunities 🏢 All nih_reporter opportunities

From Text to Translation: Using Language Models to Resolve and Classify Variants

Full Description

Unlock procurement & grants

Explore more

More from BRIGHAM AND WOMEN'S HOSPITAL

More from BRIGHAM AND WOMEN'S HOSPITAL