Detection and genotyping complex human genetic variation using single-molecule sequencing
Full Description
Project summary
Although single-molecule sequencing (SMS) technologies have advanced in recent years to enable routine
sequencing and assembly of human genomes, new software is required to utilize the potential of SMS in human
genetics. The long term goal is to help improve our understanding of complex variation in human diversity and
its role in disease. To achieve this, we will develop methods to (1) detect variation in SMS reads, (2) assemble
duplicated sequences missing from SMS de novo assemblies, and (3) genotype complex variation in large HTS
datasets using lightweight data structures. While several years of algorithm development for SMS data have
resulted in an software ecosystem to detect variation in SMS genomes, the rationale for the need to continue
development is that sensitivity and specificity are not yet sufficient for disease studies, important classes of
variation are not resolved by current assembly approaches, and the knowledge gained from sequencing SMS
genomes must be used to improve what can be discovered in large disease studies that rely heavily on short
read data such as those conducted under TOPMed. The algorithmic innovations we will provide for SMS data
are an alignment algorithm that explicitly optimizes over rearranged sequences, an assembly approach that
exploits minor differences between duplication copies to resolve genome function. Software will be supported
through Bioconda installation and distributed test cases. Once a variant is discovered by SMS, it may be more
easily genotyped in short read data. We will develop methods to generate databases of SMS variation that may
be queried with short read data. To aid in development of assembly algorithms for duplicated sequences, we will
generate a public resource of SMS data for individuals with known copy number polymorphisms. The significance
of this work is to enable SMS genomes to be used in disease studies, both by uncovering previously hidden
variation, and by increasing the amount of variation found in large short-read datasets.
Grant Number: 5R01HG011649-05
NIH Institute/Center: NIH
Principal Investigator: Mark Chaisson
Sign up free to get the apply link, save to pipeline, and set email alerts.
Sign up free →Agency Plan
7-day free trialUnlock procurement & grants
Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.
$29.99 / month
- 🔔Email alerts for new matching tenders
- 🗂️Track tenders in your pipeline
- 💰Filter by contract value
- 📥Export results to CSV
- 📌Save searches with one click