Probabilistic string similarity sketching software development for metagenomics and RNA-seq
Full Description
One of the major approaches modern biologists use for understanding living things is to sequence their “genomes,” which involves reading their genomes and comparing them to known biological databases. A lot of software for analyzing genomes relies on a bag of tricks that scientists have learned to make work using trial and error, but without mathematical proofs that they work. This research will provide rigorous mathematical proofs for when those analysis tricks are guaranteed to work, and also extend those methods to additional biological analysis problems. Specifically, this research will analyze analysis tricks from genome “alignment,” which measure how many differences there are between two genomes, and then apply those tricks to the problems of discovering new variants of proteins and measuring RNA levels in a cell. The broader impact of this work is that researchers will then be able to build faster genomic analysis software, improving our understanding of when living cells produce different forms of proteins.
Researchers will perform an average-case analysis of the seed-chain-extend string alignment algorithm to prove bounds on speed and accuracy for sequence alignment and read mapping software. The researchers previously performed such an analysis in the substitution-only error model, but here are extending their analysis to a more biologically-plausible error model including indels, duplications, and subsampling. Some of the probabilistic subsampling techniques will be used to improve RNA-seq quantification and novel isoform discovery. The aim is to improve the speed of RNA-seq quantification by not mapping every individual read and to improve the accuracy of novel isoform discovery by filtering the reads to plausible novel isoform candidates using a subsampling pre-filter prior to mapping.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Award Number: 2531433
Principal Investigator: Yun Yu
Funds Obligated: $300,000
State: PA
Sign up free to get the apply link, save to pipeline, and set email alerts.
Sign up free →Agency Plan
7-day free trialUnlock procurement & grants
Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.
$29.99 / month
- 🔔Email alerts for new matching tenders
- 🗂️Track tenders in your pipeline
- 💰Filter by contract value
- 📥Export results to CSV
- 📌Save searches with one click