Minipatch Learning for Selection, Stability, Inference, and Scalability
Full Description
Massive amounts of data are now collected by nearly every industry and academic discipline. Uncovering the hidden insights in such data holds the key to major scientific challenges such as understanding how the brain works, discovering mechanisms leading to diseases such as cancer and Alzheimer's disease, and combating climate change, among many others. But discovering key features and important relationships in complex and huge data poses major statistical and computational challenges. The investigator aims to develop new statistical machine learning approaches and theory for this task that break up huge data sets into small random subsets called minipatches to facilitate both faster computation and improved statistical efficiency. The new methods will be implemented in open-source software and applied to huge biomedical datasets in genomics and neuroscience. The project will provide undergraduate and graduate students training and professional development opportunities.
Discovering key features and important relationships in complex and huge data commonly found in biomedicine poses not only major computational challenges but also critical statistical challenges. To tackle these challenges, the investigator plans to develop a new framework termed minipatch learning. Inspired by the successes of random forests, stability approaches in high-dimensional statistics, and stochastic optimization strategies, the investigator will build ensembles from many random tiny subsets of both observations and features or variables called minipatches. While ensemble learning strategies are commonly used in supervised machine learning, the investigator will use minipatch learning for the tasks of feature selection, model-agnostic inference for feature importance, and learning relationships amongst features through graphical models. The approach, which trains on very tiny subsets of the data, is expected to have dramatic computational and memory savings. The investigator aims to show both theoretically and empirically that such a strategy poses significant statistical advantages as well.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Award Number: 2516872
Principal Investigator: Genevera Allen
Funds Obligated: $195,479
State: NY
Sign up free to get the apply link, save to pipeline, and set email alerts.
Sign up free →Agency Plan
7-day free trialUnlock procurement & grants
Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.
$29.99 / month
- 🔔Email alerts for new matching tenders
- 🗂️Track tenders in your pipeline
- 💰Filter by contract value
- 📥Export results to CSV
- 📌Save searches with one click