grant

ERI: Effectively Deploying Large AI Models to Edge Systems via Activation Sparsity Compression

Organization Kennesaw State University Research and Service FoundationLocation KENNESAW, United StatesPosted 1 Oct 2025Deadline 30 Sept 2027
NSFUS FederalResearch GrantScience FoundationGA
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

Deploying large AI/ML models on edge devices offers enormous benefits. These benefits include reducing the computational load on supercomputers and data centers, lowering application response time, enhancing data privacy, and elevating device intelligence capabilities. However, due to the limited computing resources of edge devices, running large-scale AI models directly on them presents significant challenges. To tackle these challenges, researchers frequently employ compression techniques to create smaller and more efficient models while preserving accuracy. In contrast to traditional AI compression techniques such as quantization, pruning, and knowledge distillation, this project introduces a novel technique called "Predictor to Prefetcher" (P2P), which leverages patterns of activation sparsity within large AI models. This new P2P approach can be directly integrated with existing techniques to ensure the AI model compression is more effective and efficient. Moreover, our P2P compression approach enhances AI processing capabilities for edge devices, making our existing system infrastructures more scalable and sustainable. This innovation expands the knowledge in AI science and education, which is crucial for supporting the ongoing growth of large AI models and maintaining the United States' leadership in AI technology.

This project will design, implement, and verify a novel large AI model compression technique via activation sparsity. We begin by analyzing the predictability of Feed-Forward Networks (FFN) activation patterns within recent large AI models. We will design a lightweight, dictionary-based pattern predictor based on pattern clustering to verify predictability. Next, we design and implement the "Predictor to Prefetcher" (P2P) module, which enables the system to prefetch only the activated weights, excluding inactive ones from the main memory, thereby compressing the model from a memory perspective. Additionally, we will investigate the feasibility of building a device-end predictor using a tiny machine learning model, such as a Multilayer Perceptron (MLP), which offers system designers another option to balance different trade-offs. Finally, we will extend our P2P approach to Large Vision Models (LVMs) and Large Multimodal Models (LMMs). As part of this project, we will also create an open-source, architecture-level simulator to assess the effectiveness of the P2P approach in reducing cache/memory pollution and decreasing execution times for cutting-edge large AI models.


This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Award Number: 2501978
Principal Investigator: Bobin Deng

Funds Obligated: $199,324

State: GA

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →