grant

EAGER: Building Idiomaticity into Natural Language Processing

Organization Princeton UniversityLocation PRINCETON, United StatesPosted 1 Jan 2026Deadline 31 Jul 2026

NSFUS FederalResearch GrantScience FoundationNJ

— or —

Get email alerts for similar roles

Full Description

Idiomatic expressions are an essential component of everyday language use and the hallmark of native language ability. Consider the phrase throw away; proficient speakers can effortlessly understand that the phrase takes a figurative meaning in “Britain threw away all the achievements of the last decade.” and a literal sense in “He threw away his cigarette and buried his head in his arms.” This EArly Grant for Exploratory Research (EAGER) will build a high-quality dataset for computers to understand the differences between figurative and literal senses of these expressions in general English text. The main novelty of this project will be in collecting a large class of idiomatic expressions and sentences containing them to let computers learn the inherent variability between a variety of idiomatic phrases. Collecting many sentences with phrases that have a figurative and literal meaning will permit computers better understand the nuances with which these expressions are used in everyday conversations and writing. Beyond understanding them, the collected. examples will help computers use these expressions like native speakers do when automatically writing text and even suggest appropriate expressions in specific contexts.

This EAGER project is essentially interdisciplinary spanning the areas of linguistics and computation and will investigate novel paradigms for natural language processing that are idiomaticity-aware. As such, it will have two research aims: (1) creating a high-quality dataset of phrasal verbs annotated with their context-specific senses and their literal/figurative equivalent forms, and (2) testing the performance of state-of-the-art idiomaticity-aware algorithms. Because idiomatic expressions vary widely in form and structure, the focus on phrasal verbs (also known as verb-particle constructions) in the context of the exploratory project will permit studying a very frequent class of idiomatic expressions that are syntactically different from those in currently available datasets. The primary risk of this project stems from its exploratory nature of creating large corpora with sufficient coverage for language model training. Given their prevalence in natural language, the dataset of phrasal verbs in English will supplement available datasets on idiomatic expressions in terms of their variety. Moreover, their figurative and literal ambiguity in context (apart from their polysemy) will permit a diverse look at the phenomenon of non-compositionality that characterizes idiomatic expressions. Thus, the dataset will serve as a training and test bed for algorithms that detect, interpret, and generate a broad class of idiomatic expressions. This effort will lead to new natural language processing algorithms for accurate interpretation and generation of idiomatic expressions towards a more human-like language processing ability in machines.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Award Number: 2611728
Principal Investigator: Suma Bhat
Funds Obligated: $20,360
State: NJ

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

🔔Email alerts for new matching tenders
🗂️Track tenders in your pipeline
💰Filter by contract value
📥Export results to CSV
📌Save searches with one click

Start 7-day free trial →

Explore more

📍 More grants in United States 🏷 More NSF opportunities 🏷 More US Federal opportunities 🏷 More Research Grant opportunities 🏢 All nsf_awards opportunities

More from Princeton University

All grants from Princeton University →