grant

EAGER: Building Idiomaticity into Natural Language Processing

Organization Princeton UniversityLocation PRINCETON, United StatesPosted 1 Jan 2026Deadline 31 Jul 2026
NSFUS FederalResearch GrantScience FoundationNJ
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

Idiomatic expressions are an essential component of everyday language use and the hallmark of native language ability. Consider the phrase throw away; proficient speakers can effortlessly understand that the phrase takes a figurative meaning in “Britain threw away all the achievements of the last decade.” and a literal sense in “He threw away his cigarette and buried his head in his arms.” This EArly Grant for Exploratory Research (EAGER) will build a high-quality dataset for computers to understand the differences between figurative and literal senses of these expressions in general English text. The main novelty of this project will be in collecting a large class of idiomatic expressions and sentences containing them to let computers learn the inherent variability between a variety of idiomatic phrases. Collecting many sentences with phrases that have a figurative and literal meaning will permit computers better understand the nuances with which these expressions are used in everyday conversations and writing. Beyond understanding them, the collected. examples will help computers use these expressions like native speakers do when automatically writing text and even suggest appropriate expressions in specific contexts.

This EAGER project is essentially interdisciplinary spanning the areas of linguistics and computation and will investigate novel paradigms for natural language processing that are idiomaticity-aware. As such, it will have two research aims: (1) creating a high-quality dataset of phrasal verbs annotated with their context-specific senses and their literal/figurative equivalent forms, and (2) testing the performance of state-of-the-art idiomaticity-aware algorithms. Because idiomatic expressions vary widely in form and structure, the focus on phrasal verbs (also known as verb-particle constructions) in the context of the exploratory project will permit studying a very frequent class of idiomatic expressions that are syntactically different from those in currently available datasets. The primary risk of this project stems from its exploratory nature of creating large corpora with sufficient coverage for language model training. Given their prevalence in natural language, the dataset of phrasal verbs in English will supplement available datasets on idiomatic expressions in terms of their variety. Moreover, their figurative and literal ambiguity in context (apart from their polysemy) will permit a diverse look at the phenomenon of non-compositionality that characterizes idiomatic expressions. Thus, the dataset will serve as a training and test bed for algorithms that detect, interpret, and generate a broad class of idiomatic expressions. This effort will lead to new natural language processing algorithms for accurate interpretation and generation of idiomatic expressions towards a more human-like language processing ability in machines.


This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Award Number: 2611728
Principal Investigator: Suma Bhat

Funds Obligated: $20,360

State: NJ

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →