grant

CRII: RI: RUI: NLP Models in Heterogeneous Causal Effect Estimation

Organization Williams CollegeLocation WILLIAMSTOWN, United StatesPosted 15 Jun 2025Deadline 31 May 2027
NSFUS FederalResearch GrantScience FoundationMA
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

A goal of almost every scientific field is causal inference; that is, implying a cause-and-effect relationship from data, so that one can quantify how intervening on certain variables affects other variables in a system. Recently, researchers have started to combine causal inference with modern natural language processing (NLP) models in order to analyze text data, which can be a rich source of information about human behavior, thought, and interactions. However, methods in this area have largely been focused on estimating average causal effects. The goal of many real world applications, is estimating heterogenous causal effects by exploring relationships between specific variables so that the knowledge can be customized for sub-groups. For example, knowledge to help clinicians decide which medications to prescribe to specific patients, central bank committees' decisions on interest rates in relation to changes in key variables, and platform administrators deciding how to optimally manage users. Estimating the effect of interventions from data can help inform decision making, particularly when effect estimates vary based on individuals’ features. A rich, unstructured source of features is written text: notes from electronic health records (EHRs) detail patients’ personal and medical histories, newspaper articles document national and international events, and online platforms host exchanges of users’ written opinions. Yet, there exist few methods that can incorporate important aspects of unstructured text data into causal estimates. This project will develop and evaluate methods to estimate how causal effects vary based on features that are computationally measured from text data. The more fine-grained causal estimates from these methods could help data scientists and the public better target interventions for those with estimated positive effects.

This project will build, expand, and evaluate heterogeneous causal effect estimation methods that incorporate natural language processing (NLP) models and text data. The investigator will use data-driven methods to discover text-based proxies for causal variables and employ a proxy adjustment strategy to combine NLP classifiers with conditional average treatment effect (CATE) models. Then the investigator will create datasets for realistic empirical evaluation of heterogeneous causal estimation methods by building from the investigator’s prior work that down-samples randomized controlled trials. Finally, the investigator will develop methods for settings in which text is a proxy for multiple causal variables, i.e. the treatment variable, the pre-treatment features, and the outcome variable. Overall, this project will help expand methods at the intersection of NLP and causal inference.


This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Award Number: 2451403
Principal Investigator: Katherine Keith

Funds Obligated: $169,637

State: MA

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →