grant

CRII: III: Towards Traceable, Affordable and Explainable Automated Feature Generation for Tabular Data

Organization Clemson UniversityLocation CLEMSON, United StatesPosted 1 Sept 2025Deadline 31 May 2027
NSFUS FederalResearch GrantScience FoundationSC
Sign up free to applyApply link · pipeline · email alerts
— or —

Get email alerts for similar roles

Weekly digest · no password needed · unsubscribe any time

Full Description

Features are used to describe the characteristics of objects. For example, "age", "smoking or not", and "years of smoking" are features of a patient, which can be used to describe the patient's physical condition, and furthermore, to predict if she or he is likely to get lung cancer. A combination of features could be more helpful to the prediction, e.g., "age" minus "years of smoking" can be a new feature to indicate how early the patient starts smoking. This kind of feature combination is called feature generation. In the big data era, there exist enormous numbers of features, and it is not realistic to generate features manually by human experts. This project will build new technologies to automatically generate new features based on existing features, to better describe the objects, and to gain better prediction performance. Additionally, this project aims to substantially improve the traceability, affordability, and explainability during the generation process. The developed algorithms and tools are expected to be generalized and applicable to a broad range of scientific and engineering problems, not just in feature generation, but also in other domains such as data pre-processing, social analysis, intelligent transportation systems, healthcare, and the internet of things.

This project identifies three research tasks: (i) A Reinforcement Learning (RL) based approach to realize traceability. Two RL agents are used to select appropriate features, and one RL agent is used to select the appropriate operation. The policy network will be decomposed into two sub-networks, i.e., representation network and value network. Different agents will share the value network to improve training convergence. (ii) A heuristic approach to realize affordability. Information theory-based utility scores will be designed to evaluate features and feature sets, and the heuristic selection strategy will be designed in the generation process. (iii) A Large Language Model (LLM) based approach to realize explainability. The tabular data will be serialized into natural language strings, and comprehensive prompts will be designed incorporating feature generation expertise and domain expertise. The LLM can generate features with explanations by fine-tuning it with prompts. Two strategies will be proposed to compress the prompt. The proposed research will provide novel perspectives and methodologies as to how to generate new features by advancing the understanding and designing new generation strategies. They go beyond conventional generation methodologies that are highly dependent on domain knowledge.


This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Award Number: 2550105
Principal Investigator: Kunpeng Liu

Funds Obligated: $110,820

State: SC

Sign up free to get the apply link, save to pipeline, and set email alerts.

Sign up free →

Agency Plan

7-day free trial

Unlock procurement & grants

Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.

$29.99 / month

  • 🔔Email alerts for new matching tenders
  • 🗂️Track tenders in your pipeline
  • 💰Filter by contract value
  • 📥Export results to CSV
  • 📌Save searches with one click
Start 7-day free trial →