Collaborative Research: HCC: Small: Accounting for Focus Ambiguity in Visual Questions
Full Description
Ambiguous language is a common part of communication. It means using vague words or phrases that can be interpreted in multiple ways depending on the context. This project addresses how a question answering system might handle ambiguous questions about images where it is unclear which part of an image a question refers to. For example, if someone asks "What is the medicine?" while looking at an image showing several pill bottles, a system should identify all relevant parts of the image and provide answers for each so that a person receives the full picture and can resolve ambiguities later. Instead, current visual question answering (VQA) services typically provide people with one answer per question and do not explain their reasoning process for choosing the answer. This limits a person's ability to verify whether the desired interpretation was made. The possible repercussions from VQA services providing incomplete information can be grave, inflicting adverse personal, social, professional, legal, and financial consequences to VQA service users.
In this project, we will develop a socio-technical solution to address the need for innovative approaches that empower people to recognize when there is question ambiguity and then resolve it. We will introduce the first back-end AI model that can specify every plausible image region that could be the focus of a question's language paired with natural language answers derived from those regions. We will also establish effective interaction designs within a user-facing tool that empowers people to recognize and resolve focus ambiguity in visual questions. Progress will be measured by evaluating the proposed AI model on our benchmark dataset and examining real users' experiences with this model when embedded within a larger VQA system. User studies will focus on blind individuals since they are the current dominant end-users for VQA services. More generally, we expect project success will benefit all VQA service users, whether visually impaired or sighted.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Award Number: 2516629
Principal Investigator: Anhong Guo
Funds Obligated: $215,000
State: MI
Sign up free to get the apply link, save to pipeline, and set email alerts.
Sign up free →Agency Plan
7-day free trialUnlock procurement & grants
Upgrade to access active tenders from World Bank, UNDP, ADB and more — with email alerts and pipeline tracking.
$29.99 / month
- 🔔Email alerts for new matching tenders
- 🗂️Track tenders in your pipeline
- 💰Filter by contract value
- 📥Export results to CSV
- 📌Save searches with one click