A small statue of Buzz the Yellow Jacket stands nexts to a stack of three books. The book spines alternate between grey and yellow.

IEEE Xplore: To RAG or Not to RAG, That Is the Question: Effective Text-to-SQL Generation Under Ambiguity

Wednesday, March 4, 2026

Abstract:
Natural language interfaces to databases promise to democratize data access, yet current Text-to-SQL systems fail to address a fundamental challenge: user questions are often ambiguous. Existing benchmarks and frameworks either ignore ambiguity entirely or require extensive training data and domain expertise during inference, limiting their practical deployment. We present two contributions to address this gap. First, we introduce Ambrosia+, a comprehensive benchmark that extends Ambrosia with 2,535 unanswerable questions, creating a total of 6,777 examples spanning linguistic ambiguity (scope, attachment, vagueness) and unanswerability across 16 domains and multi-table databases. Second, we propose Shakespeare-SQL, an efficient modular framework that integrates adaptive question routing, hybrid retrieval combining structural and semantic similarity, and curriculum-based Group Relative Policy Optimization (GRPO) with AI feedback for interpretation generation. Our framework achieves 72.6% full coverage on the schema-ambiguous AmbiQT benchmark using only 300 training examples—a 36.5% relative improvement over the state-of-the-art DFPL (53.2%) while requiring 97% less training data. On Ambrosia+, Shakespeare-SQL achieves 73.0% full coverage compared to DFPL’s 30.5%, while generating 89% fewer SQL queries. On unambiguous benchmarks, our framework maintains competitive performance: 86.7% execution accuracy on SPIDER and 56.8% on BIRD. This work demonstrates that strategic training methodology outweighs dataset scale for ambiguity resolution, enabling production deployment without extensive domain-specific fine-tuning.

School of Cybersecurity and Privacy

College of Computing

Search

IEEE Xplore: To RAG or Not to RAG, That Is the Question: Effective Text-to-SQL Generation Under Ambiguity

Recent Stories

Is Trustworthy and Explainable AI Within Reach?

From Industry to Impact: A Ph.D. Journey in Cybersecurity

GT Computing's…

News Feed

College of Computing

Georgia Institute of Technology