IEEE Xplore: To RAG or Not to RAG, That Is the Question: Effective Text-to-SQL Generation Under Ambiguity
Abstract:
Natural language interfaces to databases promise to democratize data access, yet current Text-to-SQL systems fail to address a fundamental challenge: user questions are often ambiguous. Existing benchmarks and frameworks either ignore ambiguity entirely or require extensive training data and domain expertise during inference, limiting their practical deployment. We present two contributions to address this gap. First, we introduce Ambrosia+, a comprehensive benchmark that extends Ambrosia with 2,535 unanswerable questions, creating a total of 6,777 examples spanning linguistic ambiguity (scope, attachment, vagueness) and unanswerability across 16 domains and multi-table databases. Second, we propose Shakespeare-SQL, an efficient modular framework that integrates adaptive question routing, hybrid retrieval combining structural and semantic similarity, and curriculum-based Group Relative Policy Optimization (GRPO) with AI feedback for interpretation generation. Our framework achieves 72.6% full coverage on the schema-ambiguous AmbiQT benchmark using only 300 training examples—a 36.5% relative improvement over the state-of-the-art DFPL (53.2%) while requiring 97% less training data. On Ambrosia+, Shakespeare-SQL achieves 73.0% full coverage compared to DFPL’s 30.5%, while generating 89% fewer SQL queries. On unambiguous benchmarks, our framework maintains competitive performance: 86.7% execution accuracy on SPIDER and 56.8% on BIRD. This work demonstrates that strategic training methodology outweighs dataset scale for ambiguity resolution, enabling production deployment without extensive domain-specific fine-tuning.
Natural language interfaces to databases promise to democratize data access, yet current Text-to-SQL systems fail to address a fundamental challenge: user questions are often ambiguous. Existing benchmarks and frameworks either ignore ambiguity entirely or require extensive training data and domain expertise during inference, limiting their practical deployment. We present two contributions to address this gap. First, we introduce Ambrosia+, a comprehensive benchmark that extends Ambrosia with 2,535 unanswerable questions, creating a total of 6,777 examples spanning linguistic ambiguity (scope, attachment, vagueness) and unanswerability across 16 domains and multi-table databases. Second, we propose Shakespeare-SQL, an efficient modular framework that integrates adaptive question routing, hybrid retrieval combining structural and semantic similarity, and curriculum-based Group Relative Policy Optimization (GRPO) with AI feedback for interpretation generation. Our framework achieves 72.6% full coverage on the schema-ambiguous AmbiQT benchmark using only 300 training examples—a 36.5% relative improvement over the state-of-the-art DFPL (53.2%) while requiring 97% less training data. On Ambrosia+, Shakespeare-SQL achieves 73.0% full coverage compared to DFPL’s 30.5%, while generating 89% fewer SQL queries. On unambiguous benchmarks, our framework maintains competitive performance: 86.7% execution accuracy on SPIDER and 56.8% on BIRD. This work demonstrates that strategic training methodology outweighs dataset scale for ambiguity resolution, enabling production deployment without extensive domain-specific fine-tuning.