Awesome-LLM-based-Text2SQL
[TKDE2025] Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | A curated list of resources (surveys, papers, benchmarks, and opensource projects) on large language model-based text-to-SQL.
https://github.com/DEEP-PolyU/Awesome-LLM-based-Text2SQL
Last synced: 3 days ago
JSON representation
-
ð° Surveys
- TKDE - Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [[Paper](https://ieeexplore.ieee.org/abstract/document/11160657)] [[Code]()]
- CSUR2025 - to-SQL Tasks [[Paper]()]
- TKDE - to-SQL in the Era of LLMs: Where are We, and Where are We Going? [[Paper]()]
- TKDE
- VLDBJ2023 - to-SQL [[Paper]()]
- VLDB2023
- arXiv2022 - to-SQL Parsing: Concepts, Methods, and Future Directions [[Paper]()]
- COLING2022 - to-SQL: A Survey of What We Have and What We Expect [[Paper]()]
-
ð Benchmarks
- BIRD - A Big Bench for Large-Scale Database Grounded Text-to-SQL
- NeurIPS2025 - SQL + GPT-4 | 74.2 | 85.3 | [[Paper](https://openreview.net/pdf?id=p53QDxSIc5)] [[Code](https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting)] | 2023-04-21 |
- Spider2.0 - Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
- BIRD-CRITIC - Can LLMs Fix User Issues in Real-World Database Applications?
- BIRD-INTERACT - Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
- Proprietary - | **91.2** | [[Report](https://www.seek.ai/blog/miniseek-first-model-to-surpass-90-accuracy-on-spider-test-benchmark)] | 2023-11-02 |
-
ðŠī Taxonomy
-
Fine-tuning
- arXiv2025 - based Schema Linking for LLM-based Text-to-SQL Generation [[Paper](https://arxiv.org/pdf/2502.12911)]
- ICLR2025 - to-SQL [[Paper](https://openreview.net/pdf?id=BAglD6NGy0)] [[Code](https://github.com/D2I-ai/Route)]
- NeurIPS202 - to-SQL in the Age of Well-Reasoned Language Models [[Paper](https://openreview.net/pdf?id=fglyh5pa7d)]
- COLING2025 - SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [[Paper](https://aclanthology.org/2025.coling-main.36.pdf)] [[Code](https://github.com/wbbeyourself/MAC-SQL)]
- NAACL2025 - SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation [[Paper](https://aclanthology.org/2025.naacl-long.107.pdf)] [[Code](https://github.com/layer6ai-labs/msc-sql)]
- ACL2025 - based Hierarchical Action CorREction Assistant for Text-to-SQL [[Paper](https://aclanthology.org/2025.acl-long.552.pdf)] [[Code](https://github.com/quge2023/SHARE)]
- Findings2024 - SQL: Decomposed Text-to-SQL with Small Large Language Models [[Paper](https://aclanthology.org/2024.findings-emnlp.481.pdf)] [[Code](https://github.com/MohammadrezaPourreza/DTS-SQL)]
- SIGMOD2024 - source Language Models for Text-to-SQL [[Paper](https://dl.acm.org/doi/10.1145/3654930)] [[Code](https://github.com/RUCKBReasoning/codes)]
- arXiv2024 - SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL [[Paper](https://arxiv.org/pdf/2404.12560)] [[Code](https://github.com/mercatorhq/dubo-sql)]
- COLM2024 - AI-Lab/StructLM)]
- ACL2024 - LLM: Towards Foundational Symbol-centric Interface For Large Language Models [[Paper](https://aclanthology.org/2024.acl-long.707.pdf)] [[Code](https://github.com/xufangzhi/Symbol-LLM)]
- VLDB2024 - to-SQL Empowered by Large Language Models: A Benchmark Evaluation [[Paper](https://www.vldb.org/pvldb/vol17/p1132-gao.pdf)] [[Code](https://github.com/BeachWang/DAIL-SQL)]
- ICML2024 - ai-lab/Consistency_LLM)]
-
In-context Learning
- arXiv2023 - shot Text-to-SQL with ChatGPT [[Paper](https://arxiv.org/pdf/2307.07306)] [[Code](https://github.com/bigbigwatermalon/C3SQL)]
- AAAI2025 - Correction Guideline for In-Context Text-to-SQL [[Paper](https://arxiv.org/pdf/2406.12692)] [[Code](https://github.com/microsoft/SynQo)]
- EMNLP2025 - World Large-Scale Multi-Database Text-to-SQL [[Paper](https://arxiv.org/pdf/2503.18596)] [[Code](https://github.com/Satissss/LinkAlign)]
- NeurIPS2023 - SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction [[Paper](https://openreview.net/pdf?id=p53QDxSIc5)] [[Code](https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting)]
- ICLR2025 - to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration [[Paper](https://openreview.net/pdf?id=OuFIfDBwQd)] [[Code](https://github.com/Snowflake-Labs/ReFoRCE)]
- ACL2025 - Based Multi-Agent System for Text-to-SQL Tasks [[Paper](https://aclanthology.org/2025.trl-1.4.pdf)] [[Code](https://github.com/1ring2rta/R3)]
- ICDE2024 - then-Rank Framework for Natural Language to SQL Translation [[Paper](https://ieeexplore.ieee.org/abstract/document/10597742)] [[Code](https://github.com/Kaimary/MetaSQL)]
- EMNLP2024 - main.436.pdf)] [[Code](https://github.com/OSU-NLP-Group/Middleware)]
- ICML2025 - Guided Large Language Models for Text-to-SQL Generation [[Paper](https://openreview.net/pdf?id=gT8JSEFqaS)]
- Findings2020 - based Text-to-SQL through Workflow Paradigm [[Paper](https://aclanthology.org/2024.findings-acl.641.pdf)] [[Code](https://github.com/FlyingFeather/DEA-SQL)]
- ICONIP2023 - augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain [[Paper](https://link.springer.com/chapter/10.1007/978-981-99-8076-5_25)]
- TMLR2024 - PaLM: Improved Large Language Model Adaptation for Text-to-SQL [[Paper](https://openreview.net/pdf?id=rlloVZoKrX)]
- EMNLP2023 - main.574.pdf)] [[Code](https://github.com/RUCAIBox/StructGPT)]
- Findings2023 - to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies [[Paper](https://aclanthology.org/2023.findings-emnlp.996.pdf)]
- PRICAI2023 - 3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval [[Paper](https://link.springer.com/chapter/10.1007/978-981-99-7022-3_23)]
- ICML2023
- EMNLP2022 - main.231.pdf)] [[Code](https://github.com/facebookresearch/mbr-exec)]
-
Post-annotated Datasets
-
-
ðïļ Datasets
-
Original Datasets
- SIGMOD2025 - Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis [[Paper](https://arxiv.org/pdf/2401.10506)] [[Code](https://github.com/bigbigwatermalon/FinSQL)] [[Dataset](https://drive.google.com/file/d/1OtyFdH9cs-6bEVj8yKK4Zt53N52L_dBH/view?usp=sharing)]<br>
- EMNLP2020 - Scale and Pragmatic Chinese Text-to-SQL Dataset [[Paper](https://aclanthology.org/2020.emnlp-main.562.pdf)] [[Dataset](https://www.luge.ai/#/luge/dataDetail?id=13)]<br>
- EMNLP2018 - Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [[Paper](https://aclanthology.org/D18-1425.pdf)] [[Code](https://github.com/taoyds/spider)] [[Dataset](https://drive.google.com/file/d/1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J/view)]<br>
- arXiv2017
-
Post-annotated Datasets
- Findings2020 - Vietnames** | A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese [[Paper](https://aclanthology.org/2020.findings-emnlp.364.pdf)] [[Code](https://github.com/VinAIResearch/ViText2SQL)]<br>
- EMNLP2019 - Domain Semantic Parsing in Context [[Paper](https://aclanthology.org/P19-1443.pdf)] [[Code](https://github.com/taoyds/sparc)] [[Dataset](https://drive.usercontent.google.com/download?id=1Uu7NMHTR1tdQw1t7bAuM7OPU4LElVKfg&export=download&authuser=0)]<br>*Context-dependent; Annotate conversational contents*
- ICLR2023 - to-SQL Robustness [[Paper](https://openreview.net/pdf?id=Wc5bmZZU9cy)] [[Code](https://github.com/awslabs/diagnostic-robustness-text-to-sql)]<br>
- ACL2022 - to-SQL Models Against Natural and Realistic Adversarial Table Perturbation [[Paper](https://aclanthology.org/2022.acl-long.142.pdf)] [[Code](https://github.com/microsoft/ContextualSP/tree/master/robustness_of_text_to_sql)] [[Dataset](https://github.com/microsoft/ContextualSP/blob/master/robustness_of_text_to_sql/adveta_1.0.zip)]<br>
- Findings2022 - SS&CG** | Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment [[Paper](https://aclanthology.org/2022.findings-naacl.62.pdf)] [[Code](https://github.com/ygan/SpiderSS-SpiderCG)] [[Dataset](https://github.com/ygan/SpiderSS-SpiderCG/tree/main/Spider-SS)]<br>
- EMNLP2021 - DK** | Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization [[Paper](https://aclanthology.org/2021.emnlp-main.702.pdf)] [[Code](https://github.com/ygan/Spider-DK)]<br>*Knowledge-augmented; Adding domain knowledge*
- ACL2021 - SYN** | Towards Robustness of Text-to-SQL Models against Synonym Substitution [[Paper](https://aclanthology.org/2021.acl-long.195.pdf)] [[Code](https://github.com/ygan/Spider-Syn)]<br>*Knowledge-augmented; Adding domain knowledge*
- NAACL2021 - Realistic** | Structure-Grounded Pretraining for Text-to-SQL [[Paper](https://aclanthology.org/2021.naacl-main.105.pdf)] [[Dataset](https://zenodo.org/records/5205322)]<br>*Robustness; Removing column names in question*
-
-
ðĶ Projects
-
Fine-tuning
- SQLGlot - io/tobymao/sqlglot)
- DB-GPT - ai/DB-GPT?style=social)](https://github.com/eosphoros-ai/DB-GPT/stargazers)
- DB-GPT-Hub - ai/DB-GPT-Hub?style=social)](https://github.com/eosphoros-ai/DB-GPT-Hub/stargazers)
- Awesome-Text2SQL - ai/Awesome-Text2SQL?style=social)](https://github.com/premAI-io/premsql/stargazers)
- PremSQL - io/premsql?style=social)](https://github.com/premAI-io/premsql/stargazers)
-
Programming Languages
Keywords
llm
4
database
3
text-to-sql
3
gpt
2
rag
2
sql
2
nl2sql
2
text2sql
2
bgi
1
agents
1
tsql
1
trino
1
transpiler
1
sqlparser
1
sqlite
1
spark
1
snowflake
1
redshift
1
python
1
presto
1
postgres
1
parser
1
optimizer
1
mysql
1
hive
1
duckdb
1
databricks
1
clickhouse
1
opensource
1
llama
1
text2dsl
1
text2api
1
test
1
survey
1
nlp
1
nl-to-sql
1
hacktoberfest2024
1
finetuning
1
deep-learning
1
datset
1
awesome
1
ai
1
fine-tuning
1
datasets
1
vicuna
1
security
1
private
1
langchain
1
gpt-4
1
bigquery
1