Awesome-LLM-based-Text2SQL
  
  
    [TKDE2025] Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | A curated list of resources (surveys, papers, benchmarks, and opensource projects) on large language model-based text-to-SQL. 
    https://github.com/DEEP-PolyU/Awesome-LLM-based-Text2SQL
  
        Last synced: 4 days ago 
        JSON representation
    
- 
            ð° Surveys- TKDE - Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [[Paper](https://ieeexplore.ieee.org/abstract/document/11160657)] [[Code]()]
- CSUR2025 - to-SQL Tasks [[Paper]()]
- TKDE - to-SQL in the Era of LLMs: Where are We, and Where are We Going? [[Paper]()]
- TKDE
- VLDBJ2023 - to-SQL [[Paper]()]
- VLDB2023
- arXiv2022 - to-SQL Parsing: Concepts, Methods, and Future Directions [[Paper]()]
- COLING2022 - to-SQL: A Survey of What We Have and What We Expect [[Paper]()]
 
- 
            ð Benchmarks- BIRD - A Big Bench for Large-Scale Database Grounded Text-to-SQL
- NeurIPS2025 - SQL + GPT-4 | 74.2 | 85.3 | [[Paper](https://openreview.net/pdf?id=p53QDxSIc5)] [[Code](https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting)] | 2023-04-21 |
- Spider2.0 - Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
- BIRD-CRITIC - Can LLMs Fix User Issues in Real-World Database Applications?
- BIRD-INTERACT - Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
- Proprietary - | **91.2** | [[Report](https://www.seek.ai/blog/miniseek-first-model-to-surpass-90-accuracy-on-spider-test-benchmark)] | 2023-11-02 |
 
- 
            ðïļ Datasets- 
                    Original Datasets- SIGMOD2025 - Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis [[Paper](https://arxiv.org/pdf/2401.10506)] [[Code](https://github.com/bigbigwatermalon/FinSQL)] [[Dataset](https://drive.google.com/file/d/1OtyFdH9cs-6bEVj8yKK4Zt53N52L_dBH/view?usp=sharing)]<br>
- EMNLP2020 - Scale and Pragmatic Chinese Text-to-SQL Dataset [[Paper](https://aclanthology.org/2020.emnlp-main.562.pdf)] [[Dataset](https://www.luge.ai/#/luge/dataDetail?id=13)]<br>
- EMNLP2018 - Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [[Paper](https://aclanthology.org/D18-1425.pdf)] [[Code](https://github.com/taoyds/spider)] [[Dataset](https://drive.google.com/file/d/1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J/view)]<br>
- arXiv2017
 
- 
                    Post-annotated Datasets- Findings2020 - Vietnames** | A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese [[Paper](https://aclanthology.org/2020.findings-emnlp.364.pdf)] [[Code](https://github.com/VinAIResearch/ViText2SQL)]<br>
- EMNLP2019 - Domain Semantic Parsing in Context [[Paper](https://aclanthology.org/P19-1443.pdf)] [[Code](https://github.com/taoyds/sparc)] [[Dataset](https://drive.usercontent.google.com/download?id=1Uu7NMHTR1tdQw1t7bAuM7OPU4LElVKfg&export=download&authuser=0)]<br>*Context-dependent; Annotate conversational contents*
- ICLR2023 - to-SQL Robustness [[Paper](https://openreview.net/pdf?id=Wc5bmZZU9cy)] [[Code](https://github.com/awslabs/diagnostic-robustness-text-to-sql)]<br>
- ACL2022 - to-SQL Models Against Natural and Realistic Adversarial Table Perturbation [[Paper](https://aclanthology.org/2022.acl-long.142.pdf)] [[Code](https://github.com/microsoft/ContextualSP/tree/master/robustness_of_text_to_sql)] [[Dataset](https://github.com/microsoft/ContextualSP/blob/master/robustness_of_text_to_sql/adveta_1.0.zip)]<br>
- Findings2022 - SS&CG** | Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment [[Paper](https://aclanthology.org/2022.findings-naacl.62.pdf)] [[Code](https://github.com/ygan/SpiderSS-SpiderCG)] [[Dataset](https://github.com/ygan/SpiderSS-SpiderCG/tree/main/Spider-SS)]<br>
- EMNLP2021 - DK** | Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization [[Paper](https://aclanthology.org/2021.emnlp-main.702.pdf)] [[Code](https://github.com/ygan/Spider-DK)]<br>*Knowledge-augmented; Adding domain knowledge*
- NAACL2021 - Realistic** | Structure-Grounded Pretraining for Text-to-SQL [[Paper](https://aclanthology.org/2021.naacl-main.105.pdf)] [[Dataset](https://zenodo.org/records/5205322)]<br>*Robustness; Removing column names in question*
- ACL2021 - SYN** | Towards Robustness of Text-to-SQL Models against Synonym Substitution [[Paper](https://aclanthology.org/2021.acl-long.195.pdf)] [[Code](https://github.com/ygan/Spider-Syn)]<br>*Knowledge-augmented; Adding domain knowledge*
 
 
- 
                    
- 
            ðŠī Taxonomy- 
                    In-context Learning- EMNLP2025 - World Large-Scale Multi-Database Text-to-SQL [[Paper](https://arxiv.org/pdf/2503.18596)] [[Code](https://github.com/Satissss/LinkAlign)]
- NeurIPS2023 - SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction [[Paper](https://openreview.net/pdf?id=p53QDxSIc5)] [[Code](https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting)]
- ICLR2025 - to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration [[Paper](https://openreview.net/pdf?id=OuFIfDBwQd)] [[Code](https://github.com/Snowflake-Labs/ReFoRCE)]
- ACL2025 - Based Multi-Agent System for Text-to-SQL Tasks [[Paper](https://aclanthology.org/2025.trl-1.4.pdf)] [[Code](https://github.com/1ring2rta/R3)]
- EMNLP2024 - main.436.pdf)] [[Code](https://github.com/OSU-NLP-Group/Middleware)]
- ICML2025 - Guided Large Language Models for Text-to-SQL Generation [[Paper](https://openreview.net/pdf?id=gT8JSEFqaS)]
- ICONIP2023 - augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain [[Paper](https://link.springer.com/chapter/10.1007/978-981-99-8076-5_25)]
- TMLR2024 - PaLM: Improved Large Language Model Adaptation for Text-to-SQL [[Paper](https://openreview.net/pdf?id=rlloVZoKrX)]
- EMNLP2023 - main.574.pdf)] [[Code](https://github.com/RUCAIBox/StructGPT)]
- PRICAI2023 - 3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval [[Paper](https://link.springer.com/chapter/10.1007/978-981-99-7022-3_23)]
- ICML2023
- EMNLP2022 - main.231.pdf)] [[Code](https://github.com/facebookresearch/mbr-exec)]
- arXiv2023 - shot Text-to-SQL with ChatGPT [[Paper](https://arxiv.org/pdf/2307.07306)] [[Code](https://github.com/bigbigwatermalon/C3SQL)]
- AAAI2025 - Correction Guideline for In-Context Text-to-SQL [[Paper](https://arxiv.org/pdf/2406.12692)] [[Code](https://github.com/microsoft/SynQo)]
- Findings2020 - based Text-to-SQL through Workflow Paradigm [[Paper](https://aclanthology.org/2024.findings-acl.641.pdf)] [[Code](https://github.com/FlyingFeather/DEA-SQL)]
- Findings2023 - to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies [[Paper](https://aclanthology.org/2023.findings-emnlp.996.pdf)]
- arXiv2025 - SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL [[Paper](https://arxiv.org/pdf/2502.11438)]
 
- 
                    Post-annotated Datasets
- 
                    Fine-tuning- NAACL2025 - SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation [[Paper](https://aclanthology.org/2025.naacl-long.107.pdf)] [[Code](https://github.com/layer6ai-labs/msc-sql)]
- arXiv2025 - based Schema Linking for LLM-based Text-to-SQL Generation [[Paper](https://arxiv.org/pdf/2502.12911)]
- ACL2025 - based Hierarchical Action CorREction Assistant for Text-to-SQL [[Paper](https://aclanthology.org/2025.acl-long.552.pdf)] [[Code](https://github.com/quge2023/SHARE)]
- COLM2024 - AI-Lab/StructLM)]
- SIGMOD2024 - source Language Models for Text-to-SQL [[Paper](https://dl.acm.org/doi/10.1145/3654930)] [[Code](https://github.com/RUCKBReasoning/codes)]
- ACL2024 - LLM: Towards Foundational Symbol-centric Interface For Large Language Models [[Paper](https://aclanthology.org/2024.acl-long.707.pdf)] [[Code](https://github.com/xufangzhi/Symbol-LLM)]
- VLDB2024 - to-SQL Empowered by Large Language Models: A Benchmark Evaluation [[Paper](https://www.vldb.org/pvldb/vol17/p1132-gao.pdf)] [[Code](https://github.com/BeachWang/DAIL-SQL)]
- ICML2024 - ai-lab/Consistency_LLM)]
- arXiv2024 - SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL [[Paper](https://arxiv.org/pdf/2404.12560)] [[Code](https://github.com/mercatorhq/dubo-sql)]
- ICLR2025 - to-SQL [[Paper](https://openreview.net/pdf?id=BAglD6NGy0)] [[Code](https://github.com/D2I-ai/Route)]
- COLING2025 - SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [[Paper](https://aclanthology.org/2025.coling-main.36.pdf)] [[Code](https://github.com/wbbeyourself/MAC-SQL)]
- Findings2024 - SQL: Decomposed Text-to-SQL with Small Large Language Models [[Paper](https://aclanthology.org/2024.findings-emnlp.481.pdf)] [[Code](https://github.com/MohammadrezaPourreza/DTS-SQL)]
- NeurIPS202 - to-SQL in the Age of Well-Reasoned Language Models [[Paper](https://openreview.net/pdf?id=fglyh5pa7d)]
 
 
- 
                    
- 
            ðĶ Projects- 
                    Fine-tuning- SQLGlot - io/tobymao/sqlglot)
- DB-GPT - ai/DB-GPT?style=social)](https://github.com/eosphoros-ai/DB-GPT/stargazers)
- DB-GPT-Hub - ai/DB-GPT-Hub?style=social)](https://github.com/eosphoros-ai/DB-GPT-Hub/stargazers)
- Awesome-Text2SQL - ai/Awesome-Text2SQL?style=social)](https://github.com/premAI-io/premsql/stargazers)
- PremSQL - io/premsql?style=social)](https://github.com/premAI-io/premsql/stargazers)
 
 
- 
                    
            Programming Languages
          
          
        
            Keywords
          
          
              
                llm
                4
              
              
                database
                3
              
              
                text-to-sql
                3
              
              
                gpt
                2
              
              
                rag
                2
              
              
                sql
                2
              
              
                nl2sql
                2
              
              
                text2sql
                2
              
              
                bgi
                1
              
              
                agents
                1
              
              
                tsql
                1
              
              
                trino
                1
              
              
                transpiler
                1
              
              
                sqlparser
                1
              
              
                sqlite
                1
              
              
                spark
                1
              
              
                snowflake
                1
              
              
                redshift
                1
              
              
                python
                1
              
              
                presto
                1
              
              
                postgres
                1
              
              
                parser
                1
              
              
                optimizer
                1
              
              
                mysql
                1
              
              
                hive
                1
              
              
                duckdb
                1
              
              
                databricks
                1
              
              
                clickhouse
                1
              
              
                opensource
                1
              
              
                llama
                1
              
              
                text2dsl
                1
              
              
                text2api
                1
              
              
                test
                1
              
              
                survey
                1
              
              
                nlp
                1
              
              
                nl-to-sql
                1
              
              
                hacktoberfest2024
                1
              
              
                finetuning
                1
              
              
                deep-learning
                1
              
              
                datset
                1
              
              
                awesome
                1
              
              
                ai
                1
              
              
                fine-tuning
                1
              
              
                datasets
                1
              
              
                vicuna
                1
              
              
                security
                1
              
              
                private
                1
              
              
                langchain
                1
              
              
                gpt-4
                1
              
              
                bigquery
                1