An open API service indexing awesome lists of open source software.

Awesome-LLM-based-Text2SQL

[TKDE2025] Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | A curated list of resources (surveys, papers, benchmarks, and opensource projects) on large language model-based text-to-SQL.
https://github.com/DEEP-PolyU/Awesome-LLM-based-Text2SQL

Last synced: 3 days ago
JSON representation

  • 📰 Surveys

    • TKDE - Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [[Paper](https://ieeexplore.ieee.org/abstract/document/11160657)] [[Code]()]
    • CSUR2025 - to-SQL Tasks [[Paper]()]
    • TKDE - to-SQL in the Era of LLMs: Where are We, and Where are We Going? [[Paper]()]
    • TKDE
    • VLDBJ2023 - to-SQL [[Paper]()]
    • VLDB2023
    • arXiv2022 - to-SQL Parsing: Concepts, Methods, and Future Directions [[Paper]()]
    • COLING2022 - to-SQL: A Survey of What We Have and What We Expect [[Paper]()]
  • 🏆 Benchmarks

    • BIRD - A Big Bench for Large-Scale Database Grounded Text-to-SQL
    • NeurIPS2025 - SQL + GPT-4 | 74.2 | 85.3 | [[Paper](https://openreview.net/pdf?id=p53QDxSIc5)] [[Code](https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting)] | 2023-04-21 |
    • Spider2.0 - Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
    • BIRD-CRITIC - Can LLMs Fix User Issues in Real-World Database Applications?
    • BIRD-INTERACT - Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
    • Proprietary - | **91.2** | [[Report](https://www.seek.ai/blog/miniseek-first-model-to-surpass-90-accuracy-on-spider-test-benchmark)] | 2023-11-02 |
  • ðŸŠī Taxonomy

    • Fine-tuning

      • arXiv2025 - based Schema Linking for LLM-based Text-to-SQL Generation [[Paper](https://arxiv.org/pdf/2502.12911)]
      • ICLR2025 - to-SQL [[Paper](https://openreview.net/pdf?id=BAglD6NGy0)] [[Code](https://github.com/D2I-ai/Route)]
      • NeurIPS202 - to-SQL in the Age of Well-Reasoned Language Models [[Paper](https://openreview.net/pdf?id=fglyh5pa7d)]
      • COLING2025 - SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [[Paper](https://aclanthology.org/2025.coling-main.36.pdf)] [[Code](https://github.com/wbbeyourself/MAC-SQL)]
      • NAACL2025 - SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation [[Paper](https://aclanthology.org/2025.naacl-long.107.pdf)] [[Code](https://github.com/layer6ai-labs/msc-sql)]
      • ACL2025 - based Hierarchical Action CorREction Assistant for Text-to-SQL [[Paper](https://aclanthology.org/2025.acl-long.552.pdf)] [[Code](https://github.com/quge2023/SHARE)]
      • Findings2024 - SQL: Decomposed Text-to-SQL with Small Large Language Models [[Paper](https://aclanthology.org/2024.findings-emnlp.481.pdf)] [[Code](https://github.com/MohammadrezaPourreza/DTS-SQL)]
      • SIGMOD2024 - source Language Models for Text-to-SQL [[Paper](https://dl.acm.org/doi/10.1145/3654930)] [[Code](https://github.com/RUCKBReasoning/codes)]
      • arXiv2024 - SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL [[Paper](https://arxiv.org/pdf/2404.12560)] [[Code](https://github.com/mercatorhq/dubo-sql)]
      • COLM2024 - AI-Lab/StructLM)]
      • ACL2024 - LLM: Towards Foundational Symbol-centric Interface For Large Language Models [[Paper](https://aclanthology.org/2024.acl-long.707.pdf)] [[Code](https://github.com/xufangzhi/Symbol-LLM)]
      • VLDB2024 - to-SQL Empowered by Large Language Models: A Benchmark Evaluation [[Paper](https://www.vldb.org/pvldb/vol17/p1132-gao.pdf)] [[Code](https://github.com/BeachWang/DAIL-SQL)]
      • ICML2024 - ai-lab/Consistency_LLM)]
    • In-context Learning

      • arXiv2023 - shot Text-to-SQL with ChatGPT [[Paper](https://arxiv.org/pdf/2307.07306)] [[Code](https://github.com/bigbigwatermalon/C3SQL)]
      • AAAI2025 - Correction Guideline for In-Context Text-to-SQL [[Paper](https://arxiv.org/pdf/2406.12692)] [[Code](https://github.com/microsoft/SynQo)]
      • EMNLP2025 - World Large-Scale Multi-Database Text-to-SQL [[Paper](https://arxiv.org/pdf/2503.18596)] [[Code](https://github.com/Satissss/LinkAlign)]
      • NeurIPS2023 - SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction [[Paper](https://openreview.net/pdf?id=p53QDxSIc5)] [[Code](https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting)]
      • ICLR2025 - to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration [[Paper](https://openreview.net/pdf?id=OuFIfDBwQd)] [[Code](https://github.com/Snowflake-Labs/ReFoRCE)]
      • ACL2025 - Based Multi-Agent System for Text-to-SQL Tasks [[Paper](https://aclanthology.org/2025.trl-1.4.pdf)] [[Code](https://github.com/1ring2rta/R3)]
      • ICDE2024 - then-Rank Framework for Natural Language to SQL Translation [[Paper](https://ieeexplore.ieee.org/abstract/document/10597742)] [[Code](https://github.com/Kaimary/MetaSQL)]
      • EMNLP2024 - main.436.pdf)] [[Code](https://github.com/OSU-NLP-Group/Middleware)]
      • ICML2025 - Guided Large Language Models for Text-to-SQL Generation [[Paper](https://openreview.net/pdf?id=gT8JSEFqaS)]
      • Findings2020 - based Text-to-SQL through Workflow Paradigm [[Paper](https://aclanthology.org/2024.findings-acl.641.pdf)] [[Code](https://github.com/FlyingFeather/DEA-SQL)]
      • ICONIP2023 - augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain [[Paper](https://link.springer.com/chapter/10.1007/978-981-99-8076-5_25)]
      • TMLR2024 - PaLM: Improved Large Language Model Adaptation for Text-to-SQL [[Paper](https://openreview.net/pdf?id=rlloVZoKrX)]
      • EMNLP2023 - main.574.pdf)] [[Code](https://github.com/RUCAIBox/StructGPT)]
      • Findings2023 - to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies [[Paper](https://aclanthology.org/2023.findings-emnlp.996.pdf)]
      • PRICAI2023 - 3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval [[Paper](https://link.springer.com/chapter/10.1007/978-981-99-7022-3_23)]
      • ICML2023
      • EMNLP2022 - main.231.pdf)] [[Code](https://github.com/facebookresearch/mbr-exec)]
    • Post-annotated Datasets

  • 🗃ïļ Datasets

    • Original Datasets

      • SIGMOD2025 - Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis [[Paper](https://arxiv.org/pdf/2401.10506)] [[Code](https://github.com/bigbigwatermalon/FinSQL)] [[Dataset](https://drive.google.com/file/d/1OtyFdH9cs-6bEVj8yKK4Zt53N52L_dBH/view?usp=sharing)]<br>
      • EMNLP2020 - Scale and Pragmatic Chinese Text-to-SQL Dataset [[Paper](https://aclanthology.org/2020.emnlp-main.562.pdf)] [[Dataset](https://www.luge.ai/#/luge/dataDetail?id=13)]<br>
      • EMNLP2018 - Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [[Paper](https://aclanthology.org/D18-1425.pdf)] [[Code](https://github.com/taoyds/spider)] [[Dataset](https://drive.google.com/file/d/1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J/view)]<br>
      • arXiv2017
    • Post-annotated Datasets

      • Findings2020 - Vietnames** | A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese [[Paper](https://aclanthology.org/2020.findings-emnlp.364.pdf)] [[Code](https://github.com/VinAIResearch/ViText2SQL)]<br>
      • EMNLP2019 - Domain Semantic Parsing in Context [[Paper](https://aclanthology.org/P19-1443.pdf)] [[Code](https://github.com/taoyds/sparc)] [[Dataset](https://drive.usercontent.google.com/download?id=1Uu7NMHTR1tdQw1t7bAuM7OPU4LElVKfg&export=download&authuser=0)]<br>*Context-dependent; Annotate conversational contents*
      • ICLR2023 - to-SQL Robustness [[Paper](https://openreview.net/pdf?id=Wc5bmZZU9cy)] [[Code](https://github.com/awslabs/diagnostic-robustness-text-to-sql)]<br>
      • ACL2022 - to-SQL Models Against Natural and Realistic Adversarial Table Perturbation [[Paper](https://aclanthology.org/2022.acl-long.142.pdf)] [[Code](https://github.com/microsoft/ContextualSP/tree/master/robustness_of_text_to_sql)] [[Dataset](https://github.com/microsoft/ContextualSP/blob/master/robustness_of_text_to_sql/adveta_1.0.zip)]<br>
      • Findings2022 - SS&CG** | Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment [[Paper](https://aclanthology.org/2022.findings-naacl.62.pdf)] [[Code](https://github.com/ygan/SpiderSS-SpiderCG)] [[Dataset](https://github.com/ygan/SpiderSS-SpiderCG/tree/main/Spider-SS)]<br>
      • EMNLP2021 - DK** | Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization [[Paper](https://aclanthology.org/2021.emnlp-main.702.pdf)] [[Code](https://github.com/ygan/Spider-DK)]<br>*Knowledge-augmented; Adding domain knowledge*
      • ACL2021 - SYN** | Towards Robustness of Text-to-SQL Models against Synonym Substitution [[Paper](https://aclanthology.org/2021.acl-long.195.pdf)] [[Code](https://github.com/ygan/Spider-Syn)]<br>*Knowledge-augmented; Adding domain knowledge*
      • NAACL2021 - Realistic** | Structure-Grounded Pretraining for Text-to-SQL [[Paper](https://aclanthology.org/2021.naacl-main.105.pdf)] [[Dataset](https://zenodo.org/records/5205322)]<br>*Robustness; Removing column names in question*
  • ðŸ“Ķ Projects

    • Fine-tuning

      • SQLGlot - io/tobymao/sqlglot)
      • DB-GPT - ai/DB-GPT?style=social)](https://github.com/eosphoros-ai/DB-GPT/stargazers)
      • DB-GPT-Hub - ai/DB-GPT-Hub?style=social)](https://github.com/eosphoros-ai/DB-GPT-Hub/stargazers)
      • Awesome-Text2SQL - ai/Awesome-Text2SQL?style=social)](https://github.com/premAI-io/premsql/stargazers)
      • PremSQL - io/premsql?style=social)](https://github.com/premAI-io/premsql/stargazers)