Awesome-Text2SQL
  
  
    Curated tutorials and resources for Large Language Models, Text2SQL,  Text2DSLγText2APIγText2Vis and more. 
    https://github.com/eosphoros-ai/Awesome-Text2SQL
  
        Last synced: 1 day ago 
        JSON representation
    
- 
            
π Survey
- [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [code
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [code
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 
 - 
            
π Leaderboard
- SeaD+Execution-Guided Decoding - MiniSeek) | **91.2** <br/>(2023/11-MiniSeek) | **80.40** <br/>(2024/05-ExSL + granite-20b-code) | **71.83** <br/>(2024/07-Distillery + GPT-4o) |
 - BRIDGE - [N-best List Rerankers + PICARD](https://arxiv.org/pdf/2210.10668.pdf)) | 80.8 <br/>(2023/07-Hindsight Chain of Thought with GPT-4 and Instructions) | 69.56 <br/>(2024/04-GRA-SQLοΌ | 65.34 <br/>(2024/07-Insights AIοΌ |
 - HydraNet+Execution-Guided Decoding - [SHiP + PICARD](https://arxiv.org/pdf/2212.08785.pdf)) | 85.6 <br/>(2023/10-DPG-SQL + GPT-4 + Self-Correction) | 73.24 <br/>(2024/07-ByteBrain) | 68.87 <br/>(2024/07-ByteBrain) |
 - X-SQL+Execution-Guided Decoding - RESDSQL+T5-1.1-lm100k-xl) | 83.9 <br/>(2023/07-Hindsight Chain of Thought with GPT-4) | 72.63 <br/>(2024/05-[CHESS](https://arxiv.org/pdf/2405.16755)) οΌ | 66.69 <br/>(2024/05-[CHESS](https://arxiv.org/pdf/2405.16755)) |
 - SDSQL - T5-SR) | 82.3 <br/>(2023/06-[C3 + ChatGPT + Zero-Shot](https://arxiv.org/pdf/2307.07306.pdf)) | 71.35 <br/>(2024/01-MCS-SQL + GPT-4) | 65.45 <br/>(2024/01-MCS-SQL + GPT-4) |
 - SeqGenSQL+EG - [RESDSQL-3B + NatSQL](https://arxiv.org/pdf/2302.05965.pdf)) | 78.5 <br/>(2022/11-SeaD + PQL) | 68.82 <br/>(2024/07-Insights AIοΌ | 64.84 <br/>(2024/02-PB-SQL v1) |
 - WikiSQL - lily.github.io/spider)<br/>Exact Match(EM) | [Spider](https://yale-lily.github.io/spider)<br/>Exact Execution(EX) | [BIRD](https://bird-bench.github.io/)<br/> Reward-based Valid Efficiency Score (R-VES) | [BIRD](https://bird-bench.github.io/)<br/>Execution Accuracy (EX) |
 - IE-SQL+Execution-Guided Decoding - CatSQL + GraPPa) | 86.2 <br/>(2023/08-[DAIL-SQL + GPT-4](https://arxiv.org/pdf/2308.15363.pdf)) | 68.44 <br/>(2024/09-[CHASE-SQL + Gemini](https://arxiv.org/abs/2410.01943)) | 72.28 <br/>(2024/08-OpenSearch-SQL, v2 + GPT-4o) |
 - Text2SQLGen + EG - [SΒ²SQL + ELECTRA ](https://arxiv.org/pdf/2203.06958.pdf)) | 79.9 <br/>(2023/02-[RESDSQL-3B + NatSQ](https://arxiv.org/pdf/2302.05965.pdf)) | 65.62 <br/>(2024/07-PURPLE + RED + GPT-4oοΌ | 68.87 <br/>(2024/07-ByteBrain) |
 
 - 
            
π¬ Classic Model
- [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [code
 - ![
 - ![
 - [paper
 - ![
 - [paper
 - [paper
 - ![
 - [paper
 - [code
 - ![
 - [paper
 - [code
 - ![
 - ![
 - [paper
 - [paper
 - [paper
 - [paper
 - [paper
 - [code
 - [paper
 - ![
 - ![ - us/download/details.aspx?id=54253)
 - [paper
 - [paper
 - ![
 - [paper
 - [code
 - [paper
 - [code
 - [paper
 - [code
 - [paper
 - [code
 - [paper
 - [code
 - ![
 - [paper
 - [code
 - [paper
 - [paper
 - [code
 - [code
 - [paper
 - [code
 - [paper
 - [code
 - ![ - 1021.pdf)
 - [paper
 - [code
 - [paper
 - [code
 - [paper
 - [code
 - [paper
 - [code
 - [paper
 - [code
 - [paper
 - [paper
 - [code
 - [paper
 - [paper
 - [code
 - [paper
 - [paper
 - ![ - bench.github.io/)
 - ![ - bench.github.io/)
 - [code
 - [code
 - [code
 - [code
 - ![
 - [code
 - [code
 - [code
 - ![
 - ![ - long.142.pdf)
 - ![ - explorer/)
 - ![ - main.562.pdf)
 - ![
 - ![
 - ![
 - ![
 - ![
 - [paper
 - [paper
 - [paper
 - ![ - Hsuan-Lee/KaggleDBQA)
 - [code
 - ![
 - [paper
 - [code
 - ![ - bench.github.io/)
 - [code
 - ![ - lily.github.io/spider)
 - ![ - main.105.pdf)
 - ![
 - ![
 
 - 
            
π₯ Base Model
- [paper - 6B/blob/main/README.md)] [[model](https://huggingface.co/THUDM/chatglm-6b)]
 - General Language Model
 - [paper - lab/stanford_alpaca)] [[model](https://huggingface.co/tatsu-lab/alpaca-7b-wdiff/tree/main)]
 - [paper - sys/FastChat)] [[model](https://huggingface.co/lmsys)]
 - [paper - 6B/blob/main/README_EN.md)] [[model](https://huggingface.co/THUDM/chatglm2-6b)]
 - [code - inc/Baichuan-7B)]
 - [code - inc/Baichuan-13B-Base)]
 - [paper
 - [paper - llama)]
 - [paper
 - [paper
 - [paper
 - [code - inc)]
 - [paper - 1_5)]
 - phi-1 - 1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters. 2023/12, They propose [Phi-2](https://huggingface.co/microsoft/phi-2), a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
 - [model
 - [paper
 - [code
 - [model
 - [paper
 - [code
 - [model
 - [paper - started/open_weight_models/)] [[model](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)]
 - [paper - 3-mini-128k-instruct)]
 - [paper - llama/llama3)] [[model](https://huggingface.co/meta-llama)]
 - [paper - 110B)]
 - [paper - 6659360b33528ced941e557f)]
 
 - 
            
π‘ Fine-tuning
 - 
            
πͺ Dataset
- [paper
 - [paper - lily.github.io/spider)]
 - [paper
 - [paper - 8zixluQuLa?usp=sharing)]
 - [paper - lily.github.io/cosql)] [[dataset](https://yale-lily.github.io/cosql)]
 - [paper
 - [paper
 - [paper - Hsuan-Lee/KaggleDBQA/)] [[dataset](https://github.com/Chia-Hsuan-Lee/KaggleDBQA/tree/main?tab=readme-ov-file#Data-Format)]
 - [paper - intsoft/chase)] [[dataset](https://github.com/xjtu-intsoft/chase/tree/page/data)]
 
 - 
            
π¦ Libraries
 - 
            
π§ Practice Project
- ![GitHub Repo stars - ai/DB-GPT-Hub/stargazers)
 - last commit
 - sqlcoder
 - ![GitHub Repo stars - ai/sqlcoder/stargazers)
 - last commit
 - modal_finetune_sql
 - ![GitHub Repo stars - llama/modal_finetune_sql/stargazers)
 - last commit
 - ![GitHub Repo stars - Efficient-Tuning/stargazers)
 - last commit
 
 - 
            
π€ Friendship Links
- eosphoros
 - ![GitHub Repo stars - ai)
 - last commit
 - Awesome-AIGC-Tutorials
 - ![GitHub Repo stars - agi/Awesome-AIGC-Tutorials/stargazers)
 - last commit
 - ![Star History Chart - history.com/#eosphoros-ai/Awesome-Text2SQL)
 
 
            Programming Languages
          
          
        
            Categories
          
          
        
            Sub Categories
          
          
            Keywords
          
          
              
                natural-language-processing
                7
              
              
                text-to-sql
                7
              
              
                database
                6
              
              
                llm
                4
              
              
                chatgpt
                4
              
              
                large-language-models
                4
              
              
                nl2sql
                4
              
              
                huggingface
                3
              
              
                gpt-4
                3
              
              
                chinese
                3
              
              
                ceval
                3
              
              
                artificial-intelligence
                3
              
              
                mmlu
                3
              
              
                nlp
                3
              
              
                semantic-parsing
                3
              
              
                natural-language-interface
                3
              
              
                text2sql
                2
              
              
                benchmark
                2
              
              
                question-answering
                2
              
              
                llama
                2
              
              
                awesome
                2
              
              
                gpt
                2
              
              
                deep-learning
                2
              
              
                llama2
                1
              
              
                opensource
                1
              
              
                wikisql
                1
              
              
                transformers
                1
              
              
                program-synthesis
                1
              
              
                dbqa
                1
              
              
                pytorch
                1
              
              
                structured-prediction
                1
              
              
                heterogeneous-graph-neural-network
                1
              
              
                dialog
                1
              
              
                sql
                1
              
              
                fine-tuning
                1
              
              
                datasets
                1
              
              
                symbolic-language
                1
              
              
                structured-knowledge-grounding
                1
              
              
                prompt-learning
                1
              
              
                openai
                1
              
              
                language-model
                1
              
              
                language-binding
                1
              
              
                in-context-learning
                1
              
              
                fact-verification
                1
              
              
                tutorial
                1
              
              
                survey
                1
              
              
                nlp-resources
                1
              
              
                nl-to-sql
                1
              
              
                nl-to-code
                1
              
              
                llms
                1