https://github.com/wbbeyourself/MAC-SQL

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL
https://github.com/wbbeyourself/MAC-SQL

Last synced: 4 months ago
JSON representation

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

Host: GitHub
URL: https://github.com/wbbeyourself/MAC-SQL
Owner: wbbeyourself
Created: 2023-12-18T11:24:08.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-06-27T07:35:21.000Z (over 1 year ago)
Last Synced: 2024-06-27T08:51:31.919Z (over 1 year ago)
Language: Python
Homepage: https://arxiv.org/abs/2312.11242
Size: 348 KB
Stars: 139
Watchers: 5
Forks: 11
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

Awesome-Text2SQL - [code

README

## 📖Introduction

This is the official repository for the paper ["MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL"](https://arxiv.org/abs/2312.11242).

In this paper, we propose a multi-agent collaborative Text-to-SQL framework MAC-SQL, which comprises three agents: the **Selector**, the **Decomposer**, and the **Refiner**.

# 🔥 Updates
- [**2024.11**] New Our work has been accepted by COLING 2025 [conference paper version](https://aclanthology.org/2025.coling-main.36/). Welcome to cite this paper version.
- [**2024.04**] We have updated the `sql-llama-instruct-v0.5.jsonl` and training scripts in `training_scripts` dir of this project. Please check it out.Download the `sql-llama-data.zip` from [Baidu Dsik](https://pan.baidu.com/s/1yaEBsSN894O7MlBrckciKw?pwd=htwt) or [Google Drive](https://drive.google.com/file/d/1_3s88Op1PCZo50RsHcx5m2Bj_n05PPn4/view?usp=sharing).
Unzip `sql-llama-data.zip` and get the data dir, which contains sql-llama-instruct-v0.5.jsonl (3375 instances).
- [**2024.04**] We have updated the [SQL-Llama-v0.5](https://huggingface.co/IceKingBing) model and data.zip (update dev_gold_schema.json in bird and spider) The download links of the updated data are available on [Baidu Disk](https://pan.baidu.com/s/1jU2li3d-enhzswx8VdNYdg?pwd=hfmk) and [Google Drive](https://drive.google.com/file/d/1kkkNJSmJkZKeZyDFUDG7c4mnkxsrr-om/view?usp=sharing).
- [**2024.02**] We have updated the paper, with updates mainly focusing on experiments and framework details, check it out! [link](https://arxiv.org/abs/2312.11242).
- [**2023.12**] We have updated the paper, with updates mainly focusing on the title, abstract, introduction, some details, and appendix. In addition, we give some bad case examples on `bad_cases` folder, check it out!
- [**2023.12**] We released our first version [paper](https://arxiv.org/abs/2312.11242), [code](https://github.com/wbbeyourself/MAC-SQL). Check it out!

## ⚡Environment

1. Config your local environment.

```bash
conda create -n macsql python=3.9 -y
conda activate macsql
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt')"
```

Note: we use `openai==0.28.1`, which use `openai.ChatCompletion.create` to call api.

2. Edit openai config at **core/api_config.py**, and set related environment variables of Azure OpenAI API.

Currently, we use `gpt-4-1106-preview` (128k version) by default, which is 2.5 times less expensive than the `gpt-4 (8k)` on average.

```bash
export OPENAI_API_BASE="YOUR_OPENAI_API_BASE"
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
```

## 🔧 Data Preparation

In order to prepare the data more quickly, I have packaged the files including the databases of the BIRD dataset and the Spider dataset into `data.zip` and uploaded them.
All files were downloaded on December 19, 2023, ensuring they are the latest version at that moment.
The download links are available on [Baidu Disk](https://pan.baidu.com/s/1jU2li3d-enhzswx8VdNYdg?pwd=hfmk) and [Google Drive](https://drive.google.com/file/d/1kkkNJSmJkZKeZyDFUDG7c4mnkxsrr-om/view?usp=sharing)(update on 2024-04-22).

After downloading the `data.zip` file, you should delete the existing data folder in the project directory and replace it with the unzipped data folder from `data.zip`.

## 🚀 Run

The run script will first run 5 examples in Spider to check environment.
You should open code comments for different usage.

- `run.sh` for Linux/Mac OS
- `run.bat` for Windows OS

For SQL execution demo, you can use `app_bird.py` or `app_spider.py` to get the execution result of your SQL query.

```bash
cd ./scripts
python app_bird.py
python app_spider.py
```

If occur error `/bin/bash^M: bad interpreter` in Linux, use `sed -i -e 's/\r$//' run.sh` to solve it.

## 📝Evaluation Dataset

We evaluate our method on both BIRD dataset and Spider dataset.

EX: Execution Accuracy(%)

VES: Valid Efficiency Score(%)

Refer to our paper for the details.

## 🫡Run SQL-Llama

Download the [SQL-Llama](https://huggingface.co/IceKingBing)(current v0.5 version) and follow the [SQL-Llama-deployment.md](SQL-Llama-deployment.md) to deploy.

Uncomment the `MODEL_NAME = 'CodeLlama-7b-hf'` in `core/api_config.py` to set the global model and comment other `MODEL_NAME = xxx` lines.

Uncomment the `export OPENAI_API_BASE='http://0.0.0.0:8000/v1'` in `run.sh` to set the local model api base.

Then, run `run.sh` to start your local inference.

## 🌟 Project Structure

```txt
├─data # store datasets and databases
| ├─spider
| ├─bird
├─core
| ├─agents.py # define three agents class
| ├─api_config.py # OpenAI API ENV config
| ├─chat_manager.py # manage the communication between agents
| ├─const.py # prompt templates and CONST values
| ├─llm.py # api call function and log print
| ├─utils.py # utils function
├─scripts # sqlite execution flask demo
| ├─app_bird.py
| ├─app_spider.py
| ├─templates
├─evaluation # evaluation scripts
| ├─evaluation_bird_ex.py
| ├─evaluation_bird_ves.py
| ├─evaluation_spider.py
├─bad_cases
| ├─badcase_BIRD(dev)_examples.xlsx
| └badcase_Spider(dev)_examples.xlsx
├─evaluation_bird_ex_ves.sh # bird evaluation script
├─README.md
├─requirements.txt
├─run.py # main run script
├─run.sh # generation and evaluation script
```

## 💬Citation

If you find our work is helpful, please cite as:

```text
@inproceedings{macsql-2025,
title={MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL},
author={Wang, Bing and Ren, Changyu and Yang, Jian and Liang, Xinnian and Bai, Jiaqi and Chai, Linzheng and Yan, Zhao and Zhang, Qian-Wen and Yin, Di and Sun, Xing and others},
booktitle={Proceedings of the 31st International Conference on Computational Linguistics},
pages={540--557},
year={2025}
}
```

## 👍Contributing

We welcome contributions and suggestions!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wbbeyourself/MAC-SQL

Awesome Lists containing this project

README