https://github.com/ruc-datalab/SC-prompt

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/ruc-datalab/SC-prompt
Owner: ruc-datalab
Created: 2023-04-13T06:54:47.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-05-13T05:19:58.000Z (about 2 years ago)
Last Synced: 2024-08-18T11:13:38.644Z (11 months ago)
Language: Python
Size: 68.4 KB
Stars: 10
Watchers: 2
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

Awesome-Text2SQL - [code

README

        # SC-prompt

## Introduction

This repository contains the code for the paper "Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning". In this paper, we propose SC-Prompt, a novel divide-and-conquer strategy for effectively supporting Text-to-SQL translation in the few-shot scenario. 

## Setup

```sh

git clone [email protected]:ruc-datalab/SC-prompt.git

cd SC-prompt

mkdir -p -m 777 experimental_outputs

mkdir -p -m 777 transformers_cache

cd experimental_outputs

mkdir -p -m 777 spider

mkdir -p -m 777 cosql

mkdir -p -m 777 geoquery

cd ..

```

## Dataset Download

- [Spider](https://drive.google.com/uc?export=download&id=1_AckYkinAnhqmRQtGsQgUKAnTHxxX5J0): Put it under `src/datasets/spider`.

- [Cosql](https://drive.google.com/uc?export=download&id=14x6lsWqlu6gR-aYxa6cemslDN3qT3zxP): Put it under `src/datasets/cosql`.

- [Geoquery](https://drive.google.com/file/d/1hP4gpExG1EJCN3a1vOyK4XR4mTSFi7Q1/view?usp=share_link): Put it under `src/datasets/geoquery`.

## Code Structure

```sh

|-- experimental_outputs # save the fine-tuned models and evaluation results

|-- scripts # the train/inference script

|-- src

    |-- datasets # the class to preprocess the dataset 

    |-- metrics # the class to evaluate the prediction results

    |-- utils # main code

    |-- run.py # the class to train/inference the few-shot text-to-sql model

```

## Environment

Our constrained decoding method is based on the parser provided by [Picard](https://arxiv.org/abs/2109.05093). Please use the Docker image provided by the official [repository](https://github.com/ServiceNow/picard) to build the container.

```sh

docker run -itd --gpus '"device="' --rm --user 13011:13011 --mount type=bind,source=/transformers_cache,target=/transformers_cache --mount type=bind,source=/scripts,target=/app/scripts --mount type=bind,source=/experimental_outputs,target=/app/experimental_outputs --mount type=bind,source=/src,target=/app/src tscholak/text-to-sql-eval:6a252386bed6d4233f0f13f4562d8ae8608e7445

```

You should set `` and ``.

## Quick Inference

Download the fine-tuned model and put it under the corresponding folder.

| Dataset | #Train | Model | Folder |

|-------|--------|--------|---------|

| Spider | 0.05 (350) | [link](https://drive.google.com/drive/folders/1b-16LFsnVMC5U2JxRew9nKtdOIhVr46j?usp=share_link) | experimental_outputs/spider/ |

| Spider | 0.1 (700) | [link](https://drive.google.com/drive/folders/16qcI-zcahpB-Y6BUyizLmt3-EMP8_sM7?usp=share_link) | experimental_outputs/spider/ |

| CoSQL | 0.05 (475) | [link](https://drive.google.com/drive/folders/1DxNdW5oBMQgYm7GE_VfvT9lFrJLcCpLs?usp=share_link) | experimental_outputs/cosql/ |

| CoSQL | 0.1 (950) | [link](https://drive.google.com/drive/folders/1MhbsPsyhD0RTVYFJ7jiqy8zxxUo2_4kp?usp=share_link) | experimental_outputs/cosql/ |

| Geoquery | 1. (536) | [link](https://drive.google.com/drive/folders/1Z-akKlTFhiNGdT23kmpU8VFQ3L5XvOgD?usp=share_link) | experimental_outputs/geoquery/ |

Use the scripts to inference.

```sh

# Inference on spider

CUDA_VISIBLE_DEVICES=0 bash scripts/eval_spider_scprompt.sh 0.1

# Inference on cosql

CUDA_VISIBLE_DEVICES=0 bash scripts/eval_cosql_scprompt.sh 0.1

# Inference on geoquery

CUDA_VISIBLE_DEVICES=0 bash scripts/eval_geoquery_scprompt.sh 1.

```

- The second argument refers to the proportion of using the official training set.

## Train from scrach

```sh

# Train on spider

CUDA_VISIBLE_DEVICES=0 bash scripts/train_spider_scprompt.sh 0.1

# Train on cosql

CUDA_VISIBLE_DEVICES=0 bash scripts/train_cosql_scprompt.sh 0.1

# Train on geoquery

CUDA_VISIBLE_DEVICES=0 bash scripts/train_geoquery_scprompt.sh 1.

```

- The second argument refers to the proportion of using the official training set.

The best model will be automatically saved at `experimental_outputs/`. Please note that training does not use the fine-grained constrained decoding strategy, which is only necessary for evaluation. Please refer to `Quick Inference`to evaluate the fine-tuned model.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ruc-datalab/SC-prompt

Awesome Lists containing this project

README