{"id":14181757,"url":"https://github.com/ruc-datalab/SC-prompt","last_synced_at":"2025-08-07T14:31:08.297Z","repository":{"id":164690220,"uuid":"627289319","full_name":"ruc-datalab/SC-prompt","owner":"ruc-datalab","description":null,"archived":false,"fork":false,"pushed_at":"2023-05-13T05:19:58.000Z","size":70,"stargazers_count":10,"open_issues_count":2,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-08-18T11:13:38.644Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ruc-datalab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-13T06:54:47.000Z","updated_at":"2024-07-10T03:32:28.000Z","dependencies_parsed_at":"2024-05-11T18:15:11.794Z","dependency_job_id":null,"html_url":"https://github.com/ruc-datalab/SC-prompt","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ruc-datalab%2FSC-prompt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ruc-datalab%2FSC-prompt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ruc-datalab%2FSC-prompt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ruc-datalab%2FSC-prompt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ruc-datalab","download_url":"https://codeload.github.com/ruc-datalab/SC-prompt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229045127,"owners_count":18011445,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-18T11:04:12.417Z","updated_at":"2024-12-10T10:31:12.233Z","avatar_url":"https://github.com/ruc-datalab.png","language":"Python","funding_links":[],"categories":["💬 Classic Model"],"sub_categories":[],"readme":"# SC-prompt\n## Introduction\nThis repository contains the code for the paper \"Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning\". In this paper, we propose SC-Prompt, a novel divide-and-conquer strategy for effectively supporting Text-to-SQL translation in the few-shot scenario. \n\n## Setup\n```sh\ngit clone git@github.com:ruc-datalab/SC-prompt.git\ncd SC-prompt\nmkdir -p -m 777 experimental_outputs\nmkdir -p -m 777 transformers_cache\ncd experimental_outputs\nmkdir -p -m 777 spider\nmkdir -p -m 777 cosql\nmkdir -p -m 777 geoquery\ncd ..\n```\n\n## Dataset Download\n\n- [Spider](https://drive.google.com/uc?export=download\u0026id=1_AckYkinAnhqmRQtGsQgUKAnTHxxX5J0): Put it under `src/datasets/spider`.\n- [Cosql](https://drive.google.com/uc?export=download\u0026id=14x6lsWqlu6gR-aYxa6cemslDN3qT3zxP): Put it under `src/datasets/cosql`.\n- [Geoquery](https://drive.google.com/file/d/1hP4gpExG1EJCN3a1vOyK4XR4mTSFi7Q1/view?usp=share_link): Put it under `src/datasets/geoquery`.\n\n## Code Structure\n\n```sh\n|-- experimental_outputs # save the fine-tuned models and evaluation results\n|-- scripts # the train/inference script\n|-- src\n    |-- datasets # the class to preprocess the dataset \n    |-- metrics # the class to evaluate the prediction results\n    |-- utils # main code\n    |-- run.py # the class to train/inference the few-shot text-to-sql model\n```\n\n## Environment\nOur constrained decoding method is based on the parser provided by [Picard](https://arxiv.org/abs/2109.05093). Please use the Docker image provided by the official [repository](https://github.com/ServiceNow/picard) to build the container.\n\n```sh\ndocker run -itd --gpus '\"device=\u003cyour_available_gpu_ids\u003e\"' --rm --user 13011:13011 --mount type=bind,source=\u003cyour_base_dir\u003e/transformers_cache,target=/transformers_cache --mount type=bind,source=\u003cyour_base_dir\u003e/scripts,target=/app/scripts --mount type=bind,source=\u003cyour_base_dir\u003e/experimental_outputs,target=/app/experimental_outputs --mount type=bind,source=\u003cyour_base_dir\u003e/src,target=/app/src tscholak/text-to-sql-eval:6a252386bed6d4233f0f13f4562d8ae8608e7445\n```\nYou should set `\u003cyour_available_gpu_ids\u003e` and `\u003cyour_base_dir\u003e`.\n\n## Quick Inference\n\nDownload the fine-tuned model and put it under the corresponding folder.\n\n| Dataset | #Train | Model | Folder |\n|-------|--------|--------|---------|\n| Spider | 0.05 (350) | [link](https://drive.google.com/drive/folders/1b-16LFsnVMC5U2JxRew9nKtdOIhVr46j?usp=share_link) | experimental_outputs/spider/ |\n| Spider | 0.1 (700) | [link](https://drive.google.com/drive/folders/16qcI-zcahpB-Y6BUyizLmt3-EMP8_sM7?usp=share_link) | experimental_outputs/spider/ |\n| CoSQL | 0.05 (475) | [link](https://drive.google.com/drive/folders/1DxNdW5oBMQgYm7GE_VfvT9lFrJLcCpLs?usp=share_link) | experimental_outputs/cosql/ |\n| CoSQL | 0.1 (950) | [link](https://drive.google.com/drive/folders/1MhbsPsyhD0RTVYFJ7jiqy8zxxUo2_4kp?usp=share_link) | experimental_outputs/cosql/ |\n| Geoquery | 1. (536) | [link](https://drive.google.com/drive/folders/1Z-akKlTFhiNGdT23kmpU8VFQ3L5XvOgD?usp=share_link) | experimental_outputs/geoquery/ |\n\nUse the scripts to inference.\n```sh\n# Inference on spider\nCUDA_VISIBLE_DEVICES=0 bash scripts/eval_spider_scprompt.sh 0.1\n# Inference on cosql\nCUDA_VISIBLE_DEVICES=0 bash scripts/eval_cosql_scprompt.sh 0.1\n# Inference on geoquery\nCUDA_VISIBLE_DEVICES=0 bash scripts/eval_geoquery_scprompt.sh 1.\n```\n- The second argument refers to the proportion of using the official training set.\n\n## Train from scrach\n```sh\n# Train on spider\nCUDA_VISIBLE_DEVICES=0 bash scripts/train_spider_scprompt.sh 0.1\n# Train on cosql\nCUDA_VISIBLE_DEVICES=0 bash scripts/train_cosql_scprompt.sh 0.1\n# Train on geoquery\nCUDA_VISIBLE_DEVICES=0 bash scripts/train_geoquery_scprompt.sh 1.\n```\n- The second argument refers to the proportion of using the official training set.\n\nThe best model will be automatically saved at `experimental_outputs/`. Please note that training does not use the fine-grained constrained decoding strategy, which is only necessary for evaluation. Please refer to `Quick Inference`to evaluate the fine-tuned model.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fruc-datalab%2FSC-prompt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fruc-datalab%2FSC-prompt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fruc-datalab%2FSC-prompt/lists"}