{"id":15012690,"url":"https://github.com/microsoft/rat-sql","last_synced_at":"2025-04-05T04:12:04.342Z","repository":{"id":45537998,"uuid":"265127379","full_name":"microsoft/rat-sql","owner":"microsoft","description":"A relation-aware semantic parsing model from English to SQL","archived":false,"fork":false,"pushed_at":"2023-08-22T16:19:45.000Z","size":113,"stargazers_count":419,"open_issues_count":41,"forks_count":118,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-03-29T03:08:33.033Z","etag":null,"topics":["dbqa","nl2sql","nlp","program-synthesis","question-answering","semantic-parsing","transformers"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1911.04942","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-19T03:01:10.000Z","updated_at":"2025-03-27T11:42:07.000Z","dependencies_parsed_at":"2024-08-18T11:12:12.065Z","dependency_job_id":"6cb70667-91e5-4886-9cf6-e1f58d9c903d","html_url":"https://github.com/microsoft/rat-sql","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Frat-sql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Frat-sql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Frat-sql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Frat-sql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/rat-sql/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247284951,"owners_count":20913704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dbqa","nl2sql","nlp","program-synthesis","question-answering","semantic-parsing","transformers"],"created_at":"2024-09-24T19:43:04.888Z","updated_at":"2025-04-05T04:12:04.320Z","avatar_url":"https://github.com/microsoft.png","language":"Python","readme":"# RAT-SQL\n\nThis repository contains code for the ACL 2020 paper [\"RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers\"](https://arxiv.org/abs/1911.04942).\n\nIf you use RAT-SQL in your work, please cite it as follows:\n``` bibtex\n@inproceedings{rat-sql,\n    title = \"{RAT-SQL}: Relation-Aware Schema Encoding and Linking for Text-to-{SQL} Parsers\",\n    author = \"Wang, Bailin and Shin, Richard and Liu, Xiaodong and Polozov, Oleksandr and Richardson, Matthew\",\n    booktitle = \"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics\",\n    month = jul,\n    year = \"2020\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n    pages = \"7567--7578\"\n}\n```\n\n## Changelog\n\n**2020-08-14:**\n- The Docker image now inherits from a CUDA-enabled base image.\n- Clarified memory and dataset requirements on the image.\n- Fixed the issue where token IDs were not converted to word-piece IDs for BERT value linking.  \n\n## Usage\n\n### Step 1: Download third-party datasets \u0026 dependencies\n\nDownload the datasets: [Spider](https://yale-lily.github.io/spider) and [WikiSQL](https://github.com/salesforce/WikiSQL). In case of Spider, make sure to download the `08/03/2020` version or newer.\nUnpack the datasets somewhere outside this project to create the following directory structure:\n```\n/path/to/data\n├── spider\n│   ├── database\n│   │   └── ...\n│   ├── dev.json\n│   ├── dev_gold.sql\n│   ├── tables.json\n│   ├── train_gold.sql\n│   ├── train_others.json\n│   └── train_spider.json\n└── wikisql\n    ├── dev.db\n    ├── dev.jsonl\n    ├── dev.tables.jsonl\n    ├── test.db\n    ├── test.jsonl\n    ├── test.tables.jsonl\n    ├── train.db\n    ├── train.jsonl\n    └── train.tables.jsonl\n```\n\nTo work with the WikiSQL dataset, clone its evaluation scripts into this project:\n``` bash\nmkdir -p third_party\ngit clone https://github.com/salesforce/WikiSQL third_party/wikisql\n```\n\n### Step 2: Build and run the Docker image\n\nWe have provided a `Dockerfile` that sets up the entire environment for you.\nIt assumes that you mount the datasets downloaded in Step 1 as a volume `/mnt/data` into a running image.\nThus, the environment setup for RAT-SQL is:\n``` bash\ndocker build -t ratsql .\ndocker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql\n```\nNote that the image requires at least 4 GB of RAM to run preprocessing.\nBy default, [Docker Desktop for Mac](https://hub.docker.com/editions/community/docker-ce-desktop-mac/) and [Docker Desktop for Windows](https://hub.docker.com/editions/community/docker-ce-desktop-windows) run containers with 2 GB of RAM.\nThe `-m4g` switch overrides it; alternatively, you can increase the default limit in the Docker Desktop settings.\n\n\u003e If you prefer to set up and run the codebase without Docker, follow the steps in `Dockerfile` one by one.\n\u003e Note that this repository requires Python 3.7 or higher and a JVM to run [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/).\n\n### Step 3: Run the experiments\n\nEvery experiment has its own config file in `experiments`.\nThe pipeline of working with any model version or dataset is: \n\n``` bash\npython run.py preprocess experiment_config_file  # Step 3a: preprocess the data\npython run.py train experiment_config_file       # Step 3b: train a model\npython run.py eval experiment_config_file        # Step 3b: evaluate the results\n```\n\nUse the following experiment config files to reproduce our results:\n\n* Spider, GloVE version: `experiments/spider-glove-run.jsonnet`\n* Spider, BERT version (requires a GPU with at least 16GB memory): `experiments/spider-bert-run.jsonnet`\n* WikiSQL, GloVE version: `experiments/wikisql-glove-run.jsonnet`\n\nThe exact model accuracy may vary by ±2% depending on a random seed. See [paper](https://arxiv.org/abs/1911.04942) for details.\n\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Frat-sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Frat-sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Frat-sql/lists"}