{"id":14181813,"url":"https://github.com/taoyds/syntaxsql","last_synced_at":"2025-08-07T14:31:39.483Z","repository":{"id":95475548,"uuid":"153428054","full_name":"taoyds/syntaxSQL","owner":"taoyds","description":"SyntaxSQLNet: Syntax Tree Networks for Complex and Cross Domain Text-to-SQL Task","archived":false,"fork":false,"pushed_at":"2022-03-22T19:55:29.000Z","size":50,"stargazers_count":133,"open_issues_count":15,"forks_count":40,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-11-30T19:56:13.438Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://yale-lily.github.io/spider","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/taoyds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-17T09:08:08.000Z","updated_at":"2024-11-30T17:35:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"42716eea-9fc7-442a-b1d4-7e2c81012135","html_url":"https://github.com/taoyds/syntaxSQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taoyds%2FsyntaxSQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taoyds%2FsyntaxSQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taoyds%2FsyntaxSQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taoyds%2FsyntaxSQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/taoyds","download_url":"https://codeload.github.com/taoyds/syntaxSQL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229052496,"owners_count":18012564,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-18T11:04:14.061Z","updated_at":"2024-12-10T11:30:19.601Z","avatar_url":"https://github.com/taoyds.png","language":"Python","funding_links":[],"categories":["💬 Classic Model"],"sub_categories":[],"readme":"## SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task\n\nSource code of our EMNLP 2018 paper: [SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task\n](https://arxiv.org/abs/1810.05237).\n\n:+1: `03/20/2022`: **We open-sourced a simple but SOTA model (just T5) for 20 tasks including text-to-SQL! Please check out our code in the [UnifiedSKG repo](https://github.com/hkunlp/unifiedskg)!!**\n\n### Citation\n\n```\n@InProceedings{Yu\u0026al.18.emnlp.syntax,\n  author =  {Tao Yu and Michihiro Yasunaga and Kai Yang and Rui Zhang and Dongxu Wang and Zifan Li and Dragomir Radev},\n  title =   {SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task},\n  year =    {2018},  \n  booktitle =   {Proceedings of EMNLP},  \n  publisher =   {Association for Computational Linguistics},\n}\n```\n\n#### Environment Setup\n\n1. The code uses Python 2.7 and [Pytorch 0.2.0](https://pytorch.org/previous-versions/) GPU.\n2. Install Python dependency: `pip install -r requirements.txt`\n\n#### Download Data, Embeddings, Scripts, and Pretrained Models\n1. Download the dataset from [the Spider task website](https://yale-lily.github.io/spider) to be updated, and put `tables.json`, `train.json`, and `dev.json` under `data/` directory.\n2. Download the pretrained [Glove](https://nlp.stanford.edu/data/wordvecs/glove.42B.300d.zip), and put it as `glove/glove.%dB.%dd.txt`\n3. Download `evaluation.py` and `process_sql.py` from [the Spider github page](https://github.com/taoyds/spider)\n4. Download preprocessed train/dev datasets and pretrained models from [here](https://drive.google.com/file/d/1FHEcceYuf__PLhtD5QzJvexM7SNGnoBu/view?usp=sharing). It contains: \n   -`generated_datasets/`\n    - ``generated_data`` for original Spider training datasets, pretrained models can be found at `generated_data/saved_models`\n    - ``generated_data_augment`` for original Spider + augmented training datasets, pretrained models can be found at `generated_data_augment/saved_models`\n\n#### Generating Train/dev Data for Modules\nYou could find preprocessed train/dev data in ``generated_datasets/``.\n\nTo generate them by yourself, update dirs under `TODO` in `preprocess_train_dev_data.py`, and run the following command to generate training files for each module:\n```\npython preprocess_train_dev_data.py train|dev\n```\n\n#### Folder/File Description\n- ``data/`` contains raw train/dev/test data and table file\n- ``generated_datasets/`` described as above\n- ``models/`` contains the code for each module.\n- ``evaluation.py`` is for evaluation. It uses ``process_sql.py``.\n- ``train.py`` is the main file for training. Use ``train_all.sh`` to train all the modules (see below).\n- ``test.py`` is the main file for testing. It uses ``supermodel.sh`` to call the trained modules and generate SQL queries. In practice, and use ``test_gen.sh`` to generate SQL queries.\n- `generate_wikisql_augment.py` for cross-domain data augmentation\n\n\n#### Training\nRun ``train_all.sh`` to train all the modules.\nIt looks like:\n```\npython train.py \\\n    --data_root       path/to/generated_data \\\n    --save_dir        path/to/save/trained/module \\\n    --history_type    full|no \\\n    --table_type      std|no \\\n    --train_component \u003cmodule_name\u003e \\\n    --epoch           \u003cnum_of_epochs\u003e\n```\n\n#### Testing\nRun ``test_gen.sh`` to generate SQL queries.\n``test_gen.sh`` looks like:\n```\nSAVE_PATH=generated_datasets/generated_data/saved_models_hs=full_tbl=std\npython test.py \\\n    --test_data_path  path/to/raw/test/data \\\n    --models          path/to/trained/module \\\n    --output_path     path/to/print/generated/SQL \\\n    --history_type    full|no \\\n    --table_type      std|no \\\n```\n\n#### Evaluation\nFollow the general evaluation process in [the Spider github page](https://github.com/taoyds/spider).\n\n#### Cross-Domain Data Augmentation\nYou could find preprocessed augmented data at `generated_datasets/generated_data_augment`. \n\nIf you would like to run data augmentation by yourself, first download `wikisql_tables.json` and `train_patterns.json` from [here](https://drive.google.com/file/d/13I_EqnAR4v2aE-CWhJ0XQ8c-UlGS9oic/view?usp=sharing), and then run ```python generate_wikisql_augment.py``` to generate more training data. Second, run `get_data_wikisql.py` to generate WikiSQL augment json file. Finally, use `merge_jsons.py` to generate the final spider + wikisql + wikisql augment dataset.\n\n#### Acknowledgement\n\nThe implementation is based on [SQLNet](https://github.com/xiaojunxu/SQLNet). Please cite it too if you use this code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaoyds%2Fsyntaxsql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaoyds%2Fsyntaxsql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaoyds%2Fsyntaxsql/lists"}