{"id":20432364,"url":"https://github.com/captaine/syntaxsqlnet","last_synced_at":"2026-04-29T00:04:41.811Z","repository":{"id":40981106,"uuid":"195004461","full_name":"CaptainE/syntaxsqlnet","owner":"CaptainE","description":"Pytorch implementation of SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task .","archived":false,"fork":false,"pushed_at":"2022-11-22T04:10:35.000Z","size":4802,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-15T18:40:13.723Z","etag":null,"topics":["pytorch","syntaxsqlnet","text-to-sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CaptainE.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-03T07:42:38.000Z","updated_at":"2020-09-08T10:32:08.000Z","dependencies_parsed_at":"2023-01-21T13:45:07.062Z","dependency_job_id":null,"html_url":"https://github.com/CaptainE/syntaxsqlnet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CaptainE%2Fsyntaxsqlnet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CaptainE%2Fsyntaxsqlnet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CaptainE%2Fsyntaxsqlnet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CaptainE%2Fsyntaxsqlnet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CaptainE","download_url":"https://codeload.github.com/CaptainE/syntaxsqlnet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241975156,"owners_count":20051431,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pytorch","syntaxsqlnet","text-to-sql"],"created_at":"2024-11-15T08:14:43.899Z","updated_at":"2026-04-29T00:04:41.781Z","avatar_url":"https://github.com/CaptainE.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Pytorch implementation of [SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task\n](https://arxiv.org/abs/1810.05237).\n\n## Improvements\nOur model contains several improvements over the original model:\n1. The values in WHERE and HAVING conditions, aswell as the value for LIMIT are ignored in the SPIDER evaluation. To be more usefull in practise, our model includes a module for predicting these values. The module is similar to the column predictor, but selects one or more tokens from the question. \n2. A module for predicting the DISTINCT keyword\n3. Added the BETWEEN operator\n4. Improved the column predictor, to make it possible to predict the same column multiple times.\n\nWith these changes, our model achives the following accuracy on easy+medium questions, where we include the values.\n\n| Component   | Accuracy |\n|-------------|----------|\n| SELECT      | 72.5%    |\n| WHERE       | 48.6%    |\n| GROUP BY    | 63.5%    |\n| ORDER BY    | 65.9%    |\n| HAVING      | 88.9%    |\n| LIMIT value | 94.9%    |\n| KEYWORDS    | 94.4%    |\n| **Total**   | 49.0%    |\n\n\n## Setup\n1. Python \u003e= 3.6\n2. Install dependencies using ``pip install -r requirements.txt``\n\n\n### Data\nThe data for the model can be downloaded from [Spider Dataset website](https://yale-lily.github.io/spider). \nNote that this model only focuses on easy and medium difficulty queries, meaning that we don't include multi table queries, like joins or sub-queries.\n\nTo generate augmented data, you also need to download ``wikisql_tables.json`` from [here](https://drive.google.com/file/d/13I_EqnAR4v2aE-CWhJ0XQ8c-UlGS9oic/view?usp=sharing)\n\nThe pretrained embeddings can be downloaded from the [Glove website](https://nlp.stanford.edu/projects/glove/)\n\n\n## Training\nRun ``python train.py`` to train each module\nIt takes the following arguments:\n```\n  --num_layers        Number of layers in the LSTMs\n  --lr                Learnign rate\n  --num_epochs \n                      Number of epochs to train the model\n  --batch_size \n  --name_postfix \n                      Optional postfix of the model name\n  --gpu \n  --hidden_dim \n  --save              Save the model during training\n  --dropout \n  --embedding_dim \n                      Dimension of the embeddings\n  --num_augmentation \n                      Number of additional augmented questions to generate\n  --N_word            Number of trained tokens for the embedding, this just\n                        corresponds to the name\n  --model             Select a model from {column,keyword,andor,agg,distinct,op,having,desasc,limitvalue,value}\n```\n\n# Testing\nRun ``python test.py`` to generate the test results\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaptaine%2Fsyntaxsqlnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcaptaine%2Fsyntaxsqlnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaptaine%2Fsyntaxsqlnet/lists"}