{"id":13595572,"url":"https://github.com/localminimum/QANet","last_synced_at":"2025-04-09T13:32:32.932Z","repository":{"id":39483531,"uuid":"109454407","full_name":"localminimum/QANet","owner":"localminimum","description":"A Tensorflow implementation of QANet for machine reading comprehension","archived":false,"fork":false,"pushed_at":"2018-05-30T06:39:26.000Z","size":362,"stargazers_count":984,"open_issues_count":21,"forks_count":310,"subscribers_count":55,"default_branch":"master","last_synced_at":"2024-05-19T05:45:06.544Z","etag":null,"topics":["cnn","machine-comprehension","nlp","squad","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/localminimum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-04T00:24:06.000Z","updated_at":"2024-04-09T08:17:56.000Z","dependencies_parsed_at":"2022-08-09T14:49:24.048Z","dependency_job_id":null,"html_url":"https://github.com/localminimum/QANet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localminimum%2FQANet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localminimum%2FQANet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localminimum%2FQANet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localminimum%2FQANet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/localminimum","download_url":"https://codeload.github.com/localminimum/QANet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223394600,"owners_count":17138582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","machine-comprehension","nlp","squad","tensorflow"],"created_at":"2024-08-01T16:01:52.715Z","updated_at":"2024-11-06T18:31:13.845Z","avatar_url":"https://github.com/localminimum.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# QANet\nA Tensorflow implementation of Google's [QANet](https://openreview.net/pdf?id=B14TlG-RW) (previously Fast Reading Comprehension (FRC)) from [ICLR2018](https://openreview.net/forum?id=B14TlG-RW). (Note: This is not an official implementation from the authors of the paper)\n\nI wrote a blog post about implementing QANet. Check out [here](https://medium.com/@minsangkim/implementing-question-answering-networks-with-cnns-5ae5f08e312b) for more information!\n\nTraining and preprocessing pipeline have been adopted from [R-Net by HKUST-KnowComp](https://github.com/HKUST-KnowComp/R-Net). Demo mode is working. After training, just use `python config.py --mode demo` to run an interactive demo server.\n\nDue to a memory issue, a single head dot-product attention is used as opposed to a 8 heads multi-head attention like in the original paper. The hidden size is also reduced to 96 from 128 due to usage of a GTX1080 compared to a P100 used in the paper. (8GB of GPU memory is insufficient. If you have a 12GB memory GPU please share your training results with us.)\n\nCurrently, the best model reaches EM/F1 = 70.8/80.1 in 60k steps (6~8 hours). Detailed results are listed below.\n\n![Alt text](/../master/screenshots/figure.png?raw=true \"Network Outline\")\n\n## Dataset\nThe dataset used for this task is [Stanford Question Answering Dataset](https://rajpurkar.github.io/SQuAD-explorer/).\nPretrained [GloVe embeddings](https://nlp.stanford.edu/projects/glove/) obtained from common crawl with 840B tokens used for words.\n\n## Requirements\n  * Python\u003e=2.7\n  * NumPy\n  * tqdm\n  * TensorFlow\u003e=1.5\n  * spacy==2.0.9\n  * bottle (only for demo)\n\n## Usage\nTo download and preprocess the data, run\n\n```bash\n# download SQuAD and Glove\nsh download.sh\n# preprocess the data\npython config.py --mode prepro\n```\n\nJust like [R-Net by HKUST-KnowComp](https://github.com/HKUST-KnowComp/R-Net), hyper parameters are stored in config.py. To debug/train/test/demo, run\n\n```bash\npython config.py --mode debug/train/test/demo\n```\n\nTo evaluate the model with the official code, run\n```bash\npython evaluate-v1.1.py ~/data/squad/dev-v1.1.json train/{model_name}/answer/answer.json\n```\n\nThe default directory for the tensorboard log file is `train/{model_name}/event`\n\n### Run in Docker container (optional)\nTo build the Docker image (requires nvidia-docker), run\n\n```\nnvidia-docker build -t tensorflow/qanet .\n```\n\nSet volume mount paths and port mappings (for demo mode)\n\n```\nexport QANETPATH={/path/to/cloned/QANet}\nexport CONTAINERWORKDIR=/home/QANet\nexport HOSTPORT=8080\nexport CONTAINERPORT=8080\n```\n\nbash into the container\n```\nnvidia-docker run -v $QANETPATH:$CONTAINERWORKDIR -p $HOSTPORT:$CONTAINERPORT -it --rm tensorflow/qanet bash\n```\n\nOnce inside the container, follow the commands provided above starting with downloading the SQuAD and Glove datasets.\n\n### Pretrained Model\nPretrained model weights are temporarily not available.\n\n## Detailed Implementaion\n\n  * The model adopts character level convolution - max pooling - highway network for input representations similar to [this paper by Yoon Kim](https://arxiv.org/pdf/1508.06615.pdf).\n  * The encoder consists of positional encoding - depthwise separable convolution - self attention - feed forward structure with layer norm in between.\n  * Despite the original paper using 200, we observe that using a smaller character dimension leads to better generalization.\n  * For regularization, a dropout of 0.1 is used every 2 sub-layers and 2 blocks.\n  * Stochastic depth dropout is used to drop the residual connection with respect to increasing depth of the network as this model heavily relies on residual connections.\n  * Query-to-Context attention is used along with Context-to-Query attention, which seems to improve the performance more than what the paper reported. This may be due to the lack of diversity in self attention due to 1 head (as opposed to 8 heads) which may have repetitive information that the query-to-context attention contains.\n  * Learning rate increases from 0.0 to 0.001 in the first 1000 steps in inverse exponential scale and fixed to 0.001 from 1000 steps.\n  * At inference, this model uses shadow variables maintained by the exponential moving average of all global variables.\n  * This model uses a training / testing / preprocessing pipeline from [R-Net](https://github.com/HKUST-KnowComp/R-Net) for improved efficiency.\n\n## Results\nHere are the collected results from this repository and the original paper.\n\n|      Model     | Training Steps | Size | Attention Heads | Data Size (aug) |  EM  |  F1  |\n|:--------------:|:--------------:|:----:|:---------------:|:---------------:|:----:|:----:|\n|       My model |     35,000     |  96  |        1        |   87k (no aug)  | 69.0 | 78.6 |\n|       My model |     60,000     |  96  |        1        |   87k (no aug)  | 70.4 | 79.6 |\n|       My model ( reported by [@jasonbw](https://github.com/jasonwbw))|     60,000     |  128  |        1        |   87k (no aug)  | 70.7 | 79.8 |\n|       My model ( reported by [@chesterkuo](https://github.com/chesterkuo))|     60,000     |  128  |        8        |   87k (no aug)  | 70.8 | 80.1 |\n| Original Paper |     35,000     |  128 |        8        |   87k (no aug)  |  NA  | 77.0 |\n| Original Paper |     150,000    |  128 |        8        |   87k (no aug)  | 73.6 | 82.7 |\n| Original Paper |     340,000    |  128 |        8        |    240k (aug)   | 75.1 | 83.8 |\n\n## TODO's\n- [x] Training and testing the model\n- [x] Add trilinear function to Context-to-Query attention\n- [x] Apply dropouts + stochastic depth dropout\n- [x] Query-to-context attention\n- [x] Realtime Demo\n- [ ] Data augmentation by paraphrasing\n- [ ] Train with full hyperparameters (Augmented data, 8 heads, hidden units = 128)\n\n## Tensorboard\nRun tensorboard for visualisation.\n```shell\n$ tensorboard --logdir=./\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocalminimum%2FQANet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flocalminimum%2FQANet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocalminimum%2FQANet/lists"}