{"id":18439500,"url":"https://github.com/idiap/drill","last_synced_at":"2025-07-02T04:32:57.964Z","repository":{"id":144961981,"uuid":"185944421","full_name":"idiap/drill","owner":"idiap","description":"Deep residual output layers for neural language generation","archived":false,"fork":false,"pushed_at":"2019-11-30T00:13:09.000Z","size":2170,"stargazers_count":10,"open_issues_count":1,"forks_count":3,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-06-03T11:59:10.597Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idiap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-10T07:54:07.000Z","updated_at":"2020-09-14T01:10:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"7010d1f7-6fa4-4a0f-ba58-59c883876508","html_url":"https://github.com/idiap/drill","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/idiap/drill","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fdrill","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fdrill/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fdrill/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fdrill/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idiap","download_url":"https://codeload.github.com/idiap/drill/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2Fdrill/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263075034,"owners_count":23409862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T06:25:11.488Z","updated_at":"2025-07-02T04:32:57.956Z","avatar_url":"https://github.com/idiap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cb\u003edrill\u003c/b\u003e —\na Pytorch implementation of DRILL which allows to build deep residual output layers for neural language generation, to be presented at ICML 2019 [\u003ca href=\"https://arxiv.org/pdf/1905.05513.pdf\"\u003e1\u003c/a\u003e]. Our deep output mapping explicitly captures the similarity structure of the output space while it avoids overfitting with two different dropout strategies between layers, and preserves useful information with residual connections to the word embeddings and, optionally, to the outputs of previous layers. Evaluations on three language generation tasks show that our output label mapping can match or improve state-of-the-art recurrent and self-attention architectures.\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://arxiv.org/pdf/1905.05513.pdf\"\u003e\u003cimg align=\"center\" src=\"images/arch.png\" alt=\" .\" width=\"420\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n```\n@inproceedings{Pappas_ICML_2019,\n  author    = {Nikolaos Pappas, James Henderson},\n  title     = {Deep Residual Output Layers for Neural Language Generation},\n  booktitle = {Proceedings of the 36th International Conference on Machine Learning},\n  address   = {Long Beach, California},\n  year      = {2019}\n}\n```\n\nInstallation\n------------\nThe code for DRILL requires \u003ca href=\"http://www.python.org/getit/\"\u003ePython 3.5\u003c/a\u003e programming language and \u003ca href=\"http://www.pip-installer.org/en/latest/\"\u003epip\u003c/a\u003e package manager to run.\nFor detailed instructions on how to install them please refer to the corresponding links. Next, you should be able to install the required libraries as follows using the provided list of dependencies: \u003cbr /\u003e\n```\npip install -r dependencies.txt\n```\nTo avoid creating library conflicts in your existing pip environment, it may be more convenient to use a folder-specific pip environment with \u003ca href=\"https://github.com/pypa/pipenv\"\u003epipenv\u003c/a\u003e instead. Our experiments were run on NVIDIA GTX 1080 Ti GPUs with \u003ca href=\"https://developer.nvidia.com/cuda-downloads\"\u003eCUDA 8.0\u003c/a\u003e and \u003ca href=\"https://developer.nvidia.com/cudnn\"\u003ecuDNN 5110\u003c/a\u003e.\n\nTo obtain our pretrained models from Google Drive directly from command line you can download the following script: \n```\nwget https://raw.githubusercontent.com/circulosmeos/gdown.pl/master/gdown.pl; chmod +x gdown.pl;\n```\nTo obtain the corresponding datasets for language modeling and machine translation please follow the instructions from [\u003ca href=\"https://arxiv.org/pdf/1708.02182.pdf\"\u003e2\u003c/a\u003e] and [\u003ca href=\"https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf\"\u003e3\u003c/a\u003e] respectively, e.g. the former can be easily obtained by running \u003ca href=\"https://github.com/salesforce/awd-lstm-lm/blob/master/getdata.sh\"\u003ethis\u003c/a\u003e script.\n \n# Language Modeling\n\nUnder the main folder (./) you can find the code related to the neural language modeling experiments on PennTreebank and Wikitext-2 datasets from [\u003ca href=\"https://arxiv.org/pdf/1708.02182.pdf\"\u003e2\u003c/a\u003e]. Note that a large portion of this repository is borrowed from \u003ca href=\"https://github.com/salesforce/awd-lstm-lm\"\u003eawd-lstm\u003c/a\u003e. Below you can find the commands for training our model on the two examined datasets.\n\nPenn Treebank\n------------\nThe commands for our main results in Table 1 and 4 are the following:\n\n```bash\npython main.py --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 28 --batch_size 20 --epoch 1000 \\ \n--save exp/pen-drill-4l --joint_emb 400 --joint_emb_depth 4 --joint_dropout 0.6 --joint_locked_dropout \\\n--joint_emb_activation Sigmoid\n```\n\n```bash\npython finetune.py --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 28 --batch_size 20 --epoch \\\n1000 --save exp/pen-drill-4l --joint_emb 400 --joint_emb_depth 4 --joint_dropout 0.6  --joint_locked_\\\ndropout --joint_emb_activation Sigmoid\n``` \n\nOur pretrained version of the above model can be obtained as follows:\n\n```\n./gdown.pl https://drive.google.com/open?id=1H6xELj0gaqhiAZJUzPuKJmMqws3WWbLx penn.zip\n```\n\nWikitext-2\n-----------\nThe commands for our main results in Table 2 are the following:\n \n```bash\npython main.py --data data/wiki2 --dropouth 0.2 --seed 1882 --epoch 2000  --save exp/wiki2-drill --joint\\\n_emb 400 --joint_emb_depth 4 --joint_dropout 0.6 --joint_emb_activation ReLU\n```\n\n```bash\npython finetune.py --data data/wiki2 --dropouth 0.2 --seed 1882 --epoch 2000  --save exp/wiki2-drill \\\n--joint_emb 400 --joint_emb_depth 4 --joint_dropout 0.6 --joint_emb_activation ReLU\n```\n\nOur pretrained version of the above model can be obtained as follows:\n\n```\n./gdown.pl https://drive.google.com/open?id=1Mf1GXAGsCgjWNaj2JEGgEe0KsU45K_jo wiki2.zip\n```\n\n# Machine Translation\nUnder the onmt/ folder you can find the code related to the neural machine translation experiments on the WMT 2014 English to German dataset as in [\u003ca href=\"https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf\"\u003e3\u003c/a\u003e]. Due to its large scope, we haven't included the whole \u003ca href=\"https://github.com/OpenNMT/OpenNMT-py\"\u003eOpenNMT-py\u003c/a\u003e framework but one can directly copy our files to the original onmt/ directory. Below you can find the command for training our model.\n\n \nEn-De (32K BPE)\n-----------\n\nThe command for our main results in Table 5 is the following:\n\n```bash\npython train.py -data data/en-de_32k -save_model drill -layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 350000 -max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 -optim adam -adam_beta2 0.998 \n-decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0  -param_init_glorot -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 -world_size 1 -gpu_ranks 0 -share_decoder_embeddings --joint_emb 512 --joint_emb_depth 2 --joint_dropout 0.0 --joint_locked_dropout --joint_emb_activation Sigmoid  \n```\nOur pretrained version of the above model can be obtained as follows:\n\n```\n ./gdown.pl https://drive.google.com/open?id=1xZyZ8P97roMuhRSrxn6HTURgTEs2r9FM en-de_32k.zip \n```\n\nReferences\n------------\n* [1] Nikolaos Pappas, James Henderson, \u003ca href=\"https://arxiv.org/pdf/1806.06219.pdf\"\u003eDeep Residual Output Layers for Neural Language Generation\u003c/a\u003e, Thirty-sixth International Conference on Machine Learning, Long Beach, California, 2019\n* [2]  Stephen Merity, Nitish Shirish Keskar, Richard Socher, \u003ca href=\"https://arxiv.org/pdf/1708.02182.pdf\"\u003eRegularizing and Optimizing LSTM Language Models\u003c/a\u003e, Sixth International Conference on Learning Representations, Vancouver, Canada, 2018\n* [3] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin \u003ca href=\"http://aclweb.org/anthology/D15-1027\"\u003eAttention is All You Need\u003c/a\u003e, Advances in Neural Information Processing Systems, Montreal, Canada, 2018\n \nAcknowledgements\n------------\nThis work was supported by the European Union through SUMMA project (n. 688139) and the Swiss National Science Foundation within INTERPID project (FNS-30106).  \n\n## Contact:\nnpappas@idiap.ch\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fdrill","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidiap%2Fdrill","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fdrill/lists"}