{"id":13936520,"url":"https://github.com/eladhoffer/seq2seq.pytorch","last_synced_at":"2026-02-01T19:11:22.129Z","repository":{"id":44331554,"uuid":"93027064","full_name":"eladhoffer/seq2seq.pytorch","owner":"eladhoffer","description":"Sequence-to-Sequence learning using PyTorch","archived":false,"fork":false,"pushed_at":"2019-11-12T10:05:25.000Z","size":3290,"stargazers_count":520,"open_issues_count":9,"forks_count":80,"subscribers_count":23,"default_branch":"master","last_synced_at":"2024-08-08T23:23:55.230Z","etag":null,"topics":["deep-learning","neural-machine-translation","seq2seq"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eladhoffer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-01T07:05:39.000Z","updated_at":"2024-07-01T03:04:48.000Z","dependencies_parsed_at":"2022-09-01T16:40:34.008Z","dependency_job_id":null,"html_url":"https://github.com/eladhoffer/seq2seq.pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladhoffer%2Fseq2seq.pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladhoffer%2Fseq2seq.pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladhoffer%2Fseq2seq.pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eladhoffer%2Fseq2seq.pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eladhoffer","download_url":"https://codeload.github.com/eladhoffer/seq2seq.pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226686728,"owners_count":17666928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","neural-machine-translation","seq2seq"],"created_at":"2024-08-07T23:02:45.043Z","updated_at":"2026-02-01T19:11:22.080Z","avatar_url":"https://github.com/eladhoffer.png","language":"Python","readme":"# Seq2Seq in PyTorch\nThis is a complete suite for training sequence-to-sequence models in [PyTorch](www.pytorch.org). It consists of several models and code to both train and infer using them.\n\nUsing this code you can train:\n* Neural-machine-translation (NMT) models\n* Language models\n* Image to caption generation\n* Skip-thought sentence representations\n* And more...\n \n ## Installation\n ```\n git clone --recursive https://github.com/eladhoffer/seq2seq.pytorch\n cd seq2seq.pytorch; python setup.py develop\n ```\n \n## Models\nModels currently available:\n* Simple Seq2Seq recurrent model\n* Recurrent Seq2Seq with attentional decoder\n* [Google neural machine translation](https://arxiv.org/abs/1609.08144) (GNMT) recurrent model\n* Transformer - attention-only model from [\"Attention Is All You Need\"](https://arxiv.org/abs/1706.03762)\n\n## Datasets\nDatasets currently available:\n\n* WMT16\n* WMT17\n* OpenSubtitles 2016\n* COCO image captions\n* [Conceptual captions](https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html)\n\nAll datasets can be tokenized using 3 available segmentation methods:\n\n* Character based segmentation\n* Word based segmentation\n* Byte-pair-encoding (BPE) as suggested by [bpe](https://arxiv.org/abs/1508.07909) with selectable number of tokens.  \n\nAfter choosing a tokenization method, a vocabulary will be generated and saved for future inference.\n\n\n## Training methods\nThe models can be trained using several methods:\n\n* Basic Seq2Seq - given encoded sequence, generate (decode) output sequence. Training is done with teacher-forcing.\n* Multi Seq2Seq - where several tasks (such as multiple languages) are trained simultaneously by using the data sequences as both input to the encoder and output for decoder.\n* Image2Seq - used to train image to caption generators.\n\n## Usage\nExample training scripts are available in ``scripts`` folder. Inference examples are available in ``examples`` folder.\n\n* example for training a [transformer](https://arxiv.org/abs/1706.03762)\n on WMT16 according to original paper regime:\n```\nDATASET=${1:-\"WMT16_de_en\"}\nDATASET_DIR=${2:-\"./data/wmt16_de_en\"}\nOUTPUT_DIR=${3:-\"./results\"}\n\nWARMUP=\"4000\"\nLR0=\"512**(-0.5)\"\n\npython main.py \\\n  --save transformer \\\n  --dataset ${DATASET} \\\n  --dataset-dir ${DATASET_DIR} \\\n  --results-dir ${OUTPUT_DIR} \\\n  --model Transformer \\\n  --model-config \"{'num_layers': 6, 'hidden_size': 512, 'num_heads': 8, 'inner_linear': 2048}\" \\\n  --data-config \"{'moses_pretok': True, 'tokenization':'bpe', 'num_symbols':32000, 'shared_vocab':True}\" \\\n  --b 128 \\\n  --max-length 100 \\\n  --device-ids 0 \\\n  --label-smoothing 0.1 \\\n  --trainer Seq2SeqTrainer \\\n  --optimization-config \"[{'step_lambda':\n                          \\\"lambda t: { \\\n                              'optimizer': 'Adam', \\\n                              'lr': ${LR0} * min(t ** -0.5, t * ${WARMUP} ** -1.5), \\\n                              'betas': (0.9, 0.98), 'eps':1e-9}\\\"\n                          }]\"\n```\n\n* example for training attentional LSTM based model with 3 layers in both encoder and decoder:\n```\npython main.py \\\n  --save de_en_wmt17 \\\n  --dataset ${DATASET} \\\n  --dataset-dir ${DATASET_DIR} \\\n  --results-dir ${OUTPUT_DIR} \\\n  --model RecurrentAttentionSeq2Seq \\\n  --model-config \"{'hidden_size': 512, 'dropout': 0.2, \\\n                   'tie_embedding': True, 'transfer_hidden': False, \\\n                   'encoder': {'num_layers': 3, 'bidirectional': True, 'num_bidirectional': 1, 'context_transform': 512}, \\\n                   'decoder': {'num_layers': 3, 'concat_attention': True,\\\n                               'attention': {'mode': 'dot_prod', 'dropout': 0, 'output_transform': True, 'output_nonlinearity': 'relu'}}}\" \\\n  --data-config \"{'moses_pretok': True, 'tokenization':'bpe', 'num_symbols':32000, 'shared_vocab':True}\" \\\n  --b 128 \\\n  --max-length 80 \\\n  --device-ids 0 \\\n  --trainer Seq2SeqTrainer \\\n  --optimization-config \"[{'epoch': 0, 'optimizer': 'Adam', 'lr': 1e-3},\n                          {'epoch': 6, 'lr': 5e-4},\n                          {'epoch': 8, 'lr':1e-4},\n                          {'epoch': 10, 'lr': 5e-5},\n                          {'epoch': 12, 'lr': 1e-5}]\" \\\n```\n","funding_links":[],"categories":["Python","Paper implementations｜论文实现","Paper implementations"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feladhoffer%2Fseq2seq.pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feladhoffer%2Fseq2seq.pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feladhoffer%2Fseq2seq.pytorch/lists"}