{"id":13717694,"url":"https://github.com/keon/seq2seq","last_synced_at":"2026-05-24T05:04:19.969Z","repository":{"id":55679178,"uuid":"112837516","full_name":"keon/seq2seq","owner":"keon","description":"Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch","archived":true,"fork":false,"pushed_at":"2020-12-13T10:53:01.000Z","size":32,"stargazers_count":689,"open_issues_count":12,"forks_count":171,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-08-04T00:13:52.065Z","etag":null,"topics":["deep-learning","machine-translation","seq2seq"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/keon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-02T11:42:18.000Z","updated_at":"2024-08-02T03:18:45.000Z","dependencies_parsed_at":"2022-08-15T06:20:31.597Z","dependency_job_id":null,"html_url":"https://github.com/keon/seq2seq","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keon%2Fseq2seq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keon%2Fseq2seq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keon%2Fseq2seq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keon%2Fseq2seq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/keon","download_url":"https://codeload.github.com/keon/seq2seq/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224573514,"owners_count":17333804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-translation","seq2seq"],"created_at":"2024-08-03T00:01:25.727Z","updated_at":"2026-05-24T05:04:19.964Z","avatar_url":"https://github.com/keon.png","language":"Python","funding_links":[],"categories":["Tutorials \u0026 books \u0026 examples｜教程 \u0026 书籍 \u0026 示例","Tutorials, books, \u0026 examples","Python"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"readme":"# mini seq2seq\nMinimal Seq2Seq model with attention for neural machine translation in PyTorch.\n\nThis implementation focuses on the following features:\n\n- Modular structure to be used in other projects\n- Minimal code for readability\n- Full utilization of batches and GPU.\n\nDataset (Multi30k DE→EN) is loaded via HuggingFace [`datasets`](https://github.com/huggingface/datasets); tokenization uses [spaCy](https://spacy.io/).\n\n## Model description\n\n* Encoder: Bidirectional GRU\n* Decoder: GRU with Attention Mechanism\n* Attention: [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473)\n\n![](http://www.wildml.com/wp-content/uploads/2015/12/Screen-Shot-2015-12-30-at-1.16.08-PM.png)\n\n## Requirements\n\n* Python 3.9+\n* PyTorch \u003e= 2.0 (CPU, CUDA, or Apple MPS)\n* `datasets` (HuggingFace, replaces torchtext)\n* Spacy \u003e= 3.7\n\n```\npip install -r requirements.txt\npython -m spacy download de_core_news_sm\npython -m spacy download en_core_web_sm\n```\n\n## Train\n\n```\npython train.py -epochs 30 -batch_size 32 -lr 3e-4\n```\n\nDevice is auto-detected (CUDA → MPS → CPU). Smaller `-hidden_size` / `-embed_size` flags are useful for CPU smoke runs.\n\nSanity check (CPU, 500 batches, hidden=128/embed=64):\n\n| step | train loss | perplexity |\n|------|-----------:|-----------:|\n| init |       9.19 |      9803 |\n|   50 |       6.98 |      1071 |\n|  100 |       5.48 |       239 |\n|  250 |       5.15 |       173 |\n|  500 |       4.84 |       127 |\n\nFinal val loss: **4.93** (random-init prior is `log(|V|) ≈ 9.19`).\n\n## References\n\nBased on the following implementations\n\n* [PyTorch Tutorial](http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)\n* [@spro/practical-pytorch](https://github.com/spro/practical-pytorch)\n* [@AuCson/PyTorch-Batch-Attention-Seq2seq](https://github.com/AuCson/PyTorch-Batch-Attention-Seq2seq)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeon%2Fseq2seq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeon%2Fseq2seq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeon%2Fseq2seq/lists"}