{"id":11563750,"url":"https://github.com/facebookresearch/transformer-sequential","last_synced_at":"2025-10-03T14:31:05.263Z","repository":{"id":46284471,"uuid":"364956130","full_name":"facebookresearch/transformer-sequential","owner":"facebookresearch","description":"Trains Transformer model variants. Data isn't shuffled between batches.","archived":true,"fork":false,"pushed_at":"2022-10-05T18:24:36.000Z","size":53,"stargazers_count":139,"open_issues_count":2,"forks_count":18,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-12-17T01:38:06.552Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-05-06T15:27:31.000Z","updated_at":"2024-11-30T08:12:45.000Z","dependencies_parsed_at":"2023-01-19T07:01:21.896Z","dependency_job_id":null,"html_url":"https://github.com/facebookresearch/transformer-sequential","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ftransformer-sequential","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ftransformer-sequential/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ftransformer-sequential/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Ftransformer-sequential/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/transformer-sequential/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235139115,"owners_count":18942110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-06-23T05:59:28.377Z","updated_at":"2025-10-03T14:30:59.932Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","funding_links":[],"categories":["时间序列"],"sub_categories":["网络服务_其他"],"readme":"# transformer-sequential\n\nThis repo contains the code for three papers:\n\n- Feedback Transformer\n- Expire-Span\n- Staircase Transformer\n\nThe training code is structured for long sequential modeling with Transformer-like architectures.\n\n## Requirements\n\nYou will need a CUDA-enabled GPU to run the code.\n\n## Setup\n\nRun the following:\n\n```\npip install -r requirements.txt\n```\n\n## Feedback Transformer\n\nIntroduced in [Addressing Some Limitations of Transformers with Feedback Memory](https://arxiv.org/abs/2002.09402v3).\n\n### Running Experiments from the Paper\n\n#### enwik8\n\n|Model|Params|Valid|Test|\n|-|-|-|-|\n|Feedback Transformer|77M|0.984|0.962|\n\n_Numbers are Bits-Per-Character_\n\n```\nbash experiments/feedback/enwik8.sh\n```\n\n#### Algorithmic\n\n|Model|3 Variable|5 Variable|\n|-|-|-|\n|Transformer|33.7|37.5|\n|Feedback Transformer|99.1|92.6|\n\n_Numbers are % Accuracy on Test_\n\n```\nbash experiments/feedback/algorithmic_3var.sh\nbash experiments/feedback/algorithmic_5var.sh\n```\n\n## Expire-Span\n\nIntroduced in [Not All Memories are Created Equal: Learning to Expire](https://ai.facebook.com/research/publications/not-all-memories-are-created-equal).\n\n### Running Experiments from the Paper\n\n#### enwik8\n\n|Model|Params|Valid|Test|\n|-|-|-|-|\n|Expire-Span 12L|38M|1.014|0.994|\n\n_Numbers are Bits-Per-Character_\n\n```\nbash experiments/expire_span/enwik8.sh\n```\n\n#### Object Collision\n\n|Model|Maximum Span|Test Error (%)|\n|-|-|-|\n|Expire-Span|16k|52.2|\n|Expire-Span|32k|36.7|\n|Expire-Span|64k|26.7|\n\n```\nbash experiments/expire_span/object_collision_16k.sh\nbash experiments/expire_span/object_collision_32k.sh\nbash experiments/expire_span/object_collision_64k.sh\n```\n\n## Staircase\n\nIntroduced in [Staircase Attention for Recurrent Processing of Sequences](https://arxiv.org/pdf/2106.04279.pdf).\nNote this algorithmic task in this repo is slightly different from what was used in the paper, while the number might not exactly match, it does show the same trend as in the paper. And the model implementation / hyperparameter remains the same.\n\n### Running Experiments from the Paper\n\n#### Algorithmic\n\n|Model|Test|\n|-|-|\n|Transformer|58.44%|\n|Staircase Transformer| 3.6%|\n\n_Numbers are % error rate on Test_\n\n```\nbash experiments/staircase/algorithmic_3var.sh\n```\n\n## License\n\nThe code is licensed under CC-BY-NC license. See the LICENSE file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Ftransformer-sequential","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Ftransformer-sequential","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Ftransformer-sequential/lists"}