{"id":15600992,"url":"https://github.com/lucidrains/soft-moe-pytorch","last_synced_at":"2025-04-05T12:02:34.923Z","repository":{"id":186350760,"uuid":"674841239","full_name":"lucidrains/soft-moe-pytorch","owner":"lucidrains","description":"Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch","archived":false,"fork":false,"pushed_at":"2024-04-24T15:23:45.000Z","size":1441,"stargazers_count":271,"open_issues_count":4,"forks_count":8,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-03-29T11:04:27.944Z","etag":null,"topics":["artificial-intelligence","deep-learning","mixture-of-experts","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lucidrains.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-04T23:46:54.000Z","updated_at":"2025-03-26T17:39:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"d38a9d9c-f92c-41f9-ae08-4f8ac90e5f0a","html_url":"https://github.com/lucidrains/soft-moe-pytorch","commit_stats":{"total_commits":28,"total_committers":1,"mean_commits":28.0,"dds":0.0,"last_synced_commit":"a0d8a4e5481d896da945ebc2c6c800ebf4c7f138"},"previous_names":["lucidrains/soft-moe-pytorch"],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fsoft-moe-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fsoft-moe-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fsoft-moe-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lucidrains%2Fsoft-moe-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lucidrains","download_url":"https://codeload.github.com/lucidrains/soft-moe-pytorch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247332559,"owners_count":20921853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","mixture-of-experts","transformers"],"created_at":"2024-10-03T02:11:12.644Z","updated_at":"2025-04-05T12:02:34.900Z","avatar_url":"https://github.com/lucidrains.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"./soft-moe.1.png\" width=\"450px\"\u003e\u003c/img\u003e\n\n\u003cimg src=\"./soft-moe.2.png\" width=\"450px\"\u003e\u003c/img\u003e\n\n## Soft MoE - Pytorch\n\nImplementation of \u003ca href=\"https://arxiv.org/abs/2308.00951\"\u003eSoft MoE (Mixture of Experts)\u003c/a\u003e, proposed by Brain's Vision team, in Pytorch.\n\nThis MoE has only been made to work with non-autoregressive encoder. However, some recent \u003ca href=\"https://arxiv.org/abs/2305.18295\"\u003etext-to-image models\u003c/a\u003e have started using MoE with great results, so may be a fit there.\n\nIf anyone has any ideas for how to make it work for autoregressive, let me know (through email or discussions). I meditated on it but can't think of a good way. The other issue with the slot scheme is that the routing suffers the quadratic as sequence length increases (much like attention)\n\n## Appreciation\n\n- \u003ca href=\"https://stability.ai/\"\u003eStabilityAI\u003c/a\u003e for the generous sponsorship, as well as my other sponsors out there\n\n- \u003ca href=\"https://github.com/arogozhnikov/einops\"\u003eEinops\u003c/a\u003e for making my life easy\n\n## Install\n\n```bash\n$ pip install soft-moe-pytorch\n```\n\n## Usage\n\n```python\nimport torch\nfrom soft_moe_pytorch import SoftMoE\n\nmoe = SoftMoE(\n    dim = 512,         # model dimensions\n    seq_len = 1024,    # max sequence length (will automatically calculate number of slots as seq_len // num_experts) - you can also set num_slots directly\n    num_experts = 4    # number of experts - (they suggest number of experts should be high enough that each of them get only 1 slot. wonder if that is the weakness of the paper?)\n)\n\nx = torch.randn(1, 1024, 512)\n\nout = moe(x) + x # (1, 1024, 512) - add in a transformer in place of a feedforward at a certain layer (here showing the residual too)\n```\n\nFor an improvised variant that does dynamic slots so that number of slots ~= sequence length, just import `DynamicSlotsSoftMoe` instead\n\n```python\nimport torch\nfrom soft_moe_pytorch import DynamicSlotsSoftMoE\n\n# sequence length or number of slots need not be specified\n\nmoe = DynamicSlotsSoftMoE(\n    dim = 512,         # model dimensions\n    num_experts = 4,   # number of experts\n    geglu = True\n)\n\nx = torch.randn(1, 1023, 512)\n\nout = moe(x) + x # (1, 1023, 512)\n```\n\n## Todo\n\n- [x] address the limitation of number of slots being fixed. think about a way to make dynamic number of slots based on sequence length\n- [ ] once variable sequence length is handled in distributed, add to dynamic soft moe\n- [ ] the dispatch and combine tensors can also be split and moved into the `Experts` class to better distribute work\n\n## Citations\n\n```bibtex\n@misc{puigcerver2023sparse,\n    title \t= {From Sparse to Soft Mixtures of Experts}, \n    author \t= {Joan Puigcerver and Carlos Riquelme and Basil Mustafa and Neil Houlsby},\n    year \t= {2023},\n    eprint \t= {2308.00951},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.LG}\n}\n```\n\n```bibtex\n@misc{shazeer2020glu,\n    title   = {GLU Variants Improve Transformer},\n    author  = {Noam Shazeer},\n    year    = {2020},\n    url     = {https://arxiv.org/abs/2002.05202}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fsoft-moe-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucidrains%2Fsoft-moe-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucidrains%2Fsoft-moe-pytorch/lists"}