{"id":13478089,"url":"https://github.com/dvmazur/mixtral-offloading","last_synced_at":"2025-05-15T09:09:00.887Z","repository":{"id":213616544,"uuid":"731858124","full_name":"dvmazur/mixtral-offloading","owner":"dvmazur","description":"Run Mixtral-8x7B models in Colab or consumer desktops","archived":false,"fork":false,"pushed_at":"2024-04-08T08:40:22.000Z","size":267,"stargazers_count":2303,"open_issues_count":27,"forks_count":233,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-04-14T15:02:57.923Z","etag":null,"topics":["colab-notebook","deep-learning","google-colab","language-model","llm","mixture-of-experts","offloading","pytorch","quantization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dvmazur.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-15T03:32:35.000Z","updated_at":"2025-04-14T05:14:35.000Z","dependencies_parsed_at":"2024-01-05T12:31:13.691Z","dependency_job_id":"2db2cd9f-3942-455f-a284-6a44eaa275e3","html_url":"https://github.com/dvmazur/mixtral-offloading","commit_stats":null,"previous_names":["dvmazur/mixtral-offloading"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvmazur%2Fmixtral-offloading","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvmazur%2Fmixtral-offloading/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvmazur%2Fmixtral-offloading/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvmazur%2Fmixtral-offloading/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dvmazur","download_url":"https://codeload.github.com/dvmazur/mixtral-offloading/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254310520,"owners_count":22049470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colab-notebook","deep-learning","google-colab","language-model","llm","mixture-of-experts","offloading","pytorch","quantization"],"created_at":"2024-07-31T16:01:52.299Z","updated_at":"2025-05-15T09:08:55.874Z","avatar_url":"https://github.com/dvmazur.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","Mixture of Experts (Sparse MoE)","Repos"],"sub_categories":["大语言对话模型及数据"],"readme":"# Mixtral offloading\n\nThis project implements efficient inference of [Mixtral-8x7B models](https://mistral.ai/news/mixtral-of-experts/).\n\n## How does it work?\n\nIn summary, we achieve efficient inference of Mixtral-8x7B models through a combination of techniques:\n\n* **Mixed quantization with HQQ**. We apply separate quantization schemes for attention layers and experts to fit the model into the combined GPU and CPU memory.\n* **MoE offloading strategy**. Each expert per layer is offloaded separately and only brought pack to GPU when needed. We store active experts in a LRU cache to reduce GPU-RAM communication when computing activations for adjacent tokens.\n\nFor more detailed information about our methods and results, please refer to our [tech-report](https://arxiv.org/abs/2312.17238).\n\n## Running\n\nTo try this demo, please use the demo notebook: [./notebooks/demo.ipynb](./notebooks/demo.ipynb) or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dvmazur/mixtral-offloading/blob/master/notebooks/demo.ipynb)\n\nFor now, there is no command-line script available for running the model locally. However, you can create one using the demo notebook as a reference. That being said, contributions are welcome!\n\n## Work in progress\n\nSome techniques described in our technical report are not yet available in this repo. However, we are actively working on adding support for them in the near future.\n\nSome of the upcoming features are:\n* Support for other quantization methods\n* Speculative expert prefetching\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvmazur%2Fmixtral-offloading","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdvmazur%2Fmixtral-offloading","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvmazur%2Fmixtral-offloading/lists"}