{"id":13606151,"url":"https://github.com/databricks/megablocks","last_synced_at":"2025-05-13T23:05:39.371Z","repository":{"id":65917420,"uuid":"593423394","full_name":"databricks/megablocks","owner":"databricks","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-29T17:29:57.000Z","size":4128,"stargazers_count":1347,"open_issues_count":43,"forks_count":193,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-05-06T16:07:46.607Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-26T00:24:56.000Z","updated_at":"2025-05-05T21:03:58.000Z","dependencies_parsed_at":"2023-11-16T01:25:30.637Z","dependency_job_id":"782d12d6-d15c-4bb7-be22-e4037f95c992","html_url":"https://github.com/databricks/megablocks","commit_stats":{"total_commits":227,"total_committers":27,"mean_commits":8.407407407407407,"dds":0.5859030837004405,"last_synced_commit":"84286de8ab5be0c73928a0059f50c7e2b650e4b1"},"previous_names":["databricks/megablocks","stanford-futuredata/megablocks"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fmegablocks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fmegablocks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fmegablocks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fmegablocks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks","download_url":"https://codeload.github.com/databricks/megablocks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254040701,"owners_count":22004595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:01:06.540Z","updated_at":"2025-05-13T23:05:34.362Z","avatar_url":"https://github.com/databricks.png","language":"Python","readme":"# :robot: MegaBlocks\n\nMegaBlocks is a light-weight library for mixture-of-experts (MoE) training. The core of the system is efficient \"dropless-MoE\" ([dMoE](megablocks/layers/dmoe.py), [paper](https://arxiv.org/abs/2211.15841)) and standard [MoE](megablocks/layers/moe.py) layers.\n\nMegaBlocks is integrated with [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), where we support data, expert and pipeline parallel training of MoEs. Stay tuned for tighter integration with Databricks libraries and tools!\n\n# :rocket: Performance\n\n![MegaBlocks Performance](media/dropping_end_to_end.png)\n\nMegaBlocks dMoEs outperform MoEs trained with [Tutel](https://github.com/microsoft/tutel) by up to **40%** compared to Tutel's best performing `capacity_factor` configuration. MegaBlocks dMoEs use a reformulation of MoEs in terms of block-sparse operations, which allows us to avoid token dropping without sacrificing hardware efficiency. In addition to being faster, MegaBlocks simplifies MoE training by removing the `capacity_factor` hyperparameter altogether. Compared to dense Transformers trained with [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), MegaBlocks dMoEs can accelerate training by as much as **2.4x**. Check out our [paper](https://arxiv.org/abs/2211.15841) for more details!\n\n# :building_construction: Installation\n\nNOTE: This assumes you have `numpy` and `torch` installed.\n\n**Training models with Megatron-LM:** We recommend using NGC's [`nvcr.io/nvidia/pytorch:23.09-py3`](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags) PyTorch container. The [Dockerfile](Dockerfile) builds on this image with additional dependencies. To build the image, run `docker build . -t megablocks-dev` and then `bash docker.sh` to launch the container. Once inside the container, install MegaBlocks with `pip install .`. See [Usage](#steam_locomotive-usage) for instructions on training MoEs with MegaBlocks + Megatron-LM.\n\n**Using MegaBlocks in other packages:** To install the MegaBlocks package for use in other frameworks, run `pip install megablocks`. For example, [Mixtral-8x7B](https://mistral.ai/news/mixtral-of-experts/) can be run with [vLLM](https://github.com/vllm-project/vllm) + MegaBlocks with this installation method.\n\n**Extras:** MegaBlocks has optional dependencies that enable additional features.\n\nInstalling `megablocks[gg]` enables dMoE computation with grouped GEMM. This feature is enabled by setting the `mlp_impl` argument to `grouped`. This is currently our recommended path for Hopper-generation GPUs.\n\nInstalling `megablocks[dev]` allows you to contribute to MegaBlocks and test locally. Installing `megablocks[testing]` allows you to test via Github Actions. If you've installed megablocks[dev], you can run pre-commit install to configure the pre-commit hook to automatically format the code.\n\nMegaBlocks can be installed with all dependencies (except for `testing`) via the `megablocks[all]` package.\n\n# :steam_locomotive: Usage\n\nWe provide scripts for pre-training Transformer MoE and dMoE language models under the [top-level directory](megablocks/). The quickest way to get started is to use one of the [experiment launch scripts](exp/). These scripts require a dataset in Megatron-LM's format, which can be created by following their [instructions](https://github.com/NVIDIA/Megatron-LM#data-preprocessing).\n\n# :writing_hand: Citation\n\n```\n@article{megablocks,\n  title={{MegaBlocks: Efficient Sparse Training with Mixture-of-Experts}},\n  author={Trevor Gale and Deepak Narayanan and Cliff Young and Matei Zaharia},\n  journal={Proceedings of Machine Learning and Systems},\n  volume={5},\n  year={2023}\n}\n```\n","funding_links":[],"categories":["A01_文本生成_文本对话","Open Source Libraries","Python","Databricks / formerly Mosaic ML"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fmegablocks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks%2Fmegablocks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fmegablocks/lists"}