Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alibaba/easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
https://github.com/alibaba/easydist
automatic-parallelization deep-learning diffusion-models distributed-computing graph-neural-networks high-performance-computing large-language-models llama llm machine-learning
Last synced: 9 days ago
JSON representation
Automated Parallelization System and Infrastructure for Multiple Ecosystems
- Host: GitHub
- URL: https://github.com/alibaba/easydist
- Owner: alibaba
- License: apache-2.0
- Created: 2023-09-07T06:44:32.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-19T12:24:43.000Z (23 days ago)
- Last Synced: 2024-11-30T00:08:42.390Z (13 days ago)
- Topics: automatic-parallelization, deep-learning, diffusion-models, distributed-computing, graph-neural-networks, high-performance-computing, large-language-models, llama, llm, machine-learning
- Language: Python
- Homepage:
- Size: 522 KB
- Stars: 76
- Watchers: 5
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-distributed-ml - EasyDist: Automated Parallelization System and Infrastructure
README
# EasyDist
EasyDist is an automated parallelization system and infrastructure designed for multiple ecosystems, offering the following key features:
- **Usability**. With EasyDist, parallelizing your training or inference code to a larger scale becomes effortless with just a single line of change.
- **Ecological Compatibility**. EasyDist serves as a centralized source of truth for SPMD rules at the operator-level for various machine learning frameworks. Currently, EasyDist currently supports PyTorch, Jax natively, and the TVM Tensor Expression operator for SPMD rules.
- **Infrastructure**. EasyDist decouples auto-parallel algorithms from specific machine learning frameworks and IRs. This design choice allows for the development and benchmarking of different auto-parallel algorithms in a more flexible manner, leveraging the capabilities and abstractions provided by EasyDist.
## One Line of Code for Parallelism
To parallelize your training loop using EasyDist, you can use the `easydist_compile` decorator. Here's an example of how it can be used with PyTorch:
```python
@easydist_compile()
def train_step(net, optimizer, inputs, labels):outputs = net(inputs)
loss = nn.CrossEntropyLoss()(outputs, labels)
loss.backward()optimizer.step()
optimizer.zero_grad()return loss
```This one-line decorator parallelizes the training step. You can find more examples in the [`./examples/`](./examples/) directory.
## Overview
EasyDist introduces the concept of MetaOp and MetaIR to decouple automatic parallelization methods from specific intermediate representations (IR) and frameworks. Additionally, it presents the ShardCombine Algorithm, which defines operator Single-Program, Multiple-Data (SPMD) sharding rules without requiring manual annotations. The architecture of EasyDist is as follows:
## Installation
To install EasyDist, you can use pip and install from PyPI:
```shell
# For PyTorch users
pip install pai-easydist[torch]# For Jax users
pip install pai-easydist[jax] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```If you prefer to install EasyDist from source, you can clone the GitHub repository and then install it with the appropriate extras:
```shell
git clone https://github.com/alibaba/easydist.git && cd easydist# EasyDist with PyTorch installation
pip install -e '.[torch]'# EasyDist with Jax installation
pip install -e '.[jax]' -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```## Contributing
See CONTRIBUTING.md for details.
## Contributors
EasyDist is developed by Alibaba Group and NUS HPC-AI Lab. This work is supported by [Alibaba Innovative Research(AIR)](https://damo.alibaba.com/air/).
## License
EasyDist is licensed under the Apache License (Version 2.0). See LICENSE file.
This product contains some third-party testcases under other open source licenses.
See the NOTICE file for more information.