Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MachineLearningSystem/DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
https://github.com/MachineLearningSystem/DAPPLE
Last synced: 3 months ago
JSON representation
An Efficient Pipelined Data Parallel Approach for Training Large Model
- Host: GitHub
- URL: https://github.com/MachineLearningSystem/DAPPLE
- Owner: MachineLearningSystem
- Fork: true (AlibabaPAI/DAPPLE)
- Created: 2022-10-27T11:56:03.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2020-12-11T03:13:47.000Z (about 4 years ago)
- Last Synced: 2024-08-02T19:33:29.372Z (6 months ago)
- Homepage:
- Size: 1.64 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-AI-system - DAPPLE: An Efficient Pipelined Data Parallel Approach for Large Models Training PPOPP'21
README
# DAPPLE: An Efficient Pipelined Data Parallel Approach for Large Models Training
[![](https://img.shields.io/badge/PyPI-HPGO%200.92-blue?logo=python&style=for-the-badge&logoColor=yellow)](https://pypi.org/project/HPGO/)
DAPPLE is a distributed training framework which combines pipeline parallelism
and data parallelism to address aforementioned scheduling and planning challenges with synchronous training.
This framework features a profiler, a [planner](https://github.com/AlibabaPAI/DAPPLE/tree/master/src)
and a runtime system.
The profiler takes a user’s DNN model as input, and profiles execution time, activation and parameter sizes for each layer.
Sample profiling results for some models are given in [profiling results](https://github.com/AlibabaPAI/DAPPLE/tree/master/profiling_results).
Taking profiling results as input, DAPPLE planner generates an optimized hybrid parallelization plan on a given global batch size,
which is further split into multiple micro-batches and scheduled for execution by DAPPLE runtime.This repository contains the source code implementation of DAPPLE's planning results on
5 typical models:
[VGG19](https://github.com/AlibabaPAI/DAPPLE/tree/master/vgg19),
[AmoebaNet](https://github.com/AlibabaPAI/DAPPLE/tree/master/amoeba_net),
[BERT](https://github.com/AlibabaPAI/DAPPLE/tree/master/bert),
[GNMT](https://github.com/AlibabaPAI/DAPPLE/tree/master/gnmt),
and [XLNET](https://github.com/AlibabaPAI/DAPPLE/tree/master/xlnet).## Running the DAPPLE experiments
### DAPPLE Planner
All the planner-related experiments can be reproduced on any machine, regardless of the environment. We've provided a detailed how-to in [`PLANNER_REPRODUCTION.md`](PLANNER_REPRODUCTION.md).### DAPPLE Runtime
Please see the launch script `run.sh` for each model for details.## Using the Planner
### Install from Python PyPI, as a Python3 package
PyPI: [https://pypi.org/project/HPGO/](https://pypi.org/project/HPGO/)```bash
pip3 install HPGO
```### Build from source
```bash
rustup default nightly
cargo build --release
maturin build --release
pip3 install xxx.whl
```### Example Usage of Python API
```python
# Import HPGO Python API
import HPGO
# Construct the Conductor object
# conductor_from_torch_graph_and_seps(profile_filename, profile_batch_size, global_batch_size, devices)
conductor = HPGO.conductor_from_torch_graph_and_seps("./profiling_results/xlnet-36-pbs-1.txt", 1, 128, [8, 16])
result = conductor.py_orchestrate()
print(result)
```## License
The DAPPLE Planner is open sourced under the terms of BSD-3-Clause, details of which can be found in the [`src/LICENSE.md`](src/LICENSE.md) fileThe file [`src/input/torch_graph_py.rs`](src/input/torch_graph_py.rs) contains Python source code from [PipeDream](https://github.com/msr-fiddle/pipedream), which is licensed under the MIT License.