Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MachineLearningSystem/bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
https://github.com/MachineLearningSystem/bamboo
Last synced: 3 months ago
JSON representation
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
- Host: GitHub
- URL: https://github.com/MachineLearningSystem/bamboo
- Owner: MachineLearningSystem
- License: mit
- Fork: true (uclasystem/bamboo)
- Created: 2022-10-25T05:21:19.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-07-28T17:32:10.000Z (over 2 years ago)
- Last Synced: 2024-08-02T19:33:16.745Z (6 months ago)
- Homepage:
- Size: 24.4 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-AI-system - Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs NSDI'23
README
Afforable deep learning through resilient preemptible instances.
v0.1 - 01/20/22
# Summary of Bamboo
Bamboo is a system for running large scale DNNs using **pipeline parallelism**
affordably, reliably, and efficiently on spot instances.
It is built on top of [DeepSpeed](https://github.com/microsoft/DeepSpeed).
It uses redundant computation in the pipeline by taking advantage of
pipeline bubbles to enable low-pause recovery from failures.## Setup
Ensure you have the following requirements:
- Python 3.7
- PyTorch 1.10.0Documentation has the following requirements:
- TeX Live
- BiberFirst, create the virtual environment:
python -m venv --system-site-packages venv
source venv/bin/activate
pip install -U pip
pip install -r requirements.txtFor the documentation you may want to create a `~/.latexmkrc` file containing
the following (this example uses Evince):$pdf_previewer = 'start evince';
## Running
Start all commands with the following:
python -m project_pactum
For the documentation, go to the directory of whichever document you want to
build and run the following:latexmk -pvc
This command will recompile the LaTeX file as many times as needed and open it
in your preferred PDF viewer. For modifications keep this command running, and
the document recompiles automatically.