Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MachineLearningSystem/ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
https://github.com/MachineLearningSystem/ModelKeeper
Last synced: about 1 month ago
JSON representation
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
- Host: GitHub
- URL: https://github.com/MachineLearningSystem/ModelKeeper
- Owner: MachineLearningSystem
- Fork: true (SymbioticLab/ModelKeeper)
- Created: 2022-10-10T07:22:02.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2022-10-10T03:59:10.000Z (about 2 years ago)
- Last Synced: 2024-08-02T19:33:31.983Z (5 months ago)
- Homepage:
- Size: 16.2 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-AI-system - ModelKeeper: Accelerating DNN Training via Automated Training Warmup NSDI'23
README
# ModelKeeper
This repository contains the evvaluation artifacts of our NSDI '23 paper "[ModelKeeper: Accelerating DNN Training via Automated Training Warmup](https://www.usenix.org/conference/nsdi23/presentation/lai)".
**ModelKeeper has been merged as part of [FedScale](https://github.com/SymbioticLab/FedScale) and is actively maintained there. Please try it!**
# Overview
* [Getting Started](#getting-started)
* [Run Experiments](#run-experiments)
* [Repo Structure](#repo-structure)
* [Contact](#contact)# Getting Started
Our ```install.sh``` will install the following automatically:
* Anaconda Package Manager
* CUDA 10.2Note: if you prefer different versions of conda and CUDA, please check comments in `install.sh` for details.
Run the following commands to install ModelKeeper.
```
source install.sh
pip install -e .
```# Run Experiments
# Repo Structure
```
Repo Root
|---- modelkeeper # Core implementation (e.g., Matcher).
|---- evals # MK support for different training backends
|---- ray_tune # Ray experiments
|---- nni # Retiarii experiments
|---- examples # Toy experiments of model transformation
```# Notes
please consider to cite our paper if you use the code or data in your research project.
```bibtex
@inproceedings{modelkeeper-nsdi23,
title={ModelKeeper: Accelerating DNN Training via Automated Training Warmup},
author={Fan Lai and Yinwei Dai and Harsha V. Madhyastha and Mosharaf Chowdhury},
booktitle={USENIX Symposium on Networked Systems Design and Implementation (NSDI)},
year={2023}
}
```# Contact
Fan Lai ([email protected]) and Yinwei Dai ([email protected]).