https://github.com/MachineLearningSystem/ModelKeeper

A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
https://github.com/MachineLearningSystem/ModelKeeper

Last synced: 3 months ago
JSON representation

A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup

Host: GitHub
URL: https://github.com/MachineLearningSystem/ModelKeeper
Owner: MachineLearningSystem
Fork: true (SymbioticLab/ModelKeeper)
Created: 2022-10-10T07:22:02.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2022-10-10T03:59:10.000Z (almost 3 years ago)
Last Synced: 2024-11-07T08:42:38.256Z (8 months ago)
Homepage:
Size: 16.2 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-AI-system - ModelKeeper: Accelerating DNN Training via Automated Training Warmup NSDI'23

README

        # ModelKeeper

This repository contains the evvaluation artifacts of our NSDI '23 paper "[ModelKeeper: Accelerating DNN Training via Automated Training Warmup](https://www.usenix.org/conference/nsdi23/presentation/lai)".

**ModelKeeper has been merged as part of [FedScale](https://github.com/SymbioticLab/FedScale) and is actively maintained there. Please try it!**

# Overview

* [Getting Started](#getting-started)

* [Run Experiments](#run-experiments)

* [Repo Structure](#repo-structure)

* [Contact](#contact)

# Getting Started

Our ```install.sh``` will install the following automatically:

* Anaconda Package Manager

* CUDA 10.2

Note: if you prefer different versions of conda and CUDA, please check  comments in `install.sh` for details.

Run the following commands to install ModelKeeper. 

```

source install.sh 

pip install -e .

```

# Run Experiments

# Repo Structure

```

Repo Root

|---- modelkeeper   # Core implementation (e.g., Matcher).

|---- evals         # MK support for different training backends

    |---- ray_tune      # Ray experiments

    |---- nni           # Retiarii experiments

|---- examples      # Toy experiments of model transformation

```

# Notes

please consider to cite our paper if you use the code or data in your research project.

```bibtex

@inproceedings{modelkeeper-nsdi23,

  title={ModelKeeper: Accelerating DNN Training via Automated Training Warmup},

  author={Fan Lai and Yinwei Dai and Harsha V. Madhyastha and Mosharaf Chowdhury},

  booktitle={USENIX Symposium on Networked Systems Design and Implementation (NSDI)},

  year={2023}

}

```

# Contact

Fan Lai ([email protected]) and Yinwei Dai ([email protected]).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/MachineLearningSystem/ModelKeeper

Awesome Lists containing this project

README