Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iterative/dvc-checkpoints-mnist
Example of checkpoints in a DVC project training a simple convolutional neural net to classify MNIST data
https://github.com/iterative/dvc-checkpoints-mnist
example
Last synced: 2 months ago
JSON representation
Example of checkpoints in a DVC project training a simple convolutional neural net to classify MNIST data
- Host: GitHub
- URL: https://github.com/iterative/dvc-checkpoints-mnist
- Owner: iterative
- Archived: true
- Created: 2021-02-26T20:32:42.000Z (almost 4 years ago)
- Default Branch: live
- Last Pushed: 2022-06-22T19:34:14.000Z (over 2 years ago)
- Last Synced: 2024-08-04T01:13:08.268Z (6 months ago)
- Topics: example
- Language: Python
- Size: 39.1 KB
- Stars: 5
- Watchers: 16
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# dvc-checkpoints-mnist
This example DVC project demonstrates the different ways to employ
[Checkpoint Experiments](https://dvc.org/doc/user-guide/experiment-management#checkpoints-in-source-code)
with DVC.This scenario uses [DVCLive](https://dvc.org/doc/dvclive) to generate
[checkpoints](https://dvc.org/doc/api-reference/make_checkpoint) for iterative
model training. The model is a simple convolutional neural network (CNN)
classifier trained on the [MNIST](http://yann.lecun.com/exdb/mnist/) data of
handwritten digits to predict the digit (0-9) in each image.๐ Switch between scenarios
This repo has several
[branches](https://github.com/iterative/dvc-checkpoints-mnist/branches) that
show different methods for using checkpoints (using a similar pipeline):- The [live](https://github.com/iterative/dvc-checkpoints-mnist/tree/live)
scenario introduces full-featured checkpoint usage โ integrating with
[DVCLive](https://github.com/iterative/dvclive).
- The [basic](https://github.com/iterative/dvc-checkpoints-mnist/tree/basic)
scenario uses single-checkpoint experiments to illustrate how checkpoints work
in a simple way.
- The
[Python-only](https://github.com/iterative/dvc-checkpoints-mnist/tree/make_checkpoint)
variation features the
[make_checkpoint](https://dvc.org/doc/api-reference/make_checkpoint) function
from DVC's Python API.
- Contrastingly, the
[signal file](https://github.com/iterative/dvc-checkpoints-mnist/tree/signal_file)
scenario shows how to make your own signal files (applicable to any
programming language).
- Finally, our
[full pipeline](https://github.com/iterative/dvc-checkpoints-mnist/tree/full_pipeline)
scenario elaborates on the full-featured usage with a more advanced process.## Setup
To try it out for yourself:
1. Fork the repository and clone to your local workstation.
2. Install the prerequisites in `requirements.txt` (if you are using pip, run
`pip install -r requirements.txt`).## Experimenting
Start training the model with `dvc exp run`. It will train for an unlimited
number of epochs, each of which will generate a checkpoint. Use `Ctrl-C` to stop
at the last checkpoint, and simply `dvc exp run` again to resume.DVCLive will track performance at each checkpoint. Open `dvclive.html` in your
web browser during training to track performance over time (you will need to
refresh after each epoch completes to see updates). Metrics will also be logged
to `.tsv` files in the `dvclive` directory.Once you stop the training script, you can view the results of the experiment
with:```bash
$ dvc exp show
โโโโโโโโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโณโโโโโโโโโ
โ Experiment โ Created โ step โ acc โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ workspace โ - โ 9 โ 0.4997 โ
โ live โ 03:43 PM โ - โ โ
โ โ โ exp-34e55 โ 03:45 PM โ 9 โ 0.4997 โ
โ โ โ 2fe819e โ 03:45 PM โ 8 โ 0.4394 โ
โ โ โ 3da85f8 โ 03:45 PM โ 7 โ 0.4329 โ
โ โ โ 4f64a8e โ 03:44 PM โ 6 โ 0.4686 โ
โ โ โ b9bee58 โ 03:44 PM โ 5 โ 0.2973 โ
โ โ โ e2c5e8f โ 03:44 PM โ 4 โ 0.4004 โ
โ โ โ c202f62 โ 03:44 PM โ 3 โ 0.1468 โ
โ โ โ eb0ecc4 โ 03:43 PM โ 2 โ 0.188 โ
โ โ โ 28b170f โ 03:43 PM โ 1 โ 0.0904 โ
โ โโโจ 9c705fc โ 03:43 PM โ 0 โ 0.0894 โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโดโโโโโโโโโ
```You can manage it like any other DVC
[experiments](https://dvc.org/doc/start/experiments), including:
* Run `dvc exp run` again to continue training from the last checkpoint.
* Run `dvc exp apply [checkpoint_id]` to revert to any of the prior checkpoints
(which will update the `model.pt` output file and metrics to that point).
* Run `dvc exp run --reset` to drop all the existing checkpoints and start from
scratch.## Adding `dvclive` checkpoints to a DVC project
Using `dvclive` to add checkpoints to a DVC project requires a few additional
lines of code.In your training script, use `dvclive.log()` to log metrics and
`dvclive.next_step()` to make a checkpoint with those metrics.
If you need the current epoch number, use `dvclive.get_step()` (e.g.
to use a [learning rate
schedule](https://en.wikipedia.org/wiki/Learning_rate#Learning_rate_schedule)
or stop training after a fixed number of epochs). See the
[train.py](https://github.com/iterative/dvc-checkpoints-mnist/blob/live/train.py)
script for an example:```python
# Iterate over training epochs.
for epoch in itertools.count(dvclive.get_step()):
train(model, x_train, y_train)
torch.save(model.state_dict(), "model.pt")
# Evaluate and checkpoint.
metrics = evaluate(model, x_test, y_test)
for metric, value in metrics.items():
dvclive.log(metric, value)
dvclive.next_step()
```Then, in `dvc.yaml`, add the `checkpoint: true` option to your model output and
a `live` section to your stage output. See
[dvc.yaml](https://github.com/iterative/dvc-checkpoints-mnist/blob/live/dvc.yaml)
for an example:```yaml
stages:
train:
cmd: python train.py
deps:
- train.py
outs:
- model.pt:
checkpoint: true
live:
dvclive:
summary: true
html: true
```If you do not already have a `dvc.yaml` stage, you can use [dvc stage
add](https://dvc.org/doc/command-reference/stage/add) to create it:```bash
$ dvc stage add -n train -d train.py -c model.pt --live dvclive python train.py
```That's it! For users already familiar with logging metrics in DVC, note that you
no longer need a `metrics` section in `dvc.yaml` since `dvclive` is already
logging metrics.