{"id":13720087,"url":"https://github.com/iterative/dvc-checkpoints-mnist","last_synced_at":"2025-05-07T12:30:39.550Z","repository":{"id":40920137,"uuid":"342694112","full_name":"iterative/dvc-checkpoints-mnist","owner":"iterative","description":"Example of checkpoints in a DVC project training a simple convolutional neural net to classify MNIST data","archived":true,"fork":false,"pushed_at":"2022-06-22T19:34:14.000Z","size":40,"stargazers_count":6,"open_issues_count":3,"forks_count":5,"subscribers_count":15,"default_branch":"live","last_synced_at":"2025-05-01T05:03:10.514Z","etag":null,"topics":["example"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iterative.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-26T20:32:42.000Z","updated_at":"2024-11-19T20:53:03.000Z","dependencies_parsed_at":"2022-08-25T12:21:30.682Z","dependency_job_id":null,"html_url":"https://github.com/iterative/dvc-checkpoints-mnist","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fdvc-checkpoints-mnist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fdvc-checkpoints-mnist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fdvc-checkpoints-mnist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fdvc-checkpoints-mnist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iterative","download_url":"https://codeload.github.com/iterative/dvc-checkpoints-mnist/tar.gz/refs/heads/live","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252876289,"owners_count":21818157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["example"],"created_at":"2024-08-03T01:00:59.651Z","updated_at":"2025-05-07T12:30:39.075Z","avatar_url":"https://github.com/iterative.png","language":"Python","funding_links":[],"categories":["Tutorials"],"sub_categories":["Iterative"],"readme":"# dvc-checkpoints-mnist\n\nThis example DVC project demonstrates the different ways to employ\n[Checkpoint Experiments](https://dvc.org/doc/user-guide/experiment-management#checkpoints-in-source-code)\nwith DVC.\n\nThis scenario uses [DVCLive](https://dvc.org/doc/dvclive) to generate\n[checkpoints](https://dvc.org/doc/api-reference/make_checkpoint) for iterative\nmodel training. The model is a simple convolutional neural network (CNN)\nclassifier trained on the [MNIST](http://yann.lecun.com/exdb/mnist/) data of\nhandwritten digits to predict the digit (0-9) in each image.\n\n\u003cdetails\u003e\n\n\u003csummary\u003e🔄 Switch between scenarios\u003c/summary\u003e\n\u003cbr/\u003e\n\nThis repo has several\n[branches](https://github.com/iterative/dvc-checkpoints-mnist/branches) that\nshow different methods for using checkpoints (using a similar pipeline):\n\n- The [live](https://github.com/iterative/dvc-checkpoints-mnist/tree/live)\n  scenario introduces full-featured checkpoint usage — integrating with\n  [DVCLive](https://github.com/iterative/dvclive).\n- The [basic](https://github.com/iterative/dvc-checkpoints-mnist/tree/basic)\n  scenario uses single-checkpoint experiments to illustrate how checkpoints work\n  in a simple way.\n- The\n  [Python-only](https://github.com/iterative/dvc-checkpoints-mnist/tree/make_checkpoint)\n  variation features the\n  [make_checkpoint](https://dvc.org/doc/api-reference/make_checkpoint) function\n  from DVC's Python API.\n- Contrastingly, the\n  [signal file](https://github.com/iterative/dvc-checkpoints-mnist/tree/signal_file)\n  scenario shows how to make your own signal files (applicable to any\n  programming language).\n- Finally, our\n  [full pipeline](https://github.com/iterative/dvc-checkpoints-mnist/tree/full_pipeline)\n  scenario elaborates on the full-featured usage with a more advanced process.\n\n\u003c/details\u003e\n\n## Setup\n\nTo try it out for yourself:\n\n1. Fork the repository and clone to your local workstation.\n2. Install the prerequisites in `requirements.txt` (if you are using pip, run\n   `pip install -r requirements.txt`).\n\n## Experimenting\n\nStart training the model with `dvc exp run`. It will train for an unlimited\nnumber of epochs, each of which will generate a checkpoint. Use `Ctrl-C` to stop\nat the last checkpoint, and simply `dvc exp run` again to resume.\n\nDVCLive will track performance at each checkpoint. Open `dvclive.html` in your\nweb browser during training to track performance over time (you will need to\nrefresh after each epoch completes to see updates). Metrics will also be logged\nto `.tsv` files in the `dvclive` directory.\n\nOnce you stop the training script, you can view the results of the experiment\nwith:\n\n```bash\n$ dvc exp show\n┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━┓\n┃ Experiment    ┃ Created  ┃ step ┃    acc ┃\n┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━┩\n│ workspace     │ -        │    9 │ 0.4997 │\n│ live          │ 03:43 PM │    - │        │\n│ │ ╓ exp-34e55 │ 03:45 PM │    9 │ 0.4997 │\n│ │ ╟ 2fe819e   │ 03:45 PM │    8 │ 0.4394 │\n│ │ ╟ 3da85f8   │ 03:45 PM │    7 │ 0.4329 │\n│ │ ╟ 4f64a8e   │ 03:44 PM │    6 │ 0.4686 │\n│ │ ╟ b9bee58   │ 03:44 PM │    5 │ 0.2973 │\n│ │ ╟ e2c5e8f   │ 03:44 PM │    4 │ 0.4004 │\n│ │ ╟ c202f62   │ 03:44 PM │    3 │ 0.1468 │\n│ │ ╟ eb0ecc4   │ 03:43 PM │    2 │  0.188 │\n│ │ ╟ 28b170f   │ 03:43 PM │    1 │ 0.0904 │\n│ ├─╨ 9c705fc   │ 03:43 PM │    0 │ 0.0894 │\n└───────────────┴──────────┴──────┴────────┘\n```\n\nYou can manage it like any other DVC\n[experiments](https://dvc.org/doc/start/experiments), including:\n* Run `dvc exp run` again to continue training from the last checkpoint.\n* Run `dvc exp apply [checkpoint_id]` to revert to any of the prior checkpoints\n  (which will update the `model.pt` output file and metrics to that point).\n* Run `dvc exp run --reset` to drop all the existing checkpoints and start from\n  scratch.\n\n## Adding `dvclive` checkpoints to a DVC project\n\nUsing `dvclive` to add checkpoints to a DVC project requires a few additional\nlines of code.\n\nIn your training script, use `dvclive.log()` to log metrics and\n`dvclive.next_step()` to make a checkpoint with those metrics.\nIf you need the current epoch number, use `dvclive.get_step()` (e.g.\nto use a [learning rate\nschedule](https://en.wikipedia.org/wiki/Learning_rate#Learning_rate_schedule)\nor stop training after a fixed number of epochs). See the\n[train.py](https://github.com/iterative/dvc-checkpoints-mnist/blob/live/train.py)\nscript for an example:\n\n```python\n    # Iterate over training epochs.\n    for epoch in itertools.count(dvclive.get_step()):\n        train(model, x_train, y_train)\n        torch.save(model.state_dict(), \"model.pt\")\n        # Evaluate and checkpoint.\n        metrics = evaluate(model, x_test, y_test)\n        for metric, value in metrics.items():\n            dvclive.log(metric, value)\n        dvclive.next_step()\n```\n\nThen, in `dvc.yaml`, add the `checkpoint: true` option to your model output and\na `live` section to your stage output. See\n[dvc.yaml](https://github.com/iterative/dvc-checkpoints-mnist/blob/live/dvc.yaml)\nfor an example:\n\n```yaml\nstages:\n  train:\n    cmd: python train.py\n    deps:\n    - train.py\n    outs:\n    - model.pt:\n        checkpoint: true\n    live:\n      dvclive:\n        summary: true\n        html: true\n```\n\nIf you do not already have a `dvc.yaml` stage, you can use [dvc stage\nadd](https://dvc.org/doc/command-reference/stage/add) to create it:\n\n```bash\n$ dvc stage add -n train -d train.py -c model.pt --live dvclive python train.py\n```\n\nThat's it! For users already familiar with logging metrics in DVC, note that you\nno longer need a `metrics` section in `dvc.yaml` since `dvclive` is already\nlogging metrics.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiterative%2Fdvc-checkpoints-mnist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiterative%2Fdvc-checkpoints-mnist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiterative%2Fdvc-checkpoints-mnist/lists"}