https://github.com/ycliuhw/charm-kubeflow-pytorch-operator
Charm layer for deploying PyTorch Training as part of Kubeflow
https://github.com/ycliuhw/charm-kubeflow-pytorch-operator
Last synced: about 2 months ago
JSON representation
Charm layer for deploying PyTorch Training as part of Kubeflow
- Host: GitHub
- URL: https://github.com/ycliuhw/charm-kubeflow-pytorch-operator
- Owner: ycliuhw
- License: apache-2.0
- Created: 2018-08-23T07:16:52.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-30T08:37:14.000Z (almost 8 years ago)
- Last Synced: 2025-03-12T18:44:28.333Z (over 1 year ago)
- Language: Python
- Size: 8.79 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
PyTorch Training for Kubeflow
=============================
This charm deploys the PyTorch Training component of Kubeflow to Kubernetes
models in Juju.
Usage
=====
To submit models to be trained, you must create a `PyTorchJob` custom resource
in Kubernetes. For example, to submit the distributed mnist model, which is
used for e2e testing, you can use:
```
kubectl create -n $namespace \
-f https://raw.githubusercontent.com/kubeflow/pytorch-operator/master/examples/dist-mnist/pytorch_job_mnist.yaml
```
(Note: The namespace is the name of the Kubernetes model in Juju that this
charm is deployed into.)
You can then check the status of the job via either the TensorFlow Dashboard,
or kubectl:
```
kubectl get -o yaml -n $namespace pytorchjobs dist-mnist-for-e2e-test
```