Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/baguasys/operator

Kubernetes operator for Bagua distributed training job.
https://github.com/baguasys/operator

Last synced: about 2 months ago
JSON representation

Kubernetes operator for Bagua distributed training job.

Awesome Lists containing this project

README

        

# Kubernetes operator for Bagua jobs

This repository implements a kubernetes operator for Bagua distributed training job which supports static and elastic workloads. See [CRD definition](https://github.com/BaguaSys/operator/blob/preonline/config/crd/bases/bagua.kuaishou.com_baguas.yaml).

### Prerequisites
- Kubernetes
- kubectl

### Installation
#### Run the operator locally
```shell

git clone https://github.com/BaguaSys/operator.git
cd operator

# install crd
kubectl apply -f config/crd/bases/bagua.kuaishou.com_baguas.yaml

go run ./main.go
```
#### Deploy the operator
Install Bagua on an existing Kubernetes cluster.
```shell
kubectl apply -f https://raw.githubusercontent.com/BaguaSys/operator/master/deploy/deployment.yaml
```
Enjoy! Bagua will create resources in namespace `bagua`.

### Examples
You can get demos in `config/samples`, and run as follows,
- static mode
```shell

kubectl apply -f config/samples/bagua_v1alpha1_bagua_static.yaml
```
Verify pods are running
```yaml

kubectl get pods

NAME READY STATUS RESTARTS AGE
bagua-sample-static-master-0 1/1 Running 0 45s
bagua-sample-static-worker-0 1/1 Running 0 45s
bagua-sample-static-worker-1 1/1 Running 0 45s
```

- elastic mode
```shell

kubectl apply -f config/samples/bagua_v1alpha1_bagua_elastic.yaml
```
Verify pods are running
```yaml

kubectl get pods

NAME READY STATUS RESTARTS AGE
bagua-sample-elastic-etcd-0 1/1 Running 0 63s
bagua-sample-elastic-worker-0 1/1 Running 0 63s
bagua-sample-elastic-worker-1 1/1 Running 0 63s
bagua-sample-elastic-worker-2 1/1 Running 0 63s
```