An open API service indexing awesome lists of open source software.

https://github.com/deas/r00ki

R00ki : Taking Rook Ceph from localhost to Prod
https://github.com/deas/r00ki

ceph kubernetes rook

Last synced: about 2 months ago
JSON representation

R00ki : Taking Rook Ceph from localhost to Prod

Awesome Lists containing this project

README

        




R00ki : Taking Rook Ceph from localhost to Production ๐Ÿงช







Report Bug
ยท
Request Feature


Table of Contents



  1. About The Project



  2. Getting Started


  3. TODO

  4. Known Issues

  5. References

  6. License


## About The Project

Kubernetes Storage. Rook. The Boss Fight. Still a bit messy. But it works. Most of the time.

There must be a reason [Red Hat OpenShift Data Foundation](https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation) is expensive ...

Now seriously: Storage is one of the most critical bits in general. Many workloads are stateful, and not every Kubernetes infrastructure solves the problem nicely. That was where I found myself a few times in the past. We we given virtual machines with basic disks attached - VMware VMDKs in my case. Customers were in demand of ... you name it - everything: RWX-/RWO Volumes, S3, Snapshots, Backup/Recovery - superfast and always available. The code reflects these roots.

Disclaimer: We started by borrowing proven things from the Rook project - adapted them as we went along.

Demo creating a Minikube cluster and running a few tests ๐Ÿช„๐ŸŽฉ๐Ÿฐ

```sh
make apply-r00ki-aio test-csi-io test-csi-snapshot test-velero
```

![Demo](./assets/demo.gif)

### Goals

- Awesome local first Rook Ceph Dev Experience
- First Class Observability
- Fail early and loud (Notifications)
- Simplicity (yes, really)
- Composability
- Target `minikube`, vanilla Kubernetes and Openshift.
- Add the Rook Ops bits not covered by the Operator
- Declarative trumps Imperative

### Non Goals

### Decisions

- ArgoCD is great, but `helmfile` appears even better for our use case
- We aim for first class citizens. For Rook, it's the helm charts, for some operators, its OLM Subscriptions.

### Features

We cover:

- Single (All in Once Cluster) Deployments targetting `minikube` and Production Kubernetes (including Openshift)
- Two Cluster Deployments (Service and Consumer) targetting `minikube` and Production Kubernetes (including Openshift)
- Kube-Prometheus bits all wired up - including alerts
- Shiny Dashboards (including Grafana)
- Seamless integration with ArgoCD, specifically [`deas/argcocd-conductor`](https://github.com/deas/argcocd-conductor)

(back to top)

## Getting Started

Some opinions first:

- Ceph is complex
- Automating Trust Relationships is hard

### Prerequisites

- `make`
- `minikube`
- `kubectl`
- `helmfile`

### Usage

Run

```sh
make
```

shows help for basic tasks and give you an idea where to start.

We want lifecycle of things (Create/Destroy) to be as fast as possible. We ship support to levarage registry mirrors using pull through.

(back to top)

## TODO

- Use `dyff` to separate out value files?
- Separate out Observability, add Logging and Alerting
- Support for Mon v2
- Support for TLS/encryption
- Replace imperative bits by declarative ones
- Introduce Pentesting - maybe even Chaos Scenarios
- Improve Observability / Include Alerts
- Smoketests in CI
- Cleanup bits aroud `TODO` tags sprinkled across the code
- Use LVM instead of raw disks/partitions?
- Performance: How/When do multiple disks per node make sense?
- Exercise Upgrade/Recreate and Desaster Recovery + build tests
- Introduce unhappy path tests -likely leveraging Litmus
- Proper cascaded removal of `CephCluster`?
- Finding-/cleaning up orphans (volumes or buckets)
- Go deeper with `nix`/`devenv` - maybe even replace `mise`

(back to top)

## Known Issues

- With kvm + minikube, there appears to be a timing issue with helm when used via helmfile. `helm upgrade` sometimes fails due to CRDs unavailable - s. [fix: clear the discovery cache after CRDs are installed](https://github.com/helm/helm/pull/6332)
- ["To sum up: the Docker daemon does not currently support multiple registry mirrors ..."](https://blog.alexellis.io/how-to-configure-multiple-docker-registry-mirrors/) -> `minikube start --registry-mirror="http://yourmirror"`
- kvm network dns(masq) slow from minikube kubernetes. Times out for s3.
Patching coredns gets around the issue.
- mons on port 3300 (workaround: use port 6789 / `ROOK_EXTERNAL_CEPH_MON_DATA`): `2024-12-16T16:56:02.784+0000 7fd593d1c000 -1 failed for service _ceph-mon._tcp
mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized
Warning FailedMount 2m25s kubelet (combined from similar events): MountVolume.MountDevice failed for volume "pvc-026c86e8-9ee4-4261-a7e4-083011b80494" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 192.168.122.231:3300:/volumes/csi/csi-vol-7072e90c-5d6b-477b-bbab-655b76d0425f/e8d828a3-a1ad-4a22-9b36-7d5bc9fe9026 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/f172f41f387d01c38f46e71a4097304d70c35494e81e1c8a070549de56234790/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-2436134297,mds_namespace=myfs,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon`
- [Looking up Monitors through DNS](https://docs.ceph.com/en/latest/rados/configuration/mon-lookup-dns/)
- [OperatorHub Sub Outdated - at 1.1.1](https://operatorhub.io/operator/rook-ceph)

## References

- [Monitor OpenShift Virtualization using user-defined projects and Grafana](https://developers.redhat.com/articles/2024/08/19/monitor-openshift-virtualization-using-user-defined-projects-and-grafana)
- [How to create a long lived service account token in RHOCP4](https://access.redhat.com/solutions/7025261)

## Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks again!

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

(back to top)

## License

Distributed under the MIT License. See `LICENSE.txt` for more information.

(back to top)