{"id":19888234,"url":"https://github.com/project-codeflare/codeflare-operator","last_synced_at":"2025-05-02T17:31:53.308Z","repository":{"id":72242919,"uuid":"595784045","full_name":"project-codeflare/codeflare-operator","owner":"project-codeflare","description":"Operator for installation and lifecycle management of CodeFlare distributed workload stack","archived":false,"fork":false,"pushed_at":"2024-10-17T14:49:40.000Z","size":1222,"stargazers_count":7,"open_issues_count":35,"forks_count":43,"subscribers_count":14,"default_branch":"main","last_synced_at":"2024-11-01T04:52:20.427Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/project-codeflare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-01-31T19:59:21.000Z","updated_at":"2024-10-17T14:49:38.000Z","dependencies_parsed_at":"2024-04-15T16:16:45.438Z","dependency_job_id":"3a87210d-f98d-4985-970f-2fb9a17d5b1e","html_url":"https://github.com/project-codeflare/codeflare-operator","commit_stats":null,"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Fcodeflare-operator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Fcodeflare-operator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Fcodeflare-operator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Fcodeflare-operator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/project-codeflare","download_url":"https://codeload.github.com/project-codeflare/codeflare-operator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224324499,"owners_count":17292521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T18:06:42.000Z","updated_at":"2025-05-02T17:31:53.286Z","avatar_url":"https://github.com/project-codeflare.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# codeflare-operator\n\nThe CodeFlare-Operator has embedded two controllers, a [RayCluster controller](https://github.com/project-codeflare/codeflare-operator/blob/main/pkg/controllers/raycluster_controller.go) which creates resources including secrets, ingress, routes, service, serviceaccounts, clusterrolebinding resources; all needed for the RayClusters created to work as expected.\n\nThere's an [AppWrapper Controller](https://github.com/project-codeflare/appwrapper/blob/main/internal/controller/appwrapper/appwrapper_controller.go), which is a flexible and workload-agnostic mechanism to enable Kueue to manage a group of Kubernetes resources as a single logical unit and to provide an additional level of automatic fault detection and recovery.\n\nFor each controller, there are webhooks in place that can be found [here](https://github.com/project-codeflare/codeflare-operator/tree/main/pkg/controllers).\n\n\u003c!-- Don't delete these comments, they are used to generate Compatibility Matrix table for release automation --\u003e\n\u003c!-- Compatibility Matrix start --\u003e\nCodeFlare Stack Compatibility Matrix\n\n| Component                    | Version                                                                                           |\n|------------------------------|---------------------------------------------------------------------------------------------------|\n| CodeFlare Operator           | [v1.15.0](https://github.com/project-codeflare/codeflare-operator/releases/tag/v1.15.0)             |\n| CodeFlare-SDK                | [v0.27.0](https://github.com/project-codeflare/codeflare-sdk/releases/tag/v0.27.0)                |\n| AppWrapper                   | [v1.0.4](https://github.com/project-codeflare/appwrapper/releases/tag/v1.0.4)                   |\n| KubeRay                      | [v1.2.2](https://github.com/ray-project/kuberay/releases/tag/v1.2.2)                           |\n| Kueue                        | [v0.10.1](https://github.com/kubernetes-sigs/kueue/releases/tag/v0.10.1)                             |\n\u003c!-- Compatibility Matrix end --\u003e\n\n## Development\n\nRequirements:\n- GNU sed - sed is used in several Makefile command. Using macOS default sed is incompatible, so GNU sed is needed for correct execution of these commands.\n  When you have a version of the GNU sed installed on a macOS you may specify the binary using\n  ```bash\n  # brew install gnu-sed\n  make install -e SED=/usr/local/bin/gsed\n  ```\n- Kind - Kind is used in the kind-e2e command in the Makefile. Follow these instructions for the kind setup \u003ca href=\"https://kind.sigs.k8s.io/docs/user/quick-start/\" target=\"_blank\"\u003ehere\u003c/a\u003e\n\n### Testing\n\nThe e2e tests can be executed locally by running the following commands:\n\n1. Use an existing cluster, or set up a test cluster, e.g.:\n\n    ```bash\n    # Create a KinD cluster\n    make kind-e2e\n    ```\n\n\u003e [!NOTE]\n   Some e2e tests cover the access to services via Ingresses, as end-users would do, which requires access to the Ingress controller load balancer by its IP.\n   For it to work on macOS, this requires installing [docker-mac-net-connect](https://github.com/chipmk/docker-mac-net-connect).\n\n2. Setup the rest of the CodeFlare stack.\n\n   ```bash\n   make setup-e2e\n   ```\n   \n\u003e [!NOTE]\n   Kueue will only activate its Ray integration if KubeRay is installed before Kueue (as done by this make target).\n\n\u003e [!NOTE]\n   In OpenShift the KubeRay operator pod gets random user assigned. This user is then used to run Ray cluster.\n   However the random user assigned by OpenShift doesn't have rights to store dataset downloaded as part of test execution, causing tests to fail.\n   To prevent this failure on OpenShift user should enforce user 1000 for KubeRay and Ray cluster by creating this SCC in KubeRay operator namespace (replace the namespace placeholder):\n\n   ```yaml\n    kind: SecurityContextConstraints\n    apiVersion: security.openshift.io/v1\n    metadata:\n      name: run-as-ray-user\n    seLinuxContext:\n      type: MustRunAs\n    runAsUser:\n      type: MustRunAs\n      uid: 1000\n    users:\n      - 'system:serviceaccount:$(namespace):kuberay-operator'\n   ```\n\n3.  In the /etc/hosts file add the following lines:\n    ```bash\n    127.0.0.1 ray-dashboard-raycluster-test-ns-1.kind\n    127.0.0.1 ray-dashboard-raycluster-test-ns-2.kind\n    ```\n\n4.  Build, push and deploy the codeflare-operator image:\n    ```bash\n    make image-push IMG=\u003cfull-registry\u003e:\u003ctag\u003e\n    make deploy -e IMG=\u003cfull-registry\u003e:\u003ctag\u003e -e ENV=\"e2e\"\n    ```\n\n5.  To run the tests run the command\n    ```bash\n    make test-e2e\n    ```\n\n   Alternatively, You can run the e2e test(s) from your IDE / debugger.\n\n#### Testing on disconnected cluster\n\nTo properly run e2e tests on disconnected cluster user has to provide additional environment variables to properly configure testing environment:\n\n- `CODEFLARE_TEST_PYTORCH_IMAGE` - image tag for image used to run training job\n- `CODEFLARE_TEST_RAY_IMAGE` - image tag for Ray cluster image\n- `MNIST_DATASET_URL` - URL where MNIST dataset is available\n- `PIP_INDEX_URL` - URL where PyPI server with needed dependencies is running\n- `PIP_TRUSTED_HOST` - PyPI server hostname\n\nFor ODH tests additional environment variables are needed:\n\n- `NOTEBOOK_IMAGE_STREAM_NAME` - name of the ODH Notebook ImageStream to be used\n- `ODH_NAMESPACE` - namespace where ODH is installed\n\n## Release\n\n1. Invoke [project-codeflare-release.yaml](https://github.com/project-codeflare/codeflare-operator/actions/workflows/project-codeflare-release.yml)\n2. Once all jobs within the action are completed, verify that compatibility matrix in [README](https://github.com/project-codeflare/codeflare-operator/blob/main/README.md) was properly updated.\n3. Verify that opened pull request to [OpenShift community operators repository](https://github.com/redhat-openshift-ecosystem/community-operators-prod) has proper content.\n4. Once PR is merged, announce the new release in slack and mail lists, if any.\n5. Trigger the [auto-merge-sync workflow](https://github.com/red-hat-data-services/codeflare-operator/actions/workflows/auto-merge-sync.yaml) and verify it ran successfully. This will sync changes to the [ODH CodeFlare-Operator repo](https://github.com/opendatahub-io/codeflare-operator), and the [Red Hat CodeFlare Operator repo](https://github.com/red-hat-data-services/codeflare-operator). Please review the new merge-commit and commit history, and verify changes are also in the latest `rhoai` release branch. - If the auto-merge fails, conflicts must be resolved and force pushed manually to each downstream repository and release branch.\n6. In ODH/CFO verify that the [Build and Push action](https://github.com/opendatahub-io/codeflare-operator/actions/workflows/build-and-push.yaml) was triggered and ran successfully.\n7. Make sure that release automation created a PR updating CodeFlare SDK version in [ODH Notebooks repository](https://github.com/opendatahub-io/notebooks). Make sure the PR gets merged.\n8. Run [ODH CodeFlare Operator release workflow](https://github.com/opendatahub-io/codeflare-operator/actions/workflows/odh-release.yml) to produce ODH CodeFlare Operator release.\n\n### Releases involving part of the stack\n\nThere may be instances in which a new CodeFlare stack release requires releases of only a subset of the stack components. Examples could be hotfixes for a specific component. In these instances:\n\n1. Build updated components as needed:\n    - Build and release [CodeFlare-SDK](https://github.com/project-codeflare/codeflare-sdk)\n\n2. Invoke [tag-and-build.yml](https://github.com/project-codeflare/codeflare-operator/actions/workflows/tag-and-build.yml) GitHub action, this action will create a repository tag, build and push operator image.\n3. Check result of [tag-and-build.yml](https://github.com/project-codeflare/codeflare-operator/actions/workflows/tag-and-build.yml) GitHub action, it should pass.\n4. Verify that compatibility matrix in [README](https://github.com/project-codeflare/codeflare-operator/blob/main/README.md) was properly updated.\n5. Follow the steps 3-6 from the previous section.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-codeflare%2Fcodeflare-operator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fproject-codeflare%2Fcodeflare-operator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-codeflare%2Fcodeflare-operator/lists"}