{"id":49291092,"url":"https://github.com/k-krew/omen","last_synced_at":"2026-04-26T00:04:38.043Z","repository":{"id":349342300,"uuid":"1192231818","full_name":"k-krew/omen","owner":"k-krew","description":"A lightweight, declarative chaos engineering operator for Kubernetes","archived":false,"fork":false,"pushed_at":"2026-04-12T20:04:07.000Z","size":162,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-12T22:08:43.270Z","etag":null,"topics":["chaos-engineering","chaos-testing","controller-runtime","fault-injection","golang","helm-chart","kubebuilder","kubernetes","kubernetes-operator","reliability","sre"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/k-krew.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-26T02:37:31.000Z","updated_at":"2026-04-12T20:01:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"f670af79-3ed1-4e39-9b32-08d2a5139445","html_url":"https://github.com/k-krew/omen","commit_stats":null,"previous_names":["k-krew/omen"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/k-krew/omen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-krew%2Fomen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-krew%2Fomen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-krew%2Fomen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-krew%2Fomen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/k-krew","download_url":"https://codeload.github.com/k-krew/omen/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/k-krew%2Fomen/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32280982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"ssl_error","status_checked_at":"2026-04-25T18:29:32.149Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chaos-engineering","chaos-testing","controller-runtime","fault-injection","golang","helm-chart","kubebuilder","kubernetes","kubernetes-operator","reliability","sre"],"created_at":"2026-04-26T00:04:34.693Z","updated_at":"2026-04-26T00:04:38.029Z","avatar_url":"https://github.com/k-krew.png","language":"Go","readme":"![Claude Assisted](https://img.shields.io/badge/Made%20with-Claude-8A2BE2?logo=anthropic)\n![CI](https://github.com/k-krew/omen/actions/workflows/ci.yml/badge.svg)\n\n# Omen\n\nA lightweight Kubernetes chaos engineering operator with transparent target selection and optional manual approval.\n\n## Overview\n\nOmen lets you declaratively define chaos experiments against your workloads. Each run:\n\n1. Selects a fixed set of target pods (preview)\n2. Optionally waits for manual approval\n3. Executes the chaos action against those exact targets\n4. Records per-target results and a summary\n\nTwo CRDs are provided:\n\n- **Experiment** — defines the schedule, target selector, action, safety limits, and approval policy\n- **ExperimentRun** — a single execution instance created by the controller, holding the target preview, approval state, and results\n\n## Roadmap\n\nCurious about what's coming next? Check out our [Roadmap](ROADMAP.md) to see our plans for advanced target filtering, ChatOps integrations, and more!\n\n## Breaking Changes in v0.3.0\n\nVersion 0.3.0 introduces a major architectural shift to an **opt-in sidecar model** for network chaos, replacing the old ephemeral containers approach.\n\n- **Namespace Opt-in Required:** You must explicitly label target namespaces with `chaos.kreicer.dev/enabled=true`.\n- **Sidecar Injection:** A mutating webhook now automatically injects the `omen-agent` sidecar into pods in enabled namespaces.\n- **Removed Flags:** The `ProtectedNamespaces` CLI flag and Helm value have been completely removed in favor of the new opt-in label system.\n\n## Install via Helm\n\n```bash\nhelm install omen oci://ghcr.io/k-krew/charts/omen \\\n  --namespace omen-system \\\n  --create-namespace \\\n  --version \u003cversion\u003e\n```\n\nTo customise the installation:\n\n```bash\nhelm install omen oci://ghcr.io/k-krew/charts/omen \\\n  --namespace omen-system \\\n  --create-namespace \\\n  --version \u003cversion\u003e \\\n  --set manager.leaderElect=true \\\n  --set resources.limits.memory=256Mi \\\n  --set manager.agentImage=\"ghcr.io/k-krew/omen-agent:\u003cversion\u003e\" \\\n  --set manager.agentPort=9999\n```\n\n### Controller flags\n\n| Flag | Default | Description |\n|---|---|---|\n| `--webhook-timeout` | `10s` | Timeout for outgoing approval webhook HTTP requests. |\n| `--leader-elect` | `false` | Enable leader election for HA deployments. |\n| `--metrics-bind-address` | `0` | Address for the metrics endpoint (`0` disables it). |\n| `--health-probe-bind-address` | `:8081` | Address for liveness/readiness probes. |\n| `--agent-image` | `ghcr.io/k-krew/omen-agent:v0.3.1` | Container image injected as the `omen-agent` sidecar into target pods. |\n| `--agent-port` | `9999` | Port the agent sidecar listens on. Change if it conflicts with application ports. |\n\n## Examples\n\nReady-to-apply YAML manifests live in the [`examples/`](examples/) directory:\n\n| File | Description |\n|---|---|\n| [`delete-pod-once.yaml`](examples/delete-pod-once.yaml) | One-shot pod deletion, fixed count |\n| [`delete-pod-percent.yaml`](examples/delete-pod-percent.yaml) | One-shot pod deletion, percentage-based |\n| [`delete-pod-repeat-approval.yaml`](examples/delete-pod-repeat-approval.yaml) | Recurring deletion with manual approval and webhook notification |\n| [`network-fault-latency.yaml`](examples/network-fault-latency.yaml) | Inject 100ms latency + 10ms jitter for 5 minutes |\n| [`network-fault-packet-loss.yaml`](examples/network-fault-packet-loss.yaml) | Drop 30% of packets for 3 minutes |\n| [`network-fault-blackhole.yaml`](examples/network-fault-blackhole.yaml) | Complete network blackhole (100% packet loss) with approval gate |\n\nTo approve a pending run:\n\n```bash\nkubectl patch experimentrun \u003crun-name\u003e \\\n  --type=merge \\\n  -p '{\"spec\":{\"approved\":true}}'\n```\n\n## Action Types\n\n### `delete_pod`\n\nDeletes the selected pods. Supports `force: true` for immediate deletion (grace period 0).\n\n### `network_fault`\n\nInjects network chaos into target pods using Linux Traffic Control (`tc netem`). The controller sends HTTP requests to the `omen-agent` sidecar running inside each target pod, which applies and removes the fault. The fault is automatically rolled back after the configured `duration`.\n\n**Prerequisite:** The target namespace must be labeled `chaos.kreicer.dev/enabled=true` so that the sidecar is injected (see [Architecture](#architecture) below).\n\n**Parameters (`spec.action.networkFault`):**\n\n| Field | Type | Description |\n|---|---|---|\n| `latency` | duration | Fixed delay added to outgoing packets (e.g., `100ms`). |\n| `jitter` | duration | Random variation on top of latency (e.g., `10ms`). Requires `latency`. |\n| `packetLoss` | integer (1-100) | Percentage of packets to drop. Set to `100` for a full blackhole. |\n| `duration` | duration | How long to hold the fault before automatic rollback. Defaults to `5m`. |\n\nAt least one of `latency` or `packetLoss` must be set.\n\n## Architecture\n\nOmen uses an **opt-in sidecar model**. Chaos is only allowed in namespaces explicitly labeled with `chaos.kreicer.dev/enabled=true`. A Mutating Webhook automatically injects the `omen-agent` sidecar into all new pods in these namespaces.\n\n```\nkubectl label namespace \u003ctarget-ns\u003e chaos.kreicer.dev/enabled=true\n```\n\nThe controller then:\n1. Selects targets only from pods that live in labeled namespaces.\n2. For `delete_pod`: deletes the pod via the Kubernetes API.\n3. For `network_fault`: sends an HTTP `POST /network-fault` to the agent sidecar inside the pod to apply `tc` rules, then `DELETE /network-fault` after the duration to roll back.\n\n### Security\n\nThe `omen-agent` sidecar requires the `NET_ADMIN` Linux capability to run `tc` commands. This means namespaces used for network chaos must allow it via Pod Security Admission:\n\n```bash\nkubectl label namespace \u003ctarget-ns\u003e \\\n  pod-security.kubernetes.io/enforce=baseline\n```\n\nCommunication between the controller and agents is authenticated with a shared token (generated by Helm and stored in a Kubernetes Secret). The token is automatically injected into each agent sidecar as `OMEN_SECRET_TOKEN` by the mutating webhook.\n\nA `NetworkPolicy` is shipped with the Helm chart that restricts ingress to agent sidecars so only the controller pod can reach them.\n\n### Pre-flight registry check\n\nOn startup, the controller performs a TCP connectivity check to the agent image registry. If the registry is unreachable (e.g., in an air-gapped cluster without proper registry credentials), sidecar injection is **disabled** automatically so that user pods are never blocked by `ImagePullBackOff`. A warning is logged:\n\n```\nWARNING: agent image registry is not reachable — sidecar injection will be disabled\n```\n\n## Safety: Pod-level Opt-out\n\nIndividual pods can be excluded from all chaos experiments by adding the annotation `chaos.kreicer.dev/ignore: \"true\"`. Annotated pods are neither injected with the agent sidecar nor selected as targets.\n\n```bash\nkubectl annotate pod \u003cpod-name\u003e chaos.kreicer.dev/ignore=true\n```\n\nOr in the pod template:\n\n```yaml\nmetadata:\n  annotations:\n    chaos.kreicer.dev/ignore: \"true\"\n```\n\nExperiment-level protection is also available via `spec.safety.denyNamespaces`:\n\n```yaml\nspec:\n  safety:\n    denyNamespaces:\n      - my-critical-namespace\n```\n\n## Observability\n\nEvery phase transition of an `ExperimentRun` emits a standard Kubernetes Event on the object:\n\n```bash\nkubectl describe experimentrun \u003crun-name\u003e\n```\n\nEvents use `Normal` type for successful transitions (`PreviewGenerated`, `Approved`, `Running`, `Completed`) and `Warning` for failure states (`Failed`, `Expired`).\n\nThe `TOTAL` column in `kubectl get expruns` is populated as soon as targets are selected during the `PreviewGenerated` phase, so you can see how many pods will be affected before the run executes.\n\n## Safe Deletion\n\n`Experiment` objects carry a finalizer (`chaos.omen.com/finalizer`). When an `Experiment` is deleted, the controller first deletes all owned `ExperimentRun`s and waits for them to be removed before releasing the finalizer.\n\n`ExperimentRun`s executing a `network_fault` action carry an additional finalizer (`chaos.omen.com/network-fault`). Before the run object is removed, the controller sends `DELETE /network-fault` to the agent in all targets where the fault was still active, ensuring the network is restored even if the experiment is aborted mid-flight.\n\n## Dry Run\n\nSet `dryRun: true` on the `Experiment` to preview target selection without executing any action. For `delete_pod`, no pods are deleted. For `network_fault`, no HTTP requests are sent to the agent. Results are recorded as `Success` in both cases.\n\n## Run locally (against Kind or Minikube)\n\n### Prerequisites\n\n- Go 1.26+\n- `kubebuilder` v4\n- `kubectl` pointing at a local cluster\n\n```bash\n# Install CRDs\nGOTOOLCHAIN=local make install\n\n# Run the controller locally (uses ~/.kube/config)\nGOTOOLCHAIN=local make run\n```\n\nThe controller reads `POD_NAMESPACE` to exclude its own pods from target selection. Set it when running locally:\n\n```bash\nPOD_NAMESPACE=omen-system GOTOOLCHAIN=local make run\n```\n\n## Development\n\n```bash\n# Regenerate CRDs and RBAC after editing types\nGOTOOLCHAIN=local make manifests generate\n\n# Build the binary\nGOTOOLCHAIN=local make build\n\n# Run tests (requires setup-envtest)\ngo install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest\nexport KUBEBUILDER_ASSETS=$(setup-envtest use --print path)\ngo test ./... -v\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk-krew%2Fomen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fk-krew%2Fomen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fk-krew%2Fomen/lists"}