{"id":47614439,"url":"https://github.com/observantio/watchdog","last_synced_at":"2026-04-23T10:02:55.746Z","repository":{"id":342758348,"uuid":"1157201378","full_name":"observantio/watchdog","owner":"observantio","description":"Observantio's Watchdog is a unified control plane for infrastructure health, correlating metrics, logs, traces, AIOps, and alerts into a single interface that eliminates observability silos.","archived":false,"fork":false,"pushed_at":"2026-04-19T12:20:55.000Z","size":19099,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-19T14:32:38.880Z","etag":null,"topics":["alertmanager","devops","fastapi","grafana","lgtm-stack","loki","mimir","observability","oidc","opentelemetry","otel","self-hosted","sso","tempo"],"latest_commit_sha":null,"homepage":"https://observantio.github.io/pitch","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/observantio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE.md","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-13T14:50:56.000Z","updated_at":"2026-04-14T14:53:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/observantio/watchdog","commit_stats":null,"previous_names":["observantio/beobservant","observantio/watchdog"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/observantio/watchdog","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observantio%2Fwatchdog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observantio%2Fwatchdog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observantio%2Fwatchdog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observantio%2Fwatchdog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/observantio","download_url":"https://codeload.github.com/observantio/watchdog/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observantio%2Fwatchdog/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32175041,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-23T02:19:40.750Z","status":"ssl_error","status_checked_at":"2026-04-23T02:17:55.737Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alertmanager","devops","fastapi","grafana","lgtm-stack","loki","mimir","observability","oidc","opentelemetry","otel","self-hosted","sso","tempo"],"created_at":"2026-04-01T21:07:48.879Z","updated_at":"2026-04-23T10:02:55.739Z","avatar_url":"https://github.com/observantio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Observantio's Watchdog\n\n  \u003cimg src=\"assets/wolf.png\" alt=\"Watchdog settings preview\" width=\"150\" /\u003e\n\n  \u003cp\u003e\n    \u003ca href=\"https://github.com/observantio/resolver\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/RCA-Resolver-7c3aed?style=flat-square\" alt=\"Resolver\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/observantio/ojo\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Telemetry-Ojo-0f766e?style=flat-square\" alt=\"Ojo\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/observantio/notifier\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Alerting-Notifier-1f2937?style=flat-square\" alt=\"Notifier\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/observantio/watchdog/tree/main/gatekeeper\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Security-Gatekeeper-0ea5e9?style=flat-square\" alt=\"Gatekeeper\" /\u003e\n    \u003c/a\u003e\n  \u003c/p\u003e\n  \u003cp\u003e\n    \u003ca href=\"https://github.com/observantio/watchdog/actions/workflows/ci.yml\"\u003e\n      \u003cimg src=\"https://github.com/observantio/watchdog/actions/workflows/ci.yml/badge.svg?branch=main\" alt=\"Watchdog CI\" /\u003e\n    \u003c/a\u003e\n  \u003c/p\u003e\n  \u003cp\u003e\n    \u003ca href=\"https://github.com/observantio/watchdog/blob/main/DEPLOYMENT.md\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/🚀%20Deploy-Quick%20Setup-0ea5e9?style=flat-square\u0026logo=docker\u0026logoColor=white\" alt=\"Deploy\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/observantio/watchdog/blob/main/USER%20GUIDE.md\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/📘%20User%20Guide-Read%20Docs-16a34a?style=flat-square\u0026logo=readthedocs\u0026logoColor=white\" alt=\"User Guide\" /\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv\u003e\n  \u003cp\u003e \n    \u003cstrong\u003eSelf-hosted observability control plane for multi-tenant teams\u003c/strong\u003e\n  \u003c/p\u003e\n  \u003cp\u003e\n  Watchdog is built around Grafana, Loki, Tempo, Mimir, and Alertmanager, with application services that add tenancy, access control, alert workflows, and AI-assisted root cause analysis.\n  \u003c/p\u003e\n\u003c/div\u003e\n\n\n![Observantio Quick Demo](assets/watchdog.gif)\n\nIf you are new to the project, the simplest way to think about it is this: the Grafana stack is the storage and query layer, and Watchdog is the application layer that makes it practical for a team and enterprise to use that stack together.\n\nIn plain terms, this workspace gives you:\n\n- A secure entry point for telemetry ingestion.\n- A web UI for logs, traces, dashboards, alert rules, incidents, and RCA.\n- A control-plane API that sits in front of the Grafana stack.\n- An alerting service that stores channel, rule, silence, and incident state.\n- An RCA engine that correlates logs, metrics, and traces to rank possible causes.\n\nThis repository is best understood as one product made of several cooperating services.\n\n## What The System Is Trying To Achieve\n\nWatchdog aims to turn the raw LGTM stack into a usable multi-user application.\n\nThe base Grafana components already do storage and querying well:\n\n- Loki stores and queries logs.\n- Tempo stores and queries traces.\n- Mimir stores and evaluates metrics and alert rules.\n- Alertmanager handles alert routing and silences.\n- Grafana renders dashboards and data sources.\n\nWatchdog adds the pieces those components do not provide as a single opinionated product:\n\n- Authentication and session management.\n- User, group, permission, and API key management.\n- Tenant-aware OTLP token validation.\n- A single UI across observability, alerting, and RCA workflows.\n- Shared integrations such as Jira and notification channels.\n- Incident lifecycle tracking.\n- AI-assisted RCA and anomaly workflows.\n\n## What Lives In This Workspace\n\n| Component | Role |\n| --- | --- |\n| `watchdog` | Main FastAPI control plane. Handles auth, users, groups, API keys, Grafana proxy bootstrap, Loki/Tempo/Mimir-facing APIs, system metrics, and secure proxying to Notifier and Resolver. |\n| `gatekeeper` | OTLP token validation service for Envoy `ext_authz`. Validates `x-otlp-token`, applies allowlists and rate limits, and returns `X-Scope-OrgID` for downstream tenancy. |\n| `notifier` | Alerting workflow service. Stores and serves alert rules, channels, silences, incidents, and Jira integrations. Consumes Alertmanager webhooks and protects most endpoints with an internal service token. |\n| `resolver` | RCA and analysis engine. Reads logs, metrics, and traces from Loki, Mimir, and Tempo; runs anomaly detection and job-based RCA; stores RCA jobs and reports. |\n| `ui` | React/Vite frontend. Exposes dashboards, logs, traces, alerts, incidents, integrations, API keys, users/groups, audit views, and RCA pages. |\n| `docker-compose.yml` | Local reference deployment for the entire stack. |\n| `.env.example` | Environment contract for all services. |\n| `tests` | OTEL collector and sample telemetry generators used to feed demo traces and logs into the stack. |\n\n## Repo Links\n\n- **Watchdog (main control plane)**: https://github.com/observantio/watchdog\n- **Ojo (OpenTelemetry agent)**: https://github.com/observantio/ojo\n- **Notifier (alerting \u0026 incidents)**: https://github.com/observantio/notifier\n- **Resolver (RCA / AIops engine)**: https://github.com/observantio/resolver\n\n## High-Level Architecture\n\n![Observantio Architecture](assets/watchdog.png)\n\n### Service Responsibilities\n\n#### Watchdog | Main Proxy\n\nThis is the main application server.\n\nFrom the code, it does all of the following:\n\n- Boots the main database schema and auth service.\n- Exposes login, logout, registration, OIDC exchange, MFA, user, group, audit, and API key endpoints.\n- Stores and resolves the current user context, permissions, and API-key-backed scope.\n- Proxies observability operations to Loki, Tempo, Grafana, Alertmanager, and Resolver.\n- Exposes `/api/internal/otlp/validate` so Gatekeeper can validate OTLP tokens against Watchdog's auth model.\n- Provides `/health` and `/ready` checks and a `/api/system/metrics` endpoint for internal UI metrics.\n- Sets security headers, request-size limits, concurrency limits, and CORS.\n\n#### Gateway | Secure Gate Keeper\n\nThis service is the telemetry gatekeeper.\n\nIt is designed to sit behind Envoy's external authorization hook and does the following:\n\n- Reads `x-otlp-token` from inbound telemetry requests.\n- Applies optional IP allowlists.\n- Applies request rate limiting.\n- Caches token validation results in memory or Redis.\n- Calls the Watchdog internal validation API when a cache miss occurs.\n- Returns `X-Scope-OrgID` so Loki, Tempo, and Mimir receive the correct tenant scope.\n\nWithout this service, the system would still have storage backends, but not a protected multi-tenant OTLP ingestion path.\n\n#### Notifier | Notification and Rule Engine\n \nThis service owns alerting workflows beyond raw Alertmanager delivery.\n\nFrom the routers and services, it is responsible for:\n\n- CRUD for alert rules.\n- Importing rules from YAML, including a dry-run preview flow.\n- Syncing rule definitions to Mimir for the target organization.\n- CRUD for notification channels such as email, Slack, Teams, webhook, and PagerDuty.\n- CRUD for silences.\n- Maintaining incidents and enforcing incident lifecycle rules.\n- Recording assignment and status changes.\n- Sending assignment emails when configured.\n- Jira integration management and Jira ticket/comment synchronization.\n- Accepting inbound Alertmanager webhooks.\n\n#### Resolver | RCA and AIops Engine\n\nThis service is the RCA engine.\n\nIt does not replace Loki, Tempo, or Mimir. It reads from them, analyzes their data, and produces reports.\n\nIts responsibilities include:\n\n- Waiting for logs, metrics, and trace backends to become reachable.\n- Creating RCA jobs asynchronously.\n- Listing and retrieving jobs and saved reports.\n- Running anomaly analysis for metrics, logs, and traces.\n- Running signal correlation, topology, causal, forecast, and SLO analysis endpoints.\n- Storing RCA jobs and reports in its own database.\n- Enforcing internal service-to-service auth and tenant-aware permission context.\n\n#### React UI | Interface for Users\n\nThe frontend is not a demo shell. It is the main operator experience.\n\nThe route map shows these primary pages:\n\n- Dashboard: system summary cards and activity widgets.\n- Logs: Loki query builder, raw LogQL mode, labels, quick filters, log volume, and saved state.\n- Traces: Tempo query and exploration UI using Dependency maps.\n- Alert Manager: active alerts, alert rules, silences, hidden items, rule import, and rule testing.\n- Incidents: incident board with assignment, state changes, notes, Jira actions, and correlation labels.\n- Grafana: dashboards, folders, datasources, and a controlled hand-off into Grafana through the auth proxy.\n- RCA: job creation, queue view, saved report lookup, root-cause ranking, anomalies, topology, causal analysis, forecast/SLO views, and report deletion.\n- Integrations: notification channels and Jira integrations with visibility and sharing controls.\n- Users, Groups, API Keys, Audit/Compliance: access-management workflows.\n\n## Docker Compose Topology\n\nThe included `docker-compose.yml` brings up the full local stack:\n\n- `postgres` for application data.\n- `redis` for rate limiting, token cache, and shared ephemeral state.\n- `watchdog` as the main API.\n- `notifier` for alerts, incidents, and integrations.\n- `gateway-auth` for OTLP auth.\n- `resolver` for RCA.\n- `otlp-gateway` as Envoy on port `4320`.\n- `loki`, `tempo`, `mimir`, and `alertmanager` as the storage and routing backends.\n- `grafana` plus `grafana-proxy` on port `8080`.\n- `ui` on port `5173`.\n- `otel-agent` as a local telemetry generator harness which is under `otel` dir.\n\n### Important Runtime Endpoints\n\n| Endpoint | Service | Purpose |\n| --- | --- | --- |\n| `http://localhost:5173` | `ui` | Web UI |\n| `http://localhost:4319` | `watchdog` | Main API and docs |\n| `http://localhost:4320` | `otlp-gateway` | OTLP ingress through Envoy |\n| `http://localhost:4323` | `notifier` | Alerting service |\n| `http://localhost:8080` | `grafana-proxy` | Browser access to Grafana |\n\nInternal-only services in the default compose layout:\n\n- `gateway-auth` (`4321`) is reachable on the Docker network, not via host `localhost`.\n- `resolver` (`4322`) is reachable on the Docker network, not via host `localhost`.\n\n## Choose An Install Path\n\n- Local development and repo hacking: use [install.py](install.py). It creates a working `.env`, clones the companion repos if needed, and brings up the compose stack for a developer workstation.\n- Release bundle on a Linux host: use [download.sh](download.sh), which unpacks the release bundle and hands off to [release/install.sh](release/install.sh).\n- Kubernetes: use [charts/observantio/installer.sh](charts/observantio/installer.sh), which wraps the Helm chart and its profile-driven values files, or download the matching `observantio-${BUNDLE_VERSION}-helm-charts.tar.gz` release asset and run `installer.sh` from the extracted chart root.\n\nIf you only want to evaluate the code locally, the experimental compose installer is the fastest path. If you want the release tarball experience, stay on the release-bundle flow. If you want a cluster deployment, use the Helm chart path.\n\n## Kubernetes Helm Charts\n\nIf you want Kubernetes deployment instead of Docker Compose, use the chart under `charts/observantio`.\n\n- Chart path: `charts/observantio`\n- Release asset: `observantio-${BUNDLE_VERSION}-helm-charts.tar.gz`\n- Chart docs: [`charts/observantio/README.md`](charts/observantio/README.md)\n- Installer script: `charts/observantio/installer.sh`\n\nQuick start:\n\n```bash\nbash charts/observantio/installer.sh --profile production --foreground\n```\n\nUseful installer modes:\n\n- `--profile production` for full production defaults\n- `--profile compact` for smaller/constrained clusters\n- `--detach` for background port-forwards\n- `--no-port-forward` when you only want deployment\n- `--remove` to remove the release/namespace (smoke teardown)\n\nCustomization points:\n\n- Base values: `charts/observantio/values.yaml`\n- Production defaults: `charts/observantio/values-production.yaml`\n- Compact overrides: `charts/observantio/values-compact.yaml`\n- Image versions: `release/versions.json` and chart values/image tags\n\n## Environment File Overview\n\nThe root `.env.example` is the configuration contract for the whole stack.\n\nIt is large because it configures multiple services at once. Read it in these groups:\n\n- Core runtime: host, port, log level, database URLs.\n- Auth: JWT signing, bootstrap admin, OIDC, Keycloak, MFA, cookie security.\n- Ingestion security: OTLP tokens, gateway allowlists, rate limits, proxy trust settings.\n- Service-to-service auth: shared tokens and signing keys for Notifier and Resolver.\n- Alerting: channel types, webhook tokens, SMTP settings, Jira support.\n- Grafana runtime: admin password, auth proxy config, datasource provisioning.\n- Resolver analysis tuning: correlation window, thresholds, timeouts, quality gating.\n- Optional Vault and backup settings.\n\nTwo practical warnings for new users:\n\n1. A few example values are placeholders, not safe defaults. Replace every `replace_with_...` value.\n2. Some example lines show choices such as `AUTH_PROVIDER=local | oidc | keycloak`. You must replace those with one actual value, for example `AUTH_PROVIDER=local`.\n\n## Quick Start\n\n### Option A: Experimental Installer\n\nThe included installer is meant for evaluation and local testing. It is best to use the Experimental Installer if you want to develop the code, since it creates a working `.env` and starts all the required services cleanly for development.\n\nIf you want a shorter entrypoint, run `make quickstart` from the repository root.\n\nIt will:\n\n- Check for required commands.\n- Clone missing repos for `resolver` and `notifier`.\n- Create or update `.env`.\n- Generate secrets and a bootstrap admin account.\n- Start the compose stack.\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/observantio/watchdog/main/install.py -o install.py \u0026\u0026 python3 install.py\n```\n\n### Option B: Manual Setup\n\n```bash\ngit clone https://github.com/observantio/watchdog Observantio\ncd Observantio\ncp .env.example .env\n```\n\nBefore you run `docker compose up -d --build`, generate the host-aware observability config files:\n\n```bash\nbash scripts/run_optimal_config.sh\n```\n\nFor local developer tooling, the workspace root and the `resolver` and `notifier` service folders now each include a `pyproject.toml` with the canonical pytest, coverage, and mypy defaults for that scope.\nThe root `observantio` package is a meta package for tooling and extras; install with extras (for example `pip install -e \".[dev]\"` or `pip install -e \".[schemathesis]\"`) rather than expecting base runtime dependencies.\n\nThen edit `.env` and set, at minimum:\n\n- Strong Postgres password values.\n- `DEFAULT_ADMIN_USERNAME`\n- `DEFAULT_ADMIN_PASSWORD`\n- `DEFAULT_ADMIN_EMAIL`\n- `DATA_ENCRYPTION_KEY`\n- `DEFAULT_OTLP_TOKEN`\n- `GATEWAY_INTERNAL_SERVICE_TOKEN`\n- `NOTIFIER_SERVICE_TOKEN` and `NOTIFIER_EXPECTED_SERVICE_TOKEN`\n- `RESOLVER_SERVICE_TOKEN` and `RESOLVER_EXPECTED_SERVICE_TOKEN`\n- `NOTIFIER_CONTEXT_SIGNING_KEY` and `NOTIFIER_CONTEXT_VERIFY_KEY`\n- `RESOLVER_CONTEXT_SIGNING_KEY` and `RESOLVER_CONTEXT_VERIFY_KEY`\n\nStart the stack:\n\n```bash\ndocker compose up -d --build\n```\n\nCheck health:\n\n```bash\ndocker compose ps\ncurl http://localhost:4319/health\ncurl http://localhost:4319/ready\ncurl http://localhost:4323/health\n```\n\nFor internal services that are not published to host ports (`gateway-auth`, `resolver`), use `docker compose logs` or container-internal checks.\n\n## Developer Quality Gates\n\nGlobal quality scripts in `scripts/` support either all services or a single service argument (`resolver`, `gatekeeper`, `notifier`, `watchdog`).\n\nFor a quick workflow map, see [DEVELOPERS.md](DEVELOPERS.md).\n\nThe root [Makefile](Makefile) provides a small wrapper around the same workflow, including `make quickstart`, `make lint`, `make typecheck`, and `make test`.\n\nRun all services:\n\n```bash\nscripts/run_global_mypy.sh\nscripts/run_global_pylint.sh\nscripts/run_global_pytests.sh\n```\n\nRun one service only:\n\n```bash\nscripts/run_global_mypy.sh watchdog\nscripts/run_global_pylint.sh watchdog\nscripts/run_global_pytests.sh watchdog\n```\n\nUse `-h` on each script for the full usage contract and environment options.\n\n\n## First-Run User Journey\n\n1. Open `http://localhost:5173`.\n2. Sign in with the bootstrap admin configured in `.env`.\n3. Create one or more API keys. These keys are not only UI objects; they drive tenant-scoped access and OTLP token usage.\n4. Choose which API key should be the default scope in the UI. That choice affects what the frontend queries and where new rules are targeted.\n5. Use the API Keys page to copy the OTLP token or generate a starter OpenTelemetry Collector YAML file.\n6. Send telemetry to `http://localhost:4320` with the `x-otlp-token` header.\n7. Confirm data in Logs and Traces.\n8. Create or import alert rules, then connect channels and test them.\n9. Review incident creation and update flows.\n10. Run an RCA job after data exists.\n\n## Known-Good Starting Point For Telemetry\n\nThe included test harness sends example traces and logs through a local OpenTelemetry Collector. If you want to connect your own collector, the important idea is:\n\n- Logs go to `http://localhost:4320/loki`\n- Traces go to `http://localhost:4320/tempo`\n- Metrics go to `http://localhost:4320/mimir`\n- Every request must include `x-otlp-token`\n\nA collector pattern to start from looks like this:\n\n```yaml\nexporters:\n  otlphttp/logs:\n    endpoint: http://localhost:4320/loki\n    headers:\n      x-otlp-token: YOUR_OTLP_TOKEN\n\n  otlphttp/traces:\n    endpoint: http://localhost:4320/tempo\n    headers:\n      x-otlp-token: YOUR_OTLP_TOKEN\n\n  otlphttp/metrics:\n    endpoint: http://localhost:4320/mimir\n    headers:\n      x-otlp-token: YOUR_OTLP_TOKEN\n```\n\n## Alerting Philosophy In This Stack\n\nThe alerting flow is intentionally opinionated:\n\n- Rules are managed as application objects, not only as raw backend config.\n- Rules are synchronized to Mimir for evaluation.\n- Active alerts surface in the Watchdog UI.\n- Alertmanager webhook events feed Notifier.\n- Incidents become first-class objects with assignees, notes, and optional Jira linkage.\n\nIf you are new to the rule editor, start from a known-good template, then tune expressions and thresholds for your environment. That approach matches how the stack is built: validate the workflow first, then narrow noise and sensitivity.\n\n\n## What the UI Gives an Operator\n\n#### Dashboard\n\nThe Dashboard provides a high-level view of platform health, including active alerts, log volume, dashboard count, silence count, datasource count, and overall service status.\n\nIf OIDC is enabled, operators are asked to set a backup local password during setup. This supports a fallback to local authentication if the business later decides to change authentication methods.\n\nDashboard widgets are draggable, so users can reorder components to suit their workflow. The UI also supports easy switching between dark and light themes.\n\n#### Logs\n\nThe Logs view provides label discovery, builder-mode filtering, raw LogQL support, log volume visualisation, result browsing, and quick filters.\n\nFor most investigations, the quick filters are the fastest way to search text and review log volume over time, making it easier to identify bursts or unusual spikes in activity.\n\n#### Traces\n\nThe Traces view provides Tempo-backed trace exploration, direct trace lookup, and a graph view for comparing traces and understanding service relationships.\n\nOperators can filter traces, inspect trace data, and use the dependency map to identify pain points, bottlenecks, and issues in service-to-service data flow.\n\n#### Alert Manager\n\nAlert Manager provides:\n\n* Active alerts\n* Alert rules\n* Silences\n* YAML rule import with preview\n* Rule testing\n* Hidden and shared object handling\n\nAlerts and silences are fully scoped by tenant and channel configuration. Integrations such as Jira are also scoped appropriately. All related configuration is stored securely and encrypted in PostgreSQL.\n\n#### Incidents\n\nThe Incidents view provides a board-based operational workflow for managing incidents, including assignment, notes, status updates, and Jira integration.\n\nOperators can create notes, assign incidents to users, and link incidents to Jira so that comments and lifecycle changes remain synchronised across both systems.\n\n#### API Keys\n\nThe API Keys area provides tenant and product scoping, OTLP token management, key sharing with users and groups, token regeneration, and a downloadable starter OpenTelemetry Collector configuration.\n\nOperators can create a new API key, download a YAML configuration for that key, or use their own collector configuration with the provided token. Once the collector runs with `otelcol-contrib --config otel.yaml`, the platform accepts metrics, logs, and traces, and maps them to the correct organisation or tenant context for retrieval through Mimir, Tempo, Loki, and Resolver.\n\n#### Users and Groups\n\nThe Users and Groups section provides user creation, role and permission management, group-based permission inheritance, temporary password reset flows, and membership administration.\n\nOperators can rename users, manage passwords, update permissions and roles, create groups, and assign group permissions that members inherit. A user cannot create a group with permissions higher than their own. The same restriction applies to users with `manage:tenants` capabilities — they can only grant permissions up to their own level.\n\nAdmins can update the roles of existing members. Only an admin can deactivate another admin, and admins cannot delete other admins.\n\n#### Audit and Compliance\n\nThe Audit and Compliance section provides searchable audit history with filters, detailed inspection, and CSV export for administrative review.\n\nAudit records are not currently designed as immutable at the database level. However, there are no routes or services that allow audit logs to be edited or deleted.\n\n#### Grafana\n\nThe Grafana section provides controlled management of dashboards, folders, and datasources, along with secure access into the Grafana UI through the auth proxy.\n\nAll access is scoped according to the user’s permissions and visibility rights. Folder visibility acts as a container-level boundary for dashboards. If a folder is public, dashboard visibility still depends on the visibility settings of each individual dashboard.\n\n#### RCA\n\nThe RCA section provides job creation, queue monitoring, historical report lookup, ranked root causes, anomaly detection, topology views, causal analysis, and forecast/SLO views.\n\nThis area is functionally in place, but it still requires real production data for full validation and testing.\n\n## Important Security Model\n\nThere are three different security boundaries in this stack:\n\n1. User-to-application auth.\n   Watchdog handles login, sessions, permissions, API keys, and optional OIDC/Keycloak.\n\n2. Telemetry-ingest auth.\n   Gatekeeper validates `x-otlp-token` before Envoy forwards data to Loki, Tempo, or Mimir.\n\n3. Service-to-service auth.\n   Watchdog talks to Notifier and Resolver using dedicated service tokens and signed context JWTs.\n\n## Limits And Expectations\n\n- This workspace is well suited for local evaluation, demos, and homelab environments.\n- The installer is explicitly experimental.\n- The docs in this repository should be treated as the source of truth for this workspace, not older external deployment examples.\n- Empty environments will not produce useful RCA. Resolver needs enough logs, metrics, and traces to correlate signals.\n\n## Documentation\n\n- Detailed walkthrough: [User Guide](USER%20GUIDE.md)\n- Environment reference: [Example Environment File](.env.example)\n- Release deployment and hardening: [Deployment Guide](DEPLOYMENT.md)\n\n## License And Notices\n\nThis repository includes Apache 2.0 licensing and notice files in the root and service folders. Review them before redistribution or commercial use.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobservantio%2Fwatchdog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fobservantio%2Fwatchdog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobservantio%2Fwatchdog/lists"}