https://github.com/fluential/ttr.rip
A simple health check monitoring service.
https://github.com/fluential/ttr.rip
Last synced: 9 months ago
JSON representation
A simple health check monitoring service.
- Host: GitHub
- URL: https://github.com/fluential/ttr.rip
- Owner: fluential
- Created: 2025-09-19T23:40:56.000Z (10 months ago)
- Default Branch: dev
- Last Pushed: 2025-09-20T02:00:21.000Z (9 months ago)
- Last Synced: 2025-10-10T04:26:25.982Z (9 months ago)
- Language: Python
- Size: 470 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Authentication
How Authentication Works (Overall Flow)
The application has two distinct authentication mechanisms:
1 API Authentication (JWT-based): This is the primary, secure method used by the frontend JavaScript to communicate with the backend API (/api/v1/*).
2 Web UI Authentication (Cookie-based): This is a simpler mechanism used only to control access to the HTML dashboard page itself.
The flow for the API is as follows:
1 When the dashboard page (/) loads, the JavaScript in dashboard.html immediately calls the loginAndGetToken() function.
2 This function sends a POST request to /api/v1/token with the hardcoded credentials username: 'admin' and password: 'password'.
3 The /api/v1/token endpoint (in app/api/v1/endpoints/login.py) verifies these credentials against the user in the database.
4 If the credentials are correct, it generates a JSON Web Token (JWT) and returns it to the browser.
5 The JavaScript stores this JWT in a variable (apiToken).
6 For all subsequent API requests (like fetching or creating checks), the JavaScript includes this token in the Authorization header, like so: Authorization: Bearer .
7 The API endpoints for checks (in app/api/v1/endpoints/checks.py) are protected and use a dependency to validate this token on every request.
How JWT is Used and Validated
JWT Creation:
• The creation happens in app/security.py within the create_access_token function.
• When a user logs in successfully via the /api/v1/token endpoint, this function is called.
• It creates a Python dictionary (the "payload") containing the user's username (as sub, a standard JWT claim for "subject") and an expiration timestamp (exp).
• It then uses the jose.jwt.encode() method to sign this payload. The signing process uses the SECRET_KEY and the ALGORITHM (HS256) defined in app/core/config.py.
• The result is the compact, signed JWT string that is sent back to the client.
JWT Validation:
• Validation happens in app/security.py inside the get_current_user function, which acts as a FastAPI dependency.
• Protected API endpoints, like read_checks in app/api/v1/endpoints/checks.py, include this function in their signature: current_user: db_models.User = Depends(security.get_current_user).
• FastAPI automatically extracts the token from the Authorization: Bearer ... header.
• The get_current_user function then uses jose.jwt.decode() to verify and decode the token. This process uses the same SECRET_KEY and ALGORITHM to check the token's signature and ensure it hasn't been tampered
with. It also automatically checks if the token has expired.
• If the token is valid, the function extracts the username from the payload, fetches the corresponding user from the database, and returns the user object.
• If the token is invalid, expired, or the signature doesn't match, a 401 Unauthorized HTTP exception is raised, and the request is denied.
Where the Validation Keys are Stored
The application uses a symmetric algorithm (HS256), which means it uses a single secret key for both signing and validating tokens, not a public/private key pair.
This secret key is managed in app/core/config.py:
```python
# app/core/config.py
class Settings(BaseSettings):
# ...
SECRET_KEY: str = "a_very_secret_key"
ALGORITHM: str = "HS256"
# ...
model_config = SettingsConfigDict(env_file=".env")
settings = Settings()
```
The value is loaded from environment variables. It has a default value of "a_very_secret_key" for development but is intended to be overridden in production by setting a SECRET_KEY environment variable or placing
it in a .env file, as shown in .env.example.
# ttr.rip — Simple, resilient health‑check monitoring
ttr.rip is a FastAPI-based uptime and job monitoring service with anonymous key-based access, a clean, themeable UI, and robust Redis/Celery-powered background processing. It’s easy to run on a single node and durable enough for production, featuring adaptive rate control, flapping suppression, and Prometheus metrics out of the box.
- Elegant web UI with multiple themes (Cyberpunk, Retro, Blueprint, Terminal, Solarized, Arcade)
- Anonymous “access key” login, optional “Login with Telegram”
- Public status pages with shareable badges
- Telegram, Slack, Discord, and generic Webhook notifications
- Adaptive rate control (AIMD + backoff) and flapping detection
- Prometheus metrics and operational summary APIs
- Docker-first deployment (Postgres, Redis, web, worker, beat, Caddy)
---
## Table of contents
- [Features](#features)
- [Screenshots](#screenshots)
- [Quickstart (Docker)](#quickstart-docker)
- [Configuration](#configuration)
- [Running locally (without Docker)](#running-locally-without-docker)
- [Concepts & Architecture](#concepts--architecture)
- [API overview](#api-overview)
- [Admin & Security model](#admin--security-model)
- [Observability & Metrics](#observability--metrics)
- [Background processing](#background-processing)
- [Cleanup & Lifecycle](#cleanup--lifecycle)
- [Import/Export](#importexport)
- [Development](#development)
- [Roadmap](#roadmap)
- [License](#license)
---
## Features
User & Auth
- Anonymous key-based access via cookie or X-Auth-Key header
- Optional “Login with Telegram”
- Per-user slug for clean ping URLs and public pages
- CSRF protection for forms and APIs (Double Submit Cookie pattern)
Checks
- Scheduling: interval, cron, or systemd OnCalendar
- Durable status in DB: up / down / new, with last_ping, last_start, last_duration_seconds
- Deadlines and grace windows computed/persisted in DB (reliable overdue detection)
- Optional content validation (present/absent or regex) on ping payloads
- Pause/resume with correct counters and metrics updates
- Cursor-based pagination, ETag’d list/aggregate responses
Integrations & Alerts
- Telegram, Slack, Discord, and generic Webhook
- Adaptive global rate control (AIMD + exponential backoff), cross-worker via Redis
- Flapping detection with suppression windows
- Test flows: immediate send or queue-based
Status Pages
- Public pages under /s/{user_slug}/{page_slug}
- Layouts: cards, grid, timeline
- Safe “recent activity” with country code/name and connection hints
- Badge endpoint: /p/{user_slug}/{check_id_or_slug}/badge.svg
UI & Theming
- Multiple themes, light/dark mode, persisted user preference
- Compact, accessible dashboard with inline actions and quick copy
- Real-time feel with periodic refresh, countdowns, and subtle glow indicators
Observability
- Prometheus metrics (/metrics, admin-only)
- Summary API (/api/v1/metrics/summary) with ETag caching
- Cross-worker latency aggregation in Redis
- Worker heartbeats and “workers online” gauge
Performance & Resilience
- Redis-backed runtime hints (e.g., last_content, recent pings)
- Buffered Redis HINCRBY with coalesced flushes
- Fail-open design on metrics and cache paths
Maintenance
- Periodic cleanup of long-inactive checks/users (configurable)
- Alembic migrations
- Import/export checks as JSON
---
## Screenshots
- Dashboard: user checks, status counters, pagination, quick actions
- Integrations: per-check settings and live rate snapshots
- Public status pages: cards/grid/timeline views
(See app/web/templates and app/static/css/themes for layouts and styles.)
---
## Quickstart (Docker)
Requirements:
- Docker and docker-compose
- A valid Fernet ENCRYPTION_KEY (32 url-safe base64-encoded bytes)
1) Prepare environment
- Copy the defaults and edit as needed:
cp .env.example .env
- Generate a Fernet key and set ENCRYPTION_KEY in .env:
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
2) Start the stack
- Bring up Postgres, Redis, web, worker, beat, and Caddy:
docker-compose up -d --build
3) Access
- Via Caddy (recommended): http://localhost:8080
- Direct FastAPI (dev): http://localhost:8000
- Optional: Cloudflare Tunnel
- Set CLOUDFLARED_TOKEN in .env to attach to a named Tunnel (hostnames managed under Cloudflare Zero Trust → Tunnels → Public Hostnames, e.g. status.example.com → http://caddy:8080).
- Leave CLOUDFLARED_TOKEN empty to start a temporary Quick Tunnel (random trycloudflare.com URL).
- cloudflared forwards to http://caddy:8080 by default; override with CLOUDFLARED_URL. Extra flags via CLOUDFLARED_OPTS.
- Compatibility: you can also set CLOUDFLARE_TUNNEL_TOKEN (alias used by community examples); compose prefers CLOUDFLARED_TOKEN, then CLOUDFLARE_TUNNEL_TOKEN.
4) Get a key
- Click “Get a New Key” to obtain an access key, then go to /dashboard.
- Your key is stored in an HttpOnly cookie; keep it safe.
Admin UI
- Create an admin user (see “Admin & Security model”).
- Visit http://localhost:8080/admin/login
Stop & logs
- Stop: docker-compose down
- Logs: docker-compose logs -f web (or worker/beat)
---
## Configuration
Settings live in app/core/config.py and are overridden by .env (see .env.example):
Core
- DATABASE_URL: DB DSN (Postgres recommended)
- REDIS_URL: Redis broker/result and cache
- SECRET_KEY: JWT signing secret
- ENCRYPTION_KEY: Fernet key (required) for encrypting user secrets
- DEBUG_MODE: If true, Celery tasks run eagerly (no broker required)
Scheduling & Worker
- SCHEDULER_INTERVAL_SECONDS
- AUTO_START_EMBEDDED_WORKER, WORKER_CONCURRENCY
- EMBED_BEAT (use worker -B or separate beat service)
Redis Counters Buffer
- INCR_BUFFER_ENABLED, INCR_BUFFER_FLUSH_INTERVAL_MS, INCR_BUFFER_MAX_OPS
Telegram
- TELEGRAM_AUTH_ENABLED, TELEGRAM_BOT_NAME, TELEGRAM_BOT_TOKEN
Cleanup
- CLEANUP_ENABLED, CLEANUP_INACTIVE_DAYS, CLEANUP_INTERVAL_HOURS
Security & CSP
- USER_SLUG_ENABLED
- XAUTH_ENFORCE_ORIGIN, XAUTH_ENFORCE_IP
- CSP_USER_DASHBOARD (Content-Security-Policy for user-facing pages)
GeoIP & Logging
- SAVE_CHECK_LAST_LOGS
- GEOIP_DATABASE_PATH (optional)
- LOG_LEVEL, UVICORN_LOG_LEVEL
Most features degrade gracefully if Redis is absent or DEBUG_MODE is enabled.
---
## Running locally (without Docker)
1) Install dependencies
- Python 3.11+
- Postgres and Redis running
- pip install -r requirements.txt
2) Configure environment
- cp .env.example .env
- Set ENCRYPTION_KEY (Fernet key)
3) Migrate DB
- alembic upgrade head
4) Run services
- API (dev): uvicorn app.main:app --reload
- Worker: celery -A app.worker.celery_app worker --loglevel=info -P solo
- Beat (if not embedded): celery -A app.worker.celery_app beat --loglevel=info
Open http://localhost:8000
---
## Concepts & Architecture
- FastAPI application (app/main.py) serving:
- Public pages (/, /dashboard, /check/{id}/integrations)
- Public status pages (/s/{user_slug}/{page_slug})
- REST APIs under /api/v1
- Admin SPA endpoints (/admin/*)
- Database (SQLAlchemy + Alembic): Users, Checks, Tags, StatusPages
- Redis:
- Celery broker/result backend
- Runtime cache (recent pings, last content)
- Global counters and latency aggregation
- Worker heartbeats
- Celery worker (app/worker.py):
- Notification tasks (Telegram/Slack/Discord/Webhook)
- Periodic overdue/long-running detection (Beat)
- Adaptive rate control (app/services/rate_control.py):
- AIMD refill, min drip, exponential backoff
- Per-identity state (e.g., sha256(token)[:10])
- Alerting (app/services/alerting.py):
- Flapping detection with suppression TTL
Data flow examples
- Ping endpoint (/p/{user_slug}/{check_identifier}):
- Optionally logs geo-hints and UA to Redis
- Validates content, updates DB state and deadlines
- Schedules notifications (with adaptive rate + retries)
- Metrics:
- /metrics for Prometheus
- /api/v1/metrics/summary for UI (ETag-cached JSON)
---
## API overview
Public (X-Auth-Key)
- GET /api/v1/checks
- Query params: size, sort_by, sort_direction, cursor, tag
- ETag’d responses, cursor pagination
- GET /api/v1/checks/aggregate
- One-call dashboard aggregate (checks + stats + metrics + tags)
- GET /api/v1/checks/stats
- POST /api/v1/checks
- PUT /api/v1/checks/{check_id}
- DELETE /api/v1/checks/{check_id}
- PUT /api/v1/checks/{id}/{integration} (telegram|slack|discord|webhook)
- POST /api/v1/checks/{id}/{integration}/test (immediate)
- POST /api/v1/checks/{id}/{integration}/test-queue (enqueue)
- GET /api/v1/checks/{id}/{integration}/rate (live rate snapshot)
- Status pages: /api/v1/status-pages (CRUD)
- Import/Export:
- GET /api/v1/checks/export
- POST /api/v1/checks/import
- Pings:
- /p/{user_slug}/{check_identifier} (GET/POST)
- /p/{user_slug}/{check_identifier}/start
- /p/{user_slug}/{check_identifier}/fail
- /p/{user_slug}/{check_identifier}/badge.svg
Admin (JWT)
- GET /api/v1/admin/stats
- GET /metrics (Prometheus, admin-only)
Public Pages
- GET /s/{user_slug}/{page_slug}
- GET /s/{user_slug}/{page_slug}/data
---
## Admin & Security model
- Admin users:
- Username/password -> short-lived access token (Bearer) + HttpOnly refresh cookie
- SPA flow with token refresh (CSRF-protected)
- Public users:
- X-Auth-Key via cookie/header
- Optional Telegram binding
- Blacklisting: keys can be temporarily blacklisted in Redis
- CSRF:
- Double Submit Cookie pattern for forms/APIs
- CSP:
- Strict CSP applied to user-facing pages (configurable)
---
## Observability & Metrics
- Prometheus endpoint: /metrics (admin-only)
- Metrics summary API: /api/v1/metrics/summary
- Totals (checks, users, notifications)
- Average latencies (API/DB/Redis/queue), queue depth
- Health colors for quick at-a-glance
- Redis-based cross-worker aggregation for accurate averages
- Worker heartbeats in Redis to compute “workers online”
---
## Background processing
- Celery worker tasks:
- Notifications with retries and RateLimitedError handling
- Overdue checks and long-running detection (Beat every few seconds)
- Heartbeats:
- metrics:workers_online:set + per-worker TTL keys
- Eager mode in DEBUG (no broker required)
---
## Cleanup & Lifecycle
- Periodic cleanup (app/tasks/cleanup.py) when enabled:
- Deletes long-inactive checks
- Deletes users without active checks and no Telegram linkage
- Best-effort Redis cleanup of related keys
- Manual run:
- python app/commands/cleanup_cmd.py
---
## Import/Export
- Export:
- GET /api/v1/checks/export → JSON list (includes integration flags/urls)
- Import:
- POST /api/v1/checks/import → accepts same format
- Secrets are re-encrypted using the current user’s auth key
---
## Development
- Stack:
- FastAPI, SQLAlchemy (async), Alembic
- Redis asyncio client with pooled connections
- Celery (Redis broker/result), Prometheus client
- Useful entry points:
- app/main.py (FastAPI app, routes mounting)
- app/api/v1/endpoints/* (REST endpoints)
- app/web/* (templates and routes)
- app/services/* (notifications, alerting, rate control, queue stats)
- app/worker.py (Celery config, periodic tasks)
- app/db/models.py (ORM models)
- Local dev:
- uvicorn app.main:app --reload
- celery -A app.worker.celery_app worker --loglevel=info -P solo
- celery -A app.worker.celery_app beat --loglevel=info
- Code style:
- Use your preferred formatters/linters (e.g., black/ruff/mypy)
---
## Roadmap
- More integrations (email/SMS gateways)
- Quotas/rate-plans and richer admin controls
- Secret backends (KMS/HSM adapters)
- Multi-region setups and sharding options
- Deeper analytics and dashboards
---
## License
Licensed under the terms of the LICENSE file in this repository.
---
## Developer quickstart: curl and API usage
This section shows how to interact with ttr.rip over HTTP using curl. You can use these patterns to build simple scripts or SDKs.
Environment setup
- BASE is the base URL for your deployment.
- AUTH_KEY is your anonymous access key (from the UI “Get a New Key” or your cookie).
- ADMIN_TOKEN is a short‑lived JWT for admin APIs.
```bash
# Public base URL (examples assume local dev)
BASE=http://localhost:8000
# Public auth: use your X‑Auth‑Key for public endpoints
# Replace with your actual key (32 url-safe chars); do not share it publicly.
AUTH_KEY="YOUR_PUBLIC_AUTH_KEY"
# Admin auth: exchange username/password for a JWT
ADMIN_TOKEN=$(curl -s -X POST -d "username=admin&password=password" "$BASE/api/v1/token" | jq -r '.access_token')
```
Notes
- Public APIs: send X-Auth-Key: header.
- Admin APIs: send Authorization: Bearer header.
- Time fields are ISO 8601 (UTC). Status values: up | down | new | paused.
### Checks API (public)
List checks (paginated):
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks?size=10&sort_by=id&sort_direction=desc" | jq .
```
Create a check (interval schedule):
```bash
curl -s -X POST "$BASE/api/v1/checks" \
-H "X-Auth-Key: $AUTH_KEY" \
-H "Content-Type: application/json" \
-d '{"name":"My Job","schedule_type":"interval","interval_seconds":60,"grace_seconds":30}' | jq .
```
Update a check:
```bash
curl -s -X PUT "$BASE/api/v1/checks/123" \
-H "X-Auth-Key: $AUTH_KEY" \
-H "Content-Type: application/json" \
-d '{"name":"My Job (renamed)","schedule_type":"interval","interval_seconds":120,"grace_seconds":30}' | jq .
```
Delete a check:
```bash
curl -s -X DELETE -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/123" -i
```
Export all checks:
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" -H "Accept: application/json" "$BASE/api/v1/checks/export" -o ttr_rip_checks_export.json
```
Import checks (from a file produced by export):
```bash
curl -s -X POST "$BASE/api/v1/checks/import" \
-H "X-Auth-Key: $AUTH_KEY" \
-F "file=@ttr_rip_checks_export.json" | jq .
```
Get last content captured for a check:
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/123/content" | jq .
```
Toggle pause:
```bash
curl -s -X POST -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/123/toggle-pause" | jq .
```
Check slug availability:
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/slug-check?slug=my-slug" | jq .
```
Tags for your checks:
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/tags" | jq .
```
User stats (counts, averages):
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/stats" | jq .
```
### Integrations per check (public)
Update Telegram settings:
```bash
curl -s -X PUT "$BASE/api/v1/checks/123/telegram" \
-H "X-Auth-Key: $AUTH_KEY" \
-H "Content-Type: application/json" \
-d '{"telegram_enabled":true,"telegram_chat_id":"123456789","telegram_bot_token":"1234:abcd"}' | jq .
```
Send test immediately / via queue:
```bash
curl -s -X POST -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/123/telegram/test" | jq .
curl -s -X POST -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/123/telegram/test-queue" | jq .
```
Live rate snapshot (AIMD/backoff):
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/checks/123/telegram/rate" | jq .
```
Slack/Discord/Webhook endpoints are analogous:
- PUT /api/v1/checks/{id}/slack
- PUT /api/v1/checks/{id}/discord
- PUT /api/v1/checks/{id}/webhook
- POST /api/v1/checks/{id}/{integration}/test
- POST /api/v1/checks/{id}/{integration}/test-queue
- GET /api/v1/checks/{id}/{integration}/rate
### Status pages (public)
List pages:
```bash
curl -s -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/status-pages" | jq .
```
Create/update/delete:
```bash
curl -s -X POST "$BASE/api/v1/status-pages" \
-H "X-Auth-Key: $AUTH_KEY" -H "Content-Type: application/json" \
-d '{"name":"Prod","slug":"prod","check_ids":[1,2,3]}' | jq .
curl -s -X PUT "$BASE/api/v1/status-pages/10" \
-H "X-Auth-Key: $AUTH_KEY" -H "Content-Type: application/json" \
-d '{"name":"Prod","slug":"prod","check_ids":[1,3]}' | jq .
curl -s -X DELETE -H "X-Auth-Key: $AUTH_KEY" "$BASE/api/v1/status-pages/10" -i
```
Public page and data feed:
```bash
# HTML
curl -s "$BASE/s/{user_slug}/{page_slug}" -i
# JSON feed (etagged, 2s buckets)
curl -s "$BASE/s/{user_slug}/{page_slug}/data" | jq .
```
### Pings and badges (public)
Send a ping to your check:
```bash
# GET-based ping
curl -s "$BASE/p/{user_slug}/{check_identifier}?ok=1"
# POST payload ping
curl -s -X POST "$BASE/p/{user_slug}/{check_identifier}" \
-H "Content-Type: text/plain" \
--data-binary 'hello from cron'
```
Badge:
```bash
curl -s "$BASE/p/{user_slug}/{check_identifier}/badge.svg" -o badge.svg
```
### Admin APIs
Exchange credentials for a JWT:
```bash
ADMIN_TOKEN=$(curl -s -X POST -d "username=admin&password=password" "$BASE/api/v1/token" | jq -r '.access_token')
```
System stats:
```bash
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" "$BASE/api/v1/admin/stats" | jq .
```
Prometheus metrics (admin-only):
```bash
curl -s -H "Authorization: Bearer $ADMIN_TOKEN" "$BASE/metrics"
```
Operational metrics summary (public read):
```bash
curl -s "$BASE/api/v1/metrics/summary" | jq .
```
### SDK tips
- Authentication
- Public: X-Auth-Key in header; cookie is used by the web UI but not required for APIs.
- Admin: Authorization: Bearer .
- IDs vs slugs
- Checks can be addressed by numeric ID in APIs, and by slug or UUID in ping URLs.
- Rate control
- Notification senders are throttled with AIMD/backoff; 429s are handled internally. Rate snapshots expose state you can surface to users.
- ETags and caching
- Many list endpoints provide weak ETags with short max-age to balance freshness and load.
- Error handling
- Validation errors return 400 with a detail message; missing resources return 404; unauthorized returns 401.