https://github.com/aurevlan/whatisup

Open-source uptime monitoring: multi-probe geographic correlation, browser scenario recorder, real-time dashboard, teams RBAC, SSO/OIDC, public status pages. One Docker command to deploy.
https://github.com/aurevlan/whatisup
alerting devops docker fastapi incident-management infrastructure-monitoring monitoring multi-probe open-source playwright python self-hosted slo ssl status-page uptime uptime-monitoring vuejs
Last synced: about 1 month ago
JSON representation
Open-source uptime monitoring: multi-probe geographic correlation, browser scenario recorder, real-time dashboard, teams RBAC, SSO/OIDC, public status pages. One Docker command to deploy.
Host: GitHub
URL: https://github.com/aurevlan/whatisup
Owner: AurevLan
Created: 2026-03-01T01:19:29.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-05-03T21:43:17.000Z (about 2 months ago)
Last Synced: 2026-05-03T22:27:54.459Z (about 2 months ago)
Topics: alerting, devops, docker, fastapi, incident-management, infrastructure-monitoring, monitoring, multi-probe, open-source, playwright, python, self-hosted, slo, ssl, status-page, uptime, uptime-monitoring, vuejs
Language: Python
Size: 2.2 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project

README

          
WhatIsUp




  The self-hosted uptime platform that actually tells you where things break — and stops shouting when it shouldn't.





  Multi-probe geographic correlation · real-time dashboard · SLO tracking · intelligent alerting · public status pages · mobile app.





  

   

  

  

  

  





  Quick start ·

  What's new in 1.5 ·

  Why WhatIsUp ·

  Features ·

  Architecture ·

  Changelog



---

## Why WhatIsUp

There's no shortage of uptime tools. WhatIsUp focuses on three things most of them don't do well at once:

- 🌍 **Real multi-probe correlation** — deploy lightweight probes in any datacenter, office, or region, and let WhatIsUp tell you if an outage is global, regional, or probe-local. One failed probe no longer means one false page.

- 🔕 **Alerting that shuts up** — flapping suppression, incident groups, dependency-aware cascade suppression, maintenance windows, storm protection, and a brand-new **impact preview** (see v1.1 below) so you calibrate thresholds with data instead of vibes.

- 🎛 **Self-hosted, batteries included** — one `docker compose up`, no SaaS lock-in, no per-monitor pricing. Playwright scenarios, SSO/OIDC, teams & RBAC, IaC import/export, and a mobile app all ship in the box.

It's built for teams who want Datadog-grade monitoring without Datadog-grade bills, and who'd rather own their data than rent it.

---

## What's new in 1.5

**Wave 1 Tier 1 complete** — eight ⭐ items shipped at once to close the SRE adoption + UX backlog. 100% backward-compatible, two additive migrations.

- ⌨️ **Command palette v2 + global shortcuts** (T1-10 / T1-15) — Cmd/Ctrl+K opens fuzzy search across monitors, incidents, recent items and actions. Hover a row to **pause/resume a monitor** or **acknowledge an incident** without leaving the palette. New keyboard shortcuts (`g d/m/i/a/p/s` for navigation, `c` to create, `/` to search, `?` for the cheatsheet) wired everywhere. Recent visits persist in `localStorage` (capped at 12).

- 🔕 **Programmable alert silences** (T1-01) — new `AlertSilence` resource and `Silences` page in the sidebar. Mute alerts for one monitor or all of them during a known-noisy window (cert renewal, deploy storm) without distorting uptime. Built-in 15 m / 1 h / 4 h / 1 d duration presets, status badges (Active / Scheduled / Past). Dispatch is short-circuited *before* any external IO.

- 📱 **Quick-ack & snooze from mobile push** (T1-04) — FCM payload now ships `actions: [ack, snooze_1h, snooze_4h]`. The mobile app calls the matching endpoint on tap, no need to open the UI. New `Incident.snooze_until` field with auto-clear on resolve. Bounded duration (5 min – 24 h).

- ✅ **Multi-select bulk actions enriched** (T1-12) — `MonitorsView` gets two new dropdowns: **Move to group** and **Add tag**. `IncidentsView` gains a per-row checkbox plus an **Acknowledge all** button — one round-trip via the new `POST /incidents/bulk-ack` endpoint.

- 🪄 **CreateMonitor wizard** (T1-14) — new 3-step flow (type → target → review + notifications) replaces the legacy modal for the four most common types (HTTP, TCP, DNS, heartbeat). Mobile-scrollable body. Falls back to the advanced form for scenario / composite / keyword / json_path.

- ✨ **Polish: skeletons, empty states, replayable tour** (T1-16 / T1-18) — generic `SkeletonBox` / `Row` / `Text` components replace `animate-pulse` placeholders on Dashboard, Monitors and MonitorDetail. Six empty states standardised with contextual CTAs and doc links. The onboarding wizard is now replayable from any empty state via `?tour=1`.

See the full [CHANGELOG](CHANGELOG.md#150---2026-04-25) for the per-item breakdown, including 14 new pytest cases and 50 new vitest cases (suite at 161 green).

---

## What's new in 1.4

- 🔗 **Shareable filter URLs** (T1-11) — MonitorsView and IncidentsView now persist their filters (search, status, type, group, days) via both the querystring *and* localStorage. Refresh keeps your view; copying the URL reproduces the exact same filtering for a teammate. A new generic `useFilterPreset` composable drives both views with 8 tests.

- 🌍 **User timezone preference** (T1-13) — New `User.timezone` field (IANA, nullable). Default `null` means the browser's resolved zone is used. Settings page gets a "Preferences" card with a 45-zone picker + auto option. Dates across IncidentsView and MonitorDetailView now format in your selected zone; hover a date to see the absolute ISO + zone tooltip. Backend validates against `zoneinfo.available_timezones()` — invalid zones are rejected with 422.

- 🆕 **`PATCH /auth/me`** — self-update endpoint limited to non-privileged fields (`full_name`, `timezone`). Escalation attempts are silently ignored.

See the full [CHANGELOG](CHANGELOG.md#140---2026-04-24) for details.

---

## What's new in 1.3

- 📖 **Per-monitor runbooks** — attach an incident response procedure (markdown) to any monitor. A dedicated `Runbook` tab appears on the detail page *only* when enabled, and the rendered content is shown inline on open incidents in the incidents list — so the on-caller sees the steps-to-take without leaving the page. Unchecking the toggle wipes the content server-side (no orphan data). Built-in safe markdown renderer (headings, task checkboxes, code blocks, http/https links only) — no extra dependency.

- ⚡ **Dashboard load-time — 8 s → < 200 ms** — the monitors list aggregated queries were scanning 1.6 M `check_results` rows on every request. Sparkline switched from a `row_number() OVER (PARTITION BY)` window function (7 s) to a `JOIN LATERAL ... LIMIT 20` (3 ms, index-only per monitor), plus a new BRIN index on `check_results.checked_at` for time-window aggregates (P95 288 ms → 75 ms, uptime bulk 147 ms → 30 ms).

- 🩹 **Stability fixes** — FastAPI `X-Forwarded-Proto` handling (no more HTTPS → HTTP redirect breakage behind nginx), `/monitors/graph` route order (was 422), sidebar menu click (vue-router 5 slot navigate rewrite), charts rendering (`apexchart` global registration), stricter CSP with external theme init, Cache-Control hardening on `/index.html`.

See the full [CHANGELOG](CHANGELOG.md#130---2026-04-23) for details.

---

## What's new in 1.1

  

    

    

  

  

    

      Alert matrix v2 — cards + impact preview


      The per-monitor alerting panel is now a stack of collapsible cards: one card per condition, coloured channel chips, an "Advanced" section that hides noise, and a live ≈ N / 30j badge that replays the last 30 days of data through your rules so you can calibrate thresholds before they page you.

    

    

      One-click alerting templates


      Apply a preset (Standard, Strict/Paging, Low noise) in a single click. Built-in templates ship seeded in the database and admins can create, edit and delete their own from a dedicated section in the Alerts page. Channels stay empty — you still decide where alerts fire.

    

  

  

    

  

  

    

      Monitor tags & tag-scoped RBAC


      Label monitors with env:prod, team:backend, tier:critical, whatever makes sense. Filter dashboards and lists by tag. Target alert rules at a tag (AlertRule.tag_selector) so one rule covers every monitor that carries it. Grant users view/edit/admin access scoped to a tag via UserTagPermission.

    

  

See the full [CHANGELOG](CHANGELOG.md#110---2026-04-14) for the complete list, including the removal of the never-implemented `uptime_below` condition.

---

## Screenshots

| Dashboard | Monitor detail |

|-----------|---------------|

| ![Dashboard](docs/screenshots/dashboard.svg) | ![Monitor detail](docs/screenshots/monitor-detail.svg) |

| Monitors list | Probe map |

|--------------|-----------|

| ![Monitors](docs/screenshots/monitors-view.svg) | ![Probes](docs/screenshots/probes-map.svg) |

| Public status page | Scenario builder |

|-------------------|-----------------|

| ![Status page](docs/screenshots/public-status.svg) | ![Scenario](docs/screenshots/scenario-builder.svg) |

| Alert matrix v2 | Alerting templates |

|-----------------|-------------------|

| ![Alert matrix](docs/screenshots/alert-matrix-cards.svg) | ![Templates](docs/screenshots/alert-templates.svg) |

| Tags & RBAC | Browser extension recorder |

|-------------|----------------------------|

| ![Tags](docs/screenshots/tags-rbac.svg) | ![Extension](docs/screenshots/extension-recorder.svg) |

---

## Features

### Monitoring

- **HTTP / HTTPS** — status codes, redirect following, response time, SSL certificate expiry

- **TCP** — port reachability (databases, SSH, SMTP, custom services)

- **UDP** — datagram probe; ICMP port-unreachable = down, timeout = filtered/open

- **DNS** — record resolution with optional value assertion (A, AAAA, CNAME, MX, TXT, NS); drift detection (baseline auto-learn); cross-probe consistency check with split-horizon support

- **Keyword** — response body scan with optional negate mode

- **JSON Path** — structured response validation (e.g. `$.status == "ok"`)

- **SMTP** — banner + EHLO handshake with optional STARTTLS; measures banner-to-ready time

- **Ping** — ICMP round-trip time via system `ping`

- **Domain expiry** — WHOIS lookup; configurable warning days before domain expiration

- **Browser scenarios** — multi-step Playwright automation (navigate, click, fill, assert, extract, screenshot) with Core Web Vitals (LCP, CLS, INP)

- **Composite monitors** — aggregate multiple monitors with `all_up`, `any_up`, `majority_up`, or `weighted_up` rules; drives the full incident pipeline

- **Heartbeat / cron monitoring** — dead-man's switch for scheduled jobs; unique ping URL per monitor

- **Advanced assertions** — regex body check, response header validation (exact or `/regex/`), JSON Schema validation

### Infrastructure

- **Multi-probe architecture** — deploy lightweight probe agents in any location; correlate outages geographically

- **Network type** — tag each probe as `external` (public internet) or `internal` (corporate LAN) to distinguish internal vs external failures

- **Probe map on dashboard** — Leaflet world map with per-probe 24h uptime (🟢 ≥ 99 % / 🟡 ≥ 90 % / 🔴 < 90 %) and online/offline status; auto-refreshes every 60 s

- **Network scope per monitor** — restrict each monitor to `all`, `internal`, or `external` probes; useful for LAN-only services

- **Probe groups** — admin-defined groups; assign probes and grant visibility to specific users

- **City / address geocoding** — type any address or city to auto-resolve GPS coordinates (Nominatim, no API key)

### Observability

- **Real-time dashboard** — WebSocket push, no polling

- **SLO / Error budget** — configurable target (%) and window (days); burn rate and budget-remaining tracking

- **SLA reports** — custom date range, uptime %, incident list, P95 response time; JSON download

- **Custom push metrics** — `POST /api/v1/metrics/{monitor_id}` for business KPIs (orders, latency…)

- **Annotations** — timestamped notes on the monitor timeline (deployments, changes)

- **Response time trend** — 6-hour rolling comparison with colour-coded indicator

### Incidents & alerting

- **Alert matrix v2 (1.1)** — card-based editor: one card per condition, coloured channel chips, repliable "Advanced" params (threshold, min-duration, re-notify, business-hours schedule), multi-select condition picker, and per-condition "How it works" help in plain language

- **Impact preview (1.1)** — live `≈ N / 30j` badge on each rule, computed server-side by replaying the proposed configuration against the last 30 days of check results and incidents (statistical tail estimate for anomaly detection)

- **Alerting templates (1.1)** — apply a preset (Standard, Strict/Paging, Low noise) in one click; templates are stored in DB and managed from a dedicated section in the Alerts page; superadmins create/edit their own, built-in templates are read-only

- **Automatic incident lifecycle** — open on failure, resolve on recovery, flapping detection with per-monitor thresholds

- **Incident groups** — monitors sharing the same failing probes within a 90 s window are grouped into one persistent incident group; one notification instead of N

- **Monitor dependencies** — when a parent monitor is down, child incidents are automatically suppressed; eliminates cascade alert storms

- **Alert storm protection** — per-rule rate cap (`storm_max_alerts` within `storm_window_seconds`); forced digest when threshold is exceeded

- **Performance baseline alerting** — alert when response time exceeds a configurable multiple of the 7-day rolling hourly baseline

- **Anomaly detection** — z-score against a 7-day rolling mean ± stddev, filtered to the same ±3 h window of the day so day/night traffic patterns are respected

- **Tag-scoped alert rules (1.1)** — target a single rule at every monitor carrying a given tag via `AlertRule.tag_selector`

- **Auto post-mortem** — Markdown report generated on incident resolution (timeline, alerts, metrics)

- **Alert channels** — Email (SMTP), Webhook (HMAC-SHA256), Telegram Bot, Slack, PagerDuty, Opsgenie, [Signal](#signal-alerts), FCM (native mobile push)

- **Persistent digest** — digest scheduling stored in Redis; survives server restarts

- **Maintenance windows** — suppress alerts during planned downtime; group-level suppression support

### Public status pages

- **Shareable URL** — `/status/{slug}`, no login required

- **90-day history bars** — daily uptime visualisation per component

- **Incident timeline** — 30-day incident log with duration

- **Email subscriptions** — visitors subscribe to outage updates; secure unsubscribe token

### Platform

- **Monitor tags & tag-scoped RBAC (1.1)** — label monitors with free-form `key:value` tags (`env:prod`, `team:backend`, `tier:critical`); filter lists and dashboards by tag; grant users `view`/`edit`/`admin` access scoped to a tag via `UserTagPermission`; one alert rule can target every monitor carrying a given tag

- **Teams & RBAC** — create teams, invite members with 4 roles (`owner` > `admin` > `editor` > `viewer`); monitors, groups, channels, and maintenance windows can be team-scoped; backward-compatible — single-user mode preserved when no teams are created

- **SSO / OIDC** — OpenID Connect PKCE flow; link user accounts to any OIDC provider (Keycloak, Authentik, Auth0, Google…); optional auto-provisioning of new accounts on first login; configured entirely from the admin GUI (no restart required)

- **Admin panel** — dedicated UI for user management (`is_active`, `can_create_monitors`), probe group access control, all-monitors view, and live OIDC settings

- **Probe groups** — admin-defined groups linking probes to users; regular users see only the probes assigned to their groups

- **Network scope** — per-monitor `network_scope` field (`all` / `internal` / `external`); restricts which probe types run each check (e.g. internal-only services stay on LAN probes)

- **Multi-language** — English (default) and French; toggle in the top bar; persisted to `localStorage`

- **Light / dark theme** — toggle in top bar; auto-detected from `prefers-color-scheme`; persisted to `localStorage`

- **Onboarding wizard** — guided 4-step setup for new users (first monitor, first alert); auto-dismissed after completion

- **Infrastructure-as-Code** — `GET /api/v1/config` exports full config as JSON; `PUT /api/v1/config` imports declaratively with diff, dry-run, and prune support; resources matched by name for idempotence

- **Plugin architecture** — check types and alert channels use a registry-based plugin system; extend without modifying core code

- **Bulk actions** — multi-select monitors; bulk enable / pause / delete / export CSV

- **Audit trail** — every admin action logged with before/after diff

- **Data retention** — configurable auto-purge of old check results (default: 90 days)

- **One-command deploy** — interactive wizard generates secrets, `.env`, and starts the stack

- **Accessibility** — `prefers-reduced-motion` support, skip-to-content link, ARIA labels on interactive elements

### Browser extension — scenario recorder

The WhatIsUp Chrome extension records browser actions and sends them directly to a monitor:

1. Click **Start recording** in the extension popup

2. Navigate and interact with any website — clicks, form fills (including passwords), and navigations are captured automatically

3. Click **Stop** then **Send to WhatIsUp** — the scenario is created as a monitor in one click

**Security**: password values are stored as `{{password_N}}` placeholders in the step list; the real values are kept in a separate encrypted store, encrypted at rest with Fernet, and masked in all API responses. They are decrypted only when delivered to the probe at check time.

Install the extension from `extension/` by loading it as an unpacked extension in Chrome (`chrome://extensions → Load unpacked`).

---

## Minimum requirements

### Central server (API + frontend + PostgreSQL + Redis)

| Probes | Monitors | CPU | RAM | Disk | PostgreSQL | Redis |

|--------|----------|-----|-----|------|------------|-------|

| 1–3 | ≤ 50 | 2 vCPU | 2 GB | 20 GB SSD | shared (in-stack) | shared (in-stack) |

| 3–10 | 50–200 | 4 vCPU | 4 GB | 40 GB SSD | shared or dedicated | shared |

| 10–30 | 200–1 000 | 4–8 vCPU | 8 GB | 80 GB SSD | dedicated (4 GB RAM) | dedicated (1 GB) |

| 30+ | 1 000+ | 8+ vCPU | 16 GB | 160 GB+ SSD | dedicated (8 GB+ RAM) | dedicated (2 GB+) |

**Disk growth** — each check result row is ~300 bytes. With 200 monitors × 60 s interval × 5 probes, expect ~2.5 GB/month in PostgreSQL before retention purge (default: 90 days).

### Probe agent

| Mode | CPU | RAM | Notes |

|------|-----|-----|-------|

| HTTP / TCP / DNS / Ping only | 1 vCPU | 256 MB | Lightweight; runs on any VPS or Raspberry Pi |

| With Playwright scenarios | 2 vCPU | 1 GB | Chromium loaded on demand; set `MAX_CONCURRENT_SCENARIOS=2` |

| High-volume (100+ monitors) | 2 vCPU | 1–2 GB | Increase `MAX_CONCURRENT_CHECKS` (default: 10) |

### Network

| Component | Ports | Protocol |

|-----------|-------|----------|

| Central server (prod) | 80, 443 | HTTP/S (Nginx reverse proxy) |

| Central server (dev) | 5173 (frontend), 8000 (API) | HTTP |

| PostgreSQL | 5432 | TCP (internal only) |

| Redis | 6379 | TCP (internal only) |

| Probe → Server | 443 (or 8000 dev) | HTTPS outbound only |

### Software

- Docker ≥ 24 and Docker Compose v2

- Linux amd64 or arm64 (all images are multi-arch)

---

## Quick start

### Requirements

- Docker ≥ 24 and Docker Compose v2

- 2 GB RAM minimum (see [Minimum requirements](#minimum-requirements) for sizing)

- Ports 80 / 443 available (production) or 5173 / 8000 (development)

### Development (local)

```bash

git clone https://github.com/AurevLan/WhatIsUp.git

cd whatisup

# Start all services (PostgreSQL, Redis, API, frontend, local probe)

docker compose up -d

# Wait for all services to become healthy

docker compose ps

```

| Service | URL |

|---------|-----|

| Frontend (Vite dev server) | http://localhost:5173 |

| API (FastAPI) | http://localhost:8000 |

| API docs (Swagger UI) | http://localhost:8000/docs |

On first start an **admin account** and a **local probe** are created automatically. The admin password is written to `/shared/ADMIN_PASSWORD` inside the server container:

```bash

docker compose exec server cat /shared/ADMIN_PASSWORD

# Delete the file after reading

docker compose exec server rm /shared/ADMIN_PASSWORD

```

### Production deploy

> **Recommended** — use the interactive wizard for all deployments:

```bash

bash deploy.sh

```

The wizard generates secrets, writes `.env`, starts the stack, and **displays the admin password on screen** before securely deleting the temp file. See [`deploy.sh`](#deploying-with-deploysh) below for details.

#### Manual production setup

```bash

# 1. Copy and edit the environment file

cp .env.example .env

# 2. Generate required secrets

SECRET_KEY=$(openssl rand -hex 32)

FERNET_KEY=$(python3 -c \

  "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")

# Add to .env

echo "SECRET_KEY=$SECRET_KEY" >> .env

echo "FERNET_KEY=$FERNET_KEY" >> .env

# 3. Start the production stack

docker compose -f docker-compose.prod.yml up -d

# 4. Apply database migrations

docker compose -f docker-compose.prod.yml exec server alembic upgrade head

```

#### Environment variables

| Variable | Required | Default | Description |

|----------|----------|---------|-------------|

| `SECRET_KEY` | ✅ prod | — | JWT signing key (`openssl rand -hex 32`) |

| `FERNET_KEY` | ✅ prod | — | Fernet key for encrypting alert secrets at rest |

| `DATABASE_URL` | ✅ | `postgresql+asyncpg://whatisup:whatisup@localhost/whatisup` | PostgreSQL connection string |

| `REDIS_URL` | — | `redis://localhost:6379/0` | Redis connection string |

| `CORS_ALLOWED_ORIGINS` | ✅ prod | `http://localhost:5173` | Comma-separated HTTPS origins |

| `ENVIRONMENT` | — | `production` | Set to `development` to relax security checks |

| `REGISTRATION_OPEN` | — | `true` | `false` = invite-only after first user |

| `DATA_RETENTION_DAYS` | — | `90` | Days to keep check results (0 = keep forever) |

| `SMTP_HOST` | — | `localhost` | SMTP server for email alerts |

| `SMTP_PORT` | — | `587` | SMTP port |

| `SMTP_USER` | — | — | SMTP username |

| `SMTP_PASSWORD` | — | — | SMTP password |

| `SMTP_FROM` | — | `noreply@example.com` | Sender address |

| `OIDC_ENABLED` | — | `false` | Enable OIDC login (can also be set from admin GUI) |

| `OIDC_ISSUER_URL` | — | — | OIDC provider discovery URL (e.g. `https://accounts.google.com`) |

| `OIDC_CLIENT_ID` | — | — | Client ID registered with the OIDC provider |

| `OIDC_CLIENT_SECRET` | — | — | Client secret (stored encrypted in DB when set from admin GUI) |

| `OIDC_REDIRECT_URI` | — | — | Callback URL (leave empty to auto-detect from request base URL) |

| `OIDC_SCOPES` | — | `openid email profile` | Space-separated OIDC scopes |

| `OIDC_AUTO_PROVISION` | — | `true` | Create user accounts on first OIDC login |

---

## Deploying probe agents

Probes are lightweight Python processes that run checks from a given location and report results to the central server. Deploy as many as you need in different datacenters, offices, or cloud regions.

### 1. Register the probe

Go to **Probes → Register probe** in the UI:

1. Enter a **name** (e.g. `paris-dc1`) and **location** (any address, city, or landmark)

2. Click **Locate** — Nominatim resolves the location to GPS coordinates automatically

3. Choose **Network type**: `External` (public internet) or `Internal` (corporate LAN)

4. Save — copy the API key displayed **only once**

### 2. Run the probe

```bash

docker run -d \

  --name whatisup-probe \

  --restart unless-stopped \

  -e CENTRAL_URL=https://your-whatisup.example.com \

  -e PROBE_API_KEY=wiu_your_api_key_here \

  -e PROBE_LOCATION="Paris DC1" \

  ghcr.io/your-org/whatisup-probe:latest

```

Or with Docker Compose:

```yaml

# docker-compose.probe.yml

services:

  probe:

    image: ghcr.io/your-org/whatisup-probe:latest

    restart: unless-stopped

    environment:

      CENTRAL_URL: https://your-whatisup.example.com

      PROBE_API_KEY: wiu_your_api_key_here

      PROBE_LOCATION: "Paris DC1"

      MAX_CONCURRENT_CHECKS: "10"

      HEARTBEAT_INTERVAL: "15"

```

### Probe environment variables

| Variable | Required | Default | Description |

|----------|----------|---------|-------------|

| `CENTRAL_URL` | ✅ | — | WhatIsUp server base URL |

| `PROBE_API_KEY` | ✅ | — | API key from probe registration |

| `PROBE_LOCATION` | — | `unknown` | Display name in the UI |

| `MAX_CONCURRENT_CHECKS` | — | `10` | Max parallel checks |

| `MAX_CONCURRENT_SCENARIOS` | — | `2` | Max concurrent Playwright/Chromium instances (subset of `MAX_CONCURRENT_CHECKS`; reduce on low-memory machines) |

| `HEARTBEAT_INTERVAL` | — | `15` | Seconds between server heartbeats |

---

## Signal alerts

WhatIsUp sends Signal messages through a small REST gateway that runs alongside the server — it does not talk to Signal directly. The gateway project is [**bbernhard/signal-cli-rest-api**](https://github.com/bbernhard/signal-cli-rest-api), a maintained wrapper around the official `signal-cli`.

### 1. Run the gateway

Add a service to your `docker-compose.yml`:

```yaml

signal-api:

  image: bbernhard/signal-cli-rest-api:latest

  restart: unless-stopped

  environment:

    - MODE=normal

  volumes:

    - ./signal-data:/home/.local/share/signal-cli

  ports:

    - "8080:8080"

```

### 2. Register a phone number

Follow the [gateway's README](https://github.com/bbernhard/signal-cli-rest-api#register-a-number). Typical flow:

```bash

# Request the SMS code

curl -X POST "http://localhost:8080/v1/register/+33612345678"

# Enter the code you received

curl -X POST "http://localhost:8080/v1/register/+33612345678/verify/123456"

```

### 3. Add a Signal channel in WhatIsUp

In the UI: **Alerts → Add channel → Signal**, then fill:

| Field | Example |

|---|---|

| **API URL** | `http://signal-api:8080` (internal hostname if the gateway is in the same Compose network) |

| **Sender number** | `+33612345678` (E.164 format, the number you registered above) |

| **Recipients** | `+33612345678, +33698765432` (comma-separated; Signal group IDs are also accepted as recipients) |

Click **Test** to send a confirmation message. The channel configuration (`api_url`, `sender_number`, `recipients`) is encrypted at rest with Fernet like every other alert channel.

Implementation: [`server/whatisup/services/channels/signal.py`](server/whatisup/services/channels/signal.py).

---

## Heartbeat monitoring (cron jobs)

Create a monitor of type **Heartbeat**, copy the generated ping URL, then call it from your job:

```bash

# In your crontab or CI pipeline

curl -s https://your-whatisup.example.com/api/v1/ping/your-heartbeat-slug

```

WhatIsUp opens an incident automatically if no ping arrives within `interval + grace` seconds.

---

## Custom push metrics

Push any numeric metric from your application and visualise it alongside uptime data:

```bash

curl -X POST https://your-whatisup.example.com/api/v1/metrics/{monitor_id} \

  -H "Authorization: Bearer $TOKEN" \

  -H "Content-Type: application/json" \

  -d '{"metric_name": "orders_per_minute", "value": 42.5, "unit": "req/min"}'

```

Metrics appear as time-series graphs grouped by `metric_name` in the monitor detail view.

---

## API reference

Full interactive documentation at `/docs` (Swagger UI) and `/redoc`.

### Authentication

```bash

TOKEN=$(curl -s -X POST https://your-whatisup.example.com/api/v1/auth/login \

  -H "Content-Type: application/x-www-form-urlencoded" \

  -d "username=admin@example.com&password=your_password" \

  | jq -r '.access_token')

curl https://your-whatisup.example.com/api/v1/monitors/ \

  -H "Authorization: Bearer $TOKEN"

```

### Selected endpoints

| Method | Endpoint | Description |

|--------|----------|-------------|

| `GET` | `/api/v1/monitors/` | List monitors |

| `POST` | `/api/v1/monitors/` | Create monitor |

| `POST` | `/api/v1/monitors/bulk` | Bulk enable / pause / delete |

| `POST` | `/api/v1/monitors/{id}/trigger-check` | Trigger immediate check |

| `GET` | `/api/v1/monitors/{id}/slo` | SLO / error budget status |

| `GET` | `/api/v1/monitors/{id}/report` | SLA report (custom date range) |

| `GET` | `/api/v1/monitors/{id}/incidents/{inc}/postmortem` | Auto post-mortem (Markdown) |

| `GET` | `/api/v1/monitors/{id}/annotations` | List timeline annotations |

| `POST` | `/api/v1/metrics/{monitor_id}` | Push custom metric |

| `GET` | `/api/v1/metrics/{monitor_id}` | List custom metrics |

| `GET` | `/api/v1/public/pages/{slug}/monitors` | Public status page data (no auth) |

| `POST` | `/api/v1/public/pages/{slug}/subscribe` | Subscribe to status page |

| `GET` | `/api/v1/ping/{slug}` | Heartbeat ping |

| `GET` | `/api/v1/config/` | Export full config (IaC) |

| `PUT` | `/api/v1/config/` | Import declarative config (IaC) |

| `POST` | `/api/v1/teams/` | Create team |

| `GET` | `/api/v1/teams/` | List user's teams |

| `POST` | `/api/v1/teams/{id}/members` | Add team member |

| `GET` | `/api/v1/onboarding/status` | Onboarding progress |

| `POST` | `/api/v1/onboarding/complete` | Mark onboarding done |

| `GET` | `/api/v1/status/monitors` | External status API |

---

## Architecture

```

┌─────────────────────────────────────────────────────────┐

│                        Browser                           │

│   Vue 3 · Pinia · Vite · Tailwind · ApexCharts · Leaflet│

│   vue-i18n (EN / FR)                                    │

└───────────────────────┬─────────────────────────────────┘

                        │ HTTP + WebSocket

┌───────────────────────▼─────────────────────────────────┐

│                    FastAPI server                         │

│  auth · monitors · probes · alerts · metrics · ws        │

│  slowapi · structlog · Alembic · Prometheus metrics      │

└─────┬──────────────────┬──────────────────┬─────────────┘

      │                  │                  │

┌─────▼──────┐  ┌────────▼──────┐  ┌───────▼───────────┐

│ PostgreSQL │  │     Redis     │  │   Probe agent(s)  │

│  (main DB) │  │ cache · pub/  │  │  APScheduler      │

│            │  │ sub · rate    │  │  Playwright        │

└────────────┘  └───────────────┘  └───────────────────┘

```

| Layer | Location |

|-------|----------|

| API endpoints | `server/whatisup/api/v1/` |

| ORM models | `server/whatisup/models/` |

| Pydantic schemas | `server/whatisup/schemas/` |

| Business logic | `server/whatisup/services/` |

| Core (config, security, db) | `server/whatisup/core/` |

| Probe agent | `probe/whatisup_probe/` |

| Frontend | `frontend/src/` |

---

## Development

### Tests & linting

```bash

# Backend (server + probe)

cd server && pip install -e ".[dev]" && pytest

cd probe && pip install -e ".[dev]" && pytest

ruff check . && ruff format .

pip-audit

# Frontend (Vitest + jsdom)

cd frontend

npm install

npm test

npm run lint

npm audit

```

Tests also run inside Docker:

```bash

docker compose run --rm --no-deps server pytest tests/

docker compose run --rm --no-deps probe pytest tests/

docker run --rm -v ./frontend:/app -w /app node:25-alpine npx vitest run

```

### Database migrations

```bash

cd server

# Generate after model changes

alembic revision --autogenerate -m "short description"

# Apply

alembic upgrade head

# Rollback one step

alembic downgrade -1

```

---

## Deploying with `deploy.sh`

The root `deploy.sh` script is an interactive wizard (in French) that handles the entire production setup. Run it with:

```bash

bash deploy.sh

```

### Deployment modes

| Mode | Description |

|------|-------------|

| **1 — Serveur + sonde centrale** | Full platform with a local probe (recommended for single-server setups) |

| **2 — Serveur seul** | Server only; add remote probes later |

| **3 — Sonde distante** | Standalone probe agent that auto-enrolls to an existing server via API |

### What the wizard does

1. **Checks dependencies** — Docker, Docker Compose, `curl`, `openssl`

2. **Generates secrets** — `SECRET_KEY` (hex), `FERNET_KEY` (Fernet), PostgreSQL and Redis passwords

3. **Prompts for configuration** — domain name, SMTP settings, DNS servers (for probe modes), Let's Encrypt email

4. **Generates `.env` files** — `.env` for the server stack, `.env.probe` for remote probe mode; file permissions set to `600`

5. **Self-signed certificate** — generates a temporary TLS cert if Let's Encrypt is not configured

6. **Probe auto-enrollment** (mode 3) — registers the probe via `POST /api/v1/probes/register` and writes the API key to `.env.probe`

7. **Starts the stack** — builds and launches Docker Compose services

8. **Displays credentials** — reads the admin password from a temp file, displays it in a framed box, then deletes the file from the container (first boot only)

> **Tip**: for Let's Encrypt, ensure port 80 is reachable from the internet and set your DNS A record before running the wizard.

---

## Security

- **JWT** — HS256, access 15 min + refresh 7 days, Redis-revocable

- **OIDC / SSO** — PKCE authorization-code flow; `oidc_client_secret` encrypted at rest with Fernet; secret never returned by the API

- **Probe auth** — `X-Probe-Api-Key` bcrypt 12 rounds + Redis cache 300 s

- **WebSocket auth** — JSON message frame (`{"type":"auth","token":"…"}`), never URL parameter

- **Secrets at rest** — Fernet encryption for alert channel secrets (SMTP passwords, Telegram tokens, webhook secrets, PagerDuty / Opsgenie keys), OIDC client secret, **and** scenario variables (`secret: true`); `FERNET_KEY` is required in production (server refuses to start without it)

- **SSRF protection** — all outbound HTTP requests (webhooks, OIDC discovery, probe checks, scenario navigation) validated against private/loopback/link-local IP ranges; redirect targets re-validated after following

- **CORS** — explicit origins only; HTTP origins rejected in production

- **CSP** — `default-src 'self'; script-src 'self'`

- **Rate limiting** — all mutating endpoints rate-limited (30/min PATCH/DELETE, 60/min public pages); login 10/min, register 5/min, heartbeat 30/min, results 60/min, monitor creation 10/min

- **Input validation** — Pydantic schemas use `extra="forbid"` to reject unexpected fields on all create/update endpoints

- **WebSocket** — per-IP connection limit enforced before the auth handshake; public slug validated against DB before accepting

- **Ownership enforcement** — all mutating endpoints (including alert rule delete) verify resource ownership via JOIN; superadmin bypass is explicit

- **Docker** — non-root user in all images; CPU/memory resource limits in production

See [SECURITY.md](SECURITY.md) for the responsible disclosure policy.

---

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for the full version history.

## License

MIT — see [LICENSE](LICENSE).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aurevlan/whatisup

Awesome Lists containing this project

README

WhatIsUp