{"id":48129140,"url":"https://github.com/krystall9/docker-surgeon","last_synced_at":"2026-07-02T02:01:58.295Z","repository":{"id":320220286,"uuid":"1081261352","full_name":"kRYstall9/docker-surgeon","owner":"kRYstall9","description":"Monitor and restart unhealthy, killed, or stopped Docker containers according to a user-defined restart policy, including any dependent containers.","archived":false,"fork":false,"pushed_at":"2026-06-30T00:09:09.000Z","size":1447,"stargazers_count":59,"open_issues_count":1,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-30T02:10:36.454Z","etag":null,"topics":["docker","docker-compose","docker-container","self-hosted","selfhosted"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kRYstall9.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"kRYstall9","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":"kRYstall9","thanks_dev":null,"custom":null}},"created_at":"2025-10-22T14:30:50.000Z","updated_at":"2026-06-13T16:24:06.000Z","dependencies_parsed_at":"2025-10-22T17:17:34.864Z","dependency_job_id":"7a63bf21-7797-447b-9e88-4e204ab90947","html_url":"https://github.com/kRYstall9/docker-surgeon","commit_stats":null,"previous_names":["krystall9/docker-surgeon"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/kRYstall9/docker-surgeon","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kRYstall9%2Fdocker-surgeon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kRYstall9%2Fdocker-surgeon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kRYstall9%2Fdocker-surgeon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kRYstall9%2Fdocker-surgeon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kRYstall9","download_url":"https://codeload.github.com/kRYstall9/docker-surgeon/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kRYstall9%2Fdocker-surgeon/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35029796,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-02T02:00:06.368Z","response_time":173,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","docker-container","self-hosted","selfhosted"],"created_at":"2026-04-04T16:36:24.850Z","updated_at":"2026-07-02T02:01:58.289Z","avatar_url":"https://github.com/kRYstall9.png","language":"Python","funding_links":["https://github.com/sponsors/kRYstall9","https://buymeacoffee.com/kRYstall9"],"categories":[],"sub_categories":[],"readme":"# Docker Surgeon\nA Python service that monitors Docker containers in real time and automatically restarts them based on customizable rules, including any dependent containers.\nIdeal for environments where high availability matters and zombie containers are not welcome at the party.\n\n## ✨ Key Features\n- Monitors Docker events in real-time.\n- Automatically restarts containers that are unhealthy or have unexpectedly exited.\n- Supports a restart policy configurable via environment variables.\n- Handles container dependencies using labels (`com.monitor.depends.on`).\n- Detailed, timezone-aware logging.\n- Supports container exclusion from restart policies.\n- Supports real-time [notifications](#-notifications) through [Apprise](https://github.com/caronc/apprise)\n- **Multi-host support via Agents** — monitor and manage containers across multiple machines from a single server.\n\n## 🧭 How It Works\n\nThe service listens to Docker daemon events.\nWhen it detects that a container is in an unhealthy state or has exited with a non-excluded code, it restarts it.\nIf the container has dependencies (defined through labels), it restarts those too, in the correct order, using topological sorting.\n\nExample: `[db] --\u003e [backend] --\u003e [frontend]` \u003c/br\u003e\nIf `db` goes down, the service will restart `db`, then `backend`, and finally `frontend`.\n\n## 🤖 Agents (Multi-host Support)\n\nDocker Surgeon supports a distributed mode where a central **server** manages multiple remote **agents**, each running on a different machine. This allows you to monitor and control containers across your entire infrastructure from a single point.\n\n### Architecture\n\n```\n[Server] ──HTTP/HTTPS──▶ [Agent A - Machine 1]\n         ──HTTP/HTTPS──▶ [Agent B - Machine 2]\n         ──HTTP/HTTPS──▶ [Agent C - Machine 3]\n```\n\n- The **server** runs the dashboard, the monitor logic, and communicates with all configured agents.\n- Each **agent** runs on a remote machine, exposes a secured REST API, and has access to the local Docker daemon via the Docker socket.\n\n### Running an Agent\n\nOn each remote machine, run the agent with Docker:\n\n```yaml\n# docker-compose.yml (on the remote machine)\nservices:\n  agent:\n    image: krystall0/docker-surgeon:latest\n    container_name: docker-surgeon-agent\n    command: agent\n    ports:\n      - \"8001:8001\"\n    restart: unless-stopped\n    volumes:\n      - /var/run/docker.sock:/var/run/docker.sock\n    environment:\n      - AGENT_HOST=0.0.0.0\n      - AGENT_PORT=8001\n      - AGENT_TOKEN=yoursecuretokenhere\n```\n\n### Registering Agents on the Server\n\nOn the server, configure agents via the `AGENTS_CONFIG` environment variable as a JSON array:\n\n```env\nAGENTS_CONFIG='[\n  {\n    \"name\": \"machine-1\",\n    \"host\": \"192.168.1.50\",\n    \"port\": 8001,\n    \"token\": \"yoursecuretokenhere\"\n  },\n  {\n    \"name\": \"machine-2\",\n    \"host\": \"https://agent.example.com\",\n    \"port\": 443,\n    \"token\": \"anothersecuretoken\",\n    \"verify_ssl\": true\n  }\n]'\n```\n\nEach agent entry supports the following fields:\n\n| Field | Required | Default | Description |\n|---|---|---|---|\n| `host` | ✅ | — | IP address or domain of the agent. Prefix with `https://` for TLS. |\n| `name` | ❌ | `null` | Friendly name for the agent (used in logs). |\n| `port` | ❌ | `80` / `443` | Port the agent listens on. Auto-defaults to `443` if host starts with `https://`. |\n| `token` | ❌ | `null` | Bearer token to authenticate with the agent. Must match `AGENT_TOKEN` on the agent. |\n| `verify_ssl` | ❌ | `true` | Whether to verify the agent's SSL certificate. Set to `false` for self-signed certificates on internal networks. |\n\n### Agent Environment Variables\n\n| Variable | Default | Description |\n|---|---|---|\n| `AGENT_HOST` | `127.0.0.1` | Address the agent binds to. Use `0.0.0.0` to accept remote connections. |\n| `AGENT_PORT` | `8001` | Port the agent listens on. |\n| `AGENT_TOKEN` | `null` | Secret token required to authenticate incoming requests. |\n\n### Security Considerations\n\n- Always set `AGENT_TOKEN` on every agent. Without it, the API is open to anyone who can reach the port.\n- If the agent is exposed to the internet, place it behind a reverse proxy (Nginx, Caddy) with HTTPS enabled.\n- For internal networks, HTTP with a strong token is generally sufficient.\n- If using a self-signed certificate, set `verify_ssl: false` on the server side for that agent.\n\n---\n\n## 🧪 Environment Variables\nConfiguration is handled through a `.env` file in the project root.\nHere's an example:\n\n```\n# Restart policy in JSON format\nRESTART_POLICY = '{\n    \"excludedContainers\": [\"container_name\"], #-\u003e More than 1 container could be excluded. Specify them as [\"container1\", \"container2\"]\n    \"statuses\": {\n        \"exited\": {\n            \"codesToExclude\": [0]   #-\u003e More than 1 exit code could be excluded. Specify them as [\"code1\", \"code2\", \"code3\"]\n        }\n    }\n}'\n\nENABLE_DASHBOARD=True #-\u003e Possible values [True | False]\nLOGS_AMOUNT=10 #-\u003e This will display the last n logs on the dashboard to clearly indicate the issue that triggered the restart policy\nDASHBOARD_ADDRESS=0.0.0.0 #-\u003e Possible values [0.0.0.0 | 127.0.0.1]\nDASHBOARD_PORT=8000 #-\u003e Possible values [ Any free port ]\nADMIN_PASSWORD=\nENABLE_NOTIFICATIONS=True #-\u003e Possible values [True | False]\nNOTIFICATION_URLS='[\"url1\", \"url2\"]' #-\u003e Check https://github.com/caronc/apprise/wiki#notification-services\nNOTIFICATION_TITLE=\"\" #-\u003e Edit the notification title as you wish\nNOTIFICATION_BODY=\"\" #-\u003e Edit the notification body as you wish\n\n\n###############\n#   LOGGING   #   \n###############\n\n# --- Log Level ---\n# Set the verbosity of logs. Options: \"error\", \"warn\", \"info\", \"debug\"\n# Default: info\nLOG_LEVEL= info\n\n# --- Log Timezone ---\n# Adjust the timezone used for logging\n# e.g. Europe/Rome, America/New_York\nLOG_TIMEZONE=UTC\n\nAGENTS_CONFIG='[{\"name\": \"my-server\", \"host\": \"192.168.1.50\", \"port\": 8001, \"token\": \"secret\"}]'\n\n# This is used to specify if the agent should bind to a specific host. \n# This is useful if the agent is running on the same machine as the main application and you want to restrict access to it. \n# Possible values [127.0.0.1 | 0.0.0.0]\n# Default: 127.0.0.1\nAGENT_HOST=127.0.0.1 \n\n# This is the port on which the agent will listen for incoming requests. Make sure to set this to a free port. \n# Possible values [ Any free port ]\n# Default: 8000\nAGENT_PORT=8000 \n\n# This is the token that the agent will use to authenticate incoming requests. Make sure to set this to a strong, unique value.\n# Default: None\nAGENT_TOKEN= yourtoken\n\n```\n\n### RESTART_POLICY\n\nDefines which containers to ignore and which states should trigger a restart.\n\n- `excludedContainers`: list of containers that should never be restarted.\n- `statuses`:\n    - `exited` → restart if the container exited with a non-excluded code.\n        - `codesToExclude`: -\u003e A list of codes that should *not* trigger a restart. Check codes [here](https://komodor.com/learn/exit-codes-in-containers-and-kubernetes-the-complete-guide/#:~:text=%EE%80%80Exit%EE%80%81%20%EE%80%80codes%EE%80%81%20are%20used)\n\n\n### LOG_LEVEL\n\nControls log verbosity.\u003c/br\u003e\nSupported values: `error`, `warn`, `info`, `debug`.\u003c/br\u003e\nDefault: `info`.\n\n### LOG_TIMEZONE\n\nSets the timezone used in logs.\u003c/br\u003e\nMust be a valid pytz timezone.\u003c/br\u003e\nExamples: `UTC`, `Europe/Rome`, `America/New_York`.\u003c/br\u003e\nDefault: `UTC`\n\nCheck the valid timezones [here](https://gist.github.com/heyalexej/8bf688fd67d7199be4a1682b3eec7568)\n\n### ENABLE_DASHBOARD\nEnables or disables the web dashboard.\u003c/br\u003e\nDefault: `False`\n\n### LOGS_AMOUNT\nNumber of log entries to retain when a container is restarted.\n\nDefault: `10`\n\n### DASHBOARD_ADDRESS\nAddress interface for the dashboard:\n- `127.0.0.1` -\u003e Local only\n- `0.0.0.0` -\u003e accessible on LAN\n\nDefault: `0.0.0.0`\n\n### DASHBOARD_PORT\nPort on which the dashboard is served.\u003c/br\u003e\nDefault: `8000`\n\n### ADMIN_PASSWORD\nPassword for accessing the dashboard.\nSupport for three formats:\n- **Plain text**\n  - ADMIN_PASSWORD=r4nd0mP4ssW0rD\n- [**Bcrypt**](https://bcrypt-generator.com/)\n  - ADMIN_PASSWORD=$2a$12$9s8F...\n- [**Argon2**](https://argon2.online/) \n  - ADMIN_PASSWORD=$argon2id$v=19$m=65536,t=3,p=4$...\n\nThe system automatically detects whether the value is plain text, bcrypt, or Argon2.\u003c/br\u003e\nIf you want a strong random password (plain text), you can generate one using: `openssl rand -hex 32` *This is a plain password, not an encrypted hash*\n\n### ENABLE_NOTIFICATIONS\nEnables or disables real-time notifications.\u003c/br\u003e\nSupported values: `True` | `False`\u003c/br\u003e\nDefault: `False`\u003c/br\u003e\nSee [the notification's section](#-notifications) for more details\n\n### NOTIFICATION_URLS\nA JSON-formatted list of notification endpoints, as documented in the [Apprise URL specification](https://github.com/caronc/apprise/wiki)\u003c/br\u003e\nExpected Syntax: `'[\"url1\", \"url2\"]'`\u003c/br\u003e\n⚠️ *This must be valid JSON — use double quotes inside the list*.\n\n### NOTIFICATION_TITLE\nThe title template for notifications.\u003c/br\u003e\nSupports placeholders and emoji.\u003c/br\u003e\nDefault: `'⚠️ [{agent_name}] {container_name} crashed'`\n\nSupported placeholders:\n- {container_name}\n- {logs}\n- {exit_code}\n- {n_logs}\n- {agent_name}\n\n### NOTIFICATION_BODY\nThe body template for notifications.\u003c/br\u003e\nSupports placeholders, multiline text (\\n), and Markdown formatting.\u003c/br\u003e\nDoes **not** support icons/emoji (depending on the provider).\u003c/br\u003e\nDefault: ```'`exit code`: `{exit_code}`\\nLast {n_logs} logs of `{container_name}`: {logs}'```\n\nSupported placeholders:\n- {container_name}\n- {logs}\n- {exit_code}\n- {n_logs}\n\n\n## 🔐 Authentication Flow\n1. User submits their password to /auth/login\n2. The server validates it in this order:\n    - argon2 verification\n    - bcrypt `checkpw`\n    - direct comparison (plain text)\n3. If valid, a JWT token  is created and stored in a **HttpOnly Cookie**\n4. Protected routes require thise cookie to be present and valid\n\n## 🔗 Managing Container Dependencies\n\nYou can define container dependencies using the label `com.monitor.depends.on`.\u003c/br\u003e\nWhen a parent container is restarted, its dependent containers will be restarted too, in the correct order.\n\nExample `docker-compose.yml`:\n\n```\nservices:\n    db:\n        image: postgres\n        container_name: db\n\n    backend:\n        image: my-backend\n        container_name: backend\n        labels:\n        - \"com.monitor.depends.on=db\"\n\n    frontend:\n        image: my-frontend\n        container_name: frontend\n        labels:\n        - \"com.monitor.depends.on=backend\"\n\n    docker-surgeon:\n        image: docker-surgeon-image\n        container_name: docker-surgeon\n        volumes:\n            - /var/run/docker.sock:/var/run/docker.sock\n        env_file:\n            - path/to/.env\n```\n\nIn this setup:\u003c/br\u003e\nIf `db` crashes → `db`, `backend`, and `frontend` will be restarted in order.\u003c/br\u003e\nIf `backend` crashes → `backend` and `frontend` will be restarted.\u003c/br\u003e\nIf `frontend` crashes → only `frontend` will be restarted.\n\nMultiple dependents can be specified for a container by separating them with a comma: `com.monitor.depends.on=backend,frontend,db`\n\n## 🚀 Quick Start\n```\ndocker run -d \\\n  --name docker-surgeon \\\n  -v /var/run/docker.sock:/var/run/docker.sock \\\n  -v /your/path/data:/app/app/data \\  # persistent data (recommended if dashboard is enabled)\n  -v $(pwd)/.env:/app/.env \\\n  krystall0/docker-surgeon:latest\n```\n\nYou can also override environment variables directly:\n```\ndocker run -d \\\n  --name docker-surgeon \\\n  -v /var/run/docker.sock:/var/run/docker.sock \\\n  -v /your/path/data:/app/app/data \\ # persistent data (recommended if dashboard is enabled)\n  -e LOG_LEVEL=INFO \\\n  -e LOG_TIMEZONE=Europe/Rome \\\n  -e RESTART_POLICY='{\"excludedContainers\":[\"pihole\"],\"statuses\":{\"exited\":{\"codesToExclude\":[0]}}}' \\\n  krystall0/docker-surgeon:latest\n```\n\n### Example `docker-compose.yml`\n```\nversion: \"3.8\"\n\nservices:\n  docker-surgeon:\n    image: krystall0/docker-surgeon:latest\n    container_name: docker-surgeon\n    restart: always\n    volumes:\n      - /var/run/docker.sock:/var/run/docker.sock\n      - /your/path/data:/app/app/data # persistent data (recommended if dashboard is enabled)\n    env_file:\n        - /path/to/.env\n\n  db:\n    image: postgres\n    container_name: db\n\n  backend:\n    image: my-backend\n    container_name: backend\n    labels:\n      - \"com.monitor.depends.on=db\"\n\n  frontend:\n    image: my-frontend\n    container_name: frontend\n    labels:\n      - \"com.monitor.depends.on=backend\"\n```\n\n## 📊 Dashboard Overview\nDocker Surgeon includes a built-in web dashboard that helps you inspect:\n- Recent container crashes\n- Logs grouped by container\n- Crash statistics over time\n- Interactive charts\n- Date-based filtering\n- Full log viewer with multiline formatting\n\nTo access the dashboard:\u003c/br\u003e\n```\nhttp://\u003cyour-ip\u003e:\u003cyour-port\u003e\n```\n(Requires authentication — see [**Authentication Flow**](#-authentication-flow))\n\n### Dashboard Preview\n![alt text](docs/images/preview.png)\n\n## 🔔 Notifications\n\nDocker Surgeon can send real-time notifications whenever a container crashes.\nNotifications are handled through Apprise, supporting 70+ services including:\n- Discord\n- Telegram\n- Slack\n- Matrix\n- Email\n- Webhooks\n- Gotify / Pushover / Pushbullet\n\nAnd many others…\n\nSee [Apprise](https://github.com/caronc/apprise) for more details\n\n### Enabling Notifications\nAdd these variables to your `.env`:\u003c/br\u003e\n```\nENABLE_NOTIFICATIONS=True\nNOTIFICATION_URLS=[\"discord://\u003cwebhook_id\u003e/\u003cwebhook_token\u003e\"]\nNOTIFICATION_TITLE=\"⚠️ {container_name} crashed\"\nNOTIFICATION_BODY=\"`exit code`: `{exit_code}`\\nLast {n_logs} logs:\\n{logs}\"\n```\n\n### Formatting Notifications\nDocker Surgeon supports placeholder variables inside `NOTIFICATION_TITLE` and `NOTIFICATION_BODY`.\u003c/br\u003e\nAvailable placeholders:\n- `{container_name}` → name of the crashed container\n- `{exit_code}` → container exit code\n- `{logs}` → last N logs (ANSI colors removed)\n- `{n_logs}` → number of logs configured in `LOGS_AMOUNT`\n\nExample notification body:\u003c/br\u003e\n`exit code`: `{exit_code}`\u003c/br\u003e\nContainer `{container_name}` crashed.\u003c/br\u003e\nLast {n_logs} logs:\u003c/br\u003e\n{logs}\n\n\n### ⚠️ Security Notes\n- Do **not** expose the dashboard over the internet without HTTPS and reverse proxy protections\n- Always use a strong admin password (preferably hashed)\n- Always set `AGENT_TOKEN` on every agent to prevent unauthorized access","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrystall9%2Fdocker-surgeon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrystall9%2Fdocker-surgeon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrystall9%2Fdocker-surgeon/lists"}