Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/tmknight/docker-autoheal

Monitor and remediate unhealthy Docker containers
https://github.com/tmknight/docker-autoheal
automation docker management
Last synced: 7 days ago
JSON representation
Monitor and remediate unhealthy Docker containers
Host: GitHub
URL: https://github.com/tmknight/docker-autoheal
Owner: tmknight
License: gpl-3.0
Created: 2024-01-06T15:31:16.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-12-19T21:57:22.000Z (28 days ago)
Last Synced: 2024-12-19T22:45:44.118Z (28 days ago)
Topics: automation, docker, management
Language: Rust
Homepage:
Size: 504 KB
Stars: 24
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

README

        # Docker-Autoheal

[![GitHubRelease][GitHubReleaseBadge]][GitHubReleaseLink]

[![DockerPublishing][DockerPublishingBadge]][DockerLink]

[![DockerSize][DockerSizeBadge]][DockerLink]

[![DockerPulls][DockerPullsBadge]][DockerLink]

A cross-platform tool to monitor and remediate unhealthy Docker containers

Written in Rust and designed to be OS agnostic, flexible, and performant in large environments via concurrency and multi-threading

The `docker-autoheal` binary may be executed in a native OS or from a Docker container

## How to Use

### You must first apply `HEALTHCHECK` to your docker images

- See  for details

### Environment Variables

| Variable                     | Default                  | Description                                           |

|:----------------------------:|:------------------------:|:-----------------------------------------------------:|

| **AUTOHEAL_CONNECTION_TYPE** | local                    | This determines how `docker-autoheal` connects to Docker (One of: local, socket, http, ssl                           |

| **AUTOHEAL_STOP_TIMEOUT**    | 10                       | Docker waits `n` seconds for a container to stop before killing it during restarts (override via label; see below)  |

| **AUTOHEAL_INTERVAL**        | 5                        | Check container health every `n` seconds              |

| **AUTOHEAL_START_DELAY**     | 0                        | Wait `n` seconds before first health check            |

| **AUTOHEAL_POST_ACTION**     |                          | The absolute path of an executable to be run after restart attempts; container `name`, `id` and `stop-timeout` are passed as arguments in that order                                                              |

| **AUTOHEAL_MONITOR_ALL**     | FALSE                    | Set to `TRUE` to simply monitor all containers on the host or leave as `FALSE` and control via `autoheal.monitor.enable` |

| **AUTOHEAL_LOG_ALL**         | FALSE                    | Allow (`TRUE`/`FALSE`) logging (and webhook/apprise if set) for containers with `autostart.restart.enable=FALSE`          |

| **AUTOHEAL_LOG_PERSIST**     | FALSE                    | Allow (`TRUE`/`FALSE`) external persistent logging and reporting of historical data   |

| **AUTOHEAL_TCP_HOST**        | localhost                | Address of Docker host                                |

| **AUTOHEAL_TCP_PORT**        | 2375 (ssl: 2376)         | Port on which to connect to the Docker host           |

| **AUTOHEAL_TCP_TIMEOUT**     | 10                       | Time in `n` seconds before failing connection attempt |

| **AUTOHEAL_PEM_PATH**        | /opt/docker-autoheal/tls | Absolute path to requisite ssl certificate files (key.pem, cert.pem, ca.pem) when `AUTOHEAL_CONNECTION_TYPE=ssl`         |

| **AUTOHEAL_APPRISE_URL**     |                          | URL to post messages to the apprise following actions on unhealthy container                                             |

| **AUTOHEAL_WEBHOOK_KEY**     |                          | KEY to post messages to the webhook following actions on unhealthy container                                             |

| **AUTOHEAL_WEBHOOK_URL**     |                          | URL to post messages to the webhook following actions on unhealthy container                                             |

### Optional Container Labels

| Label                        | Default | Description                                                                                                                                 |

|:----------------------------:|:-------:|:-------------------------------------------------------------------------------------------------------------------------------------------:|

| **autoheal.stop.timeout**    |         | Per container override (in seconds) of `AUTOHEAL_STOP_TIMEOUT` during restart (e.g. Some container routinely takes longer to cleanly exit)  |

| **autoheal.monitor.enable**  | FALSE   | Per container override (true/false) to control if should be monitored (e.g. If you have a large number of containers that you wish to monitor and restart, apply this label as `FALSE` to the few that you do not wish to monitor and set `AUTOHEAL_MONITOR_ALL` to `TRUE`)                                                                                  |

| **autoheal.restart.enable**  | TRUE    | Per container override (true/false) to control if should restart on unhealthy (e.g. If you have a large number of containers that you wish to monitor and restart, apply this label as `FALSE` to the few that you do not wish to restart and set `AUTOHEAL_MONITOR_ALL` to `TRUE`)                                                                       |

### Binary Options

Used when executed in native OS (NOTE: The environment variables are also accepted)

```bash

Options:

    -a, --apprise-url 

                        The apprise url

    -c, --connection-type 

                        One of local, socket, http, or ssl

    -d, --start-delay 

                        Time in seconds to wait for first check

    -h, --help          Print help

    -i, --interval 

                        Time in seconds to check health

    -j, --webhook-key 

                        The webhook json key string

    -k, --key-path 

                        The absolute path to requisite ssl PEM files

    -l, --log-all       Enable logging of unhealthy containers where restart

                        is disabled (WARNING, this could be chatty)

    -m, --monitor-all   Enable monitoring off all containers that have a

                        healthcheck

    -n, --tcp-host 

                        The hostname or IP address of the Docker host (when -c

                        http or ssl)

    -p, --tcp-port 

                        The tcp port number of the Docker host (when -c http

                        or ssl)

    -s, --stop-timeout 

                        Time in seconds to wait for action to complete

    -t, --tcp-timeout 

                        Time in seconds to wait for connection to complete

    -w, --webhook-url 

                        The webhook url

    -L, --log-persist Enable external persistent logging and reporting of historical

                        data

    -P, --post-action 

                        The absolute path to a script that should be executed

                        after container restart

    -V, --version       Print version information

```

### Local

```bash

/usr/local/bin/docker-autoheal --monitor-all --log_persist > /var/log/docker-autoheal.log &

```

Will connect to the local Docker host, monitor all containers, and generate a persistent log at `/opt/docker-autoheal/log.json`

### Socket

```bash

docker run -d --read-only \

    --user=[uid]:[gid]

    --name docker-autoheal \

    --network=none \

    --restart=always \

    --env="AUTOHEAL_CONNECTION_TYPE=socket" \

    --env="AUTOHEAL_MONITOR_ALL=true" \

    --env="AUTOHEAL_LOG_PERSIST=true" \

    --volume=/var/run/docker.sock:/var/run/docker.sock:ro \

    --volume=/opt/docker-autoheal/log.json:/opt/docker-autoheal/log.json:rw \

    tmknight88/docker-autoheal:latest

```

Will connect to the Docker host via unix socket location /var/run/docker.sock or Windows named pipe location //./pipe/docker_engine, monitor all containers, and write persistent log data to `/opt/docker-autoheal/log.json` as the user with the specified `uid:gid`

### HTTP

```bash

docker run -d --read-only \

    --user=[uid]:[gid]

    --name docker-autoheal \

    --restart=always \

    --env="AUTOHEAL_CONNECTION_TYPE=http" \

    --env="AUTOHEAL_TCP_HOST=MYHOST" \

    --env="AUTOHEAL_TCP_PORT=2375" \

    --env="AUTOHEAL_LOG_PERSIST=true" \

    --volume=/opt/docker-autoheal/log.json:/opt/docker-autoheal/log.json:rw \

    tmknight88/docker-autoheal:latest

```

Will connect to the Docker host via hostname or IP and the specified port, monitor only containers with a label `autoheal.monitor.enable=true`, and write persistent log data to `/opt/docker-autoheal/log.json` as the user with the specified `uid:gid`

### Logging

```bash

2024-01-23 03:03:23-0500 [WARNING] [nordvpn] Container (886d37fd9f5c) is unhealthy with 3 failures

2024-01-23 03:03:23-0500 [WARNING] [nordvpn] Container (886d37fd9f5c) last output: [4] Status: Unstable

2024-01-23 03:03:23-0500 [WARNING] [nordvpn] Restarting container (886d37fd9f5c) with 10s timeout

2024-01-23 03:03:34-0500 [   INFO] [nordvpn] Restart of container (886d37fd9f5c) was successful

2024-01-23 03:03:34-0500 [   INFO] [nordvpn] Container (886d37fd9f5c) has been unhealthy 1 time

2024-01-23 03:04:48-0500 [WARNING] [privoxy] Container (74f74eb7b2d0) is unhealthy with 3 failures

2024-01-23 03:04:48-0500 [WARNING] [privoxy] Container (74f74eb7b2d0) last output: [-1] Health check exceeded timeout (3s)

2024-01-23 03:04:48-0500 [WARNING] [privoxy] Restarting container (74f74eb7b2d0) with 10s timeout

2024-01-23 03:04:59-0500 [   INFO] [privoxy] Restart of container (74f74eb7b2d0) was successful

2024-01-23 03:04:59-0500 [   INFO] [privoxy] Container (74f74eb7b2d0) has been unhealthy 1 time

```

Example output when docker-autoheal is in action

### Persistent Logging

Examples of working with log.json:

```bash

jq -s 'group_by(.name) | map({name: .[0].name, data: (group_by(.id) | map({id: .[0].id, data: .}))})' /opt/docker-autoheal/log.json

```

Group all entries by name and then group by container id

```bash

jq -s 'map(select(.name=="privoxy"))' /opt/docker-autoheal/log.json

```

Find all occurrences of 'privoxy'

```bash

jq -s 'map(select(.name=="privoxy")) | group_by(.name) | map({name: .[0].name, data: (group_by(.id) | map({id: .[0].id, data: .}))})' /opt/docker-autoheal/log.json

```

Find all occurrences of 'privoxy' and group by container id

## Other Info

### Docker Labels

a) Apply the label `autoheal.monitor.enable=true` to your container to have it watched

OR

b) Set ENV `AUTOHEAL_MONITOR_ALL=true` (or apply `--monitor-all` to the binary) to watch all running containers

### SSL Connection Type

See  for how to configure TCP with mTLS

The certificates and keys need these names:

- ca.pem

- cert.pem

- key.pem

### Docker Security

Additional security can be obtained by:

- Use a unique user for monitoring and remediating

  - Create a new user

  - Add that user to the `docker` group

  - Execute the binary or docker container with that uid:gid

- Run docker in [rootless mode](https://docs.docker.com/engine/security/rootless/)

### Docker Timezone

If you need the `docker-autoheal` container timezone to match the local machine, you can map `/etc/localtime`

```bash

docker run ... -v /etc/localtime:/etc/localtime:ro

```

### Webhook/Apprise

- The payload includes the following separated by `|`: Docker system hostname, the last health output, and the result of restart action

### A Word of Caution about Excluding from Restart and Logging of those Exclusions

- Excluding a container from restarts and enabling logging for excluded containers will generate numerous log messages whenever that container becomes unhealthy

- Additionally, if a webhook or apprise is also configured, they will be executed at each monitoring interval for those containers

## Credits

- [willfarrell](https://github.com/willfarrell)

[GitHubReleaseBadge]: https://img.shields.io/github/actions/workflow/status/tmknight/docker-autoheal/github-release.yml?branch=main&style=flat&logo=github&color=32c855&label=generate%20release&cacheSeconds=9000

[GitHubReleaseLink]: https://github.com/tmknight/docker-autoheal/releases

[DockerPublishingBadge]: https://img.shields.io/github/actions/workflow/status/tmknight/docker-autoheal/docker-publish.yml?branch=main&style=flat&logo=github&color=32c855&label=publish%20image&cacheSeconds=9000

[DockerPullsBadge]: https://img.shields.io/docker/pulls/tmknight88/docker-autoheal?style=flat&logo=docker&color=blue&cacheSeconds=9000

[DockerSizeBadge]: https://img.shields.io/docker/image-size/tmknight88/docker-autoheal?sort=date&arch=amd64&style=flat&logo=docker&color=blue&cacheSeconds=9000

[DockerLink]: https://hub.docker.com/r/tmknight88/docker-autoheal