An open API service indexing awesome lists of open source software.

https://github.com/pacnpal/wireguard-watchdog

Unraid plugin: monitors a WireGuard tunnel by pinging a peer through the interface and bounces the tunnel via wg-quick on failure.
https://github.com/pacnpal/wireguard-watchdog

networking slackware unraid unraid-plugin vpn watchdog wireguard

Last synced: about 2 months ago
JSON representation

Unraid plugin: monitors a WireGuard tunnel by pinging a peer through the interface and bounces the tunnel via wg-quick on failure.

Awesome Lists containing this project

README

          

WireGuard Watchdog logo

# WireGuard Watchdog

**An Unraid plugin that keeps your WireGuard tunnel healthy.**
Pings a peer through the tunnel on a schedule; resets peer state with
`wg syncconf` the moment the peer goes silent, falling back to a full
`wg-quick down/up` only as a last resort when the soft recovery path
fails.

[![Latest release](https://img.shields.io/github/v/release/pacnpal/wireguard-watchdog?label=release&color=88171a)](https://github.com/pacnpal/wireguard-watchdog/releases/latest)
[![License: MIT](https://img.shields.io/github/license/pacnpal/wireguard-watchdog?color=blue)](LICENSE)
[![Unraid 6.12+](https://img.shields.io/badge/Unraid-6.12%2B-f15a2c)](https://unraid.net/)
[![Lint](https://img.shields.io/github/actions/workflow/status/pacnpal/wireguard-watchdog/lint.yml?branch=main&label=lint)](https://github.com/pacnpal/wireguard-watchdog/actions/workflows/lint.yml)
[![Release workflow](https://img.shields.io/github/actions/workflow/status/pacnpal/wireguard-watchdog/release.yml?label=release%20build)](https://github.com/pacnpal/wireguard-watchdog/actions/workflows/release.yml)
[![Downloads](https://img.shields.io/github/downloads/pacnpal/wireguard-watchdog/total?color=2ee4a3)](https://github.com/pacnpal/wireguard-watchdog/releases)

> [!IMPORTANT]
> Requires Unraid's built-in **WireGuard** support (Settings → VPN
> Manager) and at least one configured tunnel (`wg0`, `wg1`, …). The
> watchdog uses `wg syncconf` (and, only as a last resort, `wg-quick`),
> the same tools Unraid uses internally — so the two coexist cleanly.

## Why?

WireGuard is silent when it fails. A peer going down or a NAT mapping
expiring leaves the tunnel "up" from the local side — `wg show` looks
fine, but no traffic flows. The fix is always the same: bounce the
tunnel. This plugin automates that bounce, gated behind a real
liveness check (a ping through the interface, not just a check that
the daemon exists).

Use it if you:
- run a site-to-site tunnel and want unattended recovery from the
remote side rebooting or losing internet briefly,
- depend on the tunnel for critical traffic (Docker containers, VMs)
and don't want to babysit it,
- want a quick visual confirmation in the UI that the tunnel is
reachable right now.

## Install

1. Open the Unraid web UI → **Plugins** tab → **Install Plugin**.
2. Paste the `.plg` URL:
```
https://raw.githubusercontent.com/pacnpal/wireguard-watchdog/main/plugin/wg-watchdog.plg
```
3. Click **Install**. The plugin downloads its `.txz` from the matching
GitHub release and installs to `/usr/local/emhttp/plugins/wg-watchdog/`.
4. Open **Tools → User Utilities → WireGuard Watchdog**, fill in the
form, set **Enabled = yes**, click **Apply**.

The plugin defaults to `Enabled=no` on first install. Nothing runs until
you explicitly enable it.

## Configuration

| Field | Default | Notes |
|-------------------|-------------------------------|-------|
| Enabled | `no` | Master toggle. `no` removes the cron entry. |
| Tunnel interface | `wg0` | Must be a configured WireGuard interface. |
| Peer IP to ping | `10.99.0.1` | Reachable through the tunnel. |
| Check interval | `60` seconds (min `20`) | Below 60s: cron uses per-minute lines with `sleep` offsets. |
| Verbose logging | `no` | If `yes`, each successful ping is logged too. |
| Log file | `/var/log/wg-watchdog.log` | Read-only display in the UI. |

**Buttons:**
- **Apply** — posts the form to Unraid's `/update.php`, which writes
`/boot/config/plugins/wg-watchdog/wg-watchdog.cfg` and runs
`scripts/install_cron.sh` (regenerates the cron file and calls
`update_cron`).
- **Test Now** — runs `watchdog.sh --test` once and shows the output
inline. Honours your settings but ignores the Enabled toggle.
- **View Log** — tails the last 200 lines of the configured log file.
- **Clear Log** — truncates the log file (with confirmation prompt).

> _Screenshot placeholder: Tools → User Utilities → WireGuard Watchdog._

## How it works

- `scripts/watchdog.sh` is the only thing scheduled. It:
1. Sources `/boot/config/plugins/wg-watchdog/wg-watchdog.cfg`.
2. Holds an exclusive `flock` on `/var/lock/wg-watchdog.lock` so
overlapping cron firings can't trample each other.
3. Runs `ping -c 2 -W 3 -I $INTERFACE $PEER_IP`.
4. On failure (soft bounce): runs `wg-quick strip $INTERFACE` and
verifies the output is non-empty before touching any live state.
Only then does it drop every peer via
`wg set $INTERFACE peer remove` and re-apply the stripped
conf with `wg syncconf $INTERFACE /dev/stdin`. This ordering
matters: if strip fails, the live interface is left untouched
and we go straight to the hard fallback — never an "interface
up but peers wiped" half-state.
5. Hard bounce only if `wg-quick strip` or `wg syncconf` itself
fails (malformed or unreadable conf): `wg-quick down $INTERFACE`
→ `sleep 2` → `wg-quick up $INTERFACE`. A successful
`wg syncconf` is taken as the source of truth — even if some
peers failed to pre-remove — because it means the running
interface matches the on-disk conf.
6. **The hard bounce is gated.** Before running it the watchdog
parses the conf and checks `wg show $INTERFACE fwmark`. If the
conf is "redirect-prone" — `AllowedIPs = 0.0.0.0/0` (or `::/0`)
without `Table = off` — AND no auto-routing fwmark is currently
set on the live interface, the hard bounce would newly install
`ip rule not fwmark table ` and silently redirect all
unmarked host traffic (including any `--network host` Docker
container) through the tunnel. The watchdog refuses and exits 1
with an explanation; add `Table = off` to the conf to opt in.
If `$INTERFACE` is missing entirely, the precheck at the top
of the script logs `FAIL: interface ... does not exist` and
exits — the watchdog heals an existing tunnel, it does not
bring one up from cold.
- `scripts/install_cron.sh` reads the cfg and writes
`/boot/config/plugins/wg-watchdog/wg-watchdog.cron`, then calls
`/usr/local/sbin/update_cron`. Unraid persists cron files from
`/boot/config/plugins/*/...cron` across reboots.
- `event/started` re-syncs cron when the array starts.
- `event/stopping` removes the active cron entry so no checks fire
during shutdown.

The plugin uses the same tools as Unraid's built-in WireGuard
(`wg`, `wg-quick`). The soft-bounce path was chosen specifically to
avoid an `wg-quick down`/`up` side effect: when a conf has
`AllowedIPs = 0.0.0.0/0` without `Table = off` (typical of imported
VPN-provider configs), `wg-quick up` adds an `ip rule not fwmark
51820 table 51820` rule that funnels every unmarked packet through the
tunnel — which redirects host traffic, including any Docker container
running with `--network host`. `wg syncconf` doesn't do that.

## Build & release

Releases are cut from the **Actions** tab → **Build and Release** →
**Run workflow** ([release.yml](.github/workflows/release.yml)).

`./build.sh` is for local testing only — the workflow builds and
attaches the public release assets.

## Test plan

Tested target: Unraid 7.2.x in a VM with a `wg0` tunnel configured in
**Settings → VPN Manager** against a reachable peer.

1. **Install**
- Build with `./build.sh`. Push to a test branch + create a release.
- Paste the .plg URL into Plugins → Install Plugin.
- Verify install log ends with the "wg-watchdog … installed" banner.
- Verify **Tools → User Utilities → WireGuard Watchdog** appears.

2. **Defaults**
- Open the page. Confirm `Enabled = no`, `INTERFACE = wg0`,
`PEER_IP = 10.99.0.1`, `INTERVAL = 60`, log path shown.
- Confirm `/boot/config/plugins/wg-watchdog/wg-watchdog.cfg` exists.
- Confirm no cron file at `/etc/cron.d/wg-watchdog` (disabled state).

3. **Apply / cron install**
- Set `Enabled = yes`, `PEER_IP` to the actual peer's tunnel IP,
`INTERVAL = 60`, click **Apply**.
- Confirm `/boot/config/plugins/wg-watchdog/wg-watchdog.cron` was
written and `/etc/cron.d/wg-watchdog` was created by `update_cron`.

4. **Test Now (happy path)**
- Click **Test Now**.
- Expect output containing `OK: reachable via wg0`.

5. **Failure simulation (soft path)**
- With `wg0` up, change the peer's PublicKey on the remote side
(or block the peer's UDP port on the remote firewall) so the
handshake stops working but the local interface stays up.
- Wait one cron interval (or click **Test Now** to force).
- Expect log entry `FAIL: ... unreachable via wg0 -- bouncing tunnel`,
followed by `wg syncconf wg0: ok (peer state reset; routes preserved)`.
- Confirm `ip rule show` and `ip route show table all` are unchanged
vs. before the bounce.
- Restore the peer; expect a fresh handshake within ~5 s of the next
ping.

6. **Failure simulation (hard path)**
- With `wg0` up, make `wg syncconf` fail by temporarily corrupting
the on-disk conf, e.g.: `mv /etc/wireguard/wg0.conf{,.bak} && \
printf 'garbage\n' > /etc/wireguard/wg0.conf` (the interface
itself stays up; only `wg-quick strip` will fail).
- Click **Test Now** (or wait an interval).
- Expect `wg-quick strip wg0: failed (rc=...) -- skipping soft
bounce`, then `soft bounce did not recover; evaluating hard
fallback`, then `wg-quick down wg0: ...` and `wg-quick up wg0:
failed (rc=...)` — the hard bounce will also fail because the
conf is broken, which is the point of the test (we just want
to see the fallback fire).
- Confirm the live interface's peers are still intact via
`wg show wg0` — the strip-first ordering means we never
touched them when strip failed.
- Restore: `mv /etc/wireguard/wg0.conf.bak /etc/wireguard/wg0.conf`
and bring the tunnel back up via VPN Manager or `wg-quick up wg0`.

7. **Interface-missing exit**
- From SSH: `wg-quick down wg0`.
- Click **Test Now**.
- Expect `FAIL: interface wg0 does not exist` and exit 1 — no
bounce, no `wg-quick up`. Restore via VPN Manager.

8. **Lock contention**
- Set `INTERVAL = 20`, click **Apply**.
- Tail the log; with verbose enabled, confirm only one run executes
at a time even with overlapping firings.

9. **Persistence**
- Reboot the server.
- After array start, confirm `/etc/cron.d/wg-watchdog` is back
(regenerated by `event/started`).
- Confirm log contains an `event: started` entry.

10. **Disable**
- Set `Enabled = no`, **Apply**.
- Confirm both `/boot/config/.../wg-watchdog.cron` and
`/etc/cron.d/wg-watchdog` are gone.

11. **Uninstall**
- Remove the plugin from the Plugins tab.
- Confirm `/boot/config/plugins/wg-watchdog/` and
`/usr/local/emhttp/plugins/wg-watchdog/` are gone.
- Confirm `/var/log/wg-watchdog.log` is **preserved**.

## Troubleshooting

| Symptom | Likely cause | Where to look |
|---|---|---|
| **Test Now** prints `FAIL: wg0 is not configured under Settings -> VPN Manager` | The conf file `/etc/wireguard/wg0.conf` is missing — VPN Manager creates it when you add a tunnel. Either the interface name in the watchdog cfg is wrong, or no tunnel exists yet. | Settings → VPN Manager. Verbose mode lists the configured `*.conf` files. |
| **Test Now** prints `FAIL: interface wg0 does not exist (configured but not active...)` | The conf exists but the tunnel is currently down. | Toggle the tunnel on under Settings → VPN Manager (or `wg-quick up wg0`). Verbose mode lists the active wg interfaces. |
| Test passes, but cron never fires | Service disabled, or `update_cron` wasn't called after Apply. | `cat /etc/cron.d/wg-watchdog` should exist; `cat /boot/config/plugins/wg-watchdog/wg-watchdog.cfg` should show `SERVICE_ENABLED="yes"`. |
| Bounces happen but tunnel stays down | The peer is genuinely unreachable, or the soft bounce isn't enough and `wg-quick up` is failing. | Tail `/var/log/wg-watchdog.log` for `wg syncconf wg0: failed` followed by `wg-quick up wg0: failed`; run them manually to see the error. |
| Log says `REFUSING hard bounce: ... has AllowedIPs=0.0.0.0/0 ... without 'Table = off'` | The soft bounce didn't recover and the conf would cause `wg-quick up` to install `ip rule not fwmark ... table ...`, which redirects host (and `--network host` Docker) traffic. The watchdog refuses to inflict that. | Add `Table = off` to `[Interface]` in `/etc/wireguard/wg0.conf` and manage routes yourself via PostUp/PostDown — or, if you genuinely want full-tunnel redirect, run `wg-quick up wg0` once manually so the auto-routing fwmark is in place; the watchdog will then allow the hard bounce. |
| Log says `skipped: previous run still in progress` repeatedly | A check is taking longer than the interval (DNS hangs, network stalls). | Lengthen the interval, or set `VERBOSE="no"` to suppress these messages. |
| Log file fills the flash drive | Verbose left on for months. | Set Verbose=no, or rotate by truncating: `: > /var/log/wg-watchdog.log`. |
| **View Log** says "log file not yet created" | First boot or just installed; nothing's run yet. | Click **Test Now** once. |

For anything else, file an issue with the contents of
`/boot/config/plugins/wg-watchdog/wg-watchdog.cfg` and the last ~50
lines of `/var/log/wg-watchdog.log`.

## Repo layout

```
wireguard-watchdog/
├── README.md
├── LICENSE
├── build.sh
├── wg-watchdog.plg.in # template; build.sh fills @@VERSION@@/@@MD5@@/@@PKG@@
├── .github/
│ ├── ISSUE_TEMPLATE/{bug_report,feature_request}.yml
│ └── workflows/{release,lint}.yml
├── assets/
│ ├── logo.svg # source vector
│ ├── logo{,-128,-512}.png # rasterised by render-png.py
│ └── render-png.py
├── source/ # installs to /usr/local/emhttp/plugins/wg-watchdog/
│ ├── default.cfg
│ ├── wg-watchdog.page
│ ├── include/{test,log,clear}.php
│ ├── scripts/{watchdog,install_cron,remove_cron}.sh
│ └── event/{started,stopping}
└── dist/ # produced by build.sh; not checked in
├── wg-watchdog--noarch-1.txz
└── wg-watchdog.plg
```

## Notes

- The watchdog prefers `wg syncconf` (a strict re-application of the
on-disk conf to the running interface). The hard `wg-quick down/up`
fallback only runs when the soft path fails, and is itself gated
behind a redirect-prone-conf check: the script refuses to run a
hard bounce that would newly install wg-quick's auto-routing
(`ip rule not fwmark ... table ...`) on a host where it isn't
already in effect. This is the rigorous fix for the "redirected
host / `--network host` Docker traffic through wg0" report.
- Tests live under `tests/`; run `bash tests/run.sh` locally. CI runs
them on every push and PR via `.github/workflows/lint.yml`.

## License

[MIT](LICENSE).