https://github.com/zephinzer/healthcheckr
Simple but effective healthcheck tool with alerting
https://github.com/zephinzer/healthcheckr
Last synced: 2 months ago
JSON representation
Simple but effective healthcheck tool with alerting
- Host: GitHub
- URL: https://github.com/zephinzer/healthcheckr
- Owner: zephinzer
- License: mit
- Created: 2024-03-29T09:47:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-05T15:31:45.000Z (12 months ago)
- Last Synced: 2025-02-01T19:13:55.134Z (4 months ago)
- Language: Go
- Size: 44.9 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Healthcheckr
- [Healthcheckr](#healthcheckr)
- [Usage](#usage)
- [Job mode](#job-mode)
- [Worker mode](#worker-mode)
- [Worker configuration documentation](#worker-configuration-documentation)
- [Configuration properties](#configuration-properties)
- [Channel type configuration](#channel-type-configuration)
- [Debug mode](#debug-mode)
- [Telegram chat ID retrieval](#telegram-chat-id-retrieval)
- [Utility HTTP server](#utility-http-server)
- [Development](#development)
- [Getting started](#getting-started)
- [Executing via `go`](#executing-via-go)
- [License](#license)# Usage
## Job mode
To run `healthcheckr` in job mode:
```sh
# with defaults
healthcheckr verify http;# with all available flags
healthcheckr verify http \
--expect-body-regex 'google' \
--expect-response-time-ms 1000 \
--expect-status-code 200 \
--log-level 4 \
--use-hostname yahoo.com \
--use-method get \
--use-path / \
--use-query abc=def \
--use-query ghi=jkl \
--use-scheme https \
--use-user-agent healthcheckr/example;# get help on available flags
healthcheckr verify http --help;
```## Worker mode
`healthcheckr` can also run as a long-running worker process with multiple healthchecks configured.
```sh
# with defaults
healthcheckr start worker;# with all available flags
healthcheckr start worker \
--config-path /path/to/config/file.yaml \
--server-addr 0.0.0.0 \
--server-port 8080;# get help on flags
healthcheckr start worker --help;
```Configuration is done via YAML:
```yaml
http:
- scheme: https
hostname: google.com
path: /
queries:
- aaa=bbb
- bbb=ccc
method: get
userAgent: healthcheckr/1.0/example
timeoutMs: 3000
expectStatusCode: 200
expectBodyRegexes:
intervalMs: 5000
failureThreshold: 3
channels:
- telegram-default
channels:
- name: telegram-default
type: telegram
apiKey:
fromEnv: TELEGRAM_BOT_TOKEN
chatId:
value: "123456789"
- name: slack-default
type: slack
url:
fromEnv: SLACK_WEBHOOK_URL
```An example is available at [`examples/config.yaml`](examples/config.yaml).
### Worker configuration documentation
#### Configuration properties
| Property | Type | Description |
| --- | --- | --- |
| `http[]` | `list(object)` | `http` is the root level property that defines a list of HTTP-based checks. |
| `http[].scheme` | `string` | This defines the scheme to use for the TCP connection. Only `http` and `https` is supported. |
| `http[].hostname` | `string` | This defines the hostname component of the URL. |
| `http[].path` | `string` | This defines the path component of the URL. Defaults to `"/"` |
| `http[].queries` | `list(string)` | This defines a list of `key=value` strings which are used as the query value of the HTTP-based request |
| `http[].method` | `string` | This defines the method to use for the request. Defaults to a `"GET"` request. |
| `http[].userAgent` | `string` | This defines a custom User Agent string to use for the request. Defaults to `"healthcheckr/1.0"` |
| `http[].timeoutMs` | `string` | This defines a timeout for the request in terms of milliseconds. Defaults to `5000` milliseconds. |
| `http[].expectStatusCode` | `integer` | This defines the expected integer status code of the response. Defaults to `200`. |
| `http[].expectBodyRegexes` | `list(string)` | This defines a list of regular expressions to match against the response body. |
| `http[].intervalMs` | `integer` | This defines the interval between checks in terms of milliseconds. Defaults to `5000` milliseconds |
| `http[].failureThreshold` | `integer` | This defines the maximum number of check failures before a notification is triggered to one of the defined channels. |
| `http[].alertMinimumIntervalS` | `integer` | This defines the minimum duration between alerts in terms of seconds. Assuming a value of `60`, this means failure notifications will happen only once every 60 seconds even if failures beyond the `.failureThreshold` |
| `http[].channels` | `list(string)` | This defines a list of channel names for which this HTTP check should notify upon failure/resolution. This string should be a value defined in one `channels[].name` otherwise an error will be thrown. |
| `channels[]` | `list(object)` | `channels` is a root level property defining a list of channels to which checks can send notifications via. |
| `channels[].name` | `string` | This defines the name of the channel and must be unique across all channels. |
| `channels[].type` | `string` | This defines the type of channel which affects how the `apiKey` and `chatId` properties are consumed. Currently only `"telegram"` and `"slack"` are supported. See notes at the bottom of this section for instructions on what fields to define. |
| `channels[].apiKey` | `ChannelValue` | This defines the API key to use for this channel where applicable. |
| `channels[].apiKey.fromEnv` | `string` | This defines the environment variable from which to retrieve the value of the API key. |
| `channels[].apiKey.value` | `string` | This defines the literal value of the API key. Takes precedence over the `.fromEnv` property. |
| `channels[].chatId` | `ChannelValue` | This defines the chat ID to use for this channel where applicable. |
| `channels[].chatId.fromEnv` | `string` | This defines the environment variable from which to retrieve the value of the chat ID. |
| `channels[].chatId.value` | `string` | This defines the literal value of the chat ID. Takes precedence over the `.fromEnv` property. |
| `channels[].url` | `ChannelValue` | This defines the URL to use to send notifications to where applicable. |
| `channels[].url.fromEnv` | `string` | This defines the environment variable from which to retrieve the value of the URL. |
| `channels[].url.value` | `string` | This defines the literal value of the URL. Takes precedence over the `.fromEnv` property. |#### Channel type configuration
When `"telegram"` is used for `channels[].type`:
- Set the `apiKey` to the Bot Token by @BotFather
- Set the `chatId` to the ID of the chat which notifications should be sent toWhen `"slack"` is used:
- Set the `url` property to the webhook URL## Debug mode
### Telegram chat ID retrieval
To use a Telegram channel for alerting:
1. Create a new bot with [@BotFather](https://t.me/BotFather) and receive a Telegram bot token (`${BOT_TOKEN}` from here). Set this in your terminal by running `export BOT_TOKEN=${BOT_TOKEN}` or define a `.envrc` file locally. For container deployments, specify these in the deployment manifest in the recommended secure way of your platform.
2. Start `healthcheckr` in Telegram debug mode while specifying the `${BOT_TOKEN}`:
```sh
healthcheckr debug telegram --bot-token ${BOT_TOKEN};
```
3. Add the Telegram bot to a chat
4. Use `/info` to trigger a response that will indicate the chat ID (`${CHAT_ID}` from here)
5. Start `healthcheckr` specifying the Telegram chat ID and Telegram bot token as one of the channels in the configuration file
```yaml
# ... other properties ...
channels:
# ... other channels ...
- name: telegram-01
type: telegram
apiKey:
fromEnv: BOT_TOKEN
chatId: ${CHAT_ID}
# ... other properties ...
```
6. Use the channel by specifying its name in the check configuration:
```yaml
# ... other properties ...
http:
- scheme: https
hostname: google.com
# ... other properties ...
channels:
- telegram-01
# ... other properties ...
```### Utility HTTP server
A debug/utility HTTP server is included as part of the tool. To execute it, run:
```sh
healthcheckr debug http;
```A server should start on port `8080`. This server returns either a `200` or `500` status code depending on whether the failure mode is turned on. To switch the mode, do a `curl` to the `/mode` endpoint:
```sh
curl localhost:8080/mode;
```You should observe one of the following logs:
```sh
INFO[3886] fail mode toggled, / will now return 500
INFO[3931] fail mode toggled, / will now return 200
```The configuration file at [`./examples/config-local.yaml`](./examples/config-local.yaml) uses this endpoint and can be used to verify the notification workflow.
# Development
## Getting started
Run the following to do a smoke test on Google:
```sh
go run . verify http;
```To test the worker mode locally, you'll need to:
1. Create a Telegram bot (ask @BotFather)
2. Create a Slack webhook (add the app in Slack)
3. Start the utility HTTP server (instructions above)Then create a `.envrc` file and define the following:
```sh
export TELEGRAM_BOT_TOKEN=xxx
export TELEGRAM_CHAT_ID=xxx
export SLACK_WEBHOOK_URL=xxx
```And then run:
```sh
go run . start worker -c ./examples/config-local.yaml;
```You can toggle the mode of the utility HTTP server to observe the notifications workflow:
```sh
curl localhost:8080/mode;
```## Executing via `go`
To execute locally without compiling, replace all `healthcheckr` invocations with `go run .`
# License
This tool is licensed under the permissive MIT license.