Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ndelitski/rancher-alarms

Will kick your ass if found unhealthy service in Rancher environment
https://github.com/ndelitski/rancher-alarms

rancher rancher-alarms

Last synced: 14 days ago
JSON representation

Will kick your ass if found unhealthy service in Rancher environment

Awesome Lists containing this project

README

        

# rancher-alarms

Send notifications when something goes wrong in rancher

## Features
- Will kick your ass when service goes down and send message when on recover
- Various notification mechanisms
- email
- slack
- * please create an issue if you need more
- Configure notification mechanisms globally or on a per service level(supported in `.json` config setup for now)
- Customize your notification messages

## Quick start

### Inside Rancher environment using rancher-compose CLI
```yml
rancher-alarms:
image: ndelitski/rancher-alarms
environment:
ALARM_SLACK_WEBHOOK_URL:https://hooks.slack.com/services/:UUID
labels:
io.rancher.container.create_agent: true
io.rancher.container.agent.role: environment
```
[How to create Slack Webhook URL](https://my.slack.com/services/new/incoming-webhook/)

NOTE: Including rancher agent labels is crucial otherwise you need provide rancher credentials manually with RANCHER_* variables

### Outside Rancher environment using `docker run`
```
docker run \
-d \
-e RANCHER_ADDRESS=rancher.yourdomain.com \
-e RANCHER_ACCESS_KEY=ACCESS-KEY \
-e RANCHER_SECRET_KEY=SECRET-KEY \
-e RANCHER_PROJECT_ID=1a8 \
-e ALARM_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR_SLACK_UUID \
--name rancher-alarms \
ndelitski/rancher-alarms
```

## How it works

On startup get a list of services and instantiate healthcheck monitor for each of them if service is in a running state. Removed, purged and etc services will be ignored

List of healthcheck monitors is updated with a `pollServicesInterval` interval. When service is removed it will be no longer monitored.

When a service transitions to a degraded state, all targets will be invoked to process notification(s).

## docker-compose configuration

### Docker compose for email notification target

```yml
rancher-alarms:
image: ndelitski/rancher-alarms
environment:
RANCHER_ADDRESS:your-rancher.com
ALARM_SLACK_WEBHOOK_URL:https://hooks.slack.com/services/...
```

More docker-compose examples see in [examples](https://github.com/ndelitski/rancher-alarms/tree/master/examples)

## Configuration

### Environment variables

#### Rancher settings
Could be ignored if you are running inside Rancher environment (service should be started as a rancher agent though)
- `RANCHER_ADDRESS`
- `RANCHER_PROJECT_ID`
- `RANCHER_ACCESS_KEY`
- `RANCHER_SECRET_KEY`

#### Polling settings
- `ALARM_POLL_INTERVAL`
- `ALARM_MONITOR_INTERVAL`
- `ALARM_MONITOR_HEALTHY_THRESHOLD`
- `ALARM_MONITOR_UNHEALTHY_THRESHOLD`
- `ALARM_FILTER`

#### Email target settings
- `ALARM_EMAIL_ADDRESSES`
- `ALARM_EMAIL_USER`
- `ALARM_EMAIL_PASS`
- `ALARM_EMAIL_SSL`
- `ALARM_EMAIL_SMTP_HOST`
- `ALARM_EMAIL_SMTP_PORT`
- `ALARM_EMAIL_FROM`
- `ALARM_EMAIL_SUBJECT`
- `ALARM_EMAIL_TEMPLATE`
- `ALARM_EMAIL_TEMPLATE_FILE`

#### Slack target settings
- `ALARM_SLACK_WEBHOOK_URL`
- `ALARM_SLACK_CHANNEL`
- `ALARM_SLACK_BOTNAME`
- `ALARM_SLACK_TEMPLATE`
- `ALARM_SLACK_TEMPLATE_FILE`

See [examples](https://github.com/ndelitski/rancher-alarms/tree/master/examples) using environment config in docker-compose files

### Local json config

```json
{
"rancher": {
"address": "rancher-host:port",
"auth": {
"accessKey": "",
"secretKey": ""
},
"projectId": "1a5"
},
"pollServicesInterval": 10000,
"filter": [
"app/*"
],
"notifications": {
"*": {
"targets": {
"email": {
"recipients": [
"[email protected]"
]
}
},
"healthcheck": {
"pollInterval": 5000,
"healthyThreshold": 2,
"unhealthyThreshold": 3
},
},
"frontend": {
"targets": {
"email": {
"recipients": [
"[email protected]"
]
}
}
}
},
"targets": {
"email": {
"smtp": {
"from": " [email protected]",
"auth": {
"user": "[email protected]",
"password": "Str0ngPa$$"
},
"host": "smtp.gmail.com",
"secureConnection": true,
"port": 465
}
},
"slack": {
"webhookUrl": "https://hooks.slack.com/services/YOUR_SLACK_UUID",
"botName": "rancher-alarm",
"channel": "#devops"
}
}
}
```

#### Config file sections
- `rancher` Rancher API settings. `required`
- `pollServicesInterval` interval in ms of fetching list of services. `required`.
- `filter` whitelist filter for stack/services names in environment. List of string values. Every string is a RegExp expression so you can use something like this to match all stack services `frontend/*`. `optional`
- `notifications` per service notification settings. Wildcard means any service `required`
- `healtcheck` monitoring state options. `optional` defaults are:
```js
{
pollInterval: 5000,
healthyThreshold: 2,
unhealthyThreshold: 3
}
```
- `targets` what notification targets to use. Will override base target settings in a root `targets` section. Currently each target must be an Object value. If you have nothing to override from a base settings just place `{}` as a value. `optional`
- `targets` base settings for each notification target. `required`

## Templates
### List of template variables:
- `healthyState` HEALTHY or UNHEALTHY
- `state` service state like it named in Rancher API
- `prevMonitorState` rancher-alarms previous service state name
- `monitorState` rancher-alarms service state name - e.g. always degraded for unhealthy
- `serviceName` Name of a service in a Rancher
- `serviceUrl` Url to a running service in a Rancher UI
- `stackUrl` Url to stack in a Rancher UI
- `stackName` Name of a stack in a Rancher
- `environmentName` Name of a environment in a Rancher
- `environmentUrl` URL to environment in a rancher UI

### Using variables in template string:
```
Hey buddy! Your service #{serviceName} become #{healthyState}, direct link to the service #{serviceUrl}
```
More detailed examples your can see in the `examples` folder

## Roadmap
- [] Simplify configuration.
- [] More use of rancher labels and metadata. Alternate configuration through rancher labels/metadata(can be used in a conjunction with initial config).
- [] Run in a rancher environment as an agent with a new label `agent: true`. No need to specify keys anymore!
- [] More notifications mechanisms: AWS SNS, http, sms
- [x] Support templating
- [] Test coverage. Setup drone.io
- [x] Notify when all services operate normal after some of them were in a degraded state
- [] Refactor code
- [x] Shrinking image size with alpine linux