https://github.com/airbnb/interferon
Signaling you about infrastructure or application issues
https://github.com/airbnb/interferon
Last synced: about 1 month ago
JSON representation
Signaling you about infrastructure or application issues
- Host: GitHub
- URL: https://github.com/airbnb/interferon
- Owner: airbnb
- License: mit
- Created: 2015-03-16T05:13:40.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2020-03-20T18:21:54.000Z (over 5 years ago)
- Last Synced: 2025-08-13T14:40:51.107Z (about 2 months ago)
- Language: Ruby
- Size: 258 KB
- Stars: 238
- Watchers: 204
- Forks: 36
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Interferon #
[](https://travis-ci.org/airbnb/interferon)
This repo contains the interferon gem.
This gem enables you to store your alerts configuration in code.
You should create your own repository, with a `Gemfile` which imports the interferon gem.
For an example of such a repository, along with example configuration and alerts files, see https://www.github.com/airbnb/alerts## Running This Gem ##
This gem provides a single executable, called `interferon`.
You are meant to invoke it like so:```bash
$ bundle exec interferon --config /path/to/config_file
```Additional options:
* `-h`, `--help` -- prints out usage information
* `-n`, `--dry-run` -- runs interferon without making any changes to alerting destinations## Configuration File ##
The configuration file is written in YAML.
It accepts the following parameters:
* `verbose_logging` -- whether to print more output
* `alerts_repo_path` -- the location to your alerts repo, containing your interferon DSL files
* `group_sources` -- a list of sources which can return groups of people to alert
* `host_sources` -- a list of sources which can read inventory systems and return lists of hosts to monitor
* `destinations` -- a list of alerting providers, which can monitor metrics and dispatch alerts as specified in your alerts dsl files
* `processes` -- number of processes to run the alert generation on (optional; default is to use all available cores)For more information, see [config.example.yaml](config.example.yaml) file in this repo.
## The Moving Parts ##
This repo knows about four kinds of objects:
* *host_sources*: these query various inventory systems and return lists of hosts or entities to alert on
* *destinations*: these are metric systems, which can watch metrics and alert engineers
* *groups*: these are groups of actual engineers who can be alerted in case of trouble
* *alerts*: these are ruby DSL files which specify when and how engineers and groups are alerted via the destination about hosts### Host Sources ###
* optica: can read a list of AWS instances from [optica](https://www.github.com/airbnb/optica)
* optica_services: returns smartstack service information parsed from optica
* aws_rds: lists RDS instances
* aws_dynamo: lists dynamo-db tables
* aws_elasticache: lists elasticache nodes and clusters### Destinations ###
#### Datadog ####
Datadog is our only alerting destination at the moment.
Datadog's alerting syntax rule are here: [http://docs.datadoghq.com/api/#alerts](http://docs.datadoghq.com/api/#alerts)
Here's a chart explaining the datadog metric syntax ([generated via asciiflow](http://www.asciiflow.com/#669823367132047287/1039453499)):```
+---------+ alert condition +-------------------------------------------------+
| |
| +-----+ metric to alert on |
| | |
| | tags to slice the metric by +------+ |
| | | |
v v v v
|----------| |-------------------------||--------------------------| |---|
max(last_5m):avg:haproxy_count_by_status{role:<%= role %>,status:up} by {host} > 0
^ ^ ^ ^
| | | |
| | +----+------------------------------+ |
| | | math on the metric over all tags | |
| | |-----------------------------------| +------------------------------------+
| | | * max, min, avg, sum | |trigger a separate alert for each |
| + +-----------------------------------+ |different value of these tags the |
| +----+----------------------------------------------+ |entire `by {}` clause can be omitted|
| | the interval to look at; always starts with last_ | +------------------------------------+
| |---------------------------------------------------|
| | * 5m, 10m, 15m, 30m |
| | * 1h, 2h, 4h |
+ +---------------------------------------------------+
+-------------------------------------------------------------------------------------------------+
| metric condition, can be one of: |
|-------------------------------------------------------------------------------------------------|
| * max: the metric gets this high at least once during the interval |
| * avg: the metric is this on average during the interval |
| * min: the metric is this small at least once during the interval |
| * change: the metric changes this much between a value N minutes ago and now (raw difference). |
| * pct_change: the metric changes this much between a value N minutes ago and now (percentage). |
+-------------------------------------------------------------------------------------------------+
```### Groups ###
Groups actually come from *group_sources*.
We only have a single group source right now, which reads groups in YAML files from the filesystem.
However, we would like to add additional group sources, such as LDAP-based ones.