https://github.com/grosser/kennel
Datadog monitors/dashboards/slos as code, avoid chaotic management via UI
https://github.com/grosser/kennel
Last synced: 7 months ago
JSON representation
Datadog monitors/dashboards/slos as code, avoid chaotic management via UI
- Host: GitHub
- URL: https://github.com/grosser/kennel
- Owner: grosser
- Created: 2017-12-09T00:55:22.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2024-09-11T16:44:04.000Z (about 1 year ago)
- Last Synced: 2024-10-30T03:42:52.570Z (about 1 year ago)
- Language: Ruby
- Homepage:
- Size: 2.02 MB
- Stars: 131
- Watchers: 8
- Forks: 41
- Open Issues: 9
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
- awesome-platform-engineering - kennel - Datadog monitors/dashboards/slos as code, avoid chaotic management via UI (Dashboards as code / Shell into containers)
README

Manage Datadog Monitors / Dashboards / Slos as code
- DRY, searchable, audited, documented
- Changes are PR reviewed and applied on merge
- Updating shows diff before applying
- Automated import of existing resources
- Resources are grouped into projects that belong to teams and inherit tags
- No copy-pasting of ids to create new resources
- Automated cleanup when removing code
- [Helpers](#helpers) for automating common tasks
### Applying changes

### Example code
```Ruby
# teams/foo.rb
module Teams
class Foo < Kennel::Models::Team
defaults(mention: -> { "@slack-my-team" })
end
end
# projects/bar.rb
class Bar < Kennel::Models::Project
defaults(
team: -> { Teams::Foo.new }, # use mention and tags from the team
tags: -> { super() + ["project:bar"] }, # unique tag for all project components
parts: -> {
[
Kennel::Models::Monitor.new(
self, # the current project
type: -> { "query alert" },
kennel_id: -> { "load-too-high" }, # pick a unique name
name: -> { "Foobar Load too high" }, # nice descriptive name that will show up in alerts and emails
message: -> {
<<~TEXT
This is bad!
#{super()} # inserts mention from team
TEXT
},
query: -> { "avg(last_5m):avg:system.load.5{hostgroup:api} by {pod} > #{critical}" },
critical: -> { 20 }
)
]
}
)
end
```
## Installation
- create a new private `kennel` repo for your organization (do not fork this repo)
- use the template folder as starting point:
```Bash
git clone git@github.com:your-org/kennel.git
git clone git@github.com:grosser/kennel.git seed
mv seed/template/* kennel/
cd kennel && git add . && git commit -m 'initial'
```
- add a basic projects and teams so others can copy-paste to get started
- setup CI build for your repo (travis and Github Actions supported)
- uncomment `.travis.yml` section for datadog updates on merge (TODO: example setup for Github Actions)
- follow `Setup` in your repos Readme.md
## Structure
- `projects/` monitors/dashboards/etc scoped by project
- `teams/` team definitions
- `parts/` monitors/dashboards/etc that are used by multiple projects
- `generated/` projects as json, to show current state and proposed changes in PRs
## About the models
Kennel provides several classes which act as models for different purposes:
* `Kennel::Models::Dashboard`, `Kennel::Models::Monitor`, `Kennel::Models::Slo`, `Kennel::Models::SyntheticTest`;
these models represent the various Datadog objects
* `Kennel::Models::Project`; a container for a collection of Datadog objects
* `Kennel::Models::Team`; provides defaults and values (e.g. tags, mentions) for the other models.
After loading all the `*.rb` files under `projects/`, Kennel's starting point
is to find all the subclasses of `Kennel::Models::Project`, and for each one,
create an instance of that subclass (via `.new`) and then call `#parts` on that
instance. `parts` should return a collection of the Datadog-objects (Dashboard / Monitor / etc).
### Model Settings
Each of the models defines various settings; for example, a Monitor has `name`, `message`,
`type`, `query`, `tags`, and many more.
When defining a subclass of a model, one can use `defaults` to provide default values for
those settings:
```Ruby
class MyMonitor < Kennel::Models::Monitor
defaults(
name: "Error rate",
type: "query alert",
critical: 5.0,
query: -> {
"some datadog metric expression > #{critical}"
},
# ...
)
end
```
This is equivalent to defining instance methods of those names, which return those values:
```Ruby
class MyMonitor < Kennel::Models::Monitor
def name
"Error rate"
end
def type
"query alert"
end
def critical
5.0
end
def query
"some datadog metric expression > #{critical}"
end
end
```
except that `defaults` will complain if you try to use a setting name which doesn't
exist. Note also that you can use either plain values (`critical: 5.0`), or procs
(`query: -> { ... }`). Using a plain value is equivalent to using a proc which returns
that same value; use whichever suits you best.
When you _instantiate_ a model class, you can pass settings in the constructor, after
the project:
```Ruby
project = Kennel::Models::Project.new
my_monitor = MyMonitor.new(
project,
critical: 10.0,
message: -> {
<<~MESSAGE
Something bad is happening and you should be worried.
#{super()}
MESSAGE
},
)
```
This works just like `defaults` (it checks the setting names, and it accepts
either plain values or procs), but it applies just to this instance of the class,
rather than to the class as a whole (i.e. it defines singleton methods, rather
than instance methods).
Most of the examples in this Readme use the proc syntax (`critical: -> { 5.0 }`) but
for simple constants you may prefer to use the plain syntax (`critical: 5.0`).
## Workflows
### Adding a team
- `mention` is used for all team monitors via `super()`
- `renotify_interval` is used for all team monitors (defaults to `0` / off)
- `tags` is used for all team monitors/dashboards (defaults to `team:`)
```Ruby
# teams/my_team.rb
module Teams
class MyTeam < Kennel::Models::Team
defaults(
mention: -> { "@slack-my-team" }
)
end
end
```
### Adding a new monitor
- use [datadog monitor UI](https://app.datadoghq.com/monitors#create) to create a monitor
- see below
### Updating an existing monitor
- use [datadog monitor UI](https://app.datadoghq.com/monitors/manage) to find a monitor
- run `URL='https://app.datadoghq.com/monitors/123' bundle exec rake kennel:import` and copy the output
- find or create a project in `projects/`
- add the monitor to `parts: [` list, for example:
```Ruby
# projects/my_project.rb
class MyProject < Kennel::Models::Project
defaults(
team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
parts: -> {
[
Kennel::Models::Monitor.new(
self,
id: -> { 123456 }, # id from datadog url, not necessary when creating a new monitor
type: -> { "query alert" },
kennel_id: -> { "load-too-high" }, # make up a unique name
name: -> { "Foobar Load too high" }, # nice descriptive name that will show up in alerts and emails
message: -> {
# Explain what behavior to expect and how to fix the cause
# Use #{super()} to add team notifications.
<<~TEXT
Foobar will be slow and that could cause Barfoo to go down.
Add capacity or debug why it is suddenly slow.
#{super()}
TEXT
},
query: -> { "avg(last_5m):avg:system.load.5{hostgroup:api} by {pod} > #{critical}" }, # replace actual value with #{critical} to keep them in sync
critical: -> { 20 }
)
]
}
)
end
```
- run `PROJECT=my_project bundle exec rake plan`, an Update to the existing monitor should be shown (not Create / Delete)
- alternatively: `bundle exec rake generate` to only locally update the generated `json` files
- review changes then `git commit`
- make a PR ... get reviewed ... merge
- datadog is updated by CI
### Deleting
Remove the code that created the resource. The next update will delete it (see above for PR workflow).
### Adding a new dashboard
- go to [datadog dashboard UI](https://app.datadoghq.com/dashboard/lists) and click on _New Dashboard_ to create a dashboard
- see below
### Updating an existing dashboard
- go to [datadog dashboard UI](https://app.datadoghq.com/dashboard/lists) and click on _New Dashboard_ to find a dashboard
- run `URL='https://app.datadoghq.com/dashboard/bet-foo-bar' bundle exec rake kennel:import` and copy the output
- find or create a project in `projects/`
- add a dashboard to `parts: [` list, for example:
```Ruby
class MyProject < Kennel::Models::Project
defaults(
team: -> { Teams::MyTeam.new }, # use existing team or create new one in teams/
parts: -> {
[
Kennel::Models::Dashboard.new(
self,
id: -> { "abc-def-ghi" }, # id from datadog url, not needed when creating a new dashboard
title: -> { "My Dashboard" },
description: -> { "Overview of foobar" },
template_variables: -> { ["environment"] }, # see https://docs.datadoghq.com/api/?lang=ruby#timeboards
kennel_id: -> { "overview-dashboard" }, # make up a unique name
layout_type: -> { "ordered" },
definitions: -> {
[ # An array or arrays, each one is a graph in the dashboard, alternatively a hash for finer control
[
# title, viz, type, query, edit an existing graph and see the json definition
"Graph name", "timeseries", "area", "sum:mystats.foobar{$environment}"
],
[
# queries can be an Array as well, this will generate multiple requests
# for a single graph
"Graph name", "timeseries", "area", ["sum:mystats.foobar{$environment}", "sum:mystats.success{$environment}"],
# add events too ...
events: [{q: "tags:foobar,deploy", tags_execution: "and"}]
]
]
}
)
]
}
)
end
```
### Updating existing resources with id
Setting `id` makes kennel take over a manually created datadog resource.
When manually creating to import, it is best to remove the `id` and delete the manually created resource.
When an `id` is set and the original resource is deleted, kennel will fail to update,
removing the `id` will cause kennel to create a new resource in datadog.
### Organizing projects with many resources
When project files get too long, this structure can keep things bite-sized.
```Ruby
# projects/project_a/base.rb
module ProjectA
class Base < Kennel::Models::Project
defaults(
kennel_id: -> { "project_a" },
parts: -> {
[
Monitors::FooAlert.new(self),
...
]
}
...
# projects/project_a/monitors/foo_alert.rb
module ProjectA
module Monitors
class FooAlert < Kennel::Models::Monitor
...
```
### Updating a single project or resource
- Use `PROJECT=` for single project:
Use the projects `kennel_id` (and if none is set then snake_case of the class name including modules)
to refer to the project. For example for `class ProjectA` use `PROJECT=project_a` but for `Foo::ProjectA` use `foo_project_a`.
- Use `TRACKING_ID=:` for single resource:
Use the project kennel_id and the resources kennel_id, for example `class ProjectA` and `FooAlert` would give `project_a:foo_alert`.
### Skipping validations
Some validations might be too strict for your usecase or just wrong, please [open an issue](https://github.com/grosser/kennel/issues) and
to unblock use the `validate: -> { false }` option.
### Linking resources with kennel_id
Link resources with their kennel_id in the format `project kennel_id` + `:` + `resource kennel_id`,
this should be used to create dependent resources like monitor + slos,
so they can be created in a single update and can be re-created if any of them is deleted.
|Resource|Type|Syntax|
|---|---|---|
|Dashboard|uptime|`monitor: {id: "foo:bar"}`|
|Dashboard|alert_graph|`alert_id: "foo:bar"`|
|Dashboard|slo|`slo_id: "foo:bar"`|
|Dashboard|timeseries|`queries: [{ data_source: "slo", slo_id: "foo:bar" }]`|
|Monitor|composite|`query: -> { "%{foo:bar} && %{foo:baz}" }`|
|Monitor|slo alert|`query: -> { "error_budget(\"%{foo:bar}\").over(\"7d\") > 123.0" }`|
|Slo|monitor|`monitor_ids: -> ["foo:bar"]`|
### Debugging changes locally
- rebase on updated `master` to not undo other changes
- figure out project name by converting the class name to snake_case
- run `PROJECT=foo bundle exec rake kennel:update_datadog` to test changes for a single project (monitors: remove mentions while debugging to avoid alert spam)
- use `PROJECT=foo,bar,...` for multiple projects
### Reuse
Add to `parts/`.
```Ruby
module Monitors
class LoadTooHigh < Kennel::Models::Monitor
defaults(
name: -> { "#{project.name} load too high" },
message: -> { "Shut it down!" },
type: -> { "query alert" },
query: -> { "avg(last_5m):avg:system.load.5{hostgroup:#{project.kennel_id}} by {pod} > #{critical}" }
)
end
end
```
Reuse it in multiple projects.
```Ruby
class Database < Kennel::Models::Project
defaults(
team: -> { Kennel::Models::Team.new(mention: -> { '@slack-foo' }, kennel_id: -> { 'foo' }) },
parts: -> { [Monitors::LoadTooHigh.new(self, critical: -> { 13 })] }
)
end
```
## Helpers
### Listing un-muted alerts
Run `rake kennel:alerts TAG=service:my-service` to see all un-muted alerts for a given datadog monitor tag.
### Validating mentions work
`rake kennel:validate_mentions` should run as part of CI
### Grepping through all of datadog
```Bash
rake kennel:dump > tmp/dump
cat tmp/dump | grep foo
```
focus on a single type: `TYPE=monitors`
Show full resources or just their urls by pattern:
```Bash
rake kennel:dump_grep DUMP=tmp/dump PATTERN=foo URLS=true
https://foo.datadog.com/dasboard/123
https://foo.datadog.com/monitor/123
```
### Find all monitors with No-Data
`rake kennel:nodata TAG=team:foo`
### Finding the tracking id of a resource
When trying to link resources together, this avoids having to go through datadog UI.
```Bash
rake kennel:tracking_id ID=123 RESOURCE=monitor
```
## Development
### Benchmarking
- Setting `FORCE_GET_CACHE=true` will cache all get requests, which makes benchmarking improvements more reliable.
- Setting `STORE=false` will make `rake plan` not update the files on disk and save a bit of time
### Integration testing
```Bash
rake play
cd template
rake plan
```
Then make changes to play around, do not commit changes and make sure to revert with a `rake kennel:update_datadog` after deleting everything.
To make changes via the UI, make a new free datadog account and use it's credentials instead.
Author
======
[Michael Grosser](http://grosser.it)
michael@grosser.it
License: MIT
