An open API service indexing awesome lists of open source software.

https://github.com/slok/sloth

🦥 Easy and simple Prometheus SLO (service level objectives) generator
https://github.com/slok/sloth

alerting alerts autogenerates-prometheus-slo controller google-slo kubernetes metrics monitoring observability oncall openslo operator prometheus prometheus-operator service-level service-level-objective sla sli slo

Last synced: 10 days ago
JSON representation

🦥 Easy and simple Prometheus SLO (service level objectives) generator

Awesome Lists containing this project

README

        


sloth

# Sloth

[![CI](https://github.com/slok/sloth/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/slok/sloth/actions/workflows/ci.yaml)
[![Go Report Card](https://goreportcard.com/badge/github.com/slok/sloth)](https://goreportcard.com/report/github.com/slok/sloth)
[![Apache 2 licensed](https://img.shields.io/badge/license-Apache2-blue.svg)](https://raw.githubusercontent.com/slok/sloth/master/LICENSE)
[![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/slok/sloth)](https://github.com/slok/sloth/releases/latest)
![Kubernetes release](https://img.shields.io/badge/Kubernetes-v1.32-green?logo=Kubernetes&style=flat&color=326CE5&logoColor=white)
[![OpenSLO](https://img.shields.io/badge/OpenSLO-v1alpha-green?color=4974EA&style=flat)](https://github.com/OpenSLO/OpenSLO#slo)

## Project status

Sloth is actively maintained and in continuous development.

We’re currently working on internal improvements to make Sloth even more extensible and adaptable to real-world use cases. While Sloth already supports custom SLI plugins, upcoming changes will bring more flexibility to how SLOs are generated, making it easier to adapt Sloth to your custom needs. Stay tuned for updates!

## Introduction

Meet the easiest way to generate [SLOs][google-slo] for Prometheus.

Sloth generates understandable, uniform and reliable Prometheus SLOs for any kind of service. Using a simple SLO spec that results in multiple metrics and [multi window multi burn][mwmb] alerts.

https://sloth.dev

## Features

- Simple, maintainable and understandable SLO spec.
- Reliable SLO metrics and alerts.
- Based on [Google SLO][google-slo] implementation and [multi window multi burn][mwmb] alerts framework.
- Autogenerates Prometheus SLI recording rules in different time windows.
- Autogenerates Prometheus SLO metadata rules.
- Autogenerates Prometheus SLO [multi window multi burn][mwmb] alert rules (Page and warning).
- SLO spec validation (including `validate` command for Gitops and CI).
- Customization of labels, disabling different type of alerts...
- A single way (uniform) of creating SLOs across all different services and teams.
- Automatic [Grafana dashboard][grafana-dashboard] to see all your SLOs state.
- Single binary and easy to use CLI.
- Kubernetes ([Prometheus-operator]) support.
- Kubernetes Controller/operator mode with CRDs.
- Support different [SLI types](#sli-types-manifests).
- Support for [SLI plugins](#sli-plugins)
- A library with [common SLI plugins][common-sli-plugins].
- [OpenSLO] support.
- Safe SLO period windows for 30 and 28 days by default.
- Customizable SLO period windows for advanced use cases.

![Small Sloth SLO dashboard](docs/img/sloth_small_dashboard.png)

## Getting started

Release the Sloth!

```bash
sloth generate -i ./examples/getting-started.yml
```

```yaml
version: "prometheus/v1"
service: "myservice"
labels:
owner: "myteam"
repo: "myorg/myservice"
tier: "2"
slos:
# We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
- name: "requests-availability"
objective: 99.9
description: "Common SLO based on availability for HTTP request responses."
labels:
category: availability
sli:
events:
error_query: sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[{{.window}}]))
total_query: sum(rate(http_request_duration_seconds_count{job="myservice"}[{{.window}}]))
alerting:
name: "MyServiceHighErrorRate"
labels:
category: "availability"
annotations:
# Overwrite default Sloth SLO alert summmary on ticket and page alerts.
summary: "High error rate on 'myservice' requests responses"
page_alert:
labels:
severity: "pageteam"
routing_key: "myteam"
ticket_alert:
labels:
severity: "slack"
slack_channel: "#alerts-myteam"
```

[This](examples/_gen/getting-started.yml) would be the result you would obtain from the above [spec example](examples/getting-started.yml).

## Documentation

[Check the docs to know more about the usage, examples, and other handy features!][docs]

## SLI plugins

Looking for common SLI plugins? Check [this repository][common-sli-plugins], if you are looking for the sli plugins docs, check [this][docs-sli-plugins] instead.

## Development and Contributing

Check [CONTRIBUTING.md](CONTRIBUTING.md).

[google-slo]: https://landing.google.com/sre/workbook/chapters/alerting-on-slos/
[mwmb]: https://landing.google.com/sre/workbook/chapters/alerting-on-slos/#6-multiwindow-multi-burn-rate-alerts
[prometheus-operator]: https://github.com/prometheus-operator
[grafana-dashboard]: https://grafana.com/grafana/dashboards/14348
[openslo]: https://openslo.com/
[common-sli-plugins]: https://github.com/slok/sloth-common-sli-plugins
[docs-sli-plugins]: https://sloth.dev/usage/plugins/
[docs]: https://sloth.dev