Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-sre
Awesome SRE page
https://github.com/adriannovegil/awesome-sre
Last synced: 6 days ago
JSON representation
-
13. References
- Site Reliability Engineering: How Google Runs Production Systems
- Site Reliability Engineering - Rodolpho Eckhardt
- Site Reliability Engineering: How Google Runs Production Systems
- Site Reliability Engineering at Dropbox
- Site Reliability Engineering: How Google Runs Production Systems
- Episódio 98: Rodolpho Eckhardt - YouTube, Google e SRE
-
6. Incident Response and Post-Mortem
- A collection of post-mortems
- A collection of postmortem templates
- OpenDuty - Openduty is an incident escalation tool, just like Pagerduty (no longer maintaining).
- Geneos - Real-time monitoring for all your environments in one platform.
- AlertOps - Transform real-time operational intelligence into automated incident response.
- Our incident postmortem template - Hosted Graphite postmotem template.
- Postmortem exercise
- PagerDuty - Your platform for digital operations management.
- Blameless - The Blameless SRE Platform empowers engineering and DevOps teams through incidents, retrospectives, and detecting the interesting patterns. With the right data, of course.
- OnPage - Incident alert management system with a secure smartphone app, enabling response teams to get the most out of their digital technology investments.
- PagerTree - Intelligent alert routing for the modern team.
- Cabot - Get alerted when services go down or metrics go crazy.
- xMatters - Automate operations workflows, ensure applications are always working, and deliver remarkable products at scale with the xMatters service reliability platform.
- Derdack Enterprise Alert - Enterprise Alert Notification Software.
- Bigpanda - AIOps Event Correlation and Automation platform enables Tech Ops teams to keep the digital economy running.
- ngDesk - ngDesk includes support, sales, asset management, marketing and pager in an all-in-one application that is ready to go and easy to use.
- Rootly - The fastest way to declare an incident.
-
8. Chaos Engineering
-
11. Tools
- SLO Generator - Tool to compute and export Service Level Objectives (SLOs), Error Budgets and Burn Rates, using configurations written in YAML (or JSON) format.
- SLO Computer - SLOs, Error windows and alerts are complicated. Here's an attempt to make it easy.
- SLO Tracker - A simple but effective way to track SLO's and Error budgets. SLO-tracker can be integrated with few alerting tools via webhook integration to receive SLO voilating incidents.
- SLO exporter - Computes standardized Service Level Indicator (SLI) and Service Level Objectives (SLO) metrics based on events coming from various data sources.
- Pyrra - Making SLOs with Prometheus manageable, accessible, and easy to use for everyone.
-
12. Books
-
14. License
-
5. Alerting
Programming Languages
Categories
Sub Categories
Keywords
slo
4
observability
3
sli
3
metrics
3
prometheus
3
sre
3
service-level-indicator
2
post-mortem
2
service-level-objective
2
sla
2
monitoring
2
golang
2
awesome
2
awesome-list
2
chaos-engineering
1
chaos
1
site-reliability-engineering
1
site-reliability
1
postmortem-templates
1
postmortem
1
incident-response
1
incident-reports
1
incident-reporting
1
devops
1
debugging
1
time-series
1
thanos
1
kubernetes
1
docker
1
sre-workbook
1
slo-exporter
1
grafana
1
exporter
1
alerting
1
slo-tracker
1
pingdom
1
opensource
1
newrelic
1
incidents
1
error-budgets
1
datadog
1
sre-team
1