Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-sre

Awesome SRE page
https://github.com/adriannovegil/awesome-sre

Last synced: 3 days ago
JSON representation

6. Incident Response and Post-Mortem
- A collection of post-mortems
- A collection of postmortem templates
- OpenDuty - Openduty is an incident escalation tool, just like Pagerduty (no longer maintaining).
- Splunk On-Call - Developers, devops and operations teams make on-call suck less while reducing mean time to acknowledge and restore outages.
- Geneos - Real-time monitoring for all your environments in one platform.
- AlertOps - Transform real-time operational intelligence into automated incident response.
- Our incident postmortem template - Hosted Graphite postmotem template.
- Postmortem exercise
- PagerDuty - Your platform for digital operations management.
- Blameless - The Blameless SRE Platform empowers engineering and DevOps teams through incidents, retrospectives, and detecting the interesting patterns. With the right data, of course.
- OnPage - Incident alert management system with a secure smartphone app, enabling response teams to get the most out of their digital technology investments.
- PagerTree - Intelligent alert routing for the modern team.
- Cabot - Get alerted when services go down or metrics go crazy.
- xMatters - Automate operations workflows, ensure applications are always working, and deliver remarkable products at scale with the xMatters service reliability platform.
- Derdack Enterprise Alert - Enterprise Alert Notification Software.
- Bigpanda - AIOps Event Correlation and Automation platform enables Tech Ops teams to keep the digital economy running.
- ngDesk - ngDesk includes support, sales, asset management, marketing and pager in an all-in-one application that is ready to go and easy to use.
- Rootly - The fastest way to declare an incident.
8. Chaos Engineering
- My Awesome Chaos Repo ;-)
11. Tools
- SLO Generator - Tool to compute and export Service Level Objectives (SLOs), Error Budgets and Burn Rates, using configurations written in YAML (or JSON) format.
- SLO Computer - SLOs, Error windows and alerts are complicated. Here's an attempt to make it easy.
- SLO Tracker - A simple but effective way to track SLO's and Error budgets. SLO-tracker can be integrated with few alerting tools via webhook integration to receive SLO voilating incidents.
- SLO exporter - Computes standardized Service Level Indicator (SLI) and Service Level Objectives (SLO) metrics based on events coming from various data sources.
- Pyrra - Making SLOs with Prometheus manageable, accessible, and easy to use for everyone.
5. Alerting
- My Awesome Observability Repo ;-)
13. References
12. Books
- Building Secure and Reliable Systems
14. License
- ![CC0

Programming Languages

Go 3 Python 2 TypeScript 1

Categories

6. Incident Response and Post-Mortem 18 13. References 7 11. Tools 5 14. License 1 5. Alerting 1 12. Books 1 8. Chaos Engineering 1

Sub Categories

Keywords

slo 4 observability 3 sli 3 metrics 3 prometheus 3 sre 3 service-level-indicator 2 post-mortem 2 service-level-objective 2 sla 2 monitoring 2 golang 2 awesome 2 awesome-list 2 chaos-engineering 1 chaos 1 site-reliability-engineering 1 site-reliability 1 postmortem-templates 1 postmortem 1 incident-response 1 incident-reports 1 incident-reporting 1 devops 1 debugging 1 time-series 1 thanos 1 kubernetes 1 docker 1 sre-workbook 1 slo-exporter 1 grafana 1 exporter 1 alerting 1 slo-tracker 1 pingdom 1 opensource 1 newrelic 1 incidents 1 error-budgets 1 datadog 1 sre-team 1