Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/operate-first/operations

The sig-operations repository.
https://github.com/operate-first/operations

site-reliability-engineering sre

Last synced: about 1 month ago
JSON representation

The sig-operations repository.

Awesome Lists containing this project

README

        

# Site Reliability Engineering (SRE) Support

This repository contains all the SRE (Site Reliability Engineering) principles and guidelines for managing the [Operate First](https://operate-first.github.io/) services.

## What is SRE?

SRE is a software engineering approach to manage operations for systems, applications and services. We use software as a tool to manage systems, solve problems, and automate operations tasks.

## Get started

If you'd like to learn and get hands on experience with SRE practices, but aren't sure where or how to start, let us help!

1. Follow this link to find [beginner friendly issues](https://github.com/operate-first/support/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
2. Tag yourself in the issue
3. Join the [Slack](https://join.slack.com/t/operatefirst/shared_invite/zt-o2gn4wn8-O39g7sthTAuPCvaCNRnLww) and let us know that you're interested in helping by posting in the `#support` channel a short introduction of yourself and a link to the issue you'd like to complete.

To learn more, check out the [incident management procedures](https://www.operate-first.cloud/operations/sre/incident-management/incident-management-procedure.md), [GitHub receiver setup](https://www.operate-first.cloud/operations/sre/incident-management/github-receiver-setup.md), learn to [configure Prometheus alerts](https://www.operate-first.cloud/operations/sre/incident-management/configure-prometheus-alerts.md), or browse the [GitHub repo](https://github.com/operate-first/support).