{"id":49712236,"url":"https://github.com/operate-first/SRE","last_synced_at":"2026-05-25T05:01:30.578Z","repository":{"id":37944506,"uuid":"458212885","full_name":"operate-first/sre","owner":"operate-first","description":"SRE content","archived":false,"fork":false,"pushed_at":"2023-11-29T09:13:21.000Z","size":7761,"stargazers_count":56,"open_issues_count":2,"forks_count":39,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-02-06T12:18:21.710Z","etag":null,"topics":["devops","hacktoberfest"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/operate-first.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-11T14:11:22.000Z","updated_at":"2026-01-24T17:08:52.000Z","dependencies_parsed_at":"2023-01-27T17:16:19.035Z","dependency_job_id":"a9a0fbae-6d9a-43ac-bc78-c9b03c98a273","html_url":"https://github.com/operate-first/sre","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/operate-first/sre","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/operate-first%2Fsre","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/operate-first%2Fsre/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/operate-first%2Fsre/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/operate-first%2Fsre/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/operate-first","download_url":"https://codeload.github.com/operate-first/sre/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/operate-first%2Fsre/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33461090,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-25T02:24:28.008Z","status":"ssl_error","status_checked_at":"2026-05-25T02:23:23.339Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["devops","hacktoberfest"],"created_at":"2026-05-08T16:00:25.196Z","updated_at":"2026-05-25T05:01:30.563Z","avatar_url":"https://github.com/operate-first.png","language":null,"funding_links":[],"categories":["15. Examples and Sandboxes"],"sub_categories":[],"readme":"# SIG-SRE — the Operate First Site Reliability Engineering Special Interest Group\n\n## Introduction\n\nWhen starting the Operate First project, we looked around and said, \"There is no Open group of SRE practitioners.\nNowhere to discuss and document all the Why, What, and How of our discipline.\nWe should fix that.\"\n\nSo we are fixing that, and you are invited to participate in creating SIG-SRE, a [community of practice](https://en.wikipedia.org/wiki/Community_of_practice) around Site Reliability Engineering (SRE).\n\nThis is SIG SRE, the Special Interest Group (SIG) for Site Reliability Engineering (SRE) in the Operate First project.\n\nWe focus on a core part of the Operate First goal to improve managed services by fully Opening all aspects of the cloud environment.\nIn particular, we focus on observability, managing fleets, incident response, operability, and similar core disciplines.\n\nThis new _Open Cloud_ environment is going to need Open SRE practices to go with it, and this is where SIG SRE comes in.\n\nRead the [SIG charter](charter.md) for more details about our purpose and approach.\n\nIf you are interested in keeping track of our progress, you can [subscribe to our announcements group](https://lists.operate-first.cloud/archives/list/sig-sre-announce@lists.operate-first.cloud/).\n\nTo participate in bootstrapping this community of practice, [join our discussion group](https://lists.operate-first.cloud/archives/list/sig-sre@lists.operate-first.cloud/).\n\n## What kind of SRE practices belong here?\n\nIt's easy to say, \"All of the good ones,\" :) and while that is true, we are realistic about where we are today.\n\nAs a starting point, we are focusing on practices that relate to a Kubernetes-based environment, and specifically an OpenShift environment.\n\nWe believe reliability engineering is agnostic of specific technologies.\nWe welcome SRE practitioners of all kinds who work with all types of technologies.\n\nBUT there are a few points to understand:\n\n1. Artifacts created in this community are [Open Works](https://fossrit.github.io/open-work-definition/) -- licensed with an Open Source license, etc.\n2. We default to Open, including for the kind of software and cloud environments we focus on.\n\nStrictly speaking, we will welcome contributions that guide SRE practices for non-Open Source-based cloud environments.\nBut where it comes to what is a central focus of this community, it will always be around Open Source-based environments.\n\n##  Table of Contents\n\n### Getting Started\n\n* [The Reliability Nightmares Colouring Book](https://github.com/operate-first/sre/raw/main/sre-coloring-book/red-hat-sre-coloring-book.pdf)\n\n    A readable, accessible guide to the various ways in which SRE principles and managed services can solve many of the problems associated with running complex IT services (and restaurants).\n\n* [The SLO Bootstrap Guide](./slo_bootstrap_guide.md)\n\n    A self-contained bootstrapping guide for teams looking to use the SRE approach to supporting services.\n\n* [Incident Management Process Guide](./process/incident_management.md)\n\n    A useful starting point for developing your own incident management processes and procedures.\n\n### Metrics and Monitoring\n\n* [Picking good Service Level Indicators (SLIs)](./picking_good_slis.md)\n\n    Service Level Indicators are one of the most important sets of metrics in SRE, and defining them correctly is key to running an effective SRE service.\n\n* [Picking good Service Level Objectives (SLOs)](./picking_good_slos.md)\n\n    Where SLIs are used to answer the question \"Is everything working as it should?\", Service Level Objectives (SLOs) define the expectations for how much and how often a service should perform correctly according to its SLIs.\n    They are probably the single most important set of statistics used for evaluating the performance of a managed service, as they show maintainers and customers alike just how well things are working.\n\n* [Prometheus Alerting Consistency](./prometheus_alerting_consistency.md)\n\n    Alerts are only useful if they're clear, understandable and actionable. This guide presents some suggested best practices for designing alerts that are consistent and useful.\n\n### Decision-Making\n\n* Some sample SIG-SRE [Architecture Decision Records](https://github.com/operate-first/sre/tree/main/ADRs/RH/SIG-SRE)\n\n    When major decisions are made on design or operational aspects of system, it can be very worthwhile to keep a clear record of those\n    decisions for future reference.\n    An Architecture Decision Record (ADR) acts as a document of the discussion and reasoning behind the decision it relates to, and in years to come can be extremely valuable for answering the eternal question - \"Why do we do things this way?\".\n    This directory contains some sample ADRs from the SIG-SRE group.\n\n### Other Documents\n\n* A work in progress - the [SRE Maturity Model](./sre_maturity.md)\n\n    Moving to an SRE model generally happens in a series of steps rather than as a single big change.\n    The SRE Maturity Model is intended to describe a set of milestones along the route from \"no SRE at all\" to \"a fully functioning SRE organisation\".\n\n## Contributing organizations\n\n### Red Hat\nThis repository is used by the SRE teams at Red Hat for collecting documentation on the nitty-gritty aspects of building and operating an effective SRE organisation:\n\nHello! We're a cross-team group of people inside Red Hat with an interest in promoting and developing strong SRE culture.\nAs a result you can expect some of the material we bring here to be a little slanted toward the way we do things inside Red Hat.\nBut we're doing our best to stick to what are generally accepted to be good SRE practices, so you should be able to make use of it with — at most — minor modifications.\n\n### Operate First contributing orgs\n\nThese organizations have a contributing stake in the Operate First project, which uses this repository as a general practice upstream.\n- [MOC Alliance](https://massopen.cloud)\n- [Red Hat Collaboratory](https://www.bu.edu/rhcollab/) at Boston University\n\n## Contributing\n\n_For complete information on contributing to this SIG, refer to the canonical [CONTRIBUTING.md](./CONTRIBUTING.md) file._\n\nFeedback is always welcome, and contributions are too!\nFeel free to file an [issue](https://github.com/operate-first/sre/issues/new) or send us a pull request (PR). If you'd like to know more about what we do or find out about ways you can get involved, then you can read about [becoming part of the Operate First community.](https://www.operate-first.cloud/our-community)\n\nIf you're submitting Markdown, please check for any linting problems.\nThe [vscode-markdownlint](https://github.com/DavidAnson/vscode-markdownlint) plugin can be used to do this in VSCode.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foperate-first%2FSRE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foperate-first%2FSRE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foperate-first%2FSRE/lists"}