SRE
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
- GitHub: https://github.com/topics/sre
- Wikipedia: https://en.wikipedia.org/wiki/Site_reliability_engineering
- Aliases: site-reliability-engineering,
- Last updated: 2026-03-24 00:29:21 UTC
- JSON Representation
https://github.com/tedilabs/terraform-aws-container
🌳 A sustainable Terraform Package which creates resources for Container Services on AWS
aws aws-ecr aws-eks devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules type-module
Last synced: 23 Feb 2026
https://github.com/getyourguide/istio-config-validator
go121 istio sre validation virtualservice
Last synced: 11 Apr 2025
https://github.com/nobl9/sloctl
A command line tool to cast SLO spells 🪄
cli go golang nobl9 reliability slo sre
Last synced: 27 Feb 2026
https://github.com/ory/jobs
Want to build the next generation identity stack? You've come to the right place!
go hiring jobs kubernetes open-source opensource ory react sre
Last synced: 17 Mar 2025
https://github.com/icco/postmortems
Postmortem metadata from danluu/post-mortems.
hacktoberfest postmortem-metadata sre
Last synced: 21 Mar 2025
https://github.com/sitectl/cuttle
Blue Box SRE Operations Platform
ansible bastion bluebox elk operations sensu sre
Last synced: 11 Apr 2025
https://github.com/ramizpolic/sre-playground
A set of Site Reliability Engineering notes & challenges
cicd cloud guide infrastructure site-reliability-engineer sre tasks
Last synced: 14 Apr 2025
https://github.com/seveas/herd
Massively parallel ssh client
cli orchestration sre ssh sysadmin system-administration
Last synced: 25 Jun 2025
https://github.com/fkie-cad/logprep
log data pre processing, generation and shipping in python
etl kafka log logdata loggenerator logshipper opensearch preprocessing python soar sre
Last synced: 02 Mar 2026
https://github.com/last9/openmetrics-registry
Do more with your metrics
exporter hcl modules open-metrics openmetrics prometheus registry sre
Last synced: 22 Feb 2025
https://github.com/enola-dev/enola
Enola 🕵🏾♀️ Holmes was an SRE.
graph graphviz mermaid modeling rdf semantic-web sre visualization
Last synced: 16 Jun 2025
https://github.com/bobek/masscan_as_a_service
masscan as a service
audit bare-metal cloud containers git-scraping masscan phabricator security security-scanner security-tools sre
Last synced: 25 Jan 2026
https://github.com/keycloak/keycloak-sre-sig
Keycloak's Site Reliability Engineers Special Interest Group (Keycloak SRE SIG): To improve the lives of people running and operating Keycloak
Last synced: 12 Apr 2025
https://github.com/lwindolf/multi-status
Aggregator PWA for status pages of online services. Know which of your 3rd party SaaS/PaaS are having issues right now.
cloud devops monitoring paas pwa saas sre
Last synced: 11 Apr 2025
https://github.com/dxcfg/dxcfg
Configuration as code for the masses
configuration deno denoland deployment-automation devops iaac jkcfg kubernetes kubernetes-deployment sre
Last synced: 24 Jan 2026
https://github.com/tedilabs/terraform-aws-network
🌳 A sustainable Terraform Package which creates VPC resources (VPC, Subnet, NACL, NAT Gateway, Route Table) on AWS
aws aws-vpc devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 15 Apr 2025
https://github.com/immobiliare/collectd-haproxy-plugin
Collectd plugin to pull metrics from HAProxy instances
collectd collectd-plugin grafana haproxy metrics monitoring sre
Last synced: 01 Apr 2025
https://github.com/stacksimplify/terraform-on-google-kubernetes-engine
GCP GKE Terraform on Google Kubernetes Engine with DevOps, SRE 40 Real-World Demos
devops gateway-api gcp gke gke-cluster gke-terraform google google-cloud google-cloud-platform google-kubernetes google-kubernetes-engine iac kubernetes-cluster kubernetes-deployment kubernetes-service kubernetes-service-account sre terraform
Last synced: 21 Apr 2025
https://github.com/dynatrace-oss/customersuccess
Open source solutions that help you level up your observability game with Dynatrace.
adoption ai automation dashboards dynatrace intelligence notebooks observability obsolescence software sre value workflows
Last synced: 07 Jan 2026
https://github.com/grafana/xk6-chaos
xk6 extension for running chaos experiments with k6 💣
chaos chaos-engineering k6-extension reliability sre testing xk6
Last synced: 01 Oct 2025
https://github.com/tedilabs/terraform-aws-load-balancer
🌳 A sustainable Terraform Package which creates resources for Load Balancers on AWS
aws aws-alb aws-clb aws-elb aws-load-balancer aws-nlb devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 14 Mar 2026
https://github.com/kjkuan/Runbook.md
Write Bash executable runbooks in Markdown.
bash devops devops-tools literate-programming markdown operations ops playbook runbook sre sre-automation task-runner
Last synced: 01 May 2025
https://github.com/operate-first/operations
The sig-operations repository.
site-reliability-engineering sre
Last synced: 16 Jan 2026
https://github.com/ghostinthewires/Team-Handbook-Template
An employee / team handbook template
devops devops-team devops-tools devopsteam handbook process sre team-devops template
Last synced: 10 May 2025
https://github.com/k8sgpt-ai/community
Community Management for K8sGPT
devops kubernetes openai sre tooling
Last synced: 15 Apr 2025
https://github.com/nathanielvarona/pritunl-client-github-action
Establish automated secure Pritunl VPN connections with Pritunl Client in GitHub Actions, supporting OpenVPN and WireGuard.
cicd devops github-actions hacktoberfest openvpn pritunl pritunl-vpn sre vpn-client vpn-server wireguard
Last synced: 10 Mar 2026
https://github.com/ocheops/paths-in-tech
For People finding it hard in tech
career-development career-guide career-path careercoach ceh computer-science devops product sre ui ui-design ux-design
Last synced: 18 Mar 2026
https://github.com/be-next/awesome-performance-engineering
A curated, opinionated collection of tools and resources dedicated to Performance Engineering, covering both Observability and Performance Testing.
awesome awesome-list devops load-testing monitoring observability performance performance-engineering performance-testing sre
Last synced: 08 Mar 2026
https://github.com/googlecloudplatform/reliable-app-platforms
A MVP of a platform for delivering reliable applications on Google Cloud
gke google-cloud kubernetes reliability slos sre terraform
Last synced: 20 Oct 2025
https://github.com/build5nines/terraform-quickstart-templates
Terraform Quickstart Templates
azure cloud devops hashicorp hcl iac microsoft microsoft-azure microsoftazure sre terraform terraform-modules terraform-templates
Last synced: 11 Apr 2025
https://github.com/microsoft/tdslib
Open implementation of the TDS protocol (version 7.4) in managed C# code.
Last synced: 17 Aug 2025
https://github.com/tedilabs/terraform-aws-security
🌳 A sustainable Terraform Package which creates Security resources on AWS
aws aws-access-analyzer aws-config devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 25 Feb 2026
https://github.com/devopsext/sre
Golang SRE framework for logs, metrics, traces and events. It supports: Jaeger, Prometheus, DataDog, Opentelemetry, NewRelic, Grafana
events logs metrics observability sre traces
Last synced: 12 Jan 2026
https://github.com/tedilabs/terraform-aws-domain
🌳 A sustainable Terraform Package which creates resources for Domain Services on AWS
aws aws-route53 devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 15 Apr 2025
https://github.com/wpjunior/multi-burn-rate-calculator
Calculator to view detection time using error budget consumption rates, based on lessons from Site Reliability Engineering Workbook
Last synced: 17 Mar 2026
https://github.com/diogopms/monit-docker
Monit is a free open source utility for managing and monitoring, processes, programs, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.
devops docker kubernetes monit monitoring sre status
Last synced: 25 Oct 2025
https://github.com/aptible/unpage
Unpage is the open source framework for building SRE agents with infrastructure context and secure access to any dev tool.
agent agentic-workflow agents ai-agent ai-sre aiops automation devops dspy incident-response incident-response-tooling mcp monitoring observability site-reliability-engineering sre sre-agent
Last synced: 08 Sep 2025
https://github.com/rootlyhq/terraform-provider-rootly
Terraform provider for Rootly - manage incident management, on-call schedules, workflows, and alerts as code
devops go golang hashicorp iac incident-management incident-response infrastructure-as-code on-call rootly site-reliability-engineering sre terraform terraform-provider
Last synced: 11 Mar 2026
https://github.com/qainsights/performance-engineers-clubhouse
Join Performance Engineers Clubhouse 🏡
clubhouse devops performance performance-monitoring performance-testing sre testing
Last synced: 08 Jan 2026
https://github.com/angelopoerio/oom-notifier
Notify about oomed processes reporting full command line
devops kubernetes linux observability rust site-reliability-engineering sre
Last synced: 17 Jan 2026
https://github.com/dpogorzelski/speedrun
Control your compute fleet at scale
automation cli cloud command-execution devops gcp go google-cloud sre sysadmin
Last synced: 12 Mar 2026
https://github.com/Ramsbaby/openclaw-self-healing
AI-powered self-healing system for OpenClaw Gateway • 4-tier autonomous recovery • macOS & Linux
ai-agent artificial-intelligence automation bash claude-ai claude-code crash-recovery devops homelab launchd macos monitoring observability openclaw reliability self-healing sre watchdog
Last synced: 19 Feb 2026
https://github.com/luan78zaoha/kaldi-timit-sre-ivector
Develop speaker recognition model based on i-vector using TIMIT database
chinese i-vector kaldi speaker-recognition speaker-verification sre
Last synced: 11 Mar 2025
https://github.com/tedilabs/terraform-aws-data
🌳 A sustainable Terraform Package which creates resources for Data Services on AWS
aws aws-athena devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 03 Oct 2025
https://github.com/runvoy/runvoy
Serverless command runner
admin-tool cli cloudcomputing containers devops fargate golang serverless sre terraform
Last synced: 08 Mar 2026
https://github.com/tedilabs/k8s-repository
♻️ Repository for Reusable Kubernetes App Manifests with Kustomize
devops gitops hacktoberfest k8s kubernetes kustomize lang-yaml sre tedilabs
Last synced: 19 Oct 2025
https://github.com/last9/last9-integrations
Sample applications of supported integrations by Last9 Products
integrations last9 reliability-engineering sre timeseries-database
Last synced: 28 Apr 2025
https://github.com/dingus-technology/dingus
Identify and squash bugs in your code with Dingus!
ai bugs deployment devops docker grafana infrastructure k8s kubernetes llm logging loki metrics monitoring openai prometheus python sre
Last synced: 24 Nov 2025
https://github.com/rsionnach/nthlayer
Generate the complete reliability stack from a service spec in 5 minutes. Dashboards, alerts, SLOs, PagerDuty - zero toil.
alerts devops grafana monitoring observability pagerduty prometheus python slo sre
Last synced: 18 Jan 2026
https://github.com/dkorunic/axfr2hosts
Fetches one or more DNS zones via AXFR and dumps in Unix hosts format for local use
bind bind9 bind9-dns dns dns-server domain linux networking security sre sysops unix zone
Last synced: 12 Apr 2025
https://github.com/tedilabs/terraform-aws-db
🌳 A sustainable Terraform Package which creates resources for Databases on AWS
aws aws-db aws-elasticache aws-rds devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 15 Apr 2025
https://github.com/bjarneo/gecho
Gecho - a HTTP request echo debugging service
debugging devops echo golang http http-server request sre
Last synced: 25 Apr 2025
https://github.com/todd-dsm/mac-ops
QnD Automation to build a MacBook Pro for DevOps
customizable devops devops-tools macbook-configuration macbook-setup macos sre
Last synced: 13 Apr 2025
https://github.com/tedilabs/terraform-aws-observability
🌳 A sustainable Terraform Package which creates resources for Observability Services on AWS
aws aws-cloudawtch-logs aws-cloudwatch aws-logs devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules type-module
Last synced: 02 Mar 2026
https://github.com/chatwoot/faultline
An open-source AI agent for infrastructure debugging.
Last synced: 24 Feb 2026
https://github.com/gopatchy/bkl
Layered Configuration Language
configuration deployment devops json k8s kubernetes sre toml yaml
Last synced: 17 Jan 2026
https://github.com/avivl/cloud-sre-agent
An autonomous SRE agent that monitors cloud logs across multiple platforms, leveraging AI models from various providers to detect anomalies, perform root cause analysis, and automate remediation by creating GitHub Pull Requests.
ai-agents ai-ops automation aws cloud devops gcp gemini-ai google-cloud incident-response llm log-analysis log-monitoring platform-engineering python resilience sre vertex-ai
Last synced: 09 Mar 2026
https://github.com/xe-nvdk/terraform-recipes
This is the repo where I save #Terraform recipes, mostly posted in cduser.com
devops iaac infrastructure-as-code sre terraform
Last synced: 11 Apr 2025
https://github.com/woodprogrammer/postgresql-connection-manager
This is project to manage postgresql connections via cgroup V2
cgroups devops pg postgresql sre
Last synced: 28 Apr 2025
https://github.com/christiangalsterer/mongodb-driver-prometheus-exporter
A prometheus exporter exposing metrics for the official MongoDB Node.js driver.
grafana grafana-dashboard metrics mongodb monitoring node-js nodejs prometheus prometheus-exporter sre typescript
Last synced: 15 Mar 2026
https://github.com/input-output-hk/devshell-capsules
Space Capsules for the Modern DevShell
Last synced: 13 Oct 2025
https://github.com/fluxninja/aperture-go
SDK to interact with Aperture Agent
concurrency-limiter flow-control rate-limiter sdk sre
Last synced: 14 Oct 2025
https://github.com/tedilabs/terraform-aws-secret
🌳 A sustainable Terraform Package which creates Secret resources on AWS
aws aws-kms aws-parameter-store aws-secrets-manager aws-ssm-parameter-store devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 03 Aug 2025
https://github.com/apiaryio/ivy
A Node.js queue library focused on easy, yet flexible task execution.
Last synced: 30 Jul 2025
https://github.com/last9/last9-cdk
Last9 CDK
observability prometheus prometheus-metrics python sre
Last synced: 28 Apr 2025
https://github.com/tedilabs/terraform-aws-firewall
🌳 A sustainable Terraform Package which creates resources for Firewall Services on AWS
aws aws-firewall aws-waf devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 21 Jan 2026
https://github.com/madetech/productionisation
The Made Tech Productionisation Checklist for Software Projects
Last synced: 12 Apr 2025
https://github.com/tedilabs/terraform-aws-vpc-connectivity
🌳 A sustainable Terraform Package which creates VPC Connectivity resources (Private Link, Client VPN, Site-to-Site VPN, DX, VPC Lattice) on AWS
aws aws-client-vpn aws-direct-connect aws-dx aws-site-to-site-vpn aws-vpc aws-vpc-lattice aws-vpc-private-link aws-vpn devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 24 Oct 2025
https://github.com/antolius/deployments-and-disasters
A tabletop RPG for practicing incident management.
Last synced: 05 May 2025
https://github.com/tedilabs/.github
📣 Default community health files for @tedilabs organization on GitHub
devops github hacktoberfest sre tedilabs
Last synced: 15 Apr 2025
https://github.com/tedilabs/github
♥️ The best way to manage GitHub organization in @tedilabs
devops github github-organization github-repository github-team hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-github terraform-module terraform-modules
Last synced: 15 Apr 2025
https://github.com/apiaryio/docker-base-images
Base docker images for Apiary applications
Last synced: 26 Jun 2025
https://github.com/tedilabs/terraform-aws-misc
🌳 A sustainable Terraform Package which creates MISC resources on AWS
aws devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 28 Oct 2025
https://github.com/misterzurg/tbank-sre
🏦 TBank backend academy SRE course.
helm kubernetes kubernetes-operator minikube oncall oncall-prober oncall-sla sre t-bank
Last synced: 14 Oct 2025
https://github.com/tedilabs/terraform-aws-cloudfront
🌳 A sustainable Terraform Package which creates CloudFront resources on AWS
aws aws-cloudfront aws-network devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 15 Apr 2025
https://github.com/hom3chuk/psqlrc-helpers
A pack of psql helper commands to maintain a PostgreSQL
cheatsheet dba devops devops-tools maintenance performance performance-analysis postgres postgresql psql psql-client psqlrc sre
Last synced: 27 Oct 2025
https://github.com/linhng98/mess-around
playground to demonstrate many awesome devops tools, enforce gitops pattern, build scalable and sustainable application cluster
devops homelab kubernetes mess-around sre
Last synced: 17 Jan 2026
https://github.com/diptochakrabarty/learn_devops_with_projects
Learn Devops by practical projects . Includes all tech stacks including k8s, ansible , docker , python and more
ansible devops golang hacktoberfest kubernetes python sre
Last synced: 13 Jun 2025
https://github.com/tedilabs/terraform-aws-ml
🌳 A sustainable Terraform Package which creates Machine Learning resources on AWS
aws devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 16 Feb 2026
https://github.com/shantoroy/site-reliability-engineering-101
This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.
100daysofcode alerting automation chaos-engineering devops devsecops monitoring reliability-engineering service-level-agreement service-level-indicator service-level-objective site-reliability-engineering sre
Last synced: 28 Dec 2025
https://github.com/powerhome/keess
Keep secrets and configmaps syncronized across clusters and namespaces
Last synced: 04 Mar 2026
https://github.com/ajinkyakadam/systemhealthai
An AI SRE for triaging system health
agents ai aiops devops devops-tools llm llmops mlops observability sre
Last synced: 03 Nov 2025
https://github.com/hamidgholami/k8s-lab
Kubernetes Labratory
cncf devops devops-tools k3s k3s-architecture k3s-cluster k3s-minicluster k8s k8s-cluster k8s-learn kubernetes kubernetes-cluster kubernetes-labs kubernetes-learning sre
Last synced: 01 May 2025
https://github.com/lawouach/ebpf-2021-talk
Code for my talk at ebpf 2021 conference
devops ebpf reliability reliably sre
Last synced: 12 Apr 2025
https://github.com/certwatch-app/cw-agent
SSL/TLS certificate monitoring agent for Kubernetes and on-prem infrastructure. Scan certificates, detect expiration, validate chains, and sync to CertWatch cloud.
certificate cli cloud-native devops golang kubernetes monitoring security sre ssl tls
Last synced: 13 Jan 2026
https://github.com/christiangalsterer/node-postgres-prometheus-exporter
A prometheus exporter for node-postgres
grafana grafana-dashboards metrics monitoring node-js node-postgres nodejs pg postgres postgresql prometheus prometheus-exporter sre
Last synced: 10 Apr 2025
https://github.com/guilt/chaossquirrel
Like Netflix's Chaos Monkey, packaged to run standalone.
chaos-monkey reliability-engineering sre
Last synced: 12 Aug 2025
https://github.com/christiangalsterer/kafkajs-prometheus-exporter
A prometheus exporter exposing metrics for KafkaJS
grafana grafana-dashboard kafka kafkajs metrics monitoring node-js nodejs prometheus prometheus-exporter sre
Last synced: 28 Jul 2025
https://github.com/skyzyx/engineering-for-site-reliability
Overall map of topics to cover for my “Engineering for Site Reliability” blog series.
ci-cd cicd devops docker security site-reliability site-reliability-engineering sre terraform
Last synced: 25 Mar 2025