SRE
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
- GitHub: https://github.com/topics/sre
- Wikipedia: https://en.wikipedia.org/wiki/Site_reliability_engineering
- Aliases: site-reliability-engineering,
- Last updated: 2026-06-25 00:25:51 UTC
- JSON Representation
https://github.com/expeor/aws-automation
AWS 운영 자동화 CLI - 멀티 계정/리전 지원, Excel 보고서 생성
automation aws cli compliance cost-optimization devops inventory multi-account multi-region ops python security-audit sre
Last synced: 14 Jan 2026
https://github.com/usekarma/adage-fabric
A universal, governance-first pattern for unifying event streams with Kafka, ClickHouse, and Grafana.
adage clickhouse event-streaming fabric governance grafana kafka observability sre
Last synced: 03 May 2026
https://github.com/rvhoney/sre-learning-notes
SRE & DevOps roadmap with Obsidian-friendly notes and projects.
automation cloud devops infrastructure learning monitoring networking obsidian projects roadmap security sre study-notes version-control
Last synced: 05 Oct 2025
https://github.com/meiiie/structured-root-cause-research-skill
Portable SKILL.md for evidence-based root-cause analysis across Claude Code, Codex, and AI coding agents
5-whys agent-skills ai-agents claude-code codex debugging postmortem root-cause-analysis security-review skill-md software-engineering sre
Last synced: 09 Jun 2026
https://github.com/viniciushammett/Golang-DevOps-SRE-Aplicado
Projeto desenvolvido para praticas de DevOps/SRE através do estudo aplicado no dia a dia usando a linguagem Golang
Last synced: 15 Aug 2025
https://github.com/loneexpert/stashfin_learning
api-testing automation automation-testing java restassured sre
Last synced: 07 Oct 2025
https://github.com/moneycat-inc/otel-ops-pack
Hardened OpenTelemetry Collector ops pack for Windows: day-2 tooling, deterministic canary, safe change control, chaos drills, audit evidence.
observability opentelemetry ops signoz sre windows
Last synced: 18 May 2026
https://github.com/cloudputation/iterator
Automate infrastructure management with observability
automation devops golang infrastructure-as-code sre
Last synced: 27 Jan 2026
https://github.com/felippemozer/go-devops-sre-udemy
DevOps and SRE use case problems solving with Go programming language - Udemy course "Programação Go para DevOps e SREs"
devops golang sre udemy-course
Last synced: 14 Jan 2026
https://github.com/aronmilenait/homelab
A personal homelab for experimenting with DevOps, SRE, and Linux tools on physical hardware.
Last synced: 10 Aug 2025
https://github.com/roybidani/sre-lab-infra
🚀 Complete SRE Training Environment - Production-grade infrastructure with Kubernetes, Prometheus, Grafana, and advanced SRE practices for hands-on learning
aws chaos-engineering devops grafana kubernetes monitoring prometheus sre terraform training
Last synced: 09 Apr 2026
https://github.com/oragazz0/viy
CLI-first Kubernetes chaos engineering toolkit in Go — modular, safe, observable. "Omniscient chaos, unveiled."
chaos-engineering chaos-toolkit cli cloud-native devops fault-injection go golang k8s kubernetes observability resilience-testing sre
Last synced: 18 Apr 2026
https://github.com/ranching-farm/kubectl-addon
Kubectl addon for connecting Kubernetes clusters to ranching.farm - an AI-powered Kubernetes management platform. Simplify cluster operations and get intelligent assistance for common tasks.
ai-assistant ai-assisted cluster-management devops helm k8s krew krew-plugin kubectl kubectl-commands kubectl-plugin kubectl-plugins kubernetes kustomize ranching-farm sre
Last synced: 19 Jan 2026
https://github.com/jemo19/runbook-copilot
RAG-based incident assistant for runbook-backed troubleshooting plans with citations and safety controls.
ai fastapi incident-response python rag runbooks sre
Last synced: 16 Jun 2026
https://github.com/aswinbennyofficial/sre-exercises
Building a resilient backend project from scratch with Dockerization and CI/CD, incorporating Site Reliability Engineering (SRE) principles. Demonstrating best practices in modular architecture, logging, database migration, and CI/CD pipelines for automated testing, deployment.
api backend docker go go-chi golang pgx postgres rest-api sre student-management-system vyper-config yaml-configuration zerolog
Last synced: 04 May 2026
https://github.com/leehmdev/gke-gitops-observability-lab
End-to-end GKE GitOps & Observability lab using Terraform, Helm, Argo CD, Prometheus, and Grafana
argocd devops gitops gke grafana helm kubernetes prometheus sre terraform
Last synced: 05 May 2026
https://github.com/rafaellimatecnologia-cloud/local-first-ai-service
Local-first AI service with deterministic routing, deadline enforcement, and graceful degradation
deterministic edge-ai latency local-first observability python reliability sre
Last synced: 13 Jan 2026
https://github.com/felipe-veas/felipe-veas
SRE-DevOps Engineer specializing in Kubernetes, Terraform, cloud infrastructure, and observability platforms
aws cloud-engineering devops gcp gitops kubernetes observability sre terraform
Last synced: 10 Mar 2026
https://github.com/mariano-tp/github-observability-demo
Prometheus + Grafana + exporter para métricas de GitHub. CI con GitHub Actions.
devops docker-compose github-actions grafana observability portfolio prometheus sre
Last synced: 24 Sep 2025
https://github.com/suresh-1001/linux-auto-debug
bash devops linux sre troubleshooting
Last synced: 05 May 2026
https://github.com/nudgebee/node-agent
Per-node observability agent for Kubernetes and Linux hosts. Gathers container and host metrics, logs, and L7 traffic via eBPF; exports to Prometheus and OpenTelemetry. Includes LLM API observability.
ebpf golang kubernetes llm-observability monitoring node-agent observability opentelemetry prometheus sre
Last synced: 11 Jun 2026
https://github.com/sanjaysv18/att-website-monitoring-
🚀 Full-stack monitoring with Prometheus & Grafana | Docker-based infrastructure monitoring | SRE practices demonstration
devops docker docker-compose grafana monitoring observability prometheus sre
Last synced: 06 May 2026