An open API service indexing awesome lists of open source software.

SRE

Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.

https://github.com/expeor/aws-automation

AWS 운영 자동화 CLI - 멀티 계정/리전 지원, Excel 보고서 생성

automation aws cli compliance cost-optimization devops inventory multi-account multi-region ops python security-audit sre

Last synced: 14 Jan 2026

https://github.com/usekarma/adage-fabric

A universal, governance-first pattern for unifying event streams with Kafka, ClickHouse, and Grafana.

adage clickhouse event-streaming fabric governance grafana kafka observability sre

Last synced: 03 May 2026

https://github.com/meiiie/structured-root-cause-research-skill

Portable SKILL.md for evidence-based root-cause analysis across Claude Code, Codex, and AI coding agents

5-whys agent-skills ai-agents claude-code codex debugging postmortem root-cause-analysis security-review skill-md software-engineering sre

Last synced: 09 Jun 2026

https://github.com/viniciushammett/Golang-DevOps-SRE-Aplicado

Projeto desenvolvido para praticas de DevOps/SRE através do estudo aplicado no dia a dia usando a linguagem Golang

development devops golang sre

Last synced: 15 Aug 2025

https://github.com/excoriate/lazy-aliases

Golang CLI Boilerplate is a bulletproof Golang CLI template with batteries included 🔋

cli devops ecs example sre tooling

Last synced: 30 Mar 2025

https://github.com/moneycat-inc/otel-ops-pack

Hardened OpenTelemetry Collector ops pack for Windows: day-2 tooling, deterministic canary, safe change control, chaos drills, audit evidence.

observability opentelemetry ops signoz sre windows

Last synced: 18 May 2026

https://github.com/cloudputation/iterator

Automate infrastructure management with observability

automation devops golang infrastructure-as-code sre

Last synced: 27 Jan 2026

https://github.com/felippemozer/go-devops-sre-udemy

DevOps and SRE use case problems solving with Go programming language - Udemy course "Programação Go para DevOps e SREs"

devops golang sre udemy-course

Last synced: 14 Jan 2026

https://github.com/aronmilenait/homelab

A personal homelab for experimenting with DevOps, SRE, and Linux tools on physical hardware.

devops homelab linux sre

Last synced: 10 Aug 2025

https://github.com/misterzurg/tbank-fsa

🐝 ОСА Курс по основам системного администрирования

devops docker fsa linux sre tbank

Last synced: 04 May 2026

https://github.com/roybidani/sre-lab-infra

🚀 Complete SRE Training Environment - Production-grade infrastructure with Kubernetes, Prometheus, Grafana, and advanced SRE practices for hands-on learning

aws chaos-engineering devops grafana kubernetes monitoring prometheus sre terraform training

Last synced: 09 Apr 2026

https://github.com/oragazz0/viy

CLI-first Kubernetes chaos engineering toolkit in Go — modular, safe, observable. "Omniscient chaos, unveiled."

chaos-engineering chaos-toolkit cli cloud-native devops fault-injection go golang k8s kubernetes observability resilience-testing sre

Last synced: 18 Apr 2026

https://github.com/ranching-farm/kubectl-addon

Kubectl addon for connecting Kubernetes clusters to ranching.farm - an AI-powered Kubernetes management platform. Simplify cluster operations and get intelligent assistance for common tasks.

ai-assistant ai-assisted cluster-management devops helm k8s krew krew-plugin kubectl kubectl-commands kubectl-plugin kubectl-plugins kubernetes kustomize ranching-farm sre

Last synced: 19 Jan 2026

https://github.com/jemo19/runbook-copilot

RAG-based incident assistant for runbook-backed troubleshooting plans with citations and safety controls.

ai fastapi incident-response python rag runbooks sre

Last synced: 16 Jun 2026

https://github.com/aswinbennyofficial/sre-exercises

Building a resilient backend project from scratch with Dockerization and CI/CD, incorporating Site Reliability Engineering (SRE) principles. Demonstrating best practices in modular architecture, logging, database migration, and CI/CD pipelines for automated testing, deployment.

api backend docker go go-chi golang pgx postgres rest-api sre student-management-system vyper-config yaml-configuration zerolog

Last synced: 04 May 2026

https://github.com/leehmdev/gke-gitops-observability-lab

End-to-end GKE GitOps & Observability lab using Terraform, Helm, Argo CD, Prometheus, and Grafana

argocd devops gitops gke grafana helm kubernetes prometheus sre terraform

Last synced: 05 May 2026

https://github.com/rafaellimatecnologia-cloud/local-first-ai-service

Local-first AI service with deterministic routing, deadline enforcement, and graceful degradation

deterministic edge-ai latency local-first observability python reliability sre

Last synced: 13 Jan 2026

https://github.com/felipe-veas/felipe-veas

SRE-DevOps Engineer specializing in Kubernetes, Terraform, cloud infrastructure, and observability platforms

aws cloud-engineering devops gcp gitops kubernetes observability sre terraform

Last synced: 10 Mar 2026

https://github.com/mariano-tp/github-observability-demo

Prometheus + Grafana + exporter para métricas de GitHub. CI con GitHub Actions.

devops docker-compose github-actions grafana observability portfolio prometheus sre

Last synced: 24 Sep 2025

https://github.com/nudgebee/node-agent

Per-node observability agent for Kubernetes and Linux hosts. Gathers container and host metrics, logs, and L7 traffic via eBPF; exports to Prometheus and OpenTelemetry. Includes LLM API observability.

ebpf golang kubernetes llm-observability monitoring node-agent observability opentelemetry prometheus sre

Last synced: 11 Jun 2026

https://github.com/sanjaysv18/att-website-monitoring-

🚀 Full-stack monitoring with Prometheus & Grafana | Docker-based infrastructure monitoring | SRE practices demonstration

devops docker docker-compose grafana monitoring observability prometheus sre

Last synced: 06 May 2026