An open API service indexing awesome lists of open source software.

SRE

Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.

https://github.com/letusdevops/learngo

30 days roadmap for Golang for DevOps along with exercises.

devops golang roadmap sre

Last synced: 13 Jun 2025

https://github.com/amaurybsouza/terraform-aws-ec2-ssh

Amazon Elastic Compute Cloud (EC2) is a web service that provides resizable compute capacity in the cloud. It is one of the core services offered by Amazon Web Services (AWS) and provides a wide range of features and capabilities.

aws aws-ec2 devops devops-tools github github-actions infrastructure-as-code infrstructure sre terraform terraform-aws terraform-managed terraform-modules terraform-provider

Last synced: 21 Jan 2026

https://github.com/imgautamm/srerepo

SRE Assessment Repo

dataengineering docker postgres python sre

Last synced: 30 Apr 2025

https://github.com/tedilabs/terraform-http-modules

🌳 A sustainable Terraform Package which manage useful data modules via HTTP provider

devops hacktoberfest hcl2 http lang-hcl sre tedilabs terraform terraform-module terraform-modules

Last synced: 11 Jun 2025

https://github.com/tiagotartari/observability-dotnet-opentelemetry-first-steps

This project demonstrates how to implement observability in .NET applications using OpenTelemetry.

dotnet dotnet8 logs metrics observability opentelemetry opentelemetry-collector opentelemetry-dotnet sre traces

Last synced: 20 Jan 2026

https://github.com/pfrederiksen/blast-radius

Local-first AWS dependency graph CLI to understand blast radius before changes

aws aws-sdk-go-v2 cli cloudops devops golang observability sre

Last synced: 25 Jan 2026

https://github.com/apiaryio/example-intersphinx-repo

This repository demonstrates using Intersphinx with indexes being exported in Docker volume

sre

Last synced: 26 Jun 2025

https://github.com/konstruktoid/disruella

A very small digitalized primate responsible for randomly preventing something from continuing as usual or as expected.

chaos-engineering hacktoberfest high-availability python-black python3 resilience sre systemd test-automation

Last synced: 16 Feb 2026

https://github.com/toolsascode/gomodeler

Go Modeler is a small CLI and Library that brings the powerful features of the golang template into a simplified form.

ci cloud devops github-actions infra it pipeline platform sre

Last synced: 13 Oct 2025

https://github.com/shakibamoshiri/dq

Debug docker quickly using Docker Query

debugging devops-tools docker nodejs sre

Last synced: 19 Jun 2025

https://github.com/dingus-technology/DINGUS

Identify and solve bugs in your code by talking to your logs!

ai bugs deployment devops docker grafana infrastructure llm logging loki metrics monitoring openai prometheus python sre

Last synced: 31 Dec 2025

https://github.com/sanjaysv18/att-website-monitoring-

🚀 Full-stack monitoring with Prometheus & Grafana | Docker-based infrastructure monitoring | SRE practices demonstration

devops docker docker-compose grafana monitoring observability prometheus sre

Last synced: 06 Oct 2025

https://github.com/sixtusagbo/alx-system_engineering-devops

System engineering and DevOps training at ALX Holberton school

api automation back-end bash ci-cd debugging devops mysql puppet python scripting shell sre sysadmin

Last synced: 20 Jan 2026

https://github.com/99icar/devops

A fully autonomous, AI-powered DevOps platform for managing cloud infrastructure across multiple providers, with AWS and GitHub integration, powered by OpenAI's Agents SDK.

book containers devops-course docker hacktoberfest jenkins kubernetes leanpub prometheus sql sre terraform vagrant yaml

Last synced: 27 Mar 2025

https://github.com/moneycat-inc/otel-ops-pack

Hardened OpenTelemetry Collector ops pack for Windows: day-2 tooling, deterministic canary, safe change control, chaos drills, audit evidence.

observability opentelemetry ops signoz sre windows

Last synced: 07 Oct 2025

https://github.com/cloudputation/iterator

Automate infrastructure management with observability

automation devops golang infrastructure-as-code sre

Last synced: 27 Jan 2026

https://github.com/felippemozer/go-devops-sre-udemy

DevOps and SRE use case problems solving with Go programming language - Udemy course "Programação Go para DevOps e SREs"

devops golang sre udemy-course

Last synced: 14 Jan 2026

https://github.com/logan-bobo/user_infomation_api

A RESTful API built with Python and Flask allowing management of user information such as first name, last name and email through CRUD operations against a persistent database.

api aws cloud database devops flask python rest-api sql sre

Last synced: 20 Aug 2025

https://github.com/ranching-farm/kubectl-addon

Kubectl addon for connecting Kubernetes clusters to ranching.farm - an AI-powered Kubernetes management platform. Simplify cluster operations and get intelligent assistance for common tasks.

ai-assistant ai-assisted cluster-management devops helm k8s krew krew-plugin kubectl kubectl-commands kubectl-plugin kubectl-plugins kubernetes kustomize ranching-farm sre

Last synced: 19 Jan 2026

https://github.com/pinepain/laravel-system-info

Set of tools to help maintaining a Laravel application in kubernetes

devops k8s laravel php sre

Last synced: 26 Mar 2025

https://github.com/cloud-automation-portfolio/cloud-monitoring-automation

IaC and scripts to deploy secure, multi-cloud logging, monitoring, and alerting. Integrates AWS CloudWatch, Azure Monitor/Sentinel, and Kubernetes (Prometheus/Alertmanager/Grafana) into a single signed Alert Hub (API Gateway + Lambda) for ChatOps delivery. Uses GitHub OIDC for CI with policy and security gates.

alertmanager api-gateway aws azure chatops cloudwatch github-actions grafana iac kubernetes lambda log-analytics observability oidc prometheus security sentinel sre terraform

Last synced: 16 Aug 2025

https://github.com/viniciushammett/Golang-DevOps-SRE-Aplicado

Projeto desenvolvido para praticas de DevOps/SRE através do estudo aplicado no dia a dia usando a linguagem Golang

development devops golang sre

Last synced: 15 Aug 2025

https://github.com/ranching-farm/k8s-agent

Kubernetes agent for deploying ranching.farm directly into your cluster. Connect your K8s deployment to our AI-powered management platform with a single line of code.

ai-assistant ai-assisted cluster-management devops helm k8s kubectl kubernetes kustomize ranching-farm sre

Last synced: 03 Feb 2026

https://github.com/aronmilenait/homelab

A personal homelab for experimenting with DevOps, SRE, and Linux tools on physical hardware.

devops homelab linux sre

Last synced: 10 Aug 2025

https://github.com/aswinbennyofficial/sre-exercises

Building a resilient backend project from scratch with Dockerization and CI/CD, incorporating Site Reliability Engineering (SRE) principles. Demonstrating best practices in modular architecture, logging, database migration, and CI/CD pipelines for automated testing, deployment.

api backend docker go go-chi golang pgx postgres rest-api sre student-management-system vyper-config yaml-configuration zerolog

Last synced: 12 Mar 2025

https://github.com/cloudon-one/opensearch-monitoring

Reusable OpenSearch Monitoring configs

aws aws-lambda sre sre-terraform-managed

Last synced: 30 Dec 2025

https://github.com/toolsascode/scoop-bucket

Scoop bucket for official GoModeler CLI

cli cloud devops golang gotemplate scoop sre

Last synced: 20 Oct 2025

https://github.com/briancain/cats-as-a-service

This is a helper repo used during a role playing based incident training.

cat cats dnd incident-response roleplay sre sre-infrastructure

Last synced: 28 Jan 2026

https://github.com/tedilabs/github-required-actions

♥️ The best way to manage GitHub Actions Required Workflows in @tedilabs

devops github github-actions hacktoberfest sre tedilabs

Last synced: 27 Mar 2025

https://github.com/ricoberger/ricoberger.de

Personal website with links to my LinkedIn, Xing, Twitter, Github and Medium profile.

cloud-native github gopher hacker linkedin medium site-reliability-engineer sre twitter xing

Last synced: 22 Feb 2025

https://github.com/tedilabs/terraform-tfe-modules

🌳 A sustainable Terraform Package to manage all of things on Terraform Enterprise (Terraform Cloud)

devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-cloud terraform-enterprise terraform-module terraform-modules tfe type-module

Last synced: 05 Mar 2026

https://github.com/curiouslearner/cache_sniper

A small utility to detect page caching on CDNs

cache cache-invalidation devops-tools rust rust-lang sre

Last synced: 28 Oct 2025

https://github.com/usekarma/adage-fabric

A universal, governance-first pattern for unifying event streams with Kafka, ClickHouse, and Grafana.

adage clickhouse event-streaming fabric governance grafana kafka observability sre

Last synced: 16 Sep 2025

https://github.com/fedekau/mercado-libre-sre

Es una API para centralizar y cachear las consultas a otras APIs de Mercado Libre.

api mercadolibre nodejs sre

Last synced: 28 Oct 2025

https://github.com/debaghtk/opsfordevs

devops bootcamp material that I have taught at previous companies

bootcamp devops operations sre

Last synced: 15 Feb 2026

https://github.com/viniciushammett/n8n-devops-lab

Lab de n8n para DevOps/SRE: automação de incidentes, digests de SLO e webhooks, rodando em Docker (Postgres+Redis, queue mode).

automation cron devops docker docker-compose incident-management n8n postgres redis sre webhook workflows

Last synced: 09 Aug 2025

https://github.com/korchasa/severin

PoC server chat agent

agent agentic-ai devops llm sre

Last synced: 31 Jan 2026

https://github.com/mikeshobes718/eks-orchestrator

Python CLI to manage EKS cluster/nodegroup lifecycle, RBAC, addons, and GitOps-safe rollouts with dry-run plans.

cli devops eks gitops helm kubectl kubernetes python sre

Last synced: 17 Mar 2026

https://github.com/mikeshobes718/cluster-admin-toolkit

SRE toolkit for day‑2 ops: nodes/deployments views, health, logs/events, rollout status, cordon/drain, restart workflows.

cli kubectl kubernetes observability operations sre

Last synced: 17 Mar 2026

https://github.com/excoriate/tfgenctl

tfgenctl is a CLI for generate code in Terraform, for lazy folks like me.

cli devops ecs example sre tooling

Last synced: 30 Mar 2025

https://github.com/monim279/ai-powered-devops

🤖 Discover AI tools and techniques to optimize DevOps processes through practical challenges and hands-on projects in this comprehensive 10-day course.

agent agentic-ai cicd cloud devops devops-platform devops-workflow engineering-productivity environment-manager go grafana hacktoberfest llmops mcp microsoft-teams prometheus self-hosted sre

Last synced: 24 Sep 2025

https://github.com/mariano-tp/github-observability-demo

Prometheus + Grafana + exporter para métricas de GitHub. CI con GitHub Actions.

devops docker-compose github-actions grafana observability portfolio prometheus sre

Last synced: 24 Sep 2025

https://github.com/knaeckebrothero/kubernetes-cluster-project

This project focuses on setting up and managing a Kubernetes Cluster, because who doesn't want one?

deployment devops kubernetes kubernetes-cluster sre

Last synced: 05 Apr 2025

https://github.com/moemoe89/ansible-ji2

📓 Ansible project for my Medium story material

ansibe aws cloud compute-engine devops docker ec2 gcp haproxy sre

Last synced: 27 Dec 2025

https://github.com/misterzurg/tbank-fsa

🐝 ОСА Курс по основам системного администрирования

devops docker fsa linux sre tbank

Last synced: 30 Oct 2025

https://github.com/vlad2030/mindbox-sre

test assignment for employment

devops kubernetes sre

Last synced: 04 Mar 2025

https://github.com/goseind/schablone

Template repository for app infrastructure based on SRE principles

actions azure cicd helm kubernetes sre terraform

Last synced: 30 Dec 2025

https://github.com/oguzhan-yilmaz/auto-blackbox-exporter

SSL Certificate Expiry alerts for existing K8s Ingress hosts — Auto generate a Prometheus ScrapeConfig for blackbox exporter — Install with Helm or ArgoCD

alertmanager alerts argocd blackbox-exporter gitops grafana grafana-dashboard helm helm-chart kubernetes monitoring prometheus prometheus-exporter sre ssl-certificate ssl-certificate-expired-check

Last synced: 24 Dec 2025

https://gitlab.com/ek-it/guias

Guiás de instalación y configuacion de servidores y servicios en un Data Center basados en tecnologias open source y software libre.

devops linux sre sysadmin

Last synced: 11 Mar 2025

https://github.com/fsaintjacques/survivalkit

A survival kit is a package of basic tools and supplies prepared in advance as an aid to survival in an emergency.

c health-check healthcheck logger monitoring sre

Last synced: 21 Mar 2025

https://github.com/dbalucas/kb_dbalucas

This repository contains all my KnowHow and is a great list of quick and searchable commands for administrating a database and other types of systems I'm working with.

cloud dba ddl dml k8s linux postgresql sql sre

Last synced: 30 Dec 2025

https://github.com/thanhnguyxn/alert-alchemy

🧪 CLI incident-response simulator: brew fixes from alerts using realistic logs, metrics & traces (offline).

chaos-engineering cli debugging devops game incident-response learning monitoring observability oncall postmortem python rich runbooks simulation site-reliability-engineering sre terminal typer yaml

Last synced: 13 Jan 2026

https://github.com/oluwatobi-roie/sre-diskmonitor

Monitor disk usage on a MySQL server and auto-reset binary logs safely when space runs low.

automation bash cronjob devops diskmonitoring mysql server-maintenance sre

Last synced: 02 Jul 2025

https://github.com/deas/ka0s

Building Chaos around LitmusChaos on Kubernetes

chaos-engineering flux2 kubernetes litmuschaos sre

Last synced: 15 Mar 2025

https://github.com/roybidani/sre-lab-infra

🚀 Complete SRE Training Environment - Production-grade infrastructure with Kubernetes, Prometheus, Grafana, and advanced SRE practices for hands-on learning

aws chaos-engineering devops grafana kubernetes monitoring prometheus sre terraform training

Last synced: 30 Dec 2025

https://github.com/jojees/project-genesis

Project Genesis is a comprehensive, hands-on learning initiative designed to build and manage a tangible, multi-service application within a modern DevOps ecosystem. This project serves as a real-world sandbox, demonstrating best practices across various disciplines, including DevOps, Site Reliability Engineering (SRE), DevSecOps, and FinDevOps.

cicd devops docker gitops grafana high-availability kubernetes microservices-architecture observability postgres prometheus rabbitmq redis sre

Last synced: 30 Dec 2025

https://github.com/leehmdev/gke-gitops-observability-lab

End-to-end GKE GitOps & Observability lab using Terraform, Helm, Argo CD, Prometheus, and Grafana

argocd devops gitops gke grafana helm kubernetes prometheus sre terraform

Last synced: 23 Nov 2025

https://github.com/rafaellimatecnologia-cloud/local-first-ai-service

Local-first AI service with deterministic routing, deadline enforcement, and graceful degradation

deterministic edge-ai latency local-first observability python reliability sre

Last synced: 13 Jan 2026

https://github.com/pezzos/pezzos

Some info about me 🤓

curriculum-vitae cv devops sre

Last synced: 11 Feb 2026

https://github.com/miare-ir/sreinterview

This repo holds the material for the technical step of our SRE interview process.

ansible celery django interview miare sre

Last synced: 18 Nov 2025

https://github.com/ramesh-852000/devops-practices-and-interview-prep

A collection of DevOps practices, scripts, interview questions, and real-world examples covering Linux, Jenkins, AWS, Kubernetes, Docker, Ansible, Terraform, CI/CD pipelines, Monitoring, and Cloud Platforms.

ansible aws azure cloud devops docker elastic gcp interview-questions jenkins kubernetes linux nosql prometheus sql sre terraform

Last synced: 30 Dec 2025

https://github.com/ziad-hsn/cpra

CPRA is a high-performance infrastructure monitoring system designed for platform teams managing large-scale microservice architectures. Built on Entity-Component-System (ECS) architecture and queueing theory principles, CPRA handles 1,000,000+ concurrent health checks with automatic worker pool scaling to meet SLO targets.

concurrency devops golang observability self-hosted sre uptime-monitor

Last synced: 07 Mar 2026

https://github.com/accuknox/rinc

Kubernetes-native system metrics reporter

kubernetes reporting sre

Last synced: 25 Mar 2025

https://github.com/felipe-veas/handling-production-incidents

Runbooks, processes, and guidelines for effectively managing production incidents

documentation incident-management reliability runbooks sre

Last synced: 10 Mar 2026

https://github.com/felipe-veas/felipe-veas

SRE-DevOps Engineer specializing in Kubernetes, Terraform, cloud infrastructure, and observability platforms

aws cloud-engineering devops gcp gitops kubernetes observability sre terraform

Last synced: 10 Mar 2026

https://github.com/guibes/runbook-operator

A cloud-native Kubernetes operator that automatically generates and manages runbook documentation from PrometheusRule configurations with multiple output formats.

alerting automation cloud-native devops documentation gitops incident-response kubernetes monitoring operator prometheus runbooks sre

Last synced: 23 Jun 2025

https://github.com/juanfranciscocis/devprobe_tesis

DevProbe is a progressive web application that provides a platform for Site Reliability Engineers to monitor their websites. The app is built with , IONIC, Angular and Firebase.

angular gemini gemini-api ionic ionic-framework reliability-engineering site site-reliability-engineering site-reliability-engineering-sre sre sre-team typescript

Last synced: 01 Apr 2025

https://github.com/excoriate/lazy-aliases

Golang CLI Boilerplate is a bulletproof Golang CLI template with batteries included 🔋

cli devops ecs example sre tooling

Last synced: 30 Mar 2025

https://github.com/macbre/http-shadow

Compares HTTP responses from two different backends

sre sus

Last synced: 20 Jul 2025

https://github.com/timyiu478/sadservers

Notes of sad servers

devops linux sre troubleshooting

Last synced: 03 Jul 2025

https://github.com/apolzek/shared

collection of proof-of-concepts (PoCs) created to explore ideas and test technologies

devops devops-tools laboratory proof-of-concept sre

Last synced: 17 Jan 2026

https://github.com/itsfoss0/writeups

Writeup about my homelab and postmoterms for incidents and/or outages in the same

devops incident-management incident-response kubernetes sre

Last synced: 08 Apr 2025

https://github.com/powerhome/pac-quota-controller

PAC Resource Sharing Validation Webhook

kubernetes-controller pac sre

Last synced: 16 Jan 2026

https://github.com/swenyai/sweny

AI-powered engineering workflows — Learn from any source, Act through any tool, Report through any channel

ai ai-agent automation claude devops github-action observability sre triage typescript

Last synced: 11 Mar 2026

https://github.com/opscart/opscart-k8s-watcher

Kubernetes security awareness and troubleshooting tool featuring CIS Benchmark scoring, environment-aware analysis (PROD vs DEV), and actionable recommendations. Not for compliance auditing - use kube-bench for official CIS compliance.

cis-benchmark cloud-native cluster-monitoring devops devsecops kubernetes-security platform-engineering resource-optimization sre troubleshooting

Last synced: 21 Feb 2026

https://github.com/trafik255/platform-engineering-starter-kit

A complete AWS platform engineering reference architecture using Terraform, ECS Fargate, ALB, CloudWatch, and GitHub Actions.

alb aws aws-ecs aws-vpc cicd cloudwatch devops ecs fastapi infrastructure-as-code microservices observability platform-engineering sre starter-kit terraform

Last synced: 27 Nov 2025