An open API service indexing awesome lists of open source software.

SRE

Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.

https://github.com/excoriate/daggerx

DaggerX is a Go package 📦 that helps you avoid DRY while developing Dagger modules.

cli devops ecs example sre tooling

Last synced: 03 Jul 2025

https://github.com/tedilabs/.github

📣 Default community health files for @tedilabs organization on GitHub

devops github hacktoberfest sre tedilabs

Last synced: 15 Apr 2025

https://github.com/madetech/productionisation

The Made Tech Productionisation Checklist for Software Projects

checklist productionise sre

Last synced: 12 Apr 2025

https://github.com/tedilabs/terraform-aws-misc

🌳 A sustainable Terraform Package which creates MISC resources on AWS

aws devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 28 Oct 2025

https://github.com/apiaryio/docker-base-images

Base docker images for Apiary applications

sre

Last synced: 26 Jun 2025

https://github.com/linhng98/mess-around

playground to demonstrate many awesome devops tools, enforce gitops pattern, build scalable and sustainable application cluster

devops homelab kubernetes mess-around sre

Last synced: 17 Jan 2026

https://github.com/arun0009/pulse

Batteries-included Spring Boot observability starter. Cardinality firewall, timeout-budget propagation, SLO-as-code, async context, PII masking, error fingerprints - one dependency, zero agents

cardinality distributed-tracing micrometer observability opentelemetry slo spring-boot-starter sre structured-logging

Last synced: 30 May 2026

https://github.com/deoops-net/dotam

toil for developers

automa golang pipline sre toil workflow

Last synced: 10 Mar 2026

https://github.com/tedilabs/terraform-aws-ml

🌳 A sustainable Terraform Package which creates Machine Learning resources on AWS

aws devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 16 Feb 2026

https://github.com/diptochakrabarty/learn_devops_with_projects

Learn Devops by practical projects . Includes all tech stacks including k8s, ansible , docker , python and more

ansible devops golang hacktoberfest kubernetes python sre

Last synced: 13 Jun 2025

https://github.com/powerhome/keess

Keep secrets and configmaps syncronized across clusters and namespaces

pac sre

Last synced: 04 Mar 2026

https://github.com/ajinkyakadam/systemhealthai

An AI SRE for triaging system health

agents ai aiops devops devops-tools llm llmops mlops observability sre

Last synced: 03 Nov 2025

https://github.com/ohmydevops/devops-culture-or-tools

فایل ارائه "دوآپس، فرهنگ یا ابزار؟" در دورهمی شماره ۲ برنامه‌نویسان کارخانه نوآوری مشهد

agile devops devops-handbook devopsdays sre

Last synced: 18 Feb 2026

https://github.com/kubeshop/fuse-releases

Platform Engineering Copilot. AI-powered expertise with deep domain knowledge at your fingertips.

ai cli copilot devops platform-engineering sre

Last synced: 24 Jan 2026

https://github.com/skyzyx/engineering-for-site-reliability

Overall map of topics to cover for my “Engineering for Site Reliability” blog series.

ci-cd cicd devops docker security site-reliability site-reliability-engineering sre terraform

Last synced: 25 Mar 2025

https://github.com/samgiles/health

Async healthchecks for Golang applications supporting liveness and readiness checks

golang healthcheck k8s kubernetes microservice sre

Last synced: 14 Jan 2026

https://github.com/guilt/chaossquirrel

Like Netflix's Chaos Monkey, packaged to run standalone.

chaos-monkey reliability-engineering sre

Last synced: 12 Aug 2025

https://github.com/prologic/prologic

Hiya 👋 I'm James Mills a Senior SRE / DevOps formally Software Engineer and enthusiastic Gopher (Golang) Programmer I love open source and contributing back,unfortunately recent events have lead me to self-host more of my own projects and data. Please read on! 🙇‍♂️

devops golang open-source software software-engineering sre

Last synced: 13 Apr 2025

https://github.com/lawouach/ebpf-2021-talk

Code for my talk at ebpf 2021 conference

devops ebpf reliability reliably sre

Last synced: 12 Apr 2025

https://github.com/tedilabs/terraform-github-modules

🌳 A sustainable Terraform Package which manage all of things on GitHub

devops github hacktoberfest hcl2 lang-hcl sre tedilabs terraform terraform-module terraform-modules type-module

Last synced: 05 Apr 2026

https://github.com/thotischner/observability-mcp

Unified observability gateway for AI agents — one MCP server for Prometheus, Loki, and any backend, with cross-signal anomaly detection and a built-in Web UI.

ai-agents anomaly-detection anthropic claude gateway helm kubernetes llm loki mcp mcp-server model-context-protocol monitoring observability prometheus sre

Last synced: 11 Jun 2026

https://github.com/certwatch-app/cw-agent

SSL/TLS certificate monitoring agent for Kubernetes and on-prem infrastructure. Scan certificates, detect expiration, validate chains, and sync to CertWatch cloud.

certificate cli cloud-native devops golang kubernetes monitoring security sre ssl tls

Last synced: 13 Jan 2026

https://github.com/johndeere/work-tracker

Observe and protect your Java web application.

elasticsearch java java11 java8 mdc metadata observability spring spring-boot sre

Last synced: 04 Oct 2025

https://github.com/mstryoda/sre-ai-agent

An autonomous Kubernetes troubleshooting and healing agent powered by AI Agents and LLMs

agent ai kubernetes llm python sre troubleshooting

Last synced: 13 Oct 2025

https://github.com/persys-dev/persys-cloud

Community Driven Cloud Automation :)

automation cloud cluster golang kubernetes pipelines platform sre

Last synced: 08 Apr 2026

https://github.com/tedilabs/terraform-aws-messaging

🌳 A sustainable Terraform Package which creates resources for Messaging Services (EventBridge, MSK, SNS, SQS) on AWS

aws aws-eventbridge aws-msk aws-sns aws-sqs devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 19 Sep 2025

https://github.com/tedilabs/terraform-aws-ec2

🌳 A sustainable Terraform Package which creates resources for EC2 Services on AWS

aws aws-ec2 devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules type-module

Last synced: 27 Feb 2026

https://github.com/aligoren/sre-book-tr

Google SRE kitabının Türkçe çevirisi. Site Reliability Engineering prensiplerini ve uygulamalarını Türkçe teknik topluluğa kazandırmak için hazırlanmıştır.

site-reliability-engineering sre turkish

Last synced: 07 Feb 2026

https://github.com/tedilabs/terraform-aws-lambda

🌳 A sustainable Terraform Package which creates Lambda & Step Functions resources on AWS

aws aws-lambda aws-sfn aws-step-functions devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 08 Mar 2026

https://github.com/spechtlabs/tka

Zero-friction Kubernetes access using Tailscale and ephemeral service accounts

access-control authentication kubernetes sre tailscale

Last synced: 18 May 2026

https://github.com/beztebya666/k8s-view

Fast, self-hosted Kubernetes web UI for multi-cluster ops — stream pod logs, exec, port-forward, edit YAML with live diff, rollout history & one-click rollback, Prometheus + metrics-server charts. Single Go binary, no agents, scales to 150k+ objects per cluster.

data-visualization devops devops-tools docker k8s k8s-cluster k8s-dashboard k8s-ui k8s-view kubernetes kubernetes-dashboard kubernetes-ui kubernetes-view linux monitoring sre

Last synced: 01 Jun 2026

https://github.com/projecthelena/warden

Open-source uptime monitoring built in Go. Multi-zone checks, status pages, unlimited team members — the production-grade upgrade from Uptime Kuma.

devops docker go golang monitoring open-source self-hosted sre status-page uptime uptime-kuma-alternative uptime-monitor

Last synced: 02 Apr 2026

https://github.com/mattermost/ponos

A ChatOps SRE toil elimination tool

chatops sre sre-team toil-elimination

Last synced: 14 Jan 2026

https://github.com/tedilabs/terraform-aws-ipam

🌳 A sustainable Terraform Package which creates IPAM resources (IPAM, Elastic IP, Prefix List) on AWS

aws aws-eip aws-elastic-ip aws-ipam aws-prefix-list aws-vpc devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 02 Aug 2025

https://github.com/jayvdb/sre-tools

Helpers for sre_parse, transforming regexes

python3 regex regular-expressions sre sre-parse

Last synced: 19 Aug 2025

https://github.com/vacovsky/poolse

Control health checks and toggle upstream node status in load balancers with ease.

application-monitoring devops devops-tools f5-health-monitor go golang health-check healthcheck load-balancer nginx-proxy proxy site-reliability-engineering sre

Last synced: 26 Mar 2025

https://github.com/k8sgpt-ai/charts

Helm Charts for K8sGPT

devops kubernetes openai sre tooling

Last synced: 11 Mar 2026

https://github.com/excoriate/golang-cli-boilerplate

Golang CLI Boilerplate is a bulletproof Golang CLI template with batteries included 🔋

cli devops ecs example sre tooling

Last synced: 28 Apr 2025

https://github.com/priyanshujain/infragpt

InfraGPT is an AI SRE Copilot for the Cloud that provides infrastructure management agents through Slack integration. The system consists of multiple services that work together to deliver intelligent DevOps workflows.

artificial-intelligence google-cloud-platform infragpt infrastructure sre terraform

Last synced: 28 Jun 2025

https://github.com/mlabouardy/devops-test-foxintelligence

The technical test to apply for a DevOps engineer role at Foxintelligence https://foxintelligence.fr/jobs.html

angular aws devops docker golang nodejs sre swarm

Last synced: 12 Apr 2025

https://github.com/ctsrc/mdrun

Runs command-line pipelines embedded in Markdown and CommonMark documents. Keeps your authored docs up to date. Even usable as an alternative to IPython notebooks.

abstract-syntax-tree ast commonmark computer-science data-science devops iac infrastructure-as-code markdown qa quality-assurance software-development software-engineering sre technical-writing writing-tool

Last synced: 02 Aug 2025

https://github.com/ayazhankadessova/grafana-prometheus

Prometheus-based Grafana dashboard featuring latency chart, CPU usage gauge, and request rates table, and node host metrics, using PromLabs' public server.

grafana monitoring prometheus sre

Last synced: 22 Jul 2025

https://github.com/gorillati/guias

Guias de instalacion y configuacion de servidores y servicios en un Data Center basados en tecnologias open source y software libre.

configuration-management cpd datacenter free-software guias linux-server manuals open-source server services sre sre-infra sysadmin system-administration

Last synced: 11 Mar 2025

https://github.com/fusakla/coordinator

Tool to coordinate on-call, incident and maintenance management

alerting communication coordination dashboard devops oncall sre

Last synced: 22 Apr 2026

https://github.com/abhishekpanda0620/eol-check

A CLI tool to check the End-Of-Life (EOL) status of your development environment and project dependencies.

cli-tool devops end-of-life eol nodejs security sre typescript

Last synced: 13 Jan 2026

https://github.com/dynatrace/obslab-release-validation

Use Grafana k6, Dynatrace business events, workflows and site reliability guardian to validate software releases

automation demo dynatrace grafana-k6 k6 load-testing obslab openfeature release-validation site-reliability-engineering site-reliability-guardian sre workflow

Last synced: 11 Jul 2025

https://github.com/dknauss/wordpress-runbook-template

WordPress operations runbook template: production procedures for deployment, maintenance, backup, incident response, and recovery.

incident-response operations runbook sre wordpress wordpress-security

Last synced: 01 Apr 2026

https://github.com/itsubaki/subset

Load balancing algorithm written in Go

google loadbalancing sre

Last synced: 09 Mar 2026

https://github.com/purbon/kafka-interesting-stories

Compilation of public incident/interesting/horror stories related to Kafka operations

incidents kafka post-mortem production-engineering sre

Last synced: 18 Mar 2026

https://github.com/richwrd/postgres-ha-cluster-lab

Bachelor's Thesis project focusing on the quantitative analysis and implementation of a PostgreSQL HA cluster with Patroni, etcd, and Pgpool-II.

database-administration disaster-recovery docker docker-compose etcd failover high-availability iac metrics open-source patroni pgpool-ii postgresql sre streaming-replication

Last synced: 24 May 2026

https://github.com/johndeere/outstanding

A Java concurrent collection for in-progress work

java java-collections java8 sre

Last synced: 14 Jan 2026

https://github.com/amaurybsouza/devops-deep-dive

DevOps week of the Linux Tips chanel - Ansible, Kubernetes, Docker and AWS.

ansible automation aws bash cloud devops devops-tools linux playbook sre terraform

Last synced: 02 Apr 2026

https://github.com/tiendanube/eks-orb

CircleCI Orb to interact with EKS

circleci eks orb sre

Last synced: 25 Feb 2026

https://github.com/input-output-hk/nixagii

Nixago Pebbles for IOG

devshell devx sre

Last synced: 13 Oct 2025

https://github.com/sourcehawk/triagent

An agent driven incident investigation platform

agentic incident-analysis incident-investigation investigation-tool sre

Last synced: 18 Jun 2026

https://github.com/shmokmt/tfhk

Terraform Housekeeper. The utility tool to remove blocks for refactoring such as moved block.

devops iac infrastructure-as-code sre terraform

Last synced: 15 May 2026

https://github.com/eabykov/sre

Надежность — это не отсутствие сбоев. Это способность системы, команды и человека вместе подняться после падения, переосмыслить, перестроить и идти дальше — с новыми правилами игры, где человеческая уязвимость не угроза, а часть уравнения

chaos-testing error-budget incident monitoring mttd mttm mttr postmortem reliability sla sli slo sre stamp

Last synced: 19 Jan 2026

https://github.com/tedilabs/terraform-okta-modules

🌳 A sustainable Terraform Package which manage all of things on Okta

devops hacktoberfest hcl2 iac lang-hcl okta sre tedilabs terraform terraform-module terraform-modules terraform-okta

Last synced: 29 Jan 2026

https://github.com/russlank/backup-cleanup

A lightweight SQL Server backup cleanup utility that safely removes expired FULL, DIFF, and LOG backup files according to configurable Grandfather-Father-Son (GFS) retention rules.

backup backup-cleanup backup-retention cli devops gfs golang linux mssql sql-server sre windows

Last synced: 24 May 2026

https://github.com/alivzh/rahbia-live-coding

In the RahBia Live Coding Series, we’ll walk through a complete DevOps journey from start to finish. Together, we'll cover every step—from initial server configuration to final production-ready service deployment that mr AhmadRafiee is hosting it

argocd ceph cicd docker elk gitops grafana haproxy linux observability openstack prometheus sre sre-team terraform traefik

Last synced: 10 Apr 2025

https://github.com/amaurybsouza/portfolio

Helps global projects improve security posture while optimizing costs and ensuring business continuity. I'm a dedicated Cloud Security Engineer committed to safeguarding cloud environments and fostering a DevSecOps culture.

ansible aws cicd cloud devops devops-team devops-tools git github gitlab gitops infrastructure-as-code kubernetes sre terraform

Last synced: 12 May 2025

https://github.com/amaurybsouza/my-github-actions

🪐🤖🚀An awesome list of useful Github actions with workflows examples and cases of market to you use on daily basis.

actions ansible aws azure azure-devops cicd cloud devops devops-pipeline devops-tools github github-actions gitlab infrastructure-as-code kubernetes-deployment pipeline sre terraform

Last synced: 11 Apr 2026

https://github.com/gdagil/vmprober

Network connectivity and service availability prober with WAL-backed metrics export to VictoriaMetrics

devops dns-probe golang grpc-probe health-check http-probe icmp infrastructure metrics monitoring network-monitoring observability probing prometheus sre tcp-probes victoriametrics

Last synced: 13 Jan 2026

https://github.com/lukaspustina/usereport-rs

Collect system information for the first 60 seconds of a performance analysis

analysis cli performance sre statistics

Last synced: 03 Aug 2025

https://github.com/ohmydevops/now-status

A social network of people's status! This is a simple project to organizing my mind for interviews. Wish me luck

devops interviews sre

Last synced: 31 Mar 2025

https://github.com/exospherehost/ai-reliability-standards

Architectural standards and best practices for building reliable AI Agents and LLM workflows. Defining the framework for AI Reliability Engineering (AIRE).

ai ai-agents ai-reliability aiops durable-execution enterprise evals evaluation observability reliability-engineering sre

Last synced: 15 Feb 2026

https://github.com/tedilabs/helm-charts

♻️ Repository for Reusable Kubernetes Helm Charts

devops gitops hacktoberfest helm helm-charts k8s kubernetes lang-yaml sre tedilabs

Last synced: 01 Mar 2026

https://github.com/andrewaylett/self-throttle

Helps clients to not overwhelm the services they call

client nodejs sre

Last synced: 05 Feb 2026

https://github.com/rootly-ai-labs/gmcq-benchmark

Evaluation benchmark for language models to understand code to close pull requests.

ai benchmark evals evaluation-metrics llm sre

Last synced: 25 Feb 2026

https://github.com/mathisve/aiosre

All In One SRE Docker Container

aws docker hacktoberfest kubernetes sre

Last synced: 26 Feb 2026

https://github.com/anqorithm/fastapi-helm-chart

This repository contains a Helm chart for a FastAPI application to be deployed on OpenShift clusters with minimal effort with customizable configurations.

automation cicd deployment devops docker fastapi helm k8s kubernetes oc openshift poetry python sre

Last synced: 12 Feb 2026

https://github.com/getbettr/www

Source code for my personal homepage.

advent-of-code adventure devops journal kubernetes rust sre technology

Last synced: 14 Feb 2026

https://github.com/ehsaniara/delay-box

This tool simplifies the workflow by removing the need to write code to handle Redis and Kafka development complexities. It manages these tasks for you through straightforward REST calls.

delayed-job devops distributed-systems kafka redis scheduled-tasks sre

Last synced: 01 Mar 2026

https://github.com/geekxflood/prometheus-inventory-manager

PRIM export in CSV All Prometheus metrics and rules set on your Prometheus instance

monitoring observability prometheus reporting sre

Last synced: 04 Mar 2026

https://github.com/mattyopon/faultray

Zero-risk infrastructure chaos simulation — 5 engines, 2000+ scenarios, 3-Layer availability proof. No production fault injection.

availability chaos-engineering devops infrastructure python resilience simulation sre

Last synced: 02 Apr 2026

https://github.com/conallob/mcp-ssh-wingman

MCP Server for providing read only access to your shell prompt

automation debugging devops llm mcp mcp-server pair-programming platform-engineering shell sre tmux

Last synced: 03 Apr 2026

https://github.com/sergkondr/skondrashov-zsh-theme

My superminimalistic theme for oh-my-zsh

devops oh-my-zsh oh-my-zsh-theme shell-theme sre zsh zsh-theme

Last synced: 23 Apr 2026

https://github.com/lucasepe/resto

A minimalist CLI REST client that calls APIs, waits for conditions, and retries intelligently.

command-line devops expression-evaluator jq kubernetes rest-client retry sre tooling

Last synced: 27 Apr 2026

https://github.com/meysam81/liveness-check

Kubernetes-native health checker that automatically finds and verifies your latest pods are ready before considering deployments successful - perfect for preview environments

ci-cd cli-tool container-native cross-platform devops docker golang health-check http-client kubernetes kubernetes-health liveness-probe microservices monitoring preview-environments readiness-probe single-binary site-reliability sre zero-dependencies

Last synced: 29 Apr 2026