An open API service indexing awesome lists of open source software.

SRE

Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.

https://github.com/prologic/prologic

Hiya 👋 I'm James Mills a Senior SRE / DevOps formally Software Engineer and enthusiastic Gopher (Golang) Programmer I love open source and contributing back,unfortunately recent events have lead me to self-host more of my own projects and data. Please read on! 🙇‍♂️

devops golang open-source software software-engineering sre

Last synced: 13 Apr 2025

https://github.com/ohmydevops/devops-culture-or-tools

فایل ارائه "دوآپس، فرهنگ یا ابزار؟" در دورهمی شماره ۲ برنامه‌نویسان کارخانه نوآوری مشهد

agile devops devops-handbook devopsdays sre

Last synced: 18 Feb 2026

https://github.com/johndeere/work-tracker

Observe and protect your Java web application.

elasticsearch java java11 java8 mdc metadata observability spring spring-boot sre

Last synced: 04 Oct 2025

https://github.com/guilt/chaossquirrel

Like Netflix's Chaos Monkey, packaged to run standalone.

chaos-monkey reliability-engineering sre

Last synced: 12 Aug 2025

https://github.com/tedilabs/terraform-aws-ec2

🌳 A sustainable Terraform Package which creates resources for EC2 Services on AWS

aws aws-ec2 devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules type-module

Last synced: 27 Feb 2026

https://github.com/k8sgpt-ai/charts

Helm Charts for K8sGPT

devops kubernetes openai sre tooling

Last synced: 11 Mar 2026

https://github.com/excoriate/golang-cli-boilerplate

Golang CLI Boilerplate is a bulletproof Golang CLI template with batteries included 🔋

cli devops ecs example sre tooling

Last synced: 28 Apr 2025

https://github.com/tedilabs/terraform-github-modules

🌳 A sustainable Terraform Package which manage all of things on GitHub

devops github hacktoberfest hcl2 lang-hcl sre tedilabs terraform terraform-module terraform-modules

Last synced: 01 Mar 2026

https://github.com/tedilabs/terraform-aws-ipam

🌳 A sustainable Terraform Package which creates IPAM resources (IPAM, Elastic IP, Prefix List) on AWS

aws aws-eip aws-elastic-ip aws-ipam aws-prefix-list aws-vpc devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 02 Aug 2025

https://github.com/jayvdb/sre-tools

Helpers for sre_parse, transforming regexes

python3 regex regular-expressions sre sre-parse

Last synced: 19 Aug 2025

https://github.com/mattermost/ponos

A ChatOps SRE toil elimination tool

chatops sre sre-team toil-elimination

Last synced: 14 Jan 2026

https://github.com/aligoren/sre-book-tr

Google SRE kitabının Türkçe çevirisi. Site Reliability Engineering prensiplerini ve uygulamalarını Türkçe teknik topluluğa kazandırmak için hazırlanmıştır.

site-reliability-engineering sre turkish

Last synced: 07 Feb 2026

https://github.com/tedilabs/terraform-aws-lambda

🌳 A sustainable Terraform Package which creates Lambda & Step Functions resources on AWS

aws aws-lambda aws-sfn aws-step-functions devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 08 Mar 2026

https://github.com/mstryoda/sre-ai-agent

An autonomous Kubernetes troubleshooting and healing agent powered by AI Agents and LLMs

agent ai kubernetes llm python sre troubleshooting

Last synced: 13 Oct 2025

https://github.com/tedilabs/terraform-aws-messaging

🌳 A sustainable Terraform Package which creates resources for Messaging Services (EventBridge, MSK, SNS, SQS) on AWS

aws aws-eventbridge aws-msk aws-sns aws-sqs devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 19 Sep 2025

https://github.com/persys-dev/persys-cloud

community driven cloud provider :)

automation cloud cluster golang kubernetes pipelines platform sre

Last synced: 14 Apr 2025

https://github.com/vacovsky/poolse

Control health checks and toggle upstream node status in load balancers with ease.

application-monitoring devops devops-tools f5-health-monitor go golang health-check healthcheck load-balancer nginx-proxy proxy site-reliability-engineering sre

Last synced: 26 Mar 2025

https://github.com/amaurybsouza/devops-deep-dive

DevOps week of the Linux Tips chanel - Ansible, Kubernetes, Docker and AWS.

ansible automation aws bash cloud devops devops-tools linux playbook sre terraform

Last synced: 30 Dec 2025

https://github.com/eabykov/sre

Надежность — это не отсутствие сбоев. Это способность системы, команды и человека вместе подняться после падения, переосмыслить, перестроить и идти дальше — с новыми правилами игры, где человеческая уязвимость не угроза, а часть уравнения

chaos-testing error-budget incident monitoring mttd mttm mttr postmortem reliability sla sli slo sre stamp

Last synced: 19 Jan 2026

https://github.com/fusakla/coordinator

Tool to coordinate on-call, incident and maintenance management

alerting communication coordination dashboard devops oncall sre

Last synced: 06 Mar 2025

https://github.com/input-output-hk/nixagii

Nixago Pebbles for IOG

devshell devx sre

Last synced: 13 Oct 2025

https://github.com/itsubaki/subset

Load balancing algorithm written in Go

google loadbalancing sre

Last synced: 09 Mar 2026

https://github.com/gorillati/guias

Guias de instalacion y configuacion de servidores y servicios en un Data Center basados en tecnologias open source y software libre.

configuration-management cpd datacenter free-software guias linux-server manuals open-source server services sre sre-infra sysadmin system-administration

Last synced: 11 Mar 2025

https://github.com/johndeere/outstanding

A Java concurrent collection for in-progress work

java java-collections java8 sre

Last synced: 14 Jan 2026

https://github.com/dynatrace/obslab-release-validation

Use Grafana k6, Dynatrace business events, workflows and site reliability guardian to validate software releases

automation demo dynatrace grafana-k6 k6 load-testing obslab openfeature release-validation site-reliability-engineering site-reliability-guardian sre workflow

Last synced: 11 Jul 2025

https://github.com/tedilabs/terraform-okta-modules

🌳 A sustainable Terraform Package which manage all of things on Okta

devops hacktoberfest hcl2 iac lang-hcl okta sre tedilabs terraform terraform-module terraform-modules terraform-okta

Last synced: 29 Jan 2026

https://github.com/tiendanube/eks-orb

CircleCI Orb to interact with EKS

circleci eks orb sre

Last synced: 25 Feb 2026

https://github.com/priyanshujain/infragpt

InfraGPT is an AI SRE Copilot for the Cloud that provides infrastructure management agents through Slack integration. The system consists of multiple services that work together to deliver intelligent DevOps workflows.

artificial-intelligence google-cloud-platform infragpt infrastructure sre terraform

Last synced: 28 Jun 2025

https://github.com/mlabouardy/devops-test-foxintelligence

The technical test to apply for a DevOps engineer role at Foxintelligence https://foxintelligence.fr/jobs.html

angular aws devops docker golang nodejs sre swarm

Last synced: 12 Apr 2025

https://github.com/ctsrc/mdrun

Runs command-line pipelines embedded in Markdown and CommonMark documents. Keeps your authored docs up to date. Even usable as an alternative to IPython notebooks.

abstract-syntax-tree ast commonmark computer-science data-science devops iac infrastructure-as-code markdown qa quality-assurance software-development software-engineering sre technical-writing writing-tool

Last synced: 02 Aug 2025

https://github.com/purbon/kafka-interesting-stories

Compilation of public incident/interesting/horror stories related to Kafka operations

incidents kafka post-mortem production-engineering sre

Last synced: 18 Mar 2026

https://github.com/ayazhankadessova/grafana-prometheus

Prometheus-based Grafana dashboard featuring latency chart, CPU usage gauge, and request rates table, and node host metrics, using PromLabs' public server.

grafana monitoring prometheus sre

Last synced: 22 Jul 2025

https://github.com/abhishekpanda0620/eol-check

A CLI tool to check the End-Of-Life (EOL) status of your development environment and project dependencies.

cli-tool devops end-of-life eol nodejs security sre typescript

Last synced: 13 Jan 2026

https://github.com/amaurybsouza/portfolio

Helps global projects improve security posture while optimizing costs and ensuring business continuity. I'm a dedicated Cloud Security Engineer committed to safeguarding cloud environments and fostering a DevSecOps culture.

ansible aws cicd cloud devops devops-team devops-tools git github gitlab gitops infrastructure-as-code kubernetes sre terraform

Last synced: 12 May 2025

https://github.com/gdagil/vmprober

Network connectivity and service availability prober with WAL-backed metrics export to VictoriaMetrics

devops dns-probe golang grpc-probe health-check http-probe icmp infrastructure metrics monitoring network-monitoring observability probing prometheus sre tcp-probes victoriametrics

Last synced: 13 Jan 2026

https://github.com/lukaspustina/usereport-rs

Collect system information for the first 60 seconds of a performance analysis

analysis cli performance sre statistics

Last synced: 03 Aug 2025

https://github.com/ohmydevops/now-status

A social network of people's status! This is a simple project to organizing my mind for interviews. Wish me luck

devops interviews sre

Last synced: 31 Mar 2025

https://github.com/shmokmt/tfhk

Terraform Housekeeper. The utility tool to remove blocks for refactoring such as moved block.

devops iac infrastructure-as-code sre terraform

Last synced: 13 Jul 2025

https://github.com/exospherehost/ai-reliability-standards

Architectural standards and best practices for building reliable AI Agents and LLM workflows. Defining the framework for AI Reliability Engineering (AIRE).

ai ai-agents ai-reliability aiops durable-execution enterprise evals evaluation observability reliability-engineering sre

Last synced: 15 Feb 2026

https://github.com/newrelic-experimental/nr1-command-center-v2

Consolidated view of incidents, anomalies, and issues across all accessible accounts

alerts anomalies issues nrai nrlabs nrlabs-viz ops sre

Last synced: 28 Nov 2025

https://github.com/amaurybsouza/my-github-actions

🪐🤖🚀An awesome list of useful Github actions with workflows examples and cases of market to you use on daily basis.

actions ansible aws azure azure-devops cicd cloud devops devops-pipeline devops-tools github github-actions gitlab infrastructure-as-code kubernetes-deployment pipeline sre terraform

Last synced: 30 Dec 2025

https://github.com/tedilabs/helm-charts

♻️ Repository for Reusable Kubernetes Helm Charts

devops gitops hacktoberfest helm helm-charts k8s kubernetes lang-yaml sre tedilabs

Last synced: 01 Mar 2026

https://github.com/andrewaylett/self-throttle

Helps clients to not overwhelm the services they call

client nodejs sre

Last synced: 05 Feb 2026

https://github.com/alivzh/rahbia-live-coding

In the RahBia Live Coding Series, we’ll walk through a complete DevOps journey from start to finish. Together, we'll cover every step—from initial server configuration to final production-ready service deployment that mr AhmadRafiee is hosting it

argocd ceph cicd docker elk gitops grafana haproxy linux observability openstack prometheus sre sre-team terraform traefik

Last synced: 10 Apr 2025

https://github.com/rootly-ai-labs/gmcq-benchmark

Evaluation benchmark for language models to understand code to close pull requests.

ai benchmark evals evaluation-metrics llm sre

Last synced: 25 Feb 2026

https://github.com/mathisve/aiosre

All In One SRE Docker Container

aws docker hacktoberfest kubernetes sre

Last synced: 26 Feb 2026

https://github.com/anqorithm/fastapi-helm-chart

This repository contains a Helm chart for a FastAPI application to be deployed on OpenShift clusters with minimal effort with customizable configurations.

automation cicd deployment devops docker fastapi helm k8s kubernetes oc openshift poetry python sre

Last synced: 12 Feb 2026

https://github.com/usmanmern/semester-4

Semester4 Books Repo - GCUF SE: Access study materials for Computer Networking, OS, Design and Algorithm, DBMS, and Software Requirement Engineering. Excel in your studies! 📚

computer-networking operating-system os sre

Last synced: 02 Mar 2025

https://github.com/getbettr/www

Source code for my personal homepage.

advent-of-code adventure devops journal kubernetes rust sre technology

Last synced: 14 Feb 2026

https://github.com/ehsaniara/delay-box

This tool simplifies the workflow by removing the need to write code to handle Redis and Kafka development complexities. It manages these tasks for you through straightforward REST calls.

delayed-job devops distributed-systems kafka redis scheduled-tasks sre

Last synced: 01 Mar 2026

https://github.com/geekxflood/prometheus-inventory-manager

PRIM export in CSV All Prometheus metrics and rules set on your Prometheus instance

monitoring observability prometheus reporting sre

Last synced: 04 Mar 2026

https://github.com/maestre3d/k8s-microservices-sample

A sample platform using Kubernetes (K8s) to manage a set of container-based microservices clusters and web clients written in Java, Golang, Elixir, Rust, Javascript (+ NodeJS) and Python.

elixir golang java javascript kubernetes microservices pyhton rust sre

Last synced: 02 Apr 2025

https://github.com/omarmfathy219/k8s-stuck-pod-cleaner

A lightweight, automated solution to resolve one of the most common operational issues in Kubernetes: pods stuck in Terminating state.

cron-job devops helm-chart k8s kubernetes kubernetes-automation kubernetes-operator pods sre stuck-pods terminating

Last synced: 15 Oct 2025

https://github.com/lucasepe/resto

A minimalist CLI REST client that calls APIs, waits for conditions, and retries intelligently.

command-line devops expression-evaluator jq kubernetes rest-client retry sre tooling

Last synced: 23 Jun 2025

https://github.com/sredevopsorg/.github

Site Reliability Engineering (SRE), DevOps, DevSecOps, Cloud Native, Linux, AI, ML, OpenSource, Platform Engineering en Español, Portugués (Brasil) and English

community devops kubernetes linux open-source organization platform-engineering site-reliability-engineering sre

Last synced: 18 Jan 2026

https://github.com/katavinanguyen/data-center-staffing-optimization-simulator

Simulates incident handling in data centers using Python and SimPy. Analyze how staffing levels, shift timing, and triage rules affect SLA compliance, resolution time, and backlog size.

critical-infrastructure data-center discrete-event-simulation incident-management noc operations-research python simpy simulation sla-monitoring sre staffing-optimization

Last synced: 28 Jul 2025

https://github.com/apiaryio/heroku-datadog-drain

Funnel metrics from multiple Heroku apps into DataDog using statsd

deprecated sre

Last synced: 20 Jan 2026

https://github.com/centerdevice/ceres

SRE Tool for CenterDevice

cli sre

Last synced: 09 Apr 2025

https://github.com/centerdevice/ceres-lambda

SRE Tool for CenterDevice - AWS Lambda Functions

aws lambda ops serverless sre

Last synced: 10 Aug 2025

https://github.com/tedilabs/terraform-aws-vpn

🌳 A sustainable Terraform Package which creates VPN resources (Clienet VPN, Site-to-Site VPN) on AWS

aws aws-client-vpn aws-site-to-site-vpn aws-vpn devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules

Last synced: 09 Apr 2025

https://github.com/arun0009/go-resilience-mock

Chaos engineering in a box. A high-performance mock server to test your API's resilience against latency, failures, and resource exhaustion

chaos-engineering cpu-stress fault-injection go golang http-mock mock-server observability prometheus resilience-testing sre

Last synced: 13 Jan 2026

https://github.com/efcloud/sre-docker-digger

Docker image to small tool that check connectivity.

docker docker-image infrastructure sre

Last synced: 11 Mar 2025

https://github.com/aruizeac/k8s-microservices-sample

A sample platform using Kubernetes (K8s) to manage a set of container-based microservices clusters and web clients written in Java, Golang, Elixir, Rust, Javascript (+ NodeJS) and Python.

elixir golang java javascript kubernetes microservices pyhton rust sre

Last synced: 22 Jun 2025

https://github.com/meysam81/liveness-check

Kubernetes-native health checker that automatically finds and verifies your latest pods are ready before considering deployments successful - perfect for preview environments

ci-cd cli-tool container-native cross-platform devops docker golang health-check http-client kubernetes kubernetes-health liveness-probe microservices monitoring preview-environments readiness-probe single-binary site-reliability sre zero-dependencies

Last synced: 28 Oct 2025

https://github.com/admodev/my-dockerfiles

Dockerfiles i use on a daily basis. Useful for SRE and DevOps Engineers.

devops docker dockerfile dockerfiles engineering image images sre

Last synced: 26 Aug 2025

https://github.com/woodprogrammer/skript

The shell script wrapper on Python

bash python shell sre

Last synced: 30 Mar 2025

https://github.com/glasnostic/helm-charts

Glasnostic Helm Chart Repository

control devops helm-charts k8s kubernetes sre

Last synced: 17 Jan 2026

https://github.com/walmartdigital/hirings

Jobs offers at Walmart Chile

hiring jobs sre

Last synced: 17 Jan 2026

https://github.com/brunopadz/memcached-ok

Simple way to test connection to memcached

infrastructure memcached site-reliability-engineering sre

Last synced: 05 Oct 2025

https://github.com/lucasloureiror/slh

Service Level Helper is a CLI tool for calculating Service Level related metrics like SLO, SLA, Error Budgets and probing frequency.

availability devops golang sla slo sre

Last synced: 06 Feb 2026

https://github.com/toolsascode/protomagic

ProtoMagic is a CLI that helps convert database tables into Protocol Buffers files (.proto).

api cloud dev developer devops golang grpc opensource proto protobuf software sre

Last synced: 26 Jul 2025

https://github.com/sergkondr/fake-web-service

fake web service for testing purposes

golang kubernetes sre testing web-service

Last synced: 02 Mar 2026

https://github.com/emdneto/playground

scripts and some random stuff for study

aws k8s kafka sre terraform

Last synced: 22 Jul 2025

https://github.com/cantrellr/ultimate-k8s-toolbox

🛠️ Comprehensive Kubernetes administration workstation with 50+ pre-installed tools. Deploy a fully-equipped debugging pod directly into your cluster. Air-gapped ready.

air-gapped cloud-native debugging devops helm helm-chart k8s k9s kubectl kubernetes mongosh offline platform-engineering sre toolbox troubleshooting

Last synced: 13 Jan 2026

https://github.com/nusnewob/kube-changejob

A Kubernetes operator that triggers Jobs when specific Kubernetes resources change

automation controller-runtime crd devops golang jobs kubernetes kubernetes-operator sre

Last synced: 16 Jan 2026

https://github.com/kintsdev/automountify

Automountify is a Go-based CLI tool to format, mount disks, and update /etc/fstab for persistent mounting

go golang sre ubuntu

Last synced: 27 Mar 2025

https://github.com/rmkraus/demo-ansible-monitoring

Demo Builder - Automated Issue Remediation with Zabbix + Ansible

ansible demo sre zabbix

Last synced: 21 Aug 2025

https://github.com/itsfoss0/alx-backend

Backend Engineering concepts, projects and resources at ALX Africa

alx-africa alx-backend backend backend-api sre

Last synced: 09 Oct 2025