SRE
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
- GitHub: https://github.com/topics/sre
- Wikipedia: https://en.wikipedia.org/wiki/Site_reliability_engineering
- Aliases: site-reliability-engineering,
- Last updated: 2026-06-25 00:25:51 UTC
- JSON Representation
https://github.com/newrelic-experimental/nr1-command-center-v2
Consolidated view of incidents, anomalies, and issues across all accessible accounts
alerts anomalies issues nrai nrlabs nrlabs-viz ops sre
Last synced: 08 Jun 2026
https://github.com/usmanmern/semester-4
Semester4 Books Repo - GCUF SE: Access study materials for Computer Networking, OS, Design and Algorithm, DBMS, and Software Requirement Engineering. Excel in your studies! 📚
computer-networking operating-system os sre
Last synced: 10 May 2026
https://github.com/itsfoss0/alx-backend
Backend Engineering concepts, projects and resources at ALX Africa
alx-africa alx-backend backend backend-api sre
Last synced: 09 Oct 2025
https://github.com/efcloud/sre-docker-digger
Docker image to small tool that check connectivity.
docker docker-image infrastructure sre
Last synced: 11 Mar 2025
https://github.com/glasnostic/helm-charts
Glasnostic Helm Chart Repository
control devops helm-charts k8s kubernetes sre
Last synced: 17 Jan 2026
https://github.com/vsingh55/homelab-ops
A production-grade Hybrid Cloud Platform spanning On-Prem (Proxmox) and GCP. Engineered with Terraform, Ansible, K3s, and WireGuard Mesh to demonstrate Zero-Trust networking, FinOps, and SRE principles.
ansible automation devops finops gcp gitops grafana hybrid-cloud infrastructure-as-code kubernetes observability platform-engineering proxmox self-hosted sre terraform wireguard zero-trust
Last synced: 11 Apr 2026
https://github.com/kmadsdev/devops-challenge
PicPay Jr Devops Challenge Solution
api cors devops docker go html-css javascript microservices python redis sre
Last synced: 13 Jan 2026
https://github.com/inxbit/prismtty
Fast terminal output highlighter focused on network devices and Unix systems
ansi chromaterm cisco cli devops fortinet juniper network-tools networking rust sre ssh sysadmin terminal terminal-ui
Last synced: 30 May 2026
https://github.com/dingus-technology/DINGUS
Identify and solve bugs in your code by talking to your logs!
ai bugs deployment devops docker grafana infrastructure llm logging loki metrics monitoring openai prometheus python sre
Last synced: 31 Dec 2025
https://github.com/maestre3d/k8s-microservices-sample
A sample platform using Kubernetes (K8s) to manage a set of container-based microservices clusters and web clients written in Java, Golang, Elixir, Rust, Javascript (+ NodeJS) and Python.
elixir golang java javascript kubernetes microservices pyhton rust sre
Last synced: 02 Apr 2025
https://github.com/apiaryio/heroku-datadog-drain
Funnel metrics from multiple Heroku apps into DataDog using statsd
Last synced: 20 Jan 2026
https://github.com/letusdevops/learngo
30 days roadmap for Golang for DevOps along with exercises.
Last synced: 21 Apr 2026
https://github.com/sredevopsorg/.github
Site Reliability Engineering (SRE), DevOps, DevSecOps, Cloud Native, Linux, AI, ML, OpenSource, Platform Engineering en Español, Portugués (Brasil) and English
community devops kubernetes linux open-source organization platform-engineering site-reliability-engineering sre
Last synced: 18 Jan 2026
https://github.com/sharanch/inkwell-complete
Microservices blogging platform — Go services, React frontend, Kubernetes (Minikube), GitOps with ArgoCD, CI/CD via GitHub Actions. SRE/DevOps portfolio project.
argocd devops docker github-actions golang istio kubernetes microservices portfolio postgresql react redis sre
Last synced: 30 May 2026
https://github.com/arun0009/go-resilience-mock
Chaos engineering in a box. A high-performance mock server to test your API's resilience against latency, failures, and resource exhaustion
chaos-engineering cpu-stress fault-injection go golang http-mock mock-server observability prometheus resilience-testing sre
Last synced: 13 Jan 2026
https://github.com/cantrellr/ultimate-k8s-toolbox
🛠️ Comprehensive Kubernetes administration workstation with 50+ pre-installed tools. Deploy a fully-equipped debugging pod directly into your cluster. Air-gapped ready.
air-gapped cloud-native debugging devops helm helm-chart k8s k9s kubectl kubernetes mongosh offline platform-engineering sre toolbox troubleshooting
Last synced: 13 Jan 2026
https://github.com/admodev/my-dockerfiles
Dockerfiles i use on a daily basis. Useful for SRE and DevOps Engineers.
devops docker dockerfile dockerfiles engineering image images sre
Last synced: 26 Aug 2025
https://github.com/brunopadz/memcached-ok
Simple way to test connection to memcached
infrastructure memcached site-reliability-engineering sre
Last synced: 05 Oct 2025
https://github.com/rmkraus/demo-ansible-monitoring
Demo Builder - Automated Issue Remediation with Zabbix + Ansible
Last synced: 21 Aug 2025
https://github.com/learn-software-engineering/website
Learn-Software.com Website
blog devops github-pages golang hugo kubernetes platform-engineering programming python site-reliability-engineering software software-engineering sre website
Last synced: 09 Apr 2026
https://github.com/centerdevice/ceres-lambda
SRE Tool for CenterDevice - AWS Lambda Functions
Last synced: 18 May 2026
https://github.com/charles-adedotun/kubepulse
Intelligent Kubernetes health monitoring with AI-powered diagnostics, predictive analytics, and auto-remediation
ai ai-agents automation claude cloud-native devops go kubernetes monitoring observability react sre typescript
Last synced: 14 Apr 2026
https://github.com/amaurybsouza/terraform-aws-ec2-ssh
Amazon Elastic Compute Cloud (EC2) is a web service that provides resizable compute capacity in the cloud. It is one of the core services offered by Amazon Web Services (AWS) and provides a wide range of features and capabilities.
aws aws-ec2 devops devops-tools github github-actions infrastructure-as-code infrstructure sre terraform terraform-aws terraform-managed terraform-modules terraform-provider
Last synced: 21 Jan 2026
https://github.com/meysam81/healthchecks-client
🏥 A production-ready CLI tool for monitoring HTTP endpoints and automatically reporting success/failure to healthchecks.io. Single binary & cross-platform.
alerting cli devops golang golang-cli healthcheck healthchecks http-monitoring infrastructure-monitoring microservices-monitoring monitoring observability ping-monitoring production-monitoring service-health service-monitoring site-reliability-engineering sre status-monitoring uptime-monitoring
Last synced: 07 Aug 2025
https://github.com/cyanheads/devops-status-mcp-server
Check vendor status pages, inspect SSL/TLS certificates, verify DNS propagation, and get incident-response playbooks via MCP. STDIO or Streamable HTTP.
ai-agents ai-tools cyanheads devops mcp mcp-server model-context-protocol monitoring sre status statuspage typescript
Last synced: 20 Jun 2026
https://github.com/katavinanguyen/data-center-staffing-optimization-simulator
Simulates incident handling in data centers using Python and SimPy. Analyze how staffing levels, shift timing, and triage rules affect SLA compliance, resolution time, and backlog size.
critical-infrastructure data-center discrete-event-simulation incident-management noc operations-research python simpy simulation sla-monitoring sre staffing-optimization
Last synced: 28 Jul 2025
https://github.com/konstruktoid/disruella
A very small digitalized primate responsible for randomly preventing something from continuing as usual or as expected.
chaos-engineering hacktoberfest high-availability python-black python3 resilience sre systemd test-automation
Last synced: 16 Feb 2026
https://github.com/toolsascode/gomodeler
Go Modeler is a small CLI and Library that brings the powerful features of the golang template into a simplified form.
ci cloud devops github-actions infra it pipeline platform sre
Last synced: 13 Oct 2025
https://github.com/omarmfathy219/k8s-stuck-pod-cleaner
A lightweight, automated solution to resolve one of the most common operational issues in Kubernetes: pods stuck in Terminating state.
cron-job devops helm-chart k8s kubernetes kubernetes-automation kubernetes-operator pods sre stuck-pods terminating
Last synced: 15 Oct 2025
https://github.com/woodprogrammer/skript
The shell script wrapper on Python
Last synced: 14 Apr 2026
https://github.com/boshu2/12-factor-agentops
DevOps + SRE principles for operating LLM applications reliably at scale. Complementary to 12-Factor Agents for building
12-factor agent-orchestration agentops agents ai-agents ai-agents-framework ai-operations argocd context-engineering devops flux gitops infrastructure-as-code kubernetes kyverno llm openshift platform-engineering production-operations sre
Last synced: 12 May 2026
https://github.com/aruizeac/k8s-microservices-sample
A sample platform using Kubernetes (K8s) to manage a set of container-based microservices clusters and web clients written in Java, Golang, Elixir, Rust, Javascript (+ NodeJS) and Python.
elixir golang java javascript kubernetes microservices pyhton rust sre
Last synced: 08 Apr 2026
https://github.com/tiagotartari/observability-dotnet-opentelemetry-first-steps
This project demonstrates how to implement observability in .NET applications using OpenTelemetry.
dotnet dotnet8 logs metrics observability opentelemetry opentelemetry-collector opentelemetry-dotnet sre traces
Last synced: 20 Jan 2026
https://github.com/pfrederiksen/blast-radius
Local-first AWS dependency graph CLI to understand blast radius before changes
aws aws-sdk-go-v2 cli cloudops devops golang observability sre
Last synced: 25 Jan 2026
https://github.com/ingero-io/ingero-fleet
GPU cluster straggler detection - custom OTEL Collector distribution
anomaly-detection distributed-training gpu gpu-observability kubernetes llm-inference machine-learning observability opentelemetry opentelemetry-collector otlp sre straggler-detection
Last synced: 02 May 2026
https://github.com/apiaryio/example-intersphinx-repo
This repository demonstrates using Intersphinx with indexes being exported in Docker volume
Last synced: 26 Jun 2025
https://github.com/lucasloureiror/slh
Service Level Helper is a CLI tool for calculating Service Level related metrics like SLO, SLA, Error Budgets and probing frequency.
availability devops golang sla slo sre
Last synced: 06 Feb 2026
https://github.com/dvilaverde/k8s-countermeasures
Kubernetes operator deploying run-books as code.
automation countermeasure devops golang k8s kubernetes operator operator-sdk prod-support runbooks sre
Last synced: 15 Jan 2026
https://github.com/toolsascode/homebrew-tap
Homebrew library repository.
brew cloud devops golang gotemplate homebrew homebrew-tap it pipeline platform sre
Last synced: 27 Feb 2026
https://github.com/sergkondr/fake-web-service
fake web service for testing purposes
golang kubernetes sre testing web-service
Last synced: 02 Mar 2026
https://github.com/nudgebee/nudgebee-docs
Public documentation for NudgeBee — AI-powered Kubernetes operations platform. Built with Docusaurus 3, published at docs.nudgebee.com.
devops documentation docusaurus kubernetes nudgebee sre
Last synced: 17 May 2026
https://github.com/ops4life/claude4ops
Production-ready DevOps superpowers for everyone. Streamline and automate complex workflows across AWS, GCP, Azure, and Kubernetes.
ai-devops anthropic aws azure cicd claude claude-code devops gcp helm iac incident-management infrastructure-as-code kubernetes monitoring observability platform-engineering sre terraform
Last synced: 05 Jun 2026
https://github.com/chukwuemekaaham/cloud-gcp-projects
Google Cloud Platform Projects, Workshop Training and Skill Badge
anthos big-data-analytics case-study cloud-identity cloud-infrastructure cloudbuild data-engineering devsecops gcp grafana-dashboard landing-zone migration mlops prometheus service-account spinnaker sre terraform vpn
Last synced: 16 May 2026
https://github.com/bigg01/claude-ci-agent
Autonomous Claude Code CI agent in a rootless Podman sandbox — GitLab CI & GitHub Actions, two personalities (read-write Agent, read-only Advisor), OpenTelemetry audit to Elastic, Helm chart for AKS/OpenShift.
ai-agent anthropic cert-manager ci-cd claude-code devops elasticsearch github-actions gitlab-ci helm kubernetes llm llm-gateway observability openbao openshift opentelemetry podman rootless sre
Last synced: 26 Jun 2026
https://github.com/kintsdev/automountify
Automountify is a Go-based CLI tool to format, mount disks, and update /etc/fstab for persistent mounting
Last synced: 27 Mar 2025
https://github.com/tedilabs/terraform-http-modules
🌳 A sustainable Terraform Package which manage useful data modules via HTTP provider
devops hacktoberfest hcl2 http lang-hcl sre tedilabs terraform terraform-module terraform-modules
Last synced: 06 Jun 2026
https://github.com/meysam81/terraform-modules
automation ci-cd cloud-infrastructure cloud-native deployment-automation devops github-actions github-repositories github-workflow iac infrastructure-as-code infrastructure-management kubernetes platform-engineering site-reliability-engineering sre terraform terraform-best-practices terraform-configuration terraform-modules
Last synced: 10 May 2026
https://github.com/clear-route/vault-client-count-exporter
A dead-simple Prometheus exporter to monitor Vaults Client Count for the entire Cluster and each Namespace
Last synced: 26 Jun 2026
https://github.com/imgautamm/srerepo
SRE Assessment Repo
dataengineering docker postgres python sre
Last synced: 30 Apr 2025
https://github.com/tedilabs/terraform-aws-vpn
🌳 A sustainable Terraform Package which creates VPN resources (Clienet VPN, Site-to-Site VPN) on AWS
aws aws-client-vpn aws-site-to-site-vpn aws-vpn devops hacktoberfest hcl2 iac lang-hcl sre tedilabs terraform terraform-aws terraform-module terraform-modules
Last synced: 29 Apr 2026
https://github.com/polsebas/agente-admin-observabilidad
Sistema de análisis automático de alertas con Agno Framework + Grafana Stack. Incluye ObservabilityTeam (WatchdogAgent, TriageAgent, ReportAgent) y Quick Commands para observabilidad en tiempo real.
agno ai alerting devops grafana loki multi-agent observability prometheus sre tempo
Last synced: 02 May 2026
https://github.com/manishklach/ai-host-observability
Linux host observability toolkit for AI/GPU infrastructure, exposing Prometheus metrics for memory pressure, RDMA/NIC health, PCIe/VFIO, NUMA, GPUs, and kernel events.
ai-infrastructure ai-ops gpu gpu-monitoring infiniband kernel linux linux-monitoring mlx5 node-exporter numa nvidia observability pcie performance-engineering prometheus rdma rdma-monitoring sre vfio
Last synced: 09 Jun 2026
https://github.com/mkozjak/mkozjak.github.io
My business website.
aws cloud consultancy devops engineer gcp kubernetes programmer services sre terraform
Last synced: 05 May 2026
https://github.com/23seriy/devops-ai-workflows
Curated collection of AI-agent workflows, prompts & rules for DevOps/SRE — Kubernetes debugging, AWS audits, Terraform plan reviews, CI/CD triage, Dockerfile reviews, secrets scanning & incident response. Works with Windsurf, Cursor, Claude Code or any LLM.
ai-agent ai-workflows aws chatops cicd cursor devops docker incident-response kubernetes llm observability platform-engineering prompts security sre terraform windsurf
Last synced: 03 Jun 2026
https://github.com/shakibamoshiri/dq
Debug docker quickly using Docker Query
debugging devops-tools docker nodejs sre
Last synced: 09 May 2026
https://github.com/oaslananka-lab/mcp-ssh-tool
Production-grade MCP SSH automation server for secure remote command, file, tunnel, service, metrics, and policy-controlled operations over stdio/HTTP, with npm distribution, MCP Registry metadata, and ChatGPT app readiness.
automation chatgpt claude codex cursor devops infrastructure mcp mcp-server model-context-protocol nodejs npm-package openai remote-automation security sre ssh ssh-client typescript vscode
Last synced: 13 May 2026
https://github.com/jnbdz/sre-quickstarts
Software Reverse Engineering (SRE) Quickstarts!
disassembler linux quickstart quickstarts reverse-engineering software-analysis software-reverse-engineering sre
Last synced: 16 Jun 2026
https://github.com/toolsascode/gomodeler-action
GitHub Action for GoModeler
ci cloud devops github-actions golang gomodeler gotemplate pltaform sre summary template
Last synced: 14 May 2026
https://github.com/greenblade29/loglense
AI-Powered Log Analysis for the Command Line
ai-analysis ai-powered anthropic artificial-intelligence cli-tool debugging devops gemini llm log-analysis machine-learning ollama openai python root-cause-analysis sre troubleshooting
Last synced: 12 Apr 2026
https://github.com/nusnewob/kube-changejob
A Kubernetes operator that triggers Jobs when specific Kubernetes resources change
automation controller-runtime crd devops golang jobs kubernetes kubernetes-operator sre
Last synced: 16 Jan 2026
https://github.com/marchenkovit/brewfile
One-command MacBook Pro M3 setup — Homebrew packages, casks, VS Code extensions, shell config, macOS defaults, kubectl contexts. Idempotent install.sh skips apps already installed manually.
apple-silicon automation aws brewfile devops dotfiles homebrew idempotent installer jetbrains kubernetes m3 mac-setup macbook macos setup sre terraform vscode zsh
Last synced: 13 May 2026
https://github.com/jwalsh/observex-demo
ObserveX demo of an Internal Developer Platform (IDP)
devops distributed-systems distributed-tracing guile-scheme idp internal-developer-platform observability platform-engineering simulation sre
Last synced: 13 Mar 2026
https://github.com/suhasramanand/predictive-reliability-platform
End-to-end predictive reliability platform with anomaly detection, auto-remediation, and comprehensive observability for microservices
anomaly-detection auto-remediation chaos-engineering devops docker fastapi grafana kubernetes microservices monitoring observability predictive-maintenance prometheus python react site-reliability sre typescript
Last synced: 08 Apr 2026
https://github.com/mrwogu/portguard
Port Monitoring Health Check Service
devops go golang health-check http-service kubernetes load-balancer monitoring port-monitoring sre systemd tcp
Last synced: 27 Jan 2026
https://github.com/mizcausevic-dev/kinetic-gain-operator-console
Mission-control operator console for the Kinetic Gain Protocol Suite — interactive topology mesh, configurable SRE operator dashboard, audit-stream visualization, PDF export. Deploys to console.kineticgain.com.
ai-governance audit-stream dataviz kinetic-gain kinetic-gain-protocol-suite operator-console react sre topology typescript vite
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/slo-budget-tracker
SLO + error-budget tracker for Python services. FastAPI middleware, Prometheus exporter, multi-window burn-rate alerts. Part of the Platform Reliability Stack.
asgi burn-rate error-budget fastapi monitoring prometheus python reliability slo sre
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/request-shadow-rs
Async request mirroring with sampling, divergence detection, and structured response diffs. The SRE primitive for safe migrations. Part of the Platform Reliability Stack.
async diff migration mirror reliability rust shadow sre tokio
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/mcp-reliability-toolkit
MCP server exposing SLO math + reliability config recipes. Compute burn rate, size rate limiters, pick breaker thresholds, get drop-in Python and Rust configs back. Part of the Platform Reliability Stack.
circuit-breaker claude kinetic-gain mcp model-context-protocol rate-limiter reliability slo sre typescript
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/latency-budget-enforcer
Go policy engine for latency budget enforcement, dependency drag review, tail-latency breaches, and operator-facing service-path response planning
backend go golang governance latency net-http observability performance-engineering platform-engineering policy-engine sre
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/agent-canary
Progressive rollout, shadow mode, and auto-rollback for AI agents. Sticky-percent routing with promote/rollback gates driven by real metrics. Platform engineering reliability for the agent era.
ai-agents canary deployment feature-flags platform-engineering progressive-rollout python reliability shadow-deployment sre
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/rate-limit-shield
Production-grade rate limiting, circuit breaking, and retry shaping for LLM APIs. Token bucket + breaker + jittered backoff with HTTP 429 / Retry-After awareness.
anthropic circuit-breaker llm llmops openai python rate-limiting reliability retry-policy sre
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/observability-incident-command-api
TypeScript API for incident severity analysis, escalation routing, responder visibility, and operational incident-command workflows.
backend express incident-response nodejs openapi platform-engineering sre typescript
Last synced: 01 Jun 2026
https://github.com/mizcausevic-dev/grpc-mesh-shadow
Typed gRPC shadow traffic client. Mirrors requests from a stable primary to an under-test candidate; diffs responses asynchronously; returns the primary to your caller. Sampling, timeouts, pluggable sinks. bufconn-tested.
ai-governance canary golang grpc platform-engineering protobuf service-mesh shadow-traffic sre
Last synced: 01 Jun 2026
https://github.com/zahidhasann88/prometheus-mcp
claude devops mcp model-context-protocol observability prometheus promql sre
Last synced: 10 Jun 2026
https://github.com/ricoberger/ricoberger.de
Personal website with links to my LinkedIn, Xing, Twitter, Github and Medium profile.
cloud-native github gopher hacker linkedin medium site-reliability-engineer sre twitter xing
Last synced: 17 May 2026
https://github.com/rbryce90/linux-time-machine
Local-first Linux observability with historical scrubbing, semantic journald search, and an MCP server for Claude-driven investigation. Go + SQLite + Ollama embeddings.
bubbletea embeddings golang linux local-first mcp model-context-protocol observability ollama rag sre systems-monitoring time-series tui
Last synced: 22 May 2026
https://github.com/vfolgosa/bifrost-proxy
A lightweight Layer-7 Kafka proxy. Route traffic across clusters with port-based routing, SASL passthrough, and autonomous failover.
devops failover golang kafka kafka-proxy load-balancing proxy sre
Last synced: 20 Jun 2026
https://github.com/gma1k/gma1k
Introducing
automation cloud devops kubernetes linux platform-engineering security sre
Last synced: 06 Mar 2026
https://github.com/toolsascode/scoop-bucket
Scoop bucket for official GoModeler CLI
cli cloud devops golang gotemplate scoop sre
Last synced: 20 Oct 2025
https://github.com/volkv/server-pulse
Lightweight Linux server monitoring with Telegram alerts. CPU, RAM, disk, load, Docker, OOM. Pure bash, systemd timer, no daemon.
alerting bash dedicated-server devops disk-space docker homelab linux-monitoring monitoring oom-killer self-hosted server-monitoring shell-script sre systemd telegram-alerts telegram-bot vps
Last synced: 21 Jun 2026
https://github.com/ranching-farm/k8s-agent
Kubernetes agent for deploying ranching.farm directly into your cluster. Connect your K8s deployment to our AI-powered management platform with a single line of code.
ai-assistant ai-assisted cluster-management devops helm k8s kubectl kubernetes kustomize ranching-farm sre
Last synced: 03 Feb 2026
https://github.com/briancain/cats-as-a-service
This is a helper repo used during a role playing based incident training.
cat cats dnd incident-response roleplay sre sre-infrastructure
Last synced: 28 Jan 2026
https://github.com/tswcbyy1107/ansible-k8s
deploy k8s by ansible
ansible calico containerd docker flannel k8s sre
Last synced: 02 Mar 2025
https://github.com/rebound-how/rebound
The open source toolbox for resilient operations
agentic-ai ai chaos-engineering chaostoolkit devops mcp-server reliability-engineering reliability-tools resilience resilience-testing sre
Last synced: 08 Jul 2025
https://github.com/aliariff/argus
Tool to export WebPageTest results into InfluxDB.
devops grafana influxdb monitoring performance python sre webpagetest
Last synced: 18 Apr 2026
https://github.com/ramesh-852000/devops-practices-and-interview-prep
A collection of DevOps practices, scripts, interview questions, and real-world examples covering Linux, Jenkins, AWS, Kubernetes, Docker, Ansible, Terraform, CI/CD pipelines, Monitoring, and Cloud Platforms.
ansible aws azure cloud devops docker elastic gcp interview-questions jenkins kubernetes linux nosql prometheus sql sre terraform
Last synced: 04 Apr 2026
https://github.com/felipe-veas/handling-production-incidents
Runbooks, processes, and guidelines for effectively managing production incidents
documentation incident-management reliability runbooks sre
Last synced: 10 Mar 2026
https://github.com/curiouslearner/cache_sniper
A small utility to detect page caching on CDNs
cache cache-invalidation devops-tools rust rust-lang sre
Last synced: 28 Oct 2025
https://github.com/Terraform-Tutorials/terraform-aws-autoscaling
Just a basic test to code a Auto Scaling using Terraform on AWS
aws aws-autoscaling aws-ec2 cloud devops devops-tools ec2 infrastructure-as-code sre tech terraform terraform-modules vpc
Last synced: 10 Mar 2025
https://github.com/macbre/http-shadow
Compares HTTP responses from two different backends
Last synced: 20 Jul 2025