An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with reliability-engineering

A curated list of projects in awesome lists tagged with reliability-engineering .

https://github.com/litmuschaos/litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

chaos-engineering chaos-experiments chaos-testing chaoshub cloud-native cncf devops fault-injection fault-simulation golang google-summer-of-code hacktoberfest k8s kubernetes lfx litmuschaos operator-sdk reliability-engineering resilience-testing site-reliability-engineering

Last synced: 23 Oct 2025

https://github.com/bregman-arie/sre-checklist

A checklist of anyone practicing Site Reliability Engineering

automation checklist gitops kubernetes reliability-engineering sre terraform

Last synced: 15 May 2025

https://github.com/awslabs/aws-well-architected-labs

Hands on labs and code to help you learn, measure, and build using architectural best practices.

aws cost lab reliability reliability-engineering resilience resiliency security well-architected wellarchitected

Last synced: 09 Jan 2026

https://github.com/Azure/Mission-Critical

This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.

azure business-critical mission-critical reliability-engineering safety-critical

Last synced: 22 Jul 2025

https://github.com/azure/mission-critical

This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.

azure business-critical mission-critical reliability-engineering safety-critical

Last synced: 28 Sep 2025

https://github.com/artilleryio/chaos-lambda

Serverless chaos monkey for AWS (runs on AWS Lambda) ☁️ 💥

aws chaos-monkey fault-tolerance reliability-engineering

Last synced: 22 Jul 2025

https://github.com/rakhimov/scram

Probabilistic Risk Analysis Tool (fault tree analysis, event tree analysis, etc.)

bdd c-plus-plus cpp17 event-tree fault-tree fta pra psa python qt5 reliability-engineering risk-analysis zbdd

Last synced: 11 May 2025

https://github.com/temperlang/temper

A programming language for libraries translated to all the others

distributed-systems interoperability programming-language reliability-engineering translation

Last synced: 25 Apr 2026

https://github.com/alphagov/paas-cf

GOV.UK PaaS - Cloud Foundry

cloud-foundry concourse paas reliability-engineering

Last synced: 08 May 2025

https://github.com/theodesp/stable-systems-checklist

An opinionated list of attributes and policies that need to be met in order to establish a stable software system.

architecture continuous-delivery continuous-integration fault-tolerance reliability-engineering security

Last synced: 07 Jan 2026

https://github.com/alphagov/puppet-aptly

No longer maintained: Puppet module for aptly

govuk puppet reliability-engineering

Last synced: 30 Sep 2025

https://github.com/alphagov/paas-billing

A Go application for generating billing data from cloudfoundry events

cloud-foundry paas reliability-engineering

Last synced: 18 Jun 2025

https://github.com/alphagov/paas-admin

Administration tool for GOV.UK PaaS

cloud-foundry node paas reliability-engineering typescript webpack

Last synced: 08 May 2025

https://github.com/alphagov/paas-aiven-broker

A service broker to provide Aiven Elasticsearch and InfluxDB services to Cloud Foundry users

aiven aws broker cloud-foundry paas reliability-engineering

Last synced: 12 Jul 2025

https://github.com/last9/last9-integrations

Sample applications of supported integrations by Last9 Products

integrations last9 reliability-engineering sre timeseries-database

Last synced: 28 Apr 2025

https://github.com/alphagov/paas-bootstrap

Bootstrap a VPC with BOSH and Concourse to run PaaS

bosh concourse paas reliability-engineering

Last synced: 16 Jun 2025

https://github.com/alphagov/paas-tech-docs

Technical documentation for GOV.UK PaaS

documentation paas reliability-engineering tech-docs-template

Last synced: 08 May 2025

https://github.com/shantoroy/site-reliability-engineering-101

This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.

100daysofcode alerting automation chaos-engineering devops devsecops monitoring reliability-engineering service-level-agreement service-level-indicator service-level-objective site-reliability-engineering sre

Last synced: 27 Mar 2026

https://github.com/louisaslett/reliabilitytheory

ReliabilityTheory R package: Tools for structural reliability analysis

r reliability-engineering

Last synced: 12 Apr 2025

https://github.com/guilt/chaossquirrel

Like Netflix's Chaos Monkey, packaged to run standalone.

chaos-monkey reliability-engineering sre

Last synced: 12 Aug 2025

https://github.com/alphagov/paas-elasticache-broker

A CloudFoundry service broker for AWS Elasticache Redis services

aws broker cloud-foundry paas reliability-engineering

Last synced: 08 May 2025

https://github.com/alphagov/paas-s3-broker

An Open Service Broker API-compatible service broker for AWS S3

aws broker cloud-foundry paas reliability-engineering s3

Last synced: 08 May 2025

https://github.com/alphagov/zendesk-scripts

Various scripts in various languages to interact with GDS Zendesk

reliability-engineering

Last synced: 06 Oct 2025

https://github.com/govuk-paas/paas-elasticache-broker

A CloudFoundry service broker for AWS Elasticache Redis services

aws broker cloud-foundry paas reliability-engineering

Last synced: 07 Oct 2025

https://github.com/alphagov/paas-release-ci

Central release CI repository

bosh concourse paas reliability-engineering

Last synced: 08 May 2025

https://github.com/carlescg/faulttreetutorial

This is a tutorial for the package FaultTree from openreliability.com

fault-tree reliability-engineering risk-analysis risk-models tutorial

Last synced: 13 Jul 2025

https://github.com/alphagov/paas-prometheus-charts

generate SVG charts for PromQL queries

paas reliability-engineering

Last synced: 06 Jul 2025

https://github.com/exospherehost/ai-reliability-standards

Architectural standards and best practices for building reliable AI Agents and LLM workflows. Defining the framework for AI Reliability Engineering (AIRE).

ai ai-agents ai-reliability aiops durable-execution enterprise evals evaluation observability reliability-engineering sre

Last synced: 15 Feb 2026

https://github.com/alphagov/white-chapel-building-map

:office: :globe_with_meridians: Maps of the GDS office in the white chapel building

reliability-engineering

Last synced: 08 May 2025

https://github.com/govuk-paas/paas-prometheus-charts

generate SVG charts for PromQL queries

paas reliability-engineering

Last synced: 11 Oct 2025

https://github.com/alphagov/paas-service-broker-base

Provides a base for building new service brokers

broker cloud-foundry paas reliability-engineering

Last synced: 08 May 2025

https://github.com/exospherehost/claudeye

Watchtower for Claude Code & Agents SDK - replay sessions, run custom evals, debug agent traces. Uncover, Understand, and Utilize

ai-agents claude claude-agents-sdk claude-code cli eval eval-framework npm observability reliability reliability-engineering

Last synced: 01 Mar 2026

https://github.com/alphagov/paas-sqs-broker

An Open Service Broker API-compatible service broker for AWS SQS

aws broker cloud-foundry paas reliability-engineering sqs

Last synced: 08 May 2025

https://github.com/thalesgroup/statistical-reliability-ml

This package provides implementations of Monte Carlo methods to estimate the probability of failure of neural networks under noisy inputs.

monte-carlo-methods neural-networks reliability-engineering statistical-analysis

Last synced: 17 Mar 2025

https://github.com/1mb-dev/autobreaker

Adaptive circuit breaker for Go with percentage-based thresholds that automatically adjust to traffic patterns. Zero dependencies, <100ns overhead.

adaptive-threshold circuit-breaker distributed-systems fault-tolerance go golang microservices observability reliability-engineering resilience

Last synced: 24 Feb 2026

https://github.com/alphagov/paas-auditor

Stores Cloud Foundry audit events in a Postgres database

cloud-foundry paas reliability-engineering security

Last synced: 08 May 2025

https://github.com/alphagov/paas-metrics-collector-cw

PaaS metrics collector to CloudWatch

cloudwatch paas reliability-engineering

Last synced: 08 May 2025

https://github.com/openpra-org/inverse-canopy

An inverse estimation technique for back-fitting conditional/functional event probability distributions in an event tree to match target end-state frequencies.

event-tree fault-tree pra probabilistic-risk-assessment probability-distribution reliability reliability-engineering risk-assessment

Last synced: 11 Apr 2026

https://github.com/alphagov/paas-observability-release

A repository for the observability prototype for GOV.UK PaaS

bosh observability paas reliability-engineering

Last synced: 08 May 2025

https://github.com/alphagov/paas-rds-metric-collector

Small application connecting to all RDS instances hosted on the GOV.UK PaaS and gathering metrics. Pushing them to loggregator.

aws cloud-foundry metrics paas reliability-engineering

Last synced: 19 Sep 2025

https://github.com/jsabo/aws-lambda-failure-flag-app

This project demonstrates how to integrate Gremlin Failure Flags into an AWS Lambda function, enabling you to simulate injected latency and exceptions while measuring processing performance. It’s a serverless demo for testing resiliency and fault injection in real-world cloud environments.

aws chaos-engineering gremlin lambda performance-testing reliability-engineering

Last synced: 08 Jul 2025

https://github.com/alphagov/paas-db-admin-boshrelease

A Bosh release for running administrative operations against Postgres

bosh paas reliability-engineering

Last synced: 08 May 2025

https://github.com/alphagov/paas-submit

🔥 Firebreak Project ℹ️

node paas reliability-engineering typescript webpack

Last synced: 08 May 2025

https://github.com/chirag2203/softwarereliability_gwo_ieee

Research paper on "Comparative analysis of software reliability using GWO and ML" published in IEEE during IATMSI conference.

gwo-optimization-algorithm ieee ml python reliability-engineering software-reliability

Last synced: 07 Sep 2025

https://github.com/allenpandas/se4ml-toolkit

人工智能+计算机安全交叉领域科研工具🔧 SE4ML: Security for Machine Learning. This repository is the Toolkit for Security, Robustness and Reliability of the Machine Learning.

ai-security aisecurity machine-learning reinforcement-learning reliability-engineering robustness security software-engineering software-testing tool toolkit

Last synced: 07 Aug 2025

https://github.com/alphagov/paas-rubbernecker

A summary of stuff in PivotalTracker

non-platform-tools paas reliability-engineering

Last synced: 31 Aug 2025

https://github.com/mosher-labs/helm-charts

🚀 This repository serves as a centralized collection of Helm charts for deploying and managing Kubernetes applications. 🎯

axes devops helm helm-charts infrastructure-as-code k8s kubernetes mosher-labs reliability-engineering viking

Last synced: 26 Feb 2025

https://github.com/mosher-labs/basic-repo-template

🚀 This repository serves as a basic template for creating new repositories. It's designed to be a foundation for structure and organization. 🎯

axes devops infrastructure-as-code mosher-labs reliability-engineering templates viking

Last synced: 05 Mar 2026

https://github.com/juanfranciscocis/devprobe_tesis

DevProbe is a progressive web application that provides a platform for Site Reliability Engineers to monitor their websites. The app is built with , IONIC, Angular and Firebase.

angular gemini gemini-api ionic ionic-framework reliability-engineering site site-reliability-engineering site-reliability-engineering-sre sre sre-team typescript

Last synced: 10 Apr 2026

https://github.com/texasbe2trill/constellation-engine

A dependency graph–driven system for reasoning about failure propagation, blast radius, and architectural risk in complex systems.

ai-reasoning architecture good-first-issue graph-theory python reliability-engineering systems-engineering

Last synced: 26 Jan 2026

https://github.com/pedroliman/uniconf

Software de Confiabilidade Universitário

reliability reliability-engineering shiny shinyapps

Last synced: 01 Apr 2025

https://github.com/mosher-labs/ansible-node-setup

🚀 This repo provides Ansible playbooks and roles designed to configure and manage nodes for lightweight Kubernetes clusters using K3s. 🎯

ansible axes devops infrastructure-as-code mosher-labs reliability-engineering viking

Last synced: 16 Jan 2026

https://github.com/vsamidurai/outage-reports

Curated list of technical outage/incident reports

cloud learning reliability-engineering sharing-is-caring

Last synced: 02 Feb 2026

https://github.com/mosher-labs/.github

⚡⚔️🌩️ Welcome to Mosher Labs! Combining Scandinavian heritage and cutting-edge cloud technologies, we deliver precision-crafted solutions with AWS and Infrastructure as Code. ⚡⚔️🌩️

automation aws axes cicd-pipelines cloud-computing cloud-native devops homelab infrastructure-as-code mosher-labs reliability-engineering viking

Last synced: 25 Apr 2026

https://github.com/cakmoel/resilio

Professional technology-agnostic load testing suite built for performance engineering and durability auditing. Implements research-based methodologies (Jain, 1991) and ISO 25010 standards to validate speed, endurance, and scalability across any backend stack.

apachebench benchmarking devops-tools endurance-testing load-testing performance-testing quality-assurance reliability-engineering scalability sre stress-testing tech-agnostic

Last synced: 13 Apr 2026

https://github.com/kretski/orac-nt-ssd-thermal-sdk

Deterministic Physics-based Thermal Control SDK for NVMe Controllers. Extending NAND lifetime by 31.6% using ORAC-NT Vitality modeling and Arrhenius-based BER optimization.

nand-flash-memory nvme reliability-engineering ssd-controller thermal-management

Last synced: 06 Apr 2026

https://github.com/vnykmshr/autobreaker

Adaptive circuit breaker for Go with percentage-based thresholds that automatically adjust to traffic patterns. Zero dependencies, <100ns overhead.

adaptive-threshold circuit-breaker distributed-systems fault-tolerance go golang microservices observability reliability-engineering resilience

Last synced: 23 Jan 2026

https://github.com/mosher-labs/basic-helm-charts-template

🚀 This repo provides a clean, minimal starting point for creating and managing Helm charts for Kubernetes applications. 🎯

axes devops helm helm-chart helm-charts infrastructure-as-code k8s kubernetes mosher-labs reliability-engineering viking

Last synced: 29 Jan 2026