An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with reliability

A curated list of projects in awesome lists tagged with reliability .

https://github.com/alibaba/sentinel

A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)

alibaba circuit-breaker cloud-native java microservice microservices rate-limiting reliability resiliency

Last synced: 12 May 2025

https://github.com/alibaba/Sentinel

A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)

alibaba circuit-breaker cloud-native java microservice microservices rate-limiting reliability resiliency

Last synced: 30 Mar 2025

https://github.com/upgundecha/howtheysre

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

alerting chaos-engineering dev-ops devops hacktoberfest hacktoberfest-accepted incident-management incident-response infrastructure ml-ops monitoring observability on-call post-mortem reliability security site-reliability-engineering software-engineering sre sre-culture

Last synced: 12 May 2025

https://github.com/upsonic/upsonic

The most reliable AI agent framework that supports MCP.

agent agent-framework claude computer-use llms mcp model-context-protocol openai rag reliability

Last synced: 18 Jan 2026

https://github.com/hjacobs/kubernetes-failure-stories

Compilation of public failure/horror stories related to Kubernetes

failures incidents kubernetes post-mortem postmortem production-engineering reliability sre

Last synced: 27 Sep 2025

https://github.com/codersguild/system-design

It's just fascinating. How is modern software designed? 🤔 Some design-level considerations for scalability, maintainability eventual consistency, availability & reliability. 👨‍💻 Interview Prep. 👨‍💻

amazon architecture computer-science eventual-consistency facebook google hacktoberfest interview interview-preparation microsoft netflix reliability scalability scale software-engineering system-design toc whatsapp youtube

Last synced: 15 May 2025

https://github.com/codersguild/System-Design

It's just fascinating. How is modern software designed? 🤔 Some design-level considerations for scalability, maintainability eventual consistency, availability & reliability. 👨‍💻 Interview Prep. 👨‍💻

amazon architecture computer-science eventual-consistency facebook google hacktoberfest interview interview-preparation microsoft netflix reliability scalability scale software-engineering system-design toc whatsapp youtube

Last synced: 10 Apr 2025

https://github.com/awslabs/aws-well-architected-labs

Hands on labs and code to help you learn, measure, and build using architectural best practices.

aws cost lab reliability reliability-engineering resilience resiliency security well-architected wellarchitected

Last synced: 09 Jan 2026

https://github.com/tnballo/high-assurance-rust

A free book about developing secure and robust systems software.

book reliability rust security systems-programming

Last synced: 08 Apr 2025

https://github.com/hynek/stamina

Production-grade retries for Python

python reliability retry retrying

Last synced: 29 Apr 2025

https://github.com/mspnp/cloud-design-patterns

Sample implementations for cloud design patterns found in the Azure Architecture Center.

azure cloud cost-optimization design-patterns operational-excellence performance-efficiency reliability security

Last synced: 08 Jul 2025

https://github.com/valkey-io/valkey-glide

An open source Valkey client library that supports Valkey, and Redis open source 6.2, 7.0 and 7.2. Valkey GLIDE is designed for reliability, optimized performance, and high-availability, for Valkey and Redis OSS based applications. GLIDE is a multi language client library, written in Rust with programming language bindings, such as Java and Python

cache csharp database fault-tolerance golang java javascript key-value kotlin nodejs open-source performance pubsub python reliability rust scala typescript valkey valkey-client

Last synced: 09 Feb 2026

https://github.com/uber/arachne

An always-on framework that performs end-to-end functional network testing for reachability, latency, and packet loss

arachne cloud data-center latency monitoring network-monitoring networking packet-loss reachability reliability

Last synced: 11 Jun 2025

https://github.com/p-org/PSharp

A framework for rapid development of reliable asynchronous software.

asynchronous-programming automated-testing dotnet reliability specifications state-machines

Last synced: 29 Apr 2025

https://github.com/teivah/designdeck

An Open-Source Collection of 230+ Flash Cards to Help You Succeed in Your System Design Interview and More 💯

cache cloud database http interview interview-preparation kafka leetcode network reliability scalability security system-design

Last synced: 03 Apr 2025

https://github.com/prequel-dev/preq

preq is the community-driven problem detector for Common Reliability Enumerations (CREs)⚡️

detection monitoring reliability sre

Last synced: 25 Jan 2026

https://github.com/socketsomeone/nestjs-resilience

🛡️ A module for improving the reliability and fault-tolerance of your NestJS applications

circuit-breaker hystrix nest nestjs reliability retry timeout typescript

Last synced: 15 May 2025

https://github.com/oldratlee/software-practice-thoughts

📚 🐣 软件实践文集。主题不限,思考讨论有趣有料就好,包含如 系统的模型分析/量化分析、开源漫游者指南、软件可靠性设计实践、平台产品的逻辑与执行… 🥤

best-practices code-review git miscellaneous model-analysis open-source-practice product quantitative-analysis reliability software-practice

Last synced: 10 Jul 2025

https://github.com/OpenIBC/Ohsce

PHP HI-REL SOCKET TCP/UDP/ICMP/Serial .高可靠性PHP通信&控制框架SOCKET-TCP/UDP/ICMP/硬件Serial-RS232/RS422/RS485 AND MORE!

automation driver iot ipc modbus network-engineers ohsce-php php pursuit reliability rs232 rs485 rtu serial socket tcp udp

Last synced: 12 May 2025

https://github.com/SocketSomeone/nestjs-resilience

🛡️ A module for improving the reliability and fault-tolerance of your NestJS applications

circuit-breaker hystrix nest nestjs reliability retry timeout typescript

Last synced: 22 Jul 2025

https://github.com/imtt-dev/steer

The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and automate reliability

ai-agents llm observability python reliability

Last synced: 13 Jan 2026

https://github.com/snowflake-labs/sansshell

A non-interactive daemon for host management

administration automation go reliability security unshelled

Last synced: 02 Oct 2025

https://github.com/My-Random-Thoughts/QA-Checks-v4

PowerShell scripts to ensure consistent and reliable build quality and configuration for your servers

automation checks compliance configuration consistency gold-image powershell powershell-qa-scripts ps1 qa qa-checks quality reliability reliable service-acceptance verify winrm

Last synced: 10 Apr 2025

https://github.com/kvz/nsfailover

Let's Make DNS Outage Suck Less

bash failover high-availability nameserver reliability

Last synced: 15 Apr 2025

https://github.com/Snowflake-Labs/sansshell

A non-interactive daemon for host management

administration automation go reliability security unshelled

Last synced: 12 May 2025

https://github.com/krkn-chaos/cerberus

Guardian of Kubernetes clusters. Tool to monitor clusters health and signal/alert on failures.

component-failures health-check kubernetes monitoring performance-testing reliability scalability watcher

Last synced: 18 Jan 2026

https://github.com/seuros/breaker_machines

Modern circuit breaker for Ruby & Rails. Thread-safe, fiber-ready async support. Built-in fallbacks, rich introspection, clean DSL. Memory-efficient with jitter & monitoring.

async circuit-breaker concurrent-ruby dsl error-handling failover fallback fault-tolerance fiber-safe high-availability hystrix microservices monitoring observability rails reliability resilience ruby state-machine thread-safe

Last synced: 23 Jan 2026

https://github.com/openslo/slogen

tool to create and manage content for reliability tracking from logs/event data.

command-line-tool golang openslo reliability slo sumologic terraform

Last synced: 12 Jan 2026

https://github.com/LNFWebsite/Streamly

Portable, independent, web-based, simple streaming YouTube video queues and playlists for music videos, audiobooks, etc.

android javascript music playlist reliability stream video web youtube youtube-player youtube-playlist youtube-video-queue

Last synced: 10 May 2025

https://github.com/djgagne/hagelslag

Hagelslag supports segmentation and tracking of weather fields and scalable verification, including performance diagrams and reliability diagrams.

geojson hail hrrr machine-learning mrms netcdf performance performance-diagram python reliability segmentation storms tracking verification weather zarr

Last synced: 08 May 2025

https://github.com/nasa/fmdtools

System Resilience Modelling, Simulation, and Assessment in Python

fault-model hazard-assessment reliability resilience safety simulation

Last synced: 05 Mar 2026

https://github.com/natlabrockies/pvdegradationtools

Set of tools to calculate degradation responses and degradation related parameters for PV.

degradation duramat photovoltaic-systems pv-modules python reliability

Last synced: 07 Feb 2026

https://github.com/jgantunes/pulsarcast

A pub-sub system for the distributed web - my master thesis @ IST

decentralized delivery-guarantees libp2p p2p persistence pubsub reliability scalability thesis

Last synced: 13 Apr 2025

https://github.com/nobl9/sloctl

A command line tool to cast SLO spells 🪄

cli go golang nobl9 reliability slo sre

Last synced: 27 Feb 2026

https://github.com/NREL/PV_ICE

An open-source tool to quantify Solar Photovoltaics (PV) Energy and Mass Flows in the Circular Economy, from a Reliability and Lifetime approach

circular-economy circularity circularity-metrics lifetime mass-flow photovoltaics recycle reliability repair reuse solar-energy

Last synced: 07 May 2025

https://github.com/haochenpan/rabia

Rabia: Simplifying State-Machine Replication Through Randomization (SOSP 2021)

consensus distributed-systems fault-tolerance formal-verification reliability state-machine-replication

Last synced: 16 Jan 2026

https://github.com/NREL/PVDegradationTools

Set of tools to calculate degradation responses and degradation related parameters for PV.

degradation duramat photovoltaic-systems pv-modules python reliability

Last synced: 07 May 2025

https://github.com/intuit/sac3

Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

blackbox consistency hallucinations large-language-models llm reliability semantic

Last synced: 12 Sep 2025

https://github.com/szaghi/fury

Fortran Units (environment) for Reliable phYsical math

fortran oop reliability unit-of-measure uom

Last synced: 25 Feb 2026

https://github.com/chanzuckerberg/redis-memo

A Redis-based version addressable caching system. Memoize pure functions, aggregated database queries, and 3rd party API calls.

activerecord cache caching memoization performance rails redis reliability ruby sql

Last synced: 29 Jul 2025

https://github.com/prathamesh-sonpatki/o11y-wiki

A glossary of all terms related to Observability, starting from A to Z!

glossary glossary-terms metrics monitoring monitoring-tool observability prometheus reliability wiki

Last synced: 23 Apr 2025

https://github.com/wschella/llm-reliability

Code for the paper "Larger and more instructable language models become less reliable"

bloom evaluation gpt llama llm reliability rlhf scaling supervision

Last synced: 12 Apr 2025

https://github.com/checkedc/checkedc

This was a fork of Checked C used from 2021-2024. The changes have been merged into the original Checked C repo.

c c-programming-language reliability security systems-programming

Last synced: 29 Mar 2025

https://github.com/yueyuel/reliablelm4code

Collections of research, benchmarks and tools towards more robust and reliable language models for code; LM4Code; LM4SE; reliable LLM; LLM4Code

code-generation code-intelligence language-models llm4code lm4se reliability software-

Last synced: 31 Jan 2026

https://github.com/checkedc/checkedc-fork

This was a fork of Checked C used from 2021-2024. The changes have been merged into the original Checked C repo.

c c-programming-language reliability security systems-programming

Last synced: 31 Oct 2025

https://github.com/manifoldco/healthz

Easily add health checks to your go services

go golang healthcheck reliability

Last synced: 01 Jul 2025

https://github.com/nobl9/nobl9-backstage-plugin

Nobl9 plugin for Backstage

backstage nobl9 reliability slo

Last synced: 09 Feb 2026

https://github.com/grafana/xk6-chaos

xk6 extension for running chaos experiments with k6 💣

chaos chaos-engineering k6-extension reliability sre testing xk6

Last synced: 01 Oct 2025

https://github.com/googlecloudplatform/reliable-app-platforms

A MVP of a platform for delivering reliable applications on Google Cloud

gke google-cloud kubernetes reliability slos sre terraform

Last synced: 20 Oct 2025

https://github.com/rhesis-ai/rhesis-sdk

Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.

application-insights compliance llm-evaluation llm-testing open-source quality-assessment reliability responsible-ai robustness trustworthiness validation

Last synced: 29 Jan 2026

https://github.com/jakecoffman/rely

reliable UDP messages for Go

ack multiplayer-game reliability rtt udp

Last synced: 02 Aug 2025

https://github.com/kelunik/retry

A tiny library for retrying failed operations.

amphp backoff reliability retry

Last synced: 14 Jul 2025

https://github.com/cutenode/mingine

A module to get the minimum usable engine(s)

engines node nodejs package-json package-lock reliability supoort

Last synced: 28 Apr 2025

https://github.com/cooldogedev/spectral

Spectral is a blazingly fast and lightweight network engine built on UDP, designed for real-time, low-latency applications.

go networking protocol real-time reliability udp

Last synced: 09 Jul 2025

https://github.com/MathiasRenner/optimize-ubuntu

Optimize Ubuntu for usability, security, privacy and stability

linux privacy reliability security ubuntu

Last synced: 07 Sep 2025

https://github.com/fsepy/sfeprapy

Structural Fire Engineering - Probabilistic Reliability Assessment

eurocode fire ibmb-fire monte-carlo-simulation parametric-fire probabilistic-analysis reliability structural travelling-fire

Last synced: 14 Jan 2026

https://github.com/atlantix-eda/atlantix-eda

Programmatically generated PCB libraries facilitating robust electronic product design.

altium altium-libraries capacitor flexibility generation kicad kicad-pcb lib libraries pcb reliability resistor resistor-library resistor-series rust rust-lang

Last synced: 10 Mar 2026

https://github.com/globocom/reliable-request

A golang opinionated library to provide reliable request using hystrix-go, go-cache, and go-resiliency.

caching circuit-breaker golang http http-client reliability requests retry-library

Last synced: 08 Sep 2025

https://github.com/yueyuel/xaiforandroidmalware

Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?

android-app explainable-ai malware-detection reliability

Last synced: 29 Apr 2025

https://github.com/saturn77/atlantix-eda

Programmatically generated PCB libraries facilitating robust electronic product design.

altium altium-libraries capacitor flexibility generation kicad lib libraries pcb reliability resistor resistor-library resistor-series rust rust-lang

Last synced: 03 Mar 2025

https://github.com/nouamanetazi/website-monitor

A tool written in Go that helps you monitor a collection of websites using various metrics.

go http monitoring reliability uptime

Last synced: 14 Apr 2025

https://github.com/grafana/k6-cloud-feature-requests

The place to propose, dicuss and vote for k6 Cloud features and ideas.

k6 k6cloud loadtesting performance-testing reliability

Last synced: 01 Oct 2025

https://github.com/cooldogedev/spectral-php

Spectral is a blazingly fast and lightweight network engine built on UDP, designed for real-time, low-latency applications.

networking php protocol real-time reliability udp

Last synced: 09 Jul 2025

https://github.com/mikhailknyazev/kube-course

Main resources for Udemy course "Configuring Kubernetes for Reliability with LitmusChaos"

chaos-engineering eks helm kubernetes litmuschaos pipelines reliability terraform

Last synced: 15 Apr 2025

https://github.com/steadybit/extension-kubernetes

A Steadybit extension to check the state of the Kubernetes cluster and inject faults.

chaos-engineering chaos-testing helm kubernetes reliability

Last synced: 24 Feb 2026

https://github.com/torchei/torchei

TorchEI is a high-speed toolbox for DNN Reliability's Research and Development

bit-flip-attack bit-flipping error-injection fault-injection reliability torch

Last synced: 01 Oct 2025

https://github.com/devmotion/pycalibration

Estimation and hypothesis tests of calibration in Python using CalibrationErrors.jl and CalibrationTests.jl.

calibration julia python reliability

Last synced: 15 Apr 2025

https://github.com/cmu-safari/harp

HARP is a memory error profiling algorithm (i.e., for identifying error-prone cells) designed for use with memory chips that use on-die error-correcting codes (ECC). This tool uses Monte-Carlo simulation to evaluate HARP and other error profilers. HARP and this tool are described in the 2021 MICRO paper by Patel et al.: https://arxiv.org/abs/2109.12697.

dram error-correcting-codes error-correction monte-carlo reliability simulator

Last synced: 27 Jan 2026

https://github.com/adarshpatil/dve

Improving DRAM Reliability and Performance On-Demand via Coherent Replication [ISCA 2021]

coherence dram reliability replication sockets

Last synced: 24 Jan 2026

https://github.com/connerdouglass/go-retry

Go library for automatic retries

context golang reliability retry

Last synced: 14 Oct 2025

https://github.com/jehiah/retrydb

RetryDB transparently retries *sql.DB operations against a secondary datasource.

database go reliability

Last synced: 11 Apr 2025

https://github.com/jaketarnow/snoopy

Home network wifi scanner application to view all devices on your home network and run diagnostics

diagnostics icmp networking objective-c ping reliability scanner snoopy upnp wifi

Last synced: 10 Mar 2026

https://github.com/friesischscott/survivalsignature.jl

Computation and numerical approximation of survival signatures.

monte-carlo-simulation reliability survival-signature

Last synced: 09 Apr 2025

https://github.com/lingrino/uptime

uptime calculator

reliability sla slo snowpack svelte tailwind

Last synced: 18 Aug 2025

https://github.com/devmotion/reliabilitydiagrams.jl

Visualization of model calibration

calibration reliability

Last synced: 14 Apr 2025

https://github.com/lawouach/ebpf-2021-talk

Code for my talk at ebpf 2021 conference

devops ebpf reliability reliably sre

Last synced: 12 Apr 2025

https://github.com/pumpkinseed/netrel

Internet reliability check - CLI tool

internet-connection reliability reliability-analysis

Last synced: 28 Jun 2025

https://github.com/ekmolloy/fmri_test-retest

Documentation and MATLAB code for test-retest functional MRI studies.

neuroimaging reliability resting-state-fmri

Last synced: 24 Dec 2025

https://github.com/djmgit/deathstar

A tool for loadtesting web based services in a easy, automated, cloud native and quick way without spending time on infrastructure setup for load generation.

automation chaos devops loadtesting reliability resiliency

Last synced: 12 Jun 2025

https://github.com/steadybit/reliability-hub-db

Database containing the content for Steadybit's Reliability Hub

chaos-engineering chaos-testing database reliability resilience steadybit

Last synced: 14 Apr 2025

https://github.com/aponysus/redress

Composable, low-overhead retry policies with pluggable classification, per-class backoff strategies, and structured observability hooks. Designed for services that need predictable retry behavior and clean integration with metrics/logging.

backoff backoff-library exponential-backoff fault-tolerance observability python reliability resilience resilient-system retry retry-library

Last synced: 06 Mar 2026