Projects in Awesome Lists tagged with reliability
A curated list of projects in awesome lists tagged with reliability .
https://github.com/alibaba/sentinel
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
alibaba circuit-breaker cloud-native java microservice microservices rate-limiting reliability resiliency
Last synced: 12 May 2025
https://github.com/alibaba/Sentinel
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
alibaba circuit-breaker cloud-native java microservice microservices rate-limiting reliability resiliency
Last synced: 30 Mar 2025
https://github.com/upgundecha/howtheysre
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
alerting chaos-engineering dev-ops devops hacktoberfest hacktoberfest-accepted incident-management incident-response infrastructure ml-ops monitoring observability on-call post-mortem reliability security site-reliability-engineering software-engineering sre sre-culture
Last synced: 12 May 2025
https://github.com/upsonic/upsonic
The most reliable AI agent framework that supports MCP.
agent agent-framework claude computer-use llms mcp model-context-protocol openai rag reliability
Last synced: 18 Jan 2026
https://github.com/hjacobs/kubernetes-failure-stories
Compilation of public failure/horror stories related to Kubernetes
failures incidents kubernetes post-mortem postmortem production-engineering reliability sre
Last synced: 27 Sep 2025
https://github.com/codersguild/system-design
It's just fascinating. How is modern software designed? 🤔 Some design-level considerations for scalability, maintainability eventual consistency, availability & reliability. 👨💻 Interview Prep. 👨💻
amazon architecture computer-science eventual-consistency facebook google hacktoberfest interview interview-preparation microsoft netflix reliability scalability scale software-engineering system-design toc whatsapp youtube
Last synced: 15 May 2025
https://github.com/codersguild/System-Design
It's just fascinating. How is modern software designed? 🤔 Some design-level considerations for scalability, maintainability eventual consistency, availability & reliability. 👨💻 Interview Prep. 👨💻
amazon architecture computer-science eventual-consistency facebook google hacktoberfest interview interview-preparation microsoft netflix reliability scalability scale software-engineering system-design toc whatsapp youtube
Last synced: 10 Apr 2025
https://github.com/awslabs/aws-well-architected-labs
Hands on labs and code to help you learn, measure, and build using architectural best practices.
aws cost lab reliability reliability-engineering resilience resiliency security well-architected wellarchitected
Last synced: 09 Jan 2026
https://github.com/chaostoolkit/chaostoolkit
Chaos Engineering Toolkit & Orchestration for Developers
automation chaos-engineering chaostoolkit devops-tools reliability reliability-engineering resiliency sre
Last synced: 14 May 2025
https://github.com/tnballo/high-assurance-rust
A free book about developing secure and robust systems software.
book reliability rust security systems-programming
Last synced: 08 Apr 2025
https://github.com/hynek/stamina
Production-grade retries for Python
python reliability retry retrying
Last synced: 29 Apr 2025
https://github.com/mspnp/cloud-design-patterns
Sample implementations for cloud design patterns found in the Azure Architecture Center.
azure cloud cost-optimization design-patterns operational-excellence performance-efficiency reliability security
Last synced: 08 Jul 2025
https://github.com/rosehgal/trashemail
A hosted disposable email telegram bot; Extremely privacy friendly; Proudly hosted for community.
disposable-email docker docker-compose docker-image dockerfile hacktoberfest hacktoberfest2020 imap-server java java-spring-boot mailbox reliability smtp spring telegram telegram-bot temp-email temporary-email trash-email trashmail
Last synced: 07 Apr 2025
https://github.com/rosehgal/TrashEmail
A hosted disposable email telegram bot; Extremely privacy friendly; Proudly hosted for community.
disposable-email docker docker-compose docker-image dockerfile hacktoberfest hacktoberfest2020 imap-server java java-spring-boot mailbox reliability smtp spring telegram telegram-bot temp-email temporary-email trash-email trashmail
Last synced: 07 May 2025
https://github.com/valkey-io/valkey-glide
An open source Valkey client library that supports Valkey, and Redis open source 6.2, 7.0 and 7.2. Valkey GLIDE is designed for reliability, optimized performance, and high-availability, for Valkey and Redis OSS based applications. GLIDE is a multi language client library, written in Rust with programming language bindings, such as Java and Python
cache csharp database fault-tolerance golang java javascript key-value kotlin nodejs open-source performance pubsub python reliability rust scala typescript valkey valkey-client
Last synced: 09 Feb 2026
https://github.com/uber/arachne
An always-on framework that performs end-to-end functional network testing for reachability, latency, and packet loss
arachne cloud data-center latency monitoring network-monitoring networking packet-loss reachability reliability
Last synced: 11 Jun 2025
https://github.com/p-org/PSharp
A framework for rapid development of reliable asynchronous software.
asynchronous-programming automated-testing dotnet reliability specifications state-machines
Last synced: 29 Apr 2025
https://github.com/teivah/designdeck
An Open-Source Collection of 230+ Flash Cards to Help You Succeed in Your System Design Interview and More 💯
cache cloud database http interview interview-preparation kafka leetcode network reliability scalability security system-design
Last synced: 03 Apr 2025
https://github.com/prequel-dev/preq
preq is the community-driven problem detector for Common Reliability Enumerations (CREs)⚡️
detection monitoring reliability sre
Last synced: 25 Jan 2026
https://github.com/traceloop/jest-opentelemetry
Easily run integration tests for your backends
api-testing e2e-testing integration-testing javascript jest open-telemetry otel reliability test-automation testing tracing typescript
Last synced: 06 Apr 2025
https://github.com/socketsomeone/nestjs-resilience
🛡️ A module for improving the reliability and fault-tolerance of your NestJS applications
circuit-breaker hystrix nest nestjs reliability retry timeout typescript
Last synced: 15 May 2025
https://github.com/oldratlee/software-practice-thoughts
📚 🐣 软件实践文集。主题不限,思考讨论有趣有料就好,包含如 系统的模型分析/量化分析、开源漫游者指南、软件可靠性设计实践、平台产品的逻辑与执行… 🥤
best-practices code-review git miscellaneous model-analysis open-source-practice product quantitative-analysis reliability software-practice
Last synced: 10 Jul 2025
https://github.com/OpenIBC/Ohsce
PHP HI-REL SOCKET TCP/UDP/ICMP/Serial .高可靠性PHP通信&控制框架SOCKET-TCP/UDP/ICMP/硬件Serial-RS232/RS422/RS485 AND MORE!
automation driver iot ipc modbus network-engineers ohsce-php php pursuit reliability rs232 rs485 rtu serial socket tcp udp
Last synced: 12 May 2025
https://github.com/orra-dev/orra
Resilience for AI Agent workflows.
agents ai ai-agents ai-developer-tools ai-in-production durable-execution go golang javascript-sdk llm-apps orchestrator python-sdk reasoning reliability
Last synced: 10 May 2025
https://github.com/gwsystems/composite
A component-based OS
components composite embedded-systems os parallelism real-time reliability
Last synced: 09 May 2025
https://github.com/dastergon/wheel-of-misfortune
A role-playing game for incident management training
chaos-engineering devops incident-management incident-response incident-scenario oncall-engineers postmortem reliability site-reliability site-reliability-engineering sre
Last synced: 28 Feb 2026
https://github.com/iaar-shanghai/xfinder
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
benchmark cc-by-nc-nd-4 chatglm dataset evaluation gpt judge-model key-answer-extraction large-language-models llm llm-as-a-judge llm-as-evaluator lm-evaluation open-compass phi qwen regex reliability reliable-evaluation xfinder
Last synced: 24 Oct 2025
https://github.com/boschresearch/pylife
a general library for fatigue and reliability
education engineering fatigue fatigue-analysis lifetime material material-fatigue material-science mechanical-engineering rainflow reliability woehler
Last synced: 07 Oct 2025
https://github.com/SocketSomeone/nestjs-resilience
🛡️ A module for improving the reliability and fault-tolerance of your NestJS applications
circuit-breaker hystrix nest nestjs reliability retry timeout typescript
Last synced: 22 Jul 2025
https://github.com/irrustible/async-backplane
Simple, Erlang-inspired fault-tolerance framework for Rust Futures.
async async-await asynchronous erlang-otp fault-detection fault-tolerance recovery reliability reliability-backplane rust
Last synced: 09 Apr 2025
https://github.com/imtt-dev/steer
The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and automate reliability
ai-agents llm observability python reliability
Last synced: 13 Jan 2026
https://github.com/snowflake-labs/sansshell
A non-interactive daemon for host management
administration automation go reliability security unshelled
Last synced: 02 Oct 2025
https://github.com/My-Random-Thoughts/QA-Checks-v4
PowerShell scripts to ensure consistent and reliable build quality and configuration for your servers
automation checks compliance configuration consistency gold-image powershell powershell-qa-scripts ps1 qa qa-checks quality reliability reliable service-acceptance verify winrm
Last synced: 10 Apr 2025
https://github.com/iaar-shanghai/xverify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
benchmark cc-by-nc-nd-4 chatgpt deepseek-math evaluation judge-model llm llm-as-a-judge math-verify open-compass open-r1 reasoning-models regex reliability reliability-tools xverify
Last synced: 07 Oct 2025
https://github.com/kvz/nsfailover
Let's Make DNS Outage Suck Less
bash failover high-availability nameserver reliability
Last synced: 15 Apr 2025
https://github.com/Snowflake-Labs/sansshell
A non-interactive daemon for host management
administration automation go reliability security unshelled
Last synced: 12 May 2025
https://github.com/krkn-chaos/cerberus
Guardian of Kubernetes clusters. Tool to monitor clusters health and signal/alert on failures.
component-failures health-check kubernetes monitoring performance-testing reliability scalability watcher
Last synced: 18 Jan 2026
https://github.com/seuros/breaker_machines
Modern circuit breaker for Ruby & Rails. Thread-safe, fiber-ready async support. Built-in fallbacks, rich introspection, clean DSL. Memory-efficient with jitter & monitoring.
async circuit-breaker concurrent-ruby dsl error-handling failover fallback fault-tolerance fiber-safe high-availability hystrix microservices monitoring observability rails reliability resilience ruby state-machine thread-safe
Last synced: 23 Jan 2026
https://github.com/openslo/slogen
tool to create and manage content for reliability tracking from logs/event data.
command-line-tool golang openslo reliability slo sumologic terraform
Last synced: 12 Jan 2026
https://github.com/LNFWebsite/Streamly
Portable, independent, web-based, simple streaming YouTube video queues and playlists for music videos, audiobooks, etc.
android javascript music playlist reliability stream video web youtube youtube-player youtube-playlist youtube-video-queue
Last synced: 10 May 2025
https://github.com/djgagne/hagelslag
Hagelslag supports segmentation and tracking of weather fields and scalable verification, including performance diagrams and reliability diagrams.
geojson hail hrrr machine-learning mrms netcdf performance performance-diagram python reliability segmentation storms tracking verification weather zarr
Last synced: 08 May 2025
https://github.com/nasa/fmdtools
System Resilience Modelling, Simulation, and Assessment in Python
fault-model hazard-assessment reliability resilience safety simulation
Last synced: 05 Mar 2026
https://github.com/natlabrockies/pvdegradationtools
Set of tools to calculate degradation responses and degradation related parameters for PV.
degradation duramat photovoltaic-systems pv-modules python reliability
Last synced: 07 Feb 2026
https://github.com/jgantunes/pulsarcast
A pub-sub system for the distributed web - my master thesis @ IST
decentralized delivery-guarantees libp2p p2p persistence pubsub reliability scalability thesis
Last synced: 13 Apr 2025
https://github.com/nobl9/sloctl
A command line tool to cast SLO spells 🪄
cli go golang nobl9 reliability slo sre
Last synced: 27 Feb 2026
https://github.com/NREL/PV_ICE
An open-source tool to quantify Solar Photovoltaics (PV) Energy and Mass Flows in the Circular Economy, from a Reliability and Lifetime approach
circular-economy circularity circularity-metrics lifetime mass-flow photovoltaics recycle reliability repair reuse solar-energy
Last synced: 07 May 2025
https://github.com/haochenpan/rabia
Rabia: Simplifying State-Machine Replication Through Randomization (SOSP 2021)
consensus distributed-systems fault-tolerance formal-verification reliability state-machine-replication
Last synced: 16 Jan 2026
https://github.com/NREL/PVDegradationTools
Set of tools to calculate degradation responses and degradation related parameters for PV.
degradation duramat photovoltaic-systems pv-modules python reliability
Last synced: 07 May 2025
https://github.com/intuit/sac3
Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
blackbox consistency hallucinations large-language-models llm reliability semantic
Last synced: 12 Sep 2025
https://github.com/szaghi/fury
Fortran Units (environment) for Reliable phYsical math
fortran oop reliability unit-of-measure uom
Last synced: 25 Feb 2026
https://github.com/chanzuckerberg/redis-memo
A Redis-based version addressable caching system. Memoize pure functions, aggregated database queries, and 3rd party API calls.
activerecord cache caching memoization performance rails redis reliability ruby sql
Last synced: 29 Jul 2025
https://github.com/prathamesh-sonpatki/o11y-wiki
A glossary of all terms related to Observability, starting from A to Z!
glossary glossary-terms metrics monitoring monitoring-tool observability prometheus reliability wiki
Last synced: 23 Apr 2025
https://github.com/wschella/llm-reliability
Code for the paper "Larger and more instructable language models become less reliable"
bloom evaluation gpt llama llm reliability rlhf scaling supervision
Last synced: 12 Apr 2025
https://github.com/checkedc/checkedc
This was a fork of Checked C used from 2021-2024. The changes have been merged into the original Checked C repo.
c c-programming-language reliability security systems-programming
Last synced: 29 Mar 2025
https://github.com/yueyuel/reliablelm4code
Collections of research, benchmarks and tools towards more robust and reliable language models for code; LM4Code; LM4SE; reliable LLM; LLM4Code
code-generation code-intelligence language-models llm4code lm4se reliability software-
Last synced: 31 Jan 2026
https://github.com/checkedc/checkedc-fork
This was a fork of Checked C used from 2021-2024. The changes have been merged into the original Checked C repo.
c c-programming-language reliability security systems-programming
Last synced: 31 Oct 2025
https://github.com/manifoldco/healthz
Easily add health checks to your go services
go golang healthcheck reliability
Last synced: 01 Jul 2025
https://github.com/erickwendel/high-reliability-js
Examples of Error Handling and reaching High Reliability with vanilla JavaScript
child-process deep-dive-javascript error-handling graceful-shutdown javascript kubernetes let-it-crash nodejs reliability zero-downtime
Last synced: 06 Sep 2025
https://github.com/nobl9/nobl9-backstage-plugin
Nobl9 plugin for Backstage
backstage nobl9 reliability slo
Last synced: 09 Feb 2026
https://github.com/grafana/xk6-chaos
xk6 extension for running chaos experiments with k6 💣
chaos chaos-engineering k6-extension reliability sre testing xk6
Last synced: 01 Oct 2025
https://github.com/googlecloudplatform/reliable-app-platforms
A MVP of a platform for delivering reliable applications on Google Cloud
gke google-cloud kubernetes reliability slos sre terraform
Last synced: 20 Oct 2025
https://github.com/rhesis-ai/rhesis-sdk
Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.
application-insights compliance llm-evaluation llm-testing open-source quality-assessment reliability responsible-ai robustness trustworthiness validation
Last synced: 29 Jan 2026
https://github.com/devmotion/calibrationerrors.jl
Estimation of calibration errors.
calibration julia machine-learning reliability statistics
Last synced: 24 Aug 2025
https://github.com/jakecoffman/rely
reliable UDP messages for Go
ack multiplayer-game reliability rtt udp
Last synced: 02 Aug 2025
https://github.com/kelunik/retry
A tiny library for retrying failed operations.
amphp backoff reliability retry
Last synced: 14 Jul 2025
https://github.com/cutenode/mingine
A module to get the minimum usable engine(s)
engines node nodejs package-json package-lock reliability supoort
Last synced: 28 Apr 2025
https://github.com/Ramsbaby/openclaw-self-healing
AI-powered self-healing system for OpenClaw Gateway • 4-tier autonomous recovery • macOS & Linux
ai-agent artificial-intelligence automation bash claude-ai claude-code crash-recovery devops homelab launchd macos monitoring observability openclaw reliability self-healing sre watchdog
Last synced: 19 Feb 2026
https://github.com/jgalego/awesome-safety-critical-ai
A curated list of references on the role of AI in safety-critical systems ⚠️
ai-privacy ai-safety artificial-intelligence awesome awesome-list awesome-lists compliance explainability genai machine-learning red-team reliability responsible-ai safety-critical trustworthy-ai
Last synced: 01 Mar 2025
https://github.com/cooldogedev/spectral
Spectral is a blazingly fast and lightweight network engine built on UDP, designed for real-time, low-latency applications.
go networking protocol real-time reliability udp
Last synced: 09 Jul 2025
https://github.com/MathiasRenner/optimize-ubuntu
Optimize Ubuntu for usability, security, privacy and stability
linux privacy reliability security ubuntu
Last synced: 07 Sep 2025
https://github.com/fsepy/sfeprapy
Structural Fire Engineering - Probabilistic Reliability Assessment
eurocode fire ibmb-fire monte-carlo-simulation parametric-fire probabilistic-analysis reliability structural travelling-fire
Last synced: 14 Jan 2026
https://github.com/atlantix-eda/atlantix-eda
Programmatically generated PCB libraries facilitating robust electronic product design.
altium altium-libraries capacitor flexibility generation kicad kicad-pcb lib libraries pcb reliability resistor resistor-library resistor-series rust rust-lang
Last synced: 10 Mar 2026
https://github.com/globocom/reliable-request
A golang opinionated library to provide reliable request using hystrix-go, go-cache, and go-resiliency.
caching circuit-breaker golang http http-client reliability requests retry-library
Last synced: 08 Sep 2025
https://github.com/yueyuel/xaiforandroidmalware
Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?
android-app explainable-ai malware-detection reliability
Last synced: 29 Apr 2025
https://github.com/saturn77/atlantix-eda
Programmatically generated PCB libraries facilitating robust electronic product design.
altium altium-libraries capacitor flexibility generation kicad lib libraries pcb reliability resistor resistor-library resistor-series rust rust-lang
Last synced: 03 Mar 2025
https://github.com/JonasRieger/rollinglda
A rolling version of the Latent Dirichlet Allocation.
consistency latent-dirichlet-allocation lda model-selection reliability text-mining textdata topic-model topic-models topicmodel topicmodeling topicmodelling
Last synced: 30 Jul 2025
https://github.com/nouamanetazi/website-monitor
A tool written in Go that helps you monitor a collection of websites using various metrics.
go http monitoring reliability uptime
Last synced: 14 Apr 2025
https://github.com/grafana/k6-cloud-feature-requests
The place to propose, dicuss and vote for k6 Cloud features and ideas.
k6 k6cloud loadtesting performance-testing reliability
Last synced: 01 Oct 2025
https://github.com/cooldogedev/spectral-php
Spectral is a blazingly fast and lightweight network engine built on UDP, designed for real-time, low-latency applications.
networking php protocol real-time reliability udp
Last synced: 09 Jul 2025
https://github.com/mikhailknyazev/kube-course
Main resources for Udemy course "Configuring Kubernetes for Reliability with LitmusChaos"
chaos-engineering eks helm kubernetes litmuschaos pipelines reliability terraform
Last synced: 15 Apr 2025
https://github.com/steadybit/extension-kubernetes
A Steadybit extension to check the state of the Kubernetes cluster and inject faults.
chaos-engineering chaos-testing helm kubernetes reliability
Last synced: 24 Feb 2026
https://github.com/torchei/torchei
TorchEI is a high-speed toolbox for DNN Reliability's Research and Development
bit-flip-attack bit-flipping error-injection fault-injection reliability torch
Last synced: 01 Oct 2025
https://github.com/devmotion/pycalibration
Estimation and hypothesis tests of calibration in Python using CalibrationErrors.jl and CalibrationTests.jl.
calibration julia python reliability
Last synced: 15 Apr 2025
https://github.com/cmu-safari/harp
HARP is a memory error profiling algorithm (i.e., for identifying error-prone cells) designed for use with memory chips that use on-die error-correcting codes (ECC). This tool uses Monte-Carlo simulation to evaluate HARP and other error profilers. HARP and this tool are described in the 2021 MICRO paper by Patel et al.: https://arxiv.org/abs/2109.12697.
dram error-correcting-codes error-correction monte-carlo reliability simulator
Last synced: 27 Jan 2026
https://github.com/adarshpatil/dve
Improving DRAM Reliability and Performance On-Demand via Coherent Replication [ISCA 2021]
coherence dram reliability replication sockets
Last synced: 24 Jan 2026
https://github.com/connerdouglass/go-retry
Go library for automatic retries
context golang reliability retry
Last synced: 14 Oct 2025
https://github.com/jehiah/retrydb
RetryDB transparently retries *sql.DB operations against a secondary datasource.
Last synced: 11 Apr 2025
https://github.com/jaketarnow/snoopy
Home network wifi scanner application to view all devices on your home network and run diagnostics
diagnostics icmp networking objective-c ping reliability scanner snoopy upnp wifi
Last synced: 10 Mar 2026
https://github.com/requiemofthesouls/pigeomail
✉️ Service which provides securely personal email addresses right in telegram
disposable-email docker docker-compose email golang mail-server mailbox mongodb pigeomail rabbitmq reliability smtp smtp-server telegram telegram-bot telegram-bot-api temp-email temporary-email trash-email trashmail
Last synced: 12 Feb 2026
https://github.com/levitation-opensource/tcpoverudp2
Reliably forwards TCP connections using UDP over two network interfaces in parallel.
reliability reliable-networking reliable-packets reliable-protocol reliable-udp reliable-udp-library tcp tunnel tunnel-client tunnel-server tunneling tunneling-proxies tunnels udp
Last synced: 16 Jan 2026
https://github.com/friesischscott/survivalsignature.jl
Computation and numerical approximation of survival signatures.
monte-carlo-simulation reliability survival-signature
Last synced: 09 Apr 2025
https://github.com/lingrino/uptime
uptime calculator
reliability sla slo snowpack svelte tailwind
Last synced: 18 Aug 2025
https://github.com/devmotion/reliabilitydiagrams.jl
Visualization of model calibration
Last synced: 14 Apr 2025
https://github.com/madarauchiha-314/lifesim
LifeSim: A lifetime reliability simulator for manycore systems
benchmark computer-architecture lifetime-reliability operating-system reliability scheduler simulation simulator
Last synced: 30 Oct 2025
https://github.com/lawouach/ebpf-2021-talk
Code for my talk at ebpf 2021 conference
devops ebpf reliability reliably sre
Last synced: 12 Apr 2025
https://github.com/pumpkinseed/netrel
Internet reliability check - CLI tool
internet-connection reliability reliability-analysis
Last synced: 28 Jun 2025
https://github.com/ekmolloy/fmri_test-retest
Documentation and MATLAB code for test-retest functional MRI studies.
neuroimaging reliability resting-state-fmri
Last synced: 24 Dec 2025
https://github.com/djmgit/deathstar
A tool for loadtesting web based services in a easy, automated, cloud native and quick way without spending time on infrastructure setup for load generation.
automation chaos devops loadtesting reliability resiliency
Last synced: 12 Jun 2025
https://github.com/steadybit/reliability-hub-db
Database containing the content for Steadybit's Reliability Hub
chaos-engineering chaos-testing database reliability resilience steadybit
Last synced: 14 Apr 2025
https://github.com/aponysus/redress
Composable, low-overhead retry policies with pluggable classification, per-class backoff strategies, and structured observability hooks. Designed for services that need predictable retry behavior and clean integration with metrics/logging.
backoff backoff-library exponential-backoff fault-tolerance observability python reliability resilience resilient-system retry retry-library
Last synced: 06 Mar 2026