Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-chaos-engineering
A curated list of Chaos Engineering resources.
https://github.com/dastergon/awesome-chaos-engineering
Last synced: about 1 hour ago
JSON representation
-
Conferences & Meetups
- Chaos Engineering Community Meetup Group - Bay Area Meetup group for Chaos Engineers.
- London Chaos Engineering Community
- Stockholm Chaos Engineering Meetup
- Chaos Engineering Community - A collection of meetups across the globe about Chaos Engineerings.
- Conf42.com: Chaos Engineering - Chaos Engineering for practitioners and adopters - London UK, 23 Jan 2020.
- Kubernetes Chaos Engineering Meetup Group India - India Meetup group for Chaos Engineers.
- Chaos Carnival - A global two-day virtual conference for Cloud Native Chaos Engineering.
- Chaos Conf - A day of Chaos Engineering demos, expert advice, and connect with your peers putting chaos into practice at their companies.
- SRECon Conferences - The official SRE conference.
- LISA Conferences - Prominent conference about SysAdmin/DevOps/SRE.
- O'Reilly Velocity Conference - Prominent conference about Systems Engineering/DevOps/SRE.
-
Forums
-
Culture
- Chaos Community
- Principles Of Chaos Engineering
- Chaos Engineering
- O'Reilly Velocity San Jose 2017: Precision Chaos
- The Discipline of Chaos Engineering
- Chaos Monkey for Fun and Profit
- Fault Injection in Production: Making the case for resilience testing
- Lord of Chaos - Becoming a Chaos Engineer
- Orchestrated Chaos
- Video - your-own-adventure-qcon-2017-1)
- AMA Chaos Engineering + DiRT
- SRECON17: Principles of Chaos Engineering
- Chaos & Intuition Engineering at Netflix
- Mastering Chaos - A Netflix Guide to Microservices
- Netflix, the Simian Army, and the culture of freedom and responsibility
- Inside Azure Search: Chaos Engineering
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- The Verification of a Distributed System by Caitie McCaffrey
- The Journey to Chaos Engineering begins with a single step - Bruce Wong and James Burns (Twilio)
- Chaos Engineering by Lorin Hochstein
- Aaron Rinehart - ChaoSlingr: Introducing Security based Chaos Testing
- Chaos Engineering - Casey Rosenthal
- video - %20Velocity%202017.pdf)
- How Netflix DDoS’d Itself To Help Protect the Entire Internet
- 10 Years of Crashing Google
- Weathering the Unexpected
- SRECON17: Breaking Things on Purpose
- PuppetConf 2016: Chaos Patterns - Architecting for Failure in Distributed Systems
- Ship More, Sink Less - Changing Chaos Engineering and Distributed Tracing
- Cloudcast - Discipline of Chaos Engineering
- Software Engineering Daily - Failure Injection with Kolton Andrus podcast
- Responding to Failures in Playback Features with Haley Tucker podcast
- "Antics, drift, and chaos" by Lorin Hochstein
- re:invent 2017: Nora Jones Describes Why We Need More Chaos - Chaos Engineering, That Is
- Failure Friday: Four Years On
- Monkeys & Lemurs and Locusts, Oh my!
- Practical Chaos Engineering
- Chaos Day in the Met Office Cloud
- Cloud Native and Chaos Engineering
- Chaos Engineering with Kolton Andrus
- Chaos Engineering: the history, principles, and practice
- Embracing the Chaos of Chaos Engineering
- Designing Services for Resilience: Netflix Lessons
- Chaos Engineering: A cheat sheet
- How to convince your boss and make them say “Yes!” to Chaos Engineering?
- Why the World Needs More Resilient Systems
- Chaos Architecture
- Gremlin’s Tammy Bütow on the Business Side of Chaos Engineering
- Kubernetes Chaos Engineering: Lessons Learned
- Chaos Engineering: managing complexity by breaking things
- Podcast:Database Chaos with Tammy Butow
- LinkedOut: A Request-Level Failure Injection Framework
- GOTO 2018 - Breaking Things on Purpose - Kolton Andrus
- Why should Chaos be part of your Distributed Systems Engineering?
- Brian Holt - Chaos Monkeys in Your Browser What Chaos Engineering Means For the Front End
- Chaos Engineering: Why the World Needs More Resilient Systems
- video
- Orchestrating Chaos using Grab's Experimentation Platform
- Breaking to Learn: Chaos Engineering Explained
- Chaos Engineering Traps
- Chaos Engineering - The Art of Breaking Things Purposefully
- Disasterpiece Theater: Slack’s process for approachable Chaos Engineering
- Taming chaos: Preparing for your next incident
- The Future of Chaos Engineering w/ Conde Nast
- Chaos Engineering For People Systems w/ Dave Rensin of Google
- Performing chaos engineering in a serverless world (AWS re:Invent 2019 CMY301)
- Building Confidence in Healthcare Systems through Chaos Engineering
- Break Your App before Someone Else Does
- Preparing for Traffic Spikes with Chaos Engineering
- Automating Chaos Engineering GameDays with Terraform
- Postmortem Culture: Learning from failure
- Problem Detection by John Allspaw
- New Paradigms for the Next Era of Security
- Cloud-Native Chaos Engineering
- Building resilient services at Prime Video with chaos engineering
- Making Chaos Part of Kubernetes/OpenShift Performance and Scalability Tests
- Lucky Lotto, chaos engineering but for teams
- Using Fault Injection Testing to Improve DoorDash Reliability
- Chaos Engineering At Ant Group
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- Automated Failure Testing
- SRECON17: Breaking Things on Purpose
- PuppetConf 2016: Chaos Patterns - Architecting for Failure in Distributed Systems
- Ship More, Sink Less - Changing Chaos Engineering and Distributed Tracing
- "Antics, drift, and chaos" by Lorin Hochstein
- re:invent 2017: Nora Jones Describes Why We Need More Chaos - Chaos Engineering, That Is
- Practical Chaos Engineering
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- GOTO 2018 - Breaking Things on Purpose - Kolton Andrus
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Breaking to Learn: Chaos Engineering Explained
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- Automated Failure Testing
- The Netflix Simian Army
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- Automated Failure Testing
- Automated Failure Testing
- Automated Failure Testing
-
Books
- Chaos Engineering: Building Confidence in System Behavior through Experiment
- Site Reliability Engineering: How Google Runs Production Systems
- The Practice Of Cloud System Administration: Designing and Operating Large Distributed Systems
- Antifragile Systems and Teams
- The InfoQ eMag: Chaos Engineering
- Learning Chaos Engineering
- Chaos Engineering: System Resilience in Practice
- Chaos Engineering: Crash test your applications
- Security Chaos Engineering: Gaining Confidence in Resilience and Safety at Speed and Scale
- Chaos Engineering Observability
-
Education
- Slides
- Chaos Engineering 101
- Intro to Chaos Engineering
- Learn the basics of the Chaos Toolkit
- Build System Confidence with Chaos Engineering
- Run Chaos Experiments Without Risking Your Job
- Increasing the Resilience of APIs with Chaos Engineering
- How To Install Distributed Tensorflow on GCP and Perform Chaos Engineering Experiments
- Monitoring Your Chaos Experiments
- 3 key steps for running chaos engineering experiments
- Exploring Multi-level Weaknesses using Automated Chaos Experiments
- Chaos Monkey Guide for Engineers
- Chaos Engineering for Serverless
- Network Fire Drills with Chaos Engineering
- Dev Ops Foundations: Chaos Engineering
- Resilience Engineering: Short Course
- The Chaos Engineering Collection
- PenTester Academic
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- How we break things at Twitter: failure testing
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Your First Chaos Experiment
- A Primer on Automating Chaos
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Consul and Chaos Engineering
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- 3 key steps for running chaos engineering experiments
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
-
Notable Tools
- orchestrator - MySQL replication topology management and HA.
- Gremlin Inc. - Failure as a Service.
- steadybit - A Chaos Engineering platform (SaaS or On-Prem) with auto discovery features, different attack types, user management and many more.
- PowerfulSeal - Adds chaos to your Kubernetes clusters, so that you can detect problems in your systems as early as possible. It kills targeted pods and takes VMs up and down.
- Wiremock - API mocking (Service Virtualization) which enables modeling real world faults and delays
- MockLab - API mocking (Service Virtualization) as a service which enables modeling real world faults and delays.
- Byteman - A Swiss Army Knife for Byte Code Manipulation.
- Perses - A project to cause (controlled) destruction to a JVM application.
- go-fault - Fault injection middleware in Go
- Proofdock's Chaos Engineering Platform - A chaos engineering platform that seamlessly integrates in Azure DevOps and has a focus on the Azure cloud platform.
- Pystol - Pystol is a fault injection platform allowing users to execute fault injection Actions in cloud-native environments in a controlled and prescribed way.
- NetHavoc - A Chaos Engineering Tool for Linux, K8s, Windows, PCF, Cloud, and Containers for injecting Resource, Infrastructure, Network, and Application failures.
- Chaos Frontend Toolkit - A set of tools to apply Chaos Engineering to frontend
- Mitigant - The Continuos Security Verification Platform, enables confidence in cloud security posture by leveraging security chaos engineering.
- orchestrator - MySQL replication topology management and HA.
- go-fault - Fault injection middleware in Go
-
Cloud Services
- Testing Amazon Aurora Using Fault Injection Queries
- Azure Chaos Studio - A managed fault injection service for Azure applications. See also [Azure Fault Analysis Service](https://docs.microsoft.com/azure/service-fabric/service-fabric-testability-overview) for Azure Service Fabric applications.
- Security Chaos Engineering for Cloud Services
-
Papers
- Maelstrom: Mitigating Datacenter-level Disasters by Draining Interdependent Traffic Safely and Efficiently
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems
- Principles of Antifragile Software
- Why is random testing effective for partition tolerance bugs?
- Chaos Engineering
- A Platform for Automating Chaos Experiments
- A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM
- TripleAgent: Monitoring, Perturbation And Failure-obliviousness for Automated Resilience Improvement in Java Applications
- Lineage-driven Fault Injection
- Antifragility is a Fragile Concept
- Chaos Engineering Security
- Security Chaos Engineering: A new paradigm for cybersecurity
- Security Challenges around Chaos Engineering
- CloudStrike: Security Chaos Engineering for Cloud Services
- Observability and Chaos Engineering on System Calls for Containerized Applications in Docker
- Maximizing Error Injection Realism for Chaos Engineering with System Calls
- Chaos Engineering of Ethereum Blockchain Clients
- Automating Failure Testing Research at Internet Scale
- Antifragility is a Fragile Concept
-
Blogs & Newsletters
- Site Reliability Engineering resources - A curated list of awesome Site Reliability and Production Engineering resources.
- Netflix Technology Blog - Learn more about how Netflix designs, builds, and operates our systems and engineering organizations.
- SRE Weekly - Weekly Site Reliability Newsletter.
- SysAdvent - One article for each day of December, ending on the 25th article.
- Gremlin Blog - Blogs on Chaos Engineering from Gremlin Inc.
- O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
- LaunchDarkly Blog - Continuous delivery and feature flags blog.
- Verica - Chaos engineering, security chaos engineering and continuous verification.
- Proofdock - Reliability, resilience and chaos engineering with a focus on MS Azure
- LitmusChaos Blog - Blogs on Chaos Engineering from LitmusChaos
- ChaosEngineering.news - Chaos Engineering newsletter. All things chaos engineering, directly to your inbox!
- Chaos Mesh Blog - Blogs on Chaos Engineering from Chaos Mesh.
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Squadcast - Blog on Site Reliability engineering.
- steadybit Blog - Blogs on Chaos Engineering, Resilience, SRE and OPS from steadybit.
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
- Chaos Experimentation Framework - source framework built on top of Envoy Proxy
-
Gamedays
- Target: What is a Gameday? - Chaos Gamedays experience by Target.
- Codecentric: Chaos Engineering Gamedays - Chaos Gamedays by Codecentric.
- New Relic: How to run a Gameday? - Chaos Gamedays experience by New Relic.
- Dius: Gamedays resources - Resources for getting started with GameDay and Chaos Engineering.
- Gremlin: Gamedays - Resources for getting started with GameDay and Chaos Engineering.
- Gremlin: How to run a Gameday? - Methodology to run Gamedays according Gremlin.
- Gremlin DB: Breaking Dynamo DB - Example of a Gameday with DynamoDB by Gremlin.
- Gremlin: Introduction to Gameday - What is a Gameday according Gremlin.
- Gremlin: Inside Gremlin 2019 Gremlin Gamedays Roadmap - Chaos Gamedays experience by Gremlin.
- Gremlin: What I lerned running the Chaos Lab with Kafka - Example of a Gameday with Kafka by Gremlin.
- Chaos Toolkit: Chaos Engineering with Humans in the loop - Article about Chaos Gamedays.
- GooCardless: All fun and games until you start with Gamedays - Article about Chaos Gamedays.
- InfoQ: Gamedays - Achieving Resilience through Chaos Engineering - InfoQ Presentation with experiences about Chaos Gamedays.
- Gremlin: Planning your own Chaos Day - Example of a Gameday with DynamoDB by Gremlin.
- New Relic: How to run a Gameday? - Chaos Gamedays experience by New Relic.
- Dius: Gamedays resources - Resources for getting started with GameDay and Chaos Engineering.
-
Podcasts
- Break Things On Purpose - Monthly podcast about Chaos Engineering presented by Gremlin Inc. Also available on Spotify, Google Play, and Stitcher.
Categories
Sub Categories
Keywords
sre
1
site-reliability-engineering
1
site-reliability
1
service-level-agreement
1
scalability
1
reliability-engineering
1
reliability
1
production
1
postmortem
1
post-mortem
1
on-call
1
monitoring
1
list
1
incident-response
1
devops
1
capacity-planning
1
awesome-list
1
awesome
1
availability
1
alerting
1