Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-chaos-engineering
A curated list of Chaos Engineering resources.
https://github.com/eric-erki/awesome-chaos-engineering
Last synced: about 12 hours ago
JSON representation
-
Culture
- SRECON17: Principles of Chaos Engineering
- Chaos & Intuition Engineering at Netflix
- Mastering Chaos - A Netflix Guide to Microservices
- SRECON17: Breaking Things on Purpose
- Inside Azure Search: Chaos Engineering
- Netflix, the Simian Army, and the culture of freedom and responsibility
- FIT: Failure Injection Testing
- Principles Of Chaos Engineering
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- O'Reilly Velocity San Jose 2017: Precision Chaos
- The Discipline of Chaos Engineering
- Chaos Monkey for Fun and Profit
- Fault Injection in Production: Making the case for resilience testing
- Lord of Chaos - Becoming a Chaos Engineer
- The Netflix Simian Army
- Orchestrated Chaos
- Automated Failure Testing
- The Journey to Chaos Engineering begins with a single step - Bruce Wong and James Burns (Twilio)
- Chaos Engineering by Lorin Hochstein
- Aaron Rinehart - ChaoSlingr: Introducing Security based Chaos Testing
- Chaos Engineering - Casey Rosenthal
- video - %20Velocity%202017.pdf)
- 10 Years of Crashing Google
- PuppetConf 2016: Chaos Patterns - Architecting for Failure in Distributed Systems
- Ship More, Sink Less - Changing Chaos Engineering and Distributed Tracing
- Software Engineering Daily - Failure Injection with Kolton Andrus podcast
- "Antics, drift, and chaos" by Lorin Hochstein
- re:invent 2017: Nora Jones Describes Why We Need More Chaos - Chaos Engineering, That Is
- Failure Friday: Four Years On
- Monkeys & Lemurs and Locusts, Oh my!
- Practical Chaos Engineering
- Chaos Day in the Met Office Cloud
- Cloud Native and Chaos Engineering
- Chaos Engineering with Kolton Andrus
- Chaos Engineering: the history, principles, and practice
- Chaos Engineering: A cheat sheet
- Gremlin’s Tammy Bütow on the Business Side of Chaos Engineering
- Kubernetes Chaos Engineering: Lessons Learned
- Chaos Engineering: managing complexity by breaking things
- Podcast:Database Chaos with Tammy Butow
- LinkedOut: A Request-Level Failure Injection Framework
- GOTO 2018 - Breaking Things on Purpose - Kolton Andrus
- Brian Holt - Chaos Monkeys in Your Browser What Chaos Engineering Means For the Front End
- Chaos Engineering: Why the World Needs More Resilient Systems
- Orchestrating Chaos using Grab's Experimentation Platform
- Chaos Engineering Traps
- The Future of Chaos Engineering w/ Conde Nast
- Chaos Engineering For People Systems w/ Dave Rensin of Google
- Performing chaos engineering in a serverless world (AWS re:Invent 2019 CMY301)
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- SRECON17: Breaking Things on Purpose
- PuppetConf 2016: Chaos Patterns - Architecting for Failure in Distributed Systems
- Ship More, Sink Less - Changing Chaos Engineering and Distributed Tracing
- "Antics, drift, and chaos" by Lorin Hochstein
- re:invent 2017: Nora Jones Describes Why We Need More Chaos - Chaos Engineering, That Is
- Practical Chaos Engineering
- GOTO 2018 - Breaking Things on Purpose - Kolton Andrus
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- How to convince your boss and make them say “Yes!” to Chaos Engineering?
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- FIT: Failure Injection Testing
- The Netflix Simian Army
- Automated Failure Testing
- Automated Failure Testing
- Automated Failure Testing
- Automated Failure Testing
-
Education
- Run Chaos Experiments Without Risking Your Job
- Slides
- Your First Chaos Experiment
- Chaos Engineering 101
- A Primer on Automating Chaos
- Intro to Chaos Engineering
- Learn the basics of the Chaos Toolkit
- Build System Confidence with Chaos Engineering
- Run Chaos Experiments Without Risking Your Job
- How To Install Distributed Tensorflow on GCP and Perform Chaos Engineering Experiments
- Monitoring Your Chaos Experiments
- Exploring Multi-level Weaknesses using Automated Chaos Experiments
- Chaos Monkey Guide for Engineers
- Network Fire Drills with Chaos Engineering
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- How we break things at Twitter: failure testing
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- 3 key steps for running chaos engineering experiments
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
- Run Chaos Experiments Without Risking Your Job
-
Notable Tools
- Gremlin Inc. - Failure as a Service.
- Byteman - A Swiss Army Knife for Byte Code Manipulation.
-
Papers
- Maelstrom: Mitigating Datacenter-level Disasters by Draining Interdependent Traffic Safely and Efficiently
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems
- Principles of Antifragile Software
- Chaos Engineering
- A Platform for Automating Chaos Experiments
- A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM
- TripleAgent: Monitoring, Perturbation And Failure-obliviousness for Automated Resilience Improvement in Java Applications
- Lineage-driven Fault Injection
- Automating Failure Testing Research at Internet Scale
-
Books
-
Gamedays
- Gremlin: Gamedays - Resources for getting started with GameDay and Chaos Engineering.
- Gremlin: How to run a Gameday? - Methodology to run Gamedays according Gremlin.
- Gremlin DB: Breaking Dynamo DB - Example of a Gameday with DynamoDB by Gremlin.
- Gremlin: Introduction to Gameday - What is a Gameday according Gremlin.
- Gremlin: Inside Gremlin 2019 Gremlin Gamedays Roadmap - Chaos Gamedays experience by Gremlin.
- Gremlin: What I lerned running the Chaos Lab with Kafka - Example of a Gameday with Kafka by Gremlin.
- Chaos Toolkit: Chaos Engineering with Humans in the loop - Article about Chaos Gamedays.
- GooCardless: All fun and games until you start with Gamedays - Article about Chaos Gamedays.
- Dius: Gamedays resources - Resources for getting started with GameDay and Chaos Engineering.
- Gremlin: Planning your own Chaos Day - Example of a Gameday with DynamoDB by Gremlin.
-
Blogs & Newsletters
- SRE Weekly - Weekly Site Reliability Newsletter.
- Site Reliability Engineering resources - A curated list of awesome Site Reliability and Production Engineering resources.
- Verica - Chaos engineering, security chaos engineering and continuous verification.
-
Podcasts
- Break Things On Purpose - Monthly podcast about Chaos Engineering presented by Gremlin Inc. Also available on Spotify, Google Play, and Stitcher.
-
Conferences & Meetups
- Chaos Conf - A day of Chaos Engineering demos, expert advice, and connect with your peers putting chaos into practice at their companies.
- SRECon Conferences - The official SRE conference.
- LISA Conferences - Prominent conference about SysAdmin/DevOps/SRE.
- Chaos Engineering Community Meetup Group - Bay Area Meetup group for Chaos Engineers.
- London Chaos Engineering Community
- Stockholm Chaos Engineering Meetup
-
Forums
-
Twitter
Categories
Sub Categories
Keywords
sre
1
site-reliability-engineering
1
site-reliability
1
service-level-agreement
1
scalability
1
reliability-engineering
1
reliability
1
production
1
postmortem
1
post-mortem
1
on-call
1
monitoring
1
list
1
incident-response
1
devops
1
capacity-planning
1
awesome-list
1
awesome
1
availability
1
alerting
1