Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/StanzaSystems/awesome-load-management
A repo of links to articles, papers, conference talks, and tooling related to load management in software services.
https://github.com/StanzaSystems/awesome-load-management
List: awesome-load-management
Last synced: 19 days ago
JSON representation
A repo of links to articles, papers, conference talks, and tooling related to load management in software services.
- Host: GitHub
- URL: https://github.com/StanzaSystems/awesome-load-management
- Owner: StanzaSystems
- License: gpl-2.0
- Created: 2023-04-28T14:01:59.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-03T23:29:46.000Z (over 1 year ago)
- Last Synced: 2024-05-21T09:09:32.347Z (6 months ago)
- Homepage:
- Size: 43 KB
- Stars: 187
- Watchers: 8
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Awesome Load Management
A repo of links to articles, papers, conference talks, and tooling related to load management in software services: loadshedding, circuitbreaking, quota management and throttling.
PRs welcome.## Table of contents
- [Loadshedding](#loadshedding)
- [Articles and Papers](#articles-and-papers)
- [Videos](#videos)
- [Fairness and Isolation](#fairness-and-isolation)
- [Articles and Papers](#articles-and-papers)
- [Circuit Breaking and Adaptive Concurrency Control](#circuit-breaking-and-adaptive-concurrency-control)
- [Articles and Papers](#articles-and-papers)
- [Quota Management and Ratelimiting](#quota-management-and-ratelimiting)
- [Reliability and traffic management generally](#reliability-and-traffic-management)## Loadshedding
### Articles and Papers
* [Handling Overload, from the Google SRE Book, by Alejandro Forero Cuervo and edited by Sarah Chavis](https://sre.google/sre-book/handling-overload/)
* [Addressing Cascading Failures, by Mike Ulrich](https://sre.google/sre-book/addressing-cascading-failures/)
* [Managing Load, from the Google SRE Workbook, by Cooper Bethea et al.](https://sre.google/workbook/managing-load/)
* [Using load shedding to avoid overload, from the AWS Builders Library, by David Yanacek](https://aws.amazon.com/builders-library/using-load-shedding-to-avoid-overload/)
* [Timeouts, Retries, Backoff, and Jitter, from the AWS Builders Library, by Marc Brooker](https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/)
* [FIFO Considered Harmful, by Jos Visser](https://medium.com/swlh/fifo-considered-harmful-793b76f98374)
* [Using load shedding to survive a success disaster—CRE life lessons, from Google Cloud Blog, by Dave Rensin and Adrian Hilton](https://cloud.google.com/blog/products/gcp/using-load-shedding-to-survive-a-success-disaster-cre-life-lessons/)
* [How to avoid a self-inflicted DDoS Attack—CRE life lessons, from Google Cloud Blog, by Dave Rensin and Adrian Hilton](https://cloud.google.com/blog/products/gcp/how-to-avoid-a-self-inflicted-ddos-attack-cre-life-lessons/)
* [Keeping Netflix Reliable Using Prioritized Load Shedding, from the Netflix Tech Blog, by Manuel Correa, Arthur Gonigberg, and Daniel West](https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94)
* [Why Disaster Happens at the Edges: An Introduction to Queue Theory, by Avishai Ish-Shalom](https://thenewstack.io/an-introduction-to-queue-theory-why-disaster-happens-at-the-edges/)
* [Applying Back Pressure When Overloaded, by Martin Thompson](https://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html)
* [Why your reliability problems are really traffic problems](https://www.stanza.systems/post/why-your-reliability-problems-are-really-traffic-problems)### Videos
* [Load Shedding—Approaches, Principles, Experiences, and Impact in Service Management
by Acacio Cruz, at SREcon EMEA 2016](https://www.usenix.org/conference/srecon16europe/program/presentation/cruz)
* [Envoy, Take the Wheel: Real-time Adaptive Circuit Breaking, by Tony Allen, at KubeCon + CloudNativeCon Europe 2020](https://www.youtube.com/watch?v=CQvmSXlnyeQ)## Fairness and Isolation
### Articles and Papers
* [Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service, Mostafa Elhemali et al., Amazon Web Services](https://www.usenix.org/system/files/atc22-elhemali.pdf)
* [Fairness in multi-tenant systems, from the AWS Builders Library, by David Yanacek](https://aws.amazon.com/builders-library/fairness-in-multi-tenant-systems/)## Circuit Breaking and Adaptive Concurrency Control
### Articles and Papers
* [Method overloading the circuit, Christopher Meiklejohn et al.](https://dl.acm.org/doi/pdf/10.1145/3542929.3563466)
* [Service mesh circuit breaker: From panic button to performance management tool, by Mohammad Reza Saleh Sedghpour, Johan Tordsson](https://dl.acm.org/doi/abs/10.1145/3447851.3458740)
* [Will circuit breakers solve my problems?, by Marc Brooker](https://brooker.co.za/blog/2022/02/16/circuit-breakers.html)
* [Performance Under Load: Adaptive Concurrency Limits @ Netflix, by Eran Landau, William Thurston, Tim Bozarth](https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581)## Quota Management and Ratelimiting
### Tools
* [Envoy Proxy Global Ratelimiting](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_features/global_rate_limiting)
* [Sentinel flow control](https://github.com/alibaba/Sentinel)* [Why your reliability problems are really traffic problems](https://www.stanza.systems/post/why-your-reliability-problems-are-really-traffic-problems)