{"id":27253429,"url":"https://github.com/marselester/capacity","last_synced_at":"2025-06-14T07:38:10.146Z","repository":{"id":57657685,"uuid":"227864531","full_name":"marselester/capacity","owner":"marselester","description":"Capacity management demo based on Jon Moore's talk https://www.youtube.com/watch?v=m64SWl9bfvk.","archived":false,"fork":false,"pushed_at":"2023-02-15T16:13:35.000Z","size":1769,"stargazers_count":34,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-11T01:36:30.008Z","etag":null,"topics":["capacity-planning","grafana","prometheus","queueing-theory"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marselester.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-13T15:06:04.000Z","updated_at":"2025-01-21T15:36:25.000Z","dependencies_parsed_at":"2024-06-20T08:18:12.112Z","dependency_job_id":"e8509958-847a-435d-836a-ffdd9ab841ad","html_url":"https://github.com/marselester/capacity","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/marselester/capacity","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marselester%2Fcapacity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marselester%2Fcapacity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marselester%2Fcapacity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marselester%2Fcapacity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marselester","download_url":"https://codeload.github.com/marselester/capacity/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marselester%2Fcapacity/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259777755,"owners_count":22909733,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["capacity-planning","grafana","prometheus","queueing-theory"],"created_at":"2025-04-11T01:27:30.842Z","updated_at":"2025-06-14T07:38:10.126Z","avatar_url":"https://github.com/marselester.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Capacity management\n\nTable of contents:\n\n- [Get started](#get-started)\n- [Experiments](#experiments)\n  - [Slower processing, service is down](#slower-processing-service-is-down)\n  - [Slower processing, service is up](#slower-processing-service-is-up)\n  - [Fixed in-flight requests quota](#fixed-in-flight-requests-quota)\n  - [Adaptive in-flight requests quota](#adaptive-in-flight-requests-quota)\n\n\u003e The most common mechanism available in both open source and commercial API gateways is rate limiting:\n\u003e making sure a particular client sends no more than a certain number of requests per unit time.\n\u003e As it turns out, this is exactly the wrong abstraction for multiple reasons.\n\nThis small project aims to reproduce results from Jon Moore's talk\n[Stop Rate Limiting! Capacity Management Done Right](https://www.youtube.com/watch?v=m64SWl9bfvk).\nHe illustrated where rate limiting can break down using\n[Little's law](https://en.wikipedia.org/wiki/Little%27s_law) `N = X * R`:\n\n- N is capacity (number of workers)\n- X is throughput (requests arrival rate)\n- R is service time (how long it takes a worker to process a request)\n\nIn the examples client sends 5 requests per second using 10 workers\nwhich wait for a response no longer than 2.5 seconds.\n\n```sh\n$ ./client -worker=10 -rps=5 -origin=http://origin:8000\n```\n\nOrigin server has an SLO to serve 99% of requests within 1 second.\nIt has fixed number of workers, each takes 1 second on average to process a request.\nRequests are enqueued when workers are busy and discarded when the queue is full.\n\n```sh\n$ ./origin -worker=7 -worktime=1s -queue=100\n```\n\nAccording to Little's law, origin should be able to handle 7 requests per second.\n\n```\nN = X * R\n7 workers = X rps * 1s\nX = 7/1 = 7 rps\n```\n\n[Run the programs](#get-started) in Docker Compose and check out Grafana dashboard:\n\n- client sends 5 requests per second and receives HTTP 200 OK responses\n- origin processes 5 requests per second\n- origin has served requests with average latency 1 second\n- origin has served 50% of requests (50th percentile) within 1 second\n- origin has served 99% of requests (99th percentile) within 1 second\n\nNote, [quantiles are estimated](https://prometheus.io/docs/practices/histograms/#errors-of-quantile-estimation).\nAlmost all observations fall into the bucket `{le=\"1.05\"}`, i.e. the bucket from 1s to 1.05s.\nThe histogram implementation guarantees that the true 99th percentile is somewhere between 1s and 1.05s.\nThe calculated quantile might give an impression that API is close to breaching the SLO\nif bucket boundaries were not chosen appropriately (sharp spikes).\n\n![Normal processing, service is up](images/normal.png)\n\n## Get started\n\nClone the repository.\n\n```sh\n$ git clone https://github.com/marselester/capacity.git\n$ cd ./capacity/docker\n```\n\nRun origin server, client (load generator), Grafana, and Prometheus using Docker Compose.\n\n```sh\n$ docker-machine start\n$ eval \"$(docker-machine env)\"\n$ docker-compose up\n$ docker-machine ip\n192.168.99.100\n```\n\nOpen Grafana http://192.168.99.100:3000 with default credentials admin/admin.\nPrometheus dashboard is available at http://192.168.99.100:9090.\n\nClean up once you've done experimenting.\n\n```sh\n$ docker-compose down\n$ docker rmi marselester/capacity\n$ docker image prune --filter label=stage=intermediate\n$ docker-machine stop\n```\n\n## Experiments\n\n### Slower processing, service is down\n\nNew release of origin server has a bug that made workers process a request within 2 seconds.\n\n```sh\n$ ORIGIN_WORKTIME=2s docker-compose up\n```\n\nAccording to Little's law, origin should be able to handle 3.5 requests per second.\n\n```\nN = X * R\n7 workers = X rps * 2s\nX = 7/2 = 3.5 rps\n```\n\nSince worker pool is able to process 3.5 requests per second, it can drain the queue at the same rate.\nRequests arrive at 5 rps, which means queue will be growing infinitely.\nTherefore processing time will also be growing infinitely.\n\n```\nN = X * R\n∞ = 3.5 rps * R\nR = ∞ / 3.5 = ∞ seconds\n```\n\nObservations:\n\n- client sends 5 requests per second and they all time out\n- origin processes 3.5 requests per second\n- origin's average latency grows with queue\n- origin's 50th and 99th percentiles show maximum configured latency of 4 seconds (the biggest bucket)\n\n![Slower processing, service is down](images/slow-down.png)\n\n### Slower processing, service is up\n\nDevelopers increased number of origin workers to 20 while they investigate\nwhy a worker takes 2 seconds to process a request instead of 1 second.\n\n```sh\n$ ORIGIN_WORKER=20 ORIGIN_WORKTIME=2s docker-compose up\n```\n\nAccording to Little's law, origin should be able to handle 10 requests per second.\n\n```\nN = X * R\n20 workers = X rps * 2s\nX = 20/2 = 10 rps\n```\n\nObservations:\n\n- client sends 5 requests per second and receives HTTP 200 OK responses\n- origin processes 5 requests per second\n- origin has served requests with average latency 2 seconds\n- origin has served 50% of requests (50th percentile) within 2 seconds\n- origin has served 99% of requests (99th percentile) within 2 seconds\n\n![Slower processing, service is up](images/slow-up.png)\n\n### Fixed in-flight requests quota\n\nClient should be able to send X=5 requests per second with average response time R=1 second.\nThis means a client should be limited to N=5 concurrent requests on a proxy.\n\n```\nN = X * R\nN = 5 rps * 1s = 5 requests in flight\n```\n\nWhen processing time increases to R=2 seconds, a client is still limited to 5 requests in flight by proxy.\nTherefore a client will end up limited to X=2.5 rps.\n\n```\nN = X * R\n5 requests in flight = X rps * 2s\nX = 5/2 = 2.5 rps\n```\n\nIn order to allow a service to recover, a client is forced to back-off: send 2.5 rps instead of 5 rps.\nProxy limits concurrency (how many requests are in flight), not request rate (rps).\n\n```sh\n$ ORIGIN_WORKTIME=2s CLIENT_ORIGIN=http://proxy:7000 docker-compose up\n```\n\nObservations:\n\n- client sends 5 requests per second and receives 2.3 rps (HTTP 200)\n- proxy oscillates between 4 and 5 in-flight requests\n- origin processes 2.3 requests per second\n- origin has served requests with average latency 2 seconds\n- origin has served 50% of requests (50th percentile) within 2 seconds\n- origin has served 99% of requests (99th percentile) within 2 seconds\n\n![Fixed in-flight requests quota](images/fixed-quota.png)\n\n### Adaptive in-flight requests quota\n\nIf origin's capacity is unknown, it's possible to estimate capacity using\n[AIMD algorithm](https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease)\n(additive-increase/multiplicative-decrease):\n\n- increase target concurrency by a constant `c` per unit time, e.g., allow 1 more rps every second\n- set target concurrency to a fraction `p` of its current size (0 \u003c= p \u003c= 1), e.g.,\n  back-off to 75% when a service is overloaded (429 or 50x status codes, connection timeout)\n\nClient waits for 2.5 seconds before timing out (cancels request).\nIn order to receive HTTP 200 OK responses a request should be processed in less than 2.5 seconds.\n\nRequest is processed within 2 seconds by origin's workers (N=7).\n\n```\nN = X * R\n7 workers = X rps * 2s\nX = 7/2 = 3.5 rps\n```\n\nRequest has only 2.5s - 2s = 0.5 second to stay in a queue, otherwise a client won't receive it due to timeout.\nExpected origin's queue length is 1.75 requests.\n\n```\nN = X * R\nN = 3.5 rps * 0.5s = 1.75 requests in queue\n```\n\nExpected number of concurrent requests to origin is 8.75.\n\n```\nN = X * R\nN = 3.5 rps * 2.5s = 8.75 requests in flight\n```\n\n\u003cimg src=\"images/capacity.png\" width=\"400\" /\u003e\n\nPerformance of my naive proxy is inferior and results don't correspond to Jon Moore's chart (Nginx/Lua).\n\n```sh\n$ ORIGIN_WORKTIME=2s CLIENT_ORIGIN=http://proxy:7000 PROXY_ADAPTIVE=true docker-compose up\n```\n\nObservations:\n\n- client sends 5 requests per second and receives between 2.4 rps and 2.7 rps (HTTP 200)\n- proxy oscillates between 4 and 6 in-flight requests\n- origin processes between 2.7 and 3 requests per second\n- origin has served requests with average latency 2.2 seconds\n- origin has served 50% of requests (50th percentile) within 2 seconds\n- origin has served 99% of requests (99th percentile) within 3 seconds with periodic spikes\n\n![Adaptive in-flight requests quota](images/adaptive-quota.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarselester%2Fcapacity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarselester%2Fcapacity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarselester%2Fcapacity/lists"}