{"id":17862674,"url":"https://github.com/stevana/elastically-scalable-thread-pools","last_synced_at":"2025-03-21T00:30:35.761Z","repository":{"id":142654092,"uuid":"611177958","full_name":"stevana/elastically-scalable-thread-pools","owner":"stevana","description":"An experiment in controlling the size of a thread pool using a PID controller.","archived":false,"fork":false,"pushed_at":"2023-10-17T11:06:49.000Z","size":134,"stargazers_count":117,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-17T19:11:24.679Z","etag":null,"topics":["elastic","pid-controller","scalable","thread-pool"],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stevana.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-08T09:30:33.000Z","updated_at":"2024-12-26T19:31:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"357d825c-1614-471e-85fe-e83cb1a6ec1d","html_url":"https://github.com/stevana/elastically-scalable-thread-pools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Felastically-scalable-thread-pools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Felastically-scalable-thread-pools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Felastically-scalable-thread-pools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevana%2Felastically-scalable-thread-pools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stevana","download_url":"https://codeload.github.com/stevana/elastically-scalable-thread-pools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244717155,"owners_count":20498280,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elastic","pid-controller","scalable","thread-pool"],"created_at":"2024-10-28T08:54:38.170Z","updated_at":"2025-03-21T00:30:35.457Z","avatar_url":"https://github.com/stevana.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Elastically scalable thread pools\n\nAn experiment in controlling the size of a thread pool using a PID controller.\n\n## Motivation\n\nA tried and tested way to achieve parallelism is to use pipelining. It's used\nextensively in manufacturing and in computer hardware.\n\nFor example, Airbus [apparently](https://youtu.be/oxjT7veKi9c?t=2682) outputs\ntwo airplanes per day on average, even though it takes two months to build a\nsingle airplane from start to finish. It's also used inside CPUs to [pipeline\ninstructions](https://en.wikipedia.org/wiki/Instruction_pipelining).\n\nLet's imagine we want to take advantage of pipelining in some software system.\nTo make things more concrete, let's say we have a system where some kind of\nrequests come on over the network and we want to process them in some way. The\nfirst stage of the pipeline is to parse the incoming requests from raw\nbytestrings into some more structured data, the second stage is to apply some\nvalidation logic to the parsed data and the third stage is to process the valid\ndata and produce some outputs that are then sent back to the client or stored\nsomewhere.\n\n![](https://raw.githubusercontent.com/stevana/elastically-scalable-thread-pools/main/img/pipeline.svg)\n\nThe service time of an item can differ from stage to stage, for example parsing\nmight be slower than validation, which can create bottlenecks. Luckily it's\nquite easy to spot bottlenecks by merely observing the queue lengths and once a\nslow stage is found we can often fix it by merely adding an additional parallel\nprocessor to that stage. For example we could spin up two or more threads that\ntake bytestrings from the first queue and turn them into structured data and\nthereby compensate for parsing being slow.\n\nBy spinning up more threads we can decrease latency (waiting time in the queue)\nand increase throughput (process more items), but we are also on the other hand\nusing more energy and potentially hogging CPU resources that might be better\nused elsewhere in the pipeline or system at large.\n\nSo here's the question that the rest of this post is concerned about: can we\ndynamically spin up and spin down threads at a stage in response to the input\nqueue length for that stage?\n\n## Plan\n\nLet's focus on a single stage of the pipeline to make things easier for\nourselves.\n\n![](https://raw.githubusercontent.com/stevana/elastically-scalable-thread-pools/main/img/stage.svg)\n\nWe'd like to increase the parallelism of the processors if the input queue\ngrows, and decrease it when the queue shrinks. One simple strategy might be to\nestablish thresholds, i.e. if there's over $100$ items in the input queue then\nallocate more processors and if there's no items in the queue then deallocate\nthem.\n\nSince allocating and deallocating processors can be an expense in itself, we'd\nlike to avoid changing them processor count unnecessarily.\n\nThe threshold based approach is sensitive to unnecessarily changing the count if\nthe arrival rate of work fluctuates. The reason for this is because it only\ntakes the *present* queue length into account.\n\nWe can do better by also incorporating the *past* and trying to predict the\n*future*, this is the basic idea of [PID\ncontrollers](https://en.wikipedia.org/wiki/PID_controller) from [control\ntheory](https://en.wikipedia.org/wiki/Control_theory).\n\nHere's what the picture looks like with a PID controller in the loop:\n\n\n```\n                                            +----------------------------------+\n                                            |                                  |\n    -------------------------------------------\u003e[Input queue]--\u003e[Worker pool]-----\u003e[Output queue]--\u003e\n                                            |                                  |\n     r(t)   e(t)                    u(t)    |                                  |\n    -----\u003e+------\u003e[PID controller]--------\u003e |                                  |\n          ^                                 |                                  |\n          |                                 +----------------------------------+\n          |                                                 | y(t)\n          +-------------------------------------------------+\n\n```\n\nThe PID controller monitors the queue length $y(t)$, compares it to some desired\nqueue length $r(t)$ (also known as the setpoint) and calculates the error $e(t)$.\nThe error determines the control variable $u(t)$ which is used to grow or shrink\nthe processor pool.\n\n## Pseudo-code\n\nLet's start top-down with the `main` function which drives our whole experiment.\n\n### Main\n\n```\nmain =\n\n  // Create the in- and out-queues.\n  inQueue  := newQueue()\n  outQueue := newQueue()\n\n\n  // The workers don't do anything interesting, they merely sleep for a bit to\n  // pretend to be doing some work.\n  worker := sleep 0.025s\n\n  // Create an empty worker pool.\n  pool := newPool(worker, inQueue, outQueue)\n\n  // Start the PID controller in a background thread. The parameters provided\n  // here allow us to tune the PID controller, we'll come back to them later.\n  kp := 1\n  ki := 0.05\n  kd := 0.05\n  dt := 0.01s\n  fork(pidController(kp, ki, kd, dt, pool))\n\n\n  // Create a workload for our workers. We use the sine function to create\n  // between 0 and 40 work items every 0.1s for 60s. The idea being that because\n  // the workload varies over time the PID controller will have some work to do\n  // figuring out how many workers are needed.\n  sineLoadGenerator(inQueue, 40, 0.1s, 60s)\n```\n\n### Worker pool\n\nThe worker pool itself is merely a struct which packs up the necessary data we\nneed to be able to scale it up and down.\n\n```\nstruct Pool =\n  { inQueue:  Queue\u003cInput\u003e\n  , outQueue: Queue\u003cOutput\u003e\n  , worker:   Function\u003cInput, Output\u003e\n  , pids:     List\u003cProcessId\u003e\n  }\n```\n\nCreating a `newPool` creates the struct with an empty list of process ids.\n\n```\nnewPool worker inQueue outQueue = Pool { ..., pids: emptyList }\n```\n\nScaling up and down are functions that take and return a `Pool`.\n\n```\nscaleUp pool =\n  work := forever\n            x := readQueue(pool.inQueue)\n            y := pool.worker(x)\n            writeQueue(pool.outQueue, y)\n  pid   := fork(work)\n  pool' := pool.pids = append(pid, pool.pids)\n  return pool'\n```\n\nThe function `scaleDown` does the inverse, i.e. kills and removes the last\nprocess id from `pool.pids`.\n\n### Load generator\n\nIn order to create work load that varies over time we use the sine function. The\nsine function oscillates between $-1$ and $1$:\n\n![](https://raw.githubusercontent.com/stevana/elastically-scalable-thread-pools/main/img/sine.svg)\n\nWe would like to have it oscillate between $0$ and some max value $m$. By\nmultiplying the output of the sine function by $m/2$ we get an oscillation\nbetween $-m/2$ and $m/2$, we can then add $m/2$ to make it oscillate between $0$\nand $m$.\n\nWe'll sample the resulting function once every `timesStep` seconds, this gives\nus the amount of work items (`n`) to create we then spread those out evenly in\ntime, rinse and repeat until we reach some `endTime`.\n\n```\nsineLoadGenerator inQueue workItem maxItems timeStep endTime =\n  for t := 0; t \u003c endtime; t += timeStep\n    n := sin(t) * maxItems / 2 + maxItems / 2\n    for i := 0; i \u003c n; i++\n      writeQueue(inQueue, workItem)\n      sleep(timeStep / n)\n```\n\n### PID controller\n\nThe PID controller implementation follows the pseudo-code given at\n[Wikipedia](https://en.wikipedia.org/wiki/PID_controller#Pseudocode):\n\n```\nprevious_error := 0\nintegral := 0\nloop:\n   error := setpoint − measured_value\n   proportional := error;\n   integral := integral + error × dt\n   derivative := (error − previous_error) / dt\n   output := Kp × proportional + Ki × integral + Kd × derivative\n   previous_error := error\n   wait(dt)\n   goto loop\n```\n\nWhere `Kp`, `Ki` and `Kd` is respectively the proportional, integral and\nderivative gain and `dt` is the loop interval time. The proportional part acts\non the *present* error value, the integral acts on the *past* and the derivative\ntries to predict the *future*. The measured value is the input queue length and\nthe setpoint, i.e. desired queue length, is set to zero. If the `output` of the\nPID controller is less than $-100$ (i.e. the queue length is over $100$ taking\nthe present, past and possible future into account) then we scale up and if it's\nmore than $-20$ (i.e. the queue length is less than $20$) then we scale down the\nworker pool.\n\n## How it works\n\nWe start off by only setting the proportional part and keeping the integral and\nderivative part zero, this is called a P-controller. We see below that it will\nscale the worker count up and down proportionally to the sine wave shaped load:\n\n![](https://raw.githubusercontent.com/stevana/elastically-scalable-thread-pools/main/img/elastically-scalable-thread-pools-1.0-0.0-0.0.svg)\n\nA P-controller only focuses on the *present*, and we see that it allocates and\ndeallocates workers unnecessarily. In order to smooth things out we introduce\nthe integral part, i.e. a PI-controller. The integral part takes the *past* into\naccount. We see now that the worker count stabilises at $28$:\n\n![](https://raw.githubusercontent.com/stevana/elastically-scalable-thread-pools/main/img/elastically-scalable-thread-pools-1.0-5.0e-2-0.0.svg)\n\nWe can improve on this by adding the derivative part which takes the *future*\ninto account. We then see that it stabilises at $26$ workers:\n\n![](https://raw.githubusercontent.com/stevana/elastically-scalable-thread-pools/main/img/elastically-scalable-thread-pools-1.0-5.0e-2-5.0e-2.svg)\n\nWith the full PID controller, which stabilises using less workers than the\nPI-controller, we see that the queue length spikes up to $20$ or so each time\nthe work load generator hits one of the sine function's peaks. Recall that we\nstarted scaling down once the queue length was less than $20$.\n\n## Usage\n\nThe above graphs were generated by running: `cabal run app -- kp ki kd`, where\nthe $K_p$, $K_i$, and $K_d$ parameters are the tuning parameters for the\nPID controller.\n\nIf you don't have the GHC Haskell compiler and the `cabal` build tool already\ninstalled, then the easiest way to get it is via\n[`ghcup`](https://www.haskell.org/ghcup/). Alternatively if you got `nix` then\n`nix-shell` should give give you access to all the dependencies you need.\n\n## Contributing\n\nThere are many ways we can build upon this experiment, here are a few ideas:\n\n- [ ] We probably want to limit the max number of threads in a pool;\n- [ ] [Clamp](https://github.com/m-lundberg/simple-pid/blob/master/simple_pid/pid.py#L128)\n      integral part to avoid integral windup;\n- [ ] If two or more threads take items from some input queue and put them on\n      some output queue then there's no guarantee that the order of the output\n      items will be the same as the input items. We could solve this, and regain\n      determinism, by using array based queues and shard on the index, i.e. even\n      indices goes to one processor and odd to an other or more generally\n      modulus N can be used to shard between N processors. This is essentially\n      what the [LMAX\n      Disruptor](https://en.wikipedia.org/wiki/Disruptor_(software)) does;\n- [ ] We've only looked at one stage in a pipeline, what happens if we have\n      multiple stages? is it enough to control each individual stage separately\n      or do we need more global control?\n- [ ] Can we come up with other things to control? E.g. batch sizes?\n- [ ] We've only monitored the current queue length, could we combine this with\n      other data? E.g. time series of the queue length from the previous day?\n- [ ] Is it robust to wildly changing usage patterns? E.g. bursty traffic or the\n      [Slashdot effect](https://en.wikipedia.org/wiki/Slashdot_effect)?\n- [ ] We've looked at scaling up and down on a single machine (vertical\n      scaling), what about scaling out and in across multiple machines\n      (horizontal scaling)?\n- [ ] We generated and processed real work items (by sleeping), could we do a\n      discrete-event simulation instead to avoid having to wait for the sleeps?\n- [ ] I just picked random values for the PID controller parameters, there are\n      more principled\n      [ways](https://en.wikipedia.org/wiki/PID_controller#Overview_of_tuning_methods)\n      of tuning the PID controller;\n- [ ] The PID controller we implemented merely followed the pseudo-code from\n      Wikipedia, there's probably better ways of implementing it?\n\nIf any of this sounds interesting, feel free to get in touch!\n\n## See also\n\n* [*A Review of Auto-scaling Techniques for Elastic Applications in Cloud\n  Environments*](https://www.researchgate.net/publication/265611546_A_Review_of_Auto-scaling_Techniques_for_Elastic_Applications_in_Cloud_Environments)\n  (2014) is a survey paper which talks about both threshold and PID controllers;\n\n* [*SEDA: An Architecture for Well-Conditioned Scalable Internet\n  Services*](https://people.eecs.berkeley.edu/~brewer/papers/SEDA-sosp.pdf)\n  (2001), this is paper that I got the idea for elastic scalable thread pools.\n  They use a threshold approach rather than a PID controller, saying:\n\n  \u003e The controller periodically samples the input queue (once per second by\n  \u003e default) and adds a thread when the queue length exceeds some threshold (100\n  \u003e events by default). Threads are removed from a stage when they are idle for a\n  \u003e specified period of time (5 seconds by default).\n\n  But also:\n\n  \u003e Under SEDA, the body of work on control systems can be brought to bear on\n  \u003e service resource management, and we have only scratched the surface of the\n  \u003e potential for this technique.\n\n  A bit more explanation is provided by Matt Welsh, who is one of the author, in\n  his PhD\n  [thesis](https://cs.uwaterloo.ca/~brecht/servers/readings-new/mdw-phdthesis.pdf)\n  (2002):\n\n  \u003e A benefit to ad hoc controller design is that it does not rely on complex\n  \u003e models and parameters that a system designer may be unable to understand or to\n  \u003e tune. A common complaint of classic PID controller design is that it is often\n  \u003e difficult to understand the effect of gain settings.\n\n* There are many introductory text books on control theory, but there's a lot\n  less resources on how to apply control theory to software systems. Here are a\n  few resources:\n\n    - [*Feedback Control for Computer\n      Systems*](https://janert.org/books/feedback-control-for-computer-systems/)\n      book by Philipp K. Janert (2013);\n\n    - [*Tutorial: Recent Advances in the Application of Control Theory to\n      Network and Service\n      Management*](https://www.cse.wustl.edu/~lu/control-tutorials/im09/).\n\n* It could very well be that the way we've applied classic PID controllers isn't\n  suitable for unpredictable internet traffic loads. There are branches of\n  control theory might be better suited for this, see, for example,\n  [robust](https://en.wikipedia.org/wiki/Robust_control) and\n  [adaptive](https://en.wikipedia.org/wiki/Adaptive_control) control theory;\n\n* The .NET thread pool apparently uses the [hill\n  climbing](https://en.wikipedia.org/wiki/Hill_climbing) optimisation technique\n  to [elastically\n  scale](https://mattwarren.org/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/);\n\n* My previous post: [*An experiment in declaratively programming parallel\n  pipelines of state\n  machines*](https://github.com/stevana/pipelined-state-machines#pipelined-state-machines).\n\n## Discussion\n\n* [Hacker News](https://news.ycombinator.com/item?id=35148068);\n* [lobste.rs](https://lobste.rs/s/ybtxic/experiment_elastically_scaling_thread);\n* [r/haskell](https://old.reddit.com/r/haskell/comments/11qyfw7/an_experiment_in_elastically_scaling_a_thread/);\n* Also see Glyn Normington's\n  [comment](https://github.com/stevana/elastically-scalable-thread-pools/issues/1)\n  in the issue tracker.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevana%2Felastically-scalable-thread-pools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevana%2Felastically-scalable-thread-pools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevana%2Felastically-scalable-thread-pools/lists"}