{"id":15469109,"url":"https://github.com/epwalsh/batched-fn","last_synced_at":"2025-04-28T10:17:14.558Z","repository":{"id":42518820,"uuid":"248860942","full_name":"epwalsh/batched-fn","owner":"epwalsh","description":"🦀 Rust server plugin for deploying deep learning models with batched prediction","archived":false,"fork":false,"pushed_at":"2024-03-10T23:17:46.000Z","size":73,"stargazers_count":21,"open_issues_count":5,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-10T09:09:39.272Z","etag":null,"topics":["batching","deep-learning","rust"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/batched-fn","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epwalsh.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-20T22:06:11.000Z","updated_at":"2024-12-30T09:30:25.000Z","dependencies_parsed_at":"2024-10-02T01:51:06.908Z","dependency_job_id":"4c1c6cd3-3aca-416a-b3dc-2339eb43e2fa","html_url":"https://github.com/epwalsh/batched-fn","commit_stats":{"total_commits":58,"total_committers":2,"mean_commits":29.0,"dds":0.06896551724137934,"last_synced_commit":"ac6c530b660e4002fc948075724ea455a9407bd8"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epwalsh%2Fbatched-fn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epwalsh%2Fbatched-fn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epwalsh%2Fbatched-fn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epwalsh%2Fbatched-fn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epwalsh","download_url":"https://codeload.github.com/epwalsh/batched-fn/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246296471,"owners_count":20754627,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batching","deep-learning","rust"],"created_at":"2024-10-02T01:50:54.920Z","updated_at":"2025-03-31T10:30:44.211Z","avatar_url":"https://github.com/epwalsh.png","language":"Rust","readme":"\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003ebatched-fn\u003c/h1\u003e\n    Rust server plugin for deploying deep learning models with batched prediction.\n\u003c/div\u003e\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/epwalsh/batched-fn/actions\"\u003e\n        \u003cimg alt=\"Build\" src=\"https://github.com/epwalsh/batched-fn/workflows/CI/badge.svg?event=push\u0026branch=master\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/epwalsh/batched-fn/blob/master/LICENSE\"\u003e\n        \u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/epwalsh/batched-fn.svg?color=blue\u0026cachedrop\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://crates.io/crates/batched-fn\"\u003e\n        \u003cimg alt=\"Crates\" src=\"https://img.shields.io/crates/v/batched-fn.svg?color=blue\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://docs.rs/batched-fn/\"\u003e\n        \u003cimg alt=\"Docs\" src=\"https://img.shields.io/badge/docs.rs-API%20docs-blue\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\u003cbr/\u003e\n\n\u003c!--\nDO NOT EDIT BELOW THIS POINT BY HAND!\n\nEverything below this point is automatically generated using cargo-rdme: https://github.com/orium/cargo-rdme\nJust run `make readme` to update.\n--\u003e\n\n\u003c!-- cargo-rdme start --\u003e\n\nDeep learning models are usually implemented to make efficient use of a GPU by batching inputs together\nin \"mini-batches\". However, applications serving these models often receive requests one-by-one.\nSo using a conventional single or multi-threaded server approach will under-utilize the GPU and lead to latency that increases\nlinearly with the volume of requests.\n\n`batched-fn` is a drop-in solution for deep learning webservers that queues individual requests and provides them as a batch\nto your model. It can be added to any application with minimal refactoring simply by inserting the [`batched_fn`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html)\nmacro into the function that runs requests through the model.\n\n## Features\n\n- 🚀 Easy to use: drop the `batched_fn!` macro into existing code.\n- 🔥 Lightweight and fast: queue system implemented on top of the blazingly fast [flume crate](https://github.com/zesterer/flume).\n- 🙌 Easy to tune: simply adjust [`max_delay`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) and [`max_batch_size`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config).\n- 🛑 [Back pressure](https://medium.com/@jayphelps/backpressure-explained-the-flow-of-data-through-software-2350b3e77ce7) mechanism included:\n  just set [`channel_cap`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) and handle\n  [`Error::Full`](https://docs.rs/batched-fn/latest/batched_fn/enum.Error.html#variant.Full) by returning a 503 from your webserver.\n\n## Examples\n\nSuppose you have a model API that look like this:\n\n```rust\n// `Batch` could be anything that implements the `batched_fn::Batch` trait.\ntype Batch\u003cT\u003e = Vec\u003cT\u003e;\n\n#[derive(Debug)]\nstruct Input {\n    // ...\n}\n\n#[derive(Debug)]\nstruct Output {\n    // ...\n}\n\nstruct Model {\n    // ...\n}\n\nimpl Model {\n    fn predict(\u0026self, batch: Batch\u003cInput\u003e) -\u003e Batch\u003cOutput\u003e {\n        // ...\n    }\n\n    fn load() -\u003e Self {\n        // ...\n    }\n}\n```\n\nWithout `batched-fn` a webserver route would need to call `Model::predict` on each\nindividual input, resulting in a bottleneck from under-utilizing the GPU:\n\n```rust\nuse once_cell::sync::Lazy;\nstatic MODEL: Lazy\u003cModel\u003e = Lazy::new(Model::load);\n\nfn predict_for_http_request(input: Input) -\u003e Output {\n    let mut batched_input = Batch::with_capacity(1);\n    batched_input.push(input);\n    MODEL.predict(batched_input).pop().unwrap()\n}\n```\n\nBut by dropping the [`batched_fn`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html) macro into your code you automatically get batched\ninference behind the scenes without changing the one-to-one relationship between inputs and\noutputs:\n\n```rust\nasync fn predict_for_http_request(input: Input) -\u003e Output {\n    let batch_predict = batched_fn! {\n        handler = |batch: Batch\u003cInput\u003e, model: \u0026Model| -\u003e Batch\u003cOutput\u003e {\n            model.predict(batch)\n        };\n        config = {\n            max_batch_size: 16,\n            max_delay: 50,\n        };\n        context = {\n            model: Model::load(),\n        };\n    };\n    batch_predict(input).await.unwrap()\n}\n```\n\n❗️ *Note that the `predict_for_http_request` function now has to be `async`.*\n\nHere we set the [`max_batch_size`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) to 16 and [`max_delay`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config)\nto 50 milliseconds. This means the batched function will wait at most 50 milliseconds after receiving a single\ninput to fill a batch of 16. If 15 more inputs are not received within 50 milliseconds\nthen the partial batch will be ran as-is.\n\n## Tuning max batch size and max delay\n\nThe optimal batch size and delay will depend on the specifics of your use case, such as how big of a batch you can fit in memory\n(typically on the order of 8, 16, 32, or 64 for a deep learning model) and how long of a delay you can afford.\nIn general you want to set `max_batch_size` as high as you can, assuming the total processing time for `N` examples is minimized\nwith a batch size of `N`, and keep `max_delay` small relative to the time it takes for your\nhandler function to process a batch.\n\n## Implementation details\n\nWhen the `batched_fn` macro is invoked it spawns a new thread where the\n[`handler`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#handler) will\nbe ran. Within that thread, every object specified in the [`context`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#context)\nis initialized and then passed by reference to the handler each time it is run.\n\nThe object returned by the macro is just a closure that sends a single input and a callback\nthrough an asyncronous channel to the handler thread. When the handler finishes\nrunning a batch it invokes the callback corresponding to each input with the corresponding output,\nwhich triggers the closure to wake up and return the output.\n\n\u003c!-- cargo-rdme end --\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepwalsh%2Fbatched-fn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepwalsh%2Fbatched-fn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepwalsh%2Fbatched-fn/lists"}