{"id":30668380,"url":"https://github.com/sitetester/auto-batching-proxy","last_synced_at":"2026-05-19T07:07:22.212Z","repository":{"id":311311881,"uuid":"1043294340","full_name":"sitetester/auto-batching-proxy","owner":"sitetester","description":"It will automatically batch inference requests from multiple independent users together in a single batch request for efficiency, so that for users the interface looks like individual requests, but internally it is handled as a batch request","archived":false,"fork":false,"pushed_at":"2025-08-28T15:28:59.000Z","size":18853,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-01T00:23:30.142Z","etag":null,"topics":["proxy","rest","rocket","tei","text-embeddings-inference"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sitetester.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-23T14:52:34.000Z","updated_at":"2025-08-28T15:29:03.000Z","dependencies_parsed_at":"2025-08-24T07:13:34.551Z","dependency_job_id":null,"html_url":"https://github.com/sitetester/auto-batching-proxy","commit_stats":null,"previous_names":["sitetester/auto-batching-proxy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sitetester/auto-batching-proxy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sitetester%2Fauto-batching-proxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sitetester%2Fauto-batching-proxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sitetester%2Fauto-batching-proxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sitetester%2Fauto-batching-proxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sitetester","download_url":"https://codeload.github.com/sitetester/auto-batching-proxy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sitetester%2Fauto-batching-proxy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281549789,"owners_count":26520515,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-29T02:00:06.901Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["proxy","rest","rocket","tei","text-embeddings-inference"],"created_at":"2025-09-01T00:00:39.212Z","updated_at":"2026-05-19T07:07:22.203Z","avatar_url":"https://github.com/sitetester.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Auto Batching Proxy\n\nIt will automatically batch inference requests from independent users in a single batch request for efficiency, so that for users the interface looks like individual requests, but internally it is handled as a batch \nrequest, essentially it provide a REST API wrapper around some inference service like https://github.com/huggingface/text-embeddings-inference\n\n\nProxy server is configured with following parameters:\n\n_Max Wait Time_ - maximal time user request can wait for other requests to be accumulated in a batch  \n_Max Batch Size_ - maximal number of requests that can be accumulated in a batch.\n\n## Setup Inference Service \nFirst, try running inference service in a container with `--model-id nomic-ai/nomic-embed-text-v1.5`\n```\ndocker run --rm -it -p 8080:80 --pull always \\\n    ghcr.io/huggingface/text-embeddings-inference:cpu-latest \\\n--model-id nomic-ai/nomic-embed-text-v1.5\n```\nif it fails to start, then try with some other alternatives. Currently, code is functional for  \n`--model-id sentence-transformers/all-MiniLM-L6-v2` \u0026 `--model-id sentence-transformers/all-mpnet-base-v2`\nCheck [/screenshots](./screenshots) for some of the tried models.\n\n```\ndocker run --rm -it -p 8080:80 --pull always \\\n  ghcr.io/huggingface/text-embeddings-inference:cpu-latest \\\n  --model-id sentence-transformers/all-MiniLM-L6-v2\n```\nNote: [Backend does not support a batch size \u003e 8](screenshots/run_model_status/max_batch_size.png)\nbut our proxy will respect this config param \u0026 will not send requests (as well as max inputs, which is 32 for `all-MiniLM-L6-v2`) more than supported batch size. \n\n## Setup Proxy Service\n- either run `cargo run` at root of the project, it will launch Rocket [with default configuration params](./screenshots/cargo/cargo_run.png)\n- or otherwise [with custom params](./screenshots/cargo/cargo_run_with_params.png) like \n```\nRUST_LOG=INFO cargo run -- --max-batch-size 50 --max-wait-time-ms 3000\n```\n\n\n**[Unit tests](https://doc.rust-lang.org/book/ch11-03-test-organization.html#unit-tests)**   \nRelevant unit tests are provided inside `/src` source code files\n\n**[Integration tests](https://doc.rust-lang.org/book/ch11-03-test-organization.html#integration-tests)**  \nCheck the `/tests` folder, code is covered with various scenarios.\n\nRun all tests via `cargo test`. Currently, tests are verified to be passed against  \n`--model-id sentence-transformers/all-MiniLM-L6-v2` \u0026 ` --model-id sentence-transformers/all-mpnet-base-v2`\n\u0026 they also explain how/why which part of code was written for which particular use case.\n\nUse the following simple CURL commands for quick testing\n- for inference\n```\ncurl -X POST http://localhost:8080/embed \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"inputs\": [\"Hello world\"]}'\n```\n- for proxy\n```\ncurl -X POST http://localhost:3000/embed \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"inputs\": [\"Hello\", \"World\"]}'\n```\nto verify proxy is working for multiple concurrent requests \n```\ncd scripts \n./proxy_concurrent_calls.sh\n```\n\n**Benchmark test results:**  \nFollowing output is taken from \n```\n$ RUSTFLAGS=\"-A dead_code\" cargo test test_compare_single_input_inference_service_vs_auto_batching_proxy_with_x_separate_requests -- --nocapture\n```\n- `-A` shortcut for allow (to suppress warnings generated for unused functions, even though they are actually used in tests) \n- [--nocapture](https://doc.rust-lang.org/cargo/commands/cargo-test.html#display-options) will recover display output \n\n[Full output:](./screenshots/timing_summary_full.png)\n![timing_summary.png](screenshots/timing_summary.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsitetester%2Fauto-batching-proxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsitetester%2Fauto-batching-proxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsitetester%2Fauto-batching-proxy/lists"}