{"id":15351768,"url":"https://github.com/hussein-awala/async-batcher","last_synced_at":"2025-09-18T17:08:19.499Z","repository":{"id":223491815,"uuid":"758259990","full_name":"hussein-awala/async-batcher","owner":"hussein-awala","description":"A service to batch the http requests.","archived":false,"fork":false,"pushed_at":"2024-09-02T20:56:15.000Z","size":4086,"stargazers_count":22,"open_issues_count":4,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-10T03:40:24.213Z","etag":null,"topics":["asyncio","batch-processing","fastapi","grpc","http-requests","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hussein-awala.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-15T23:50:01.000Z","updated_at":"2024-12-03T19:33:19.000Z","dependencies_parsed_at":"2024-02-22T22:23:31.979Z","dependency_job_id":"35e3e8d5-aebf-476a-b63e-9a110a50bb61","html_url":"https://github.com/hussein-awala/async-batcher","commit_stats":null,"previous_names":["hussein-awala/async-batcher"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hussein-awala%2Fasync-batcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hussein-awala%2Fasync-batcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hussein-awala%2Fasync-batcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hussein-awala%2Fasync-batcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hussein-awala","download_url":"https://codeload.github.com/hussein-awala/async-batcher/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230454432,"owners_count":18228392,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asyncio","batch-processing","fastapi","grpc","http-requests","python"],"created_at":"2024-10-01T12:06:44.673Z","updated_at":"2025-09-18T17:08:14.447Z","avatar_url":"https://github.com/hussein-awala.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AsyncBatcher - Asynchronous Batching for Python\n\n## Overview\nAsyncBatcher is a generic, asynchronous batch processor for Python that efficiently groups incoming items into batches\nand processes them asynchronously. It is designed for scenarios where multiple requests or tasks need to be handled in\nbatches to improve efficiency and throughput.\n\n## Key Features\n- Asynchronous processing: Uses asyncio for non-blocking execution.\n- Batching mechanism: Groups items into batches based on size or time constraints.\n- Concurrency control: Limits the number of concurrent batch executions.\n- Custom processing logic: Users must implement the `process_batch` method to define batch behavior.\n- Queue management: Uses an `asyncio.Queue` to manage incoming items.\n- Error handling: Ensures robust error reporting and handling.\n\n## How it works\n\n### 1. Receiving Items for Processing\n- Users call `process(item)`, which adds the item to an internal queue.\n- A `Future` object is returned immediately and the result is awaited asynchronously.\n\n### 2. Queue Management and Batching\n- A background task (`run()`) continuously monitors the queue.\n- Items are collected into batches based on:\n  - `max_batch_size`: Maximum items per batch.\n  - `max_queue_time`: Maximum time an item can wait before being processed.\n- Once a batch is ready, it is passed to the processing function.\n\n### 3. Processing the Batch\n- If `process_batch` is asynchronous, it is awaited directly.\n- If `process_batch` is synchronous, it runs inside an `Executor`.\n- Each item’s future is resolved with the corresponding processed result.\n\n### 4. Concurrency Control\n- If `concurrency \u003e 0`, a semaphore ensures that only a limited number of batches are processed simultaneously.\n- Otherwise, all batches run concurrently.\n\n### 5. Stopping the Batcher\n- Calling `stop(force=True)` cancels all ongoing tasks.\n- Calling `stop(force=False)` waits for pending items to be processed before shutting down.\n\n```mermaid\nsequenceDiagram\n    participant User\n    participant AsyncBatcher\n    participant Queue as asyncio.Queue\n    participant RunLoop\n    participant Semaphore\n    participant BatchProcessor\n\n    User-\u003e\u003eAsyncBatcher: process(item)\n    activate AsyncBatcher\n    AsyncBatcher-\u003e\u003eQueue: put(QueueItem(item, future))\n    AsyncBatcher--\u003e\u003eUser: returns future\n    deactivate AsyncBatcher\n\n    Note over AsyncBatcher: Starts RunLoop on first process()\n\n    loop Run Loop (run() method)\n        RunLoop-\u003e\u003eQueue: Collect items (max_batch_size/max_queue_time)\n        activate Queue\n        Queue--\u003e\u003eRunLoop: Batch [QueueItem1, QueueItem2...]\n        deactivate Queue\n\n        alt Concurrency Limited (concurrency \u003e 0)\n            RunLoop-\u003e\u003eSemaphore: acquire()\n            activate Semaphore\n            Semaphore--\u003e\u003eRunLoop: acquired\n            deactivate Semaphore\n        end\n\n        RunLoop-\u003e\u003eBatchProcessor: create_task(_batch_run(batch))\n        activate BatchProcessor\n\n        alt Async process_batch\n            BatchProcessor-\u003e\u003eAsyncBatcher: await process_batch(batch)\n        else Sync process_batch\n            BatchProcessor-\u003e\u003eExecutor: run_in_executor(process_batch)\n        end\n\n        AsyncBatcher--\u003e\u003eBatchProcessor: results [S1, S2...]\n        BatchProcessor-\u003e\u003eQueueItem1.future: set_result(S1)\n        BatchProcessor-\u003e\u003eQueueItem2.future: set_result(S2)\n        deactivate BatchProcessor\n\n        alt Concurrency Limited\n            RunLoop-\u003e\u003eSemaphore: release()\n        end\n    end\n\n    Note over User, BatchProcessor: User's await future gets resolved\n```\n\n## How to use\n\nTo use the library, you need to install the package in your environment. You can install the package using pip:\n\n```bash\npip install async-batcher\n```\n\nThen, you can create a new `AsyncBatcher` class by implementing the `process_batch` method:\n\n```python\nimport asyncio\nimport logging\n\nfrom async_batcher.batcher import AsyncBatcher\n\nclass MyBatchProcessor(AsyncBatcher[int, int]):\n    async def process_batch(self, batch: list[int]) -\u003e list[int]:\n        await asyncio.sleep(1)  # Simulate processing delay\n        return [x * 2 for x in batch]  # Example: Doubling each item\n\nasync def main():\n    batcher = MyBatchProcessor(max_batch_size=5, max_queue_time=2.0, concurrency=2)\n    results = await asyncio.gather(*[batcher.process(i) for i in range(10)])\n    print(results)  # Output: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]\n    await batcher.stop()\n\n# Set logging level to DEBUG if you want to see more details and understand the flow\nlogging.basicConfig(level=logging.DEBUG)\nasyncio.run(main())\n```\n\n## Benchmark\n\nThe benchmark is available in the [BENCHMARK.md](https://github.com/hussein-awala/async-batcher/blob/main/BENCHMARK.md)\nfile.\n\n## When to Use AsyncBatcher?\nThe AsyncBatcher library is ideal for applications that need to efficiently handle asynchronous requests in batches,\nsuch as:\n\n### Machine Learning Model Serving\n- Batch-processing requests to optimize inference performance (e.g., TensorFlow, PyTorch, Scikit-learn).\n\n### Database Bulk Operations\n- Inserting multiple records in a single query to improve I/O efficiency and reduce costs (e.g., PostgreSQL, MySQL,\n  AWS DynamoDB). \n\n### Messaging and Network Optimization\n- Sending multiple messages in a single API call to reduce latency and costs (e.g., Kafka, RabbitMQ, AWS SQS, AWS SNS).\n\n### Rate-Limited API Calls\n- Aggregating requests to comply with API rate limits (e.g., GitHub API, Twitter API, OpenAI API).\n\n## Final Notes\n- Implement `process_batch` according to your needs.\n- Ensure `max_batch_size` and `max_queue_time` are configured based on performance requirements.\n- Handle exceptions inside `process_batch` to avoid failures affecting other tasks.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhussein-awala%2Fasync-batcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhussein-awala%2Fasync-batcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhussein-awala%2Fasync-batcher/lists"}