{"id":16107252,"url":"https://github.com/neonwatty/quick_batch","last_synced_at":"2026-05-02T03:31:43.833Z","repository":{"id":167650513,"uuid":"643025222","full_name":"neonwatty/quick_batch","owner":"neonwatty","description":"ultra simple command line tool for docker-scaling batch processing","archived":false,"fork":false,"pushed_at":"2023-06-09T19:23:01.000Z","size":9692,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-04T17:34:04.964Z","etag":null,"topics":["containerization","data-science","deep-learning","docker","large-scale","machine-learning","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neonwatty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-19T22:43:57.000Z","updated_at":"2023-05-24T13:23:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"33ece1f2-51de-443f-a47b-b08c4024ce48","html_url":"https://github.com/neonwatty/quick_batch","commit_stats":null,"previous_names":["jermwatt/quick_batch","jermwatt/qbatch","jermwatt/quickbatch","neonwatty/quick_batch"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/neonwatty/quick_batch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonwatty%2Fquick_batch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonwatty%2Fquick_batch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonwatty%2Fquick_batch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonwatty%2Fquick_batch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neonwatty","download_url":"https://codeload.github.com/neonwatty/quick_batch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonwatty%2Fquick_batch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32522245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T01:12:54.858Z","status":"online","status_checked_at":"2026-05-02T02:00:05.923Z","response_time":132,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["containerization","data-science","deep-learning","docker","large-scale","machine-learning","python"],"created_at":"2024-10-09T19:15:46.631Z","updated_at":"2026-05-02T03:31:43.735Z","avatar_url":"https://github.com/neonwatty.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Python application](https://github.com/jermwatt/quick_batch/actions/workflows/python-app.yml/badge.svg)](https://github.com/jermwatt/quick_batch/actions/workflows/python-app.yml)\n[![Upload Python Package](https://github.com/jermwatt/quick_batch/actions/workflows/python-publish.yml/badge.svg)](https://github.com/jermwatt/quick_batch/actions/workflows/python-publish.yml)\n\n# quick_batch\n\n`quick_batch` is an ultra-simple command-line tool for large batch python-driven processing and transformation.  It was designed to be fast to deploy, transparent, and portable.  This allows you to scale any `processor` function that needs to be run over a large set of input data, enabling batch/parallel processing of the input with minimal setup and teardown.\n\n\n- [quick\\_batch](#quick_batch)\n- [Getting started](#getting-started)\n  - [Usage](#usage)\n  - [Scaling](#scaling)\n  - [Installation](#installation)\n  - [The `processor.py` file](#the-processorpy-file)\n- [Why use quick\\_batch](#why-use-quick_batch)\n\n# Getting started\n\nAll you need to scale batch transformations with `quick_batch` is a\n\n- transformation function(s) in a `processor.py` file\n- `Dockerfile` containing a container build appropriate to y our processor\n- an optional `requirements.txt` file containing required python modules\n\nDocument paths to these objects as well as other parameters in a `config.yaml` config file of the form below.  \n\nUnder `processor` you can either define a `dockerfile_path` to your Dockerfile or an `image_name` to a pre-built image to be pulled. \n\n\n```yaml\ndata:\n  input_path: /path/to/your/input/data\n  output_path: /path/to/your/output/data\n  log_path: /path/to/your/log/file\n\nqueue:\n  feed_rate: \u003cint - number of examples processed per processor instance\u003e\n  order_files: \u003cboolean - whether or not to order input files by size\u003e\n\nprocessor:\n  dockerfile_path: /path/to/your/Dockerfile OR\n  image_name: \u003cimage_name_to_pull\u003e\n  requirements_path: /path/to/your/requirements.txt\n  processor_path: /path/to/your/processor/processor.py\n  num_processors: \u003cint - instances of processor to run in parallel\u003e\n\n```\n\n`quick_batch` will point your `processor.py` at the `input_path` defined in this `config.yaml` and process the files listed in it in parallel at a scale given by your choice of `num_processors`.  \n\nOutput will be written to the `output_path` specified in the configuration file.\n\nYou can see the `examples` directory for examples of valid configs, processors, requirements, and dockerfiles.\n\n\n## Usage\n\nTo start processing with your `config.yaml` use `quick_batch`'s `config` command at the terminal by typing\n\n```bash\nquick_batch config /path/to/your/config.yaml\n```\n\nThis will start the build and deploy process for processing your data as defined in your `config.yaml`.\n\n## Scaling\n\nUse the `scale` commoand to manually scale the number of processors / containers running your process\n\n```bash\nquick_batch scale \u003cnum_processors\u003e \n```\n\nHere `\u003cnum_processors\u003e` is an integer \u003e= 1.   For example, to scale to 3 parallel processors / containers: `quick_batch scale 3`\n\n## Installation\n\nTo install quick_batch, simply use `pip`:\n\n```bash\npip install quick-batch\n```\n\n## The `processor.py` file\n\nCreate a `processor.py` file with the following basic pattern:\n\n```python\nimport ...\n\ndef processor(todos):\n  for file_name in todos.file_paths_to_process:\n    # processing code\n```\n\nThe `todos` object will carry in `feed_rate` number of file names to process in `.file_paths_to_process`.  \n\nNote: the function name `processor` is mandatory.\n\n\n# Why use quick_batch\n\nquick_batch aims to be\n\n- **dead simple to use:** versus standard cloud service batch transformation services that require significant configuration / service understanding\n\n- **ultra fast setup:** versus setup of heavier orchestration tools like `airflow` or `mlflow`, which may be a hinderance due to time / familiarity / organisational constraints\n\n- **100% portable:** - use quick_batch on any machine, anywhere\n\n- **processor-invariant:** quick_batch works with arbitrary processes, not just machine learning or deep learning tasks.\n\n- **transparent and open source:** quick_batch uses Docker under the hood and only abstracts away the not-so-fun stuff - including instantiation, scaling, and teardown.  you can still monitor your processing using familiar Docker command-line arguments (like `docker service ls`, `docker service logs`, etc.).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneonwatty%2Fquick_batch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneonwatty%2Fquick_batch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneonwatty%2Fquick_batch/lists"}