{"id":19805371,"url":"https://github.com/mbaynton/batch-framework","last_synced_at":"2025-06-17T05:34:54.660Z","repository":{"id":62525906,"uuid":"80366289","full_name":"mbaynton/batch-framework","owner":"mbaynton","description":"An API and foundational algorithms for efficient processing of long-running jobs that can be divided into small work units.","archived":false,"fork":false,"pushed_at":"2019-01-03T14:37:59.000Z","size":87,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-28T17:53:05.294Z","etag":null,"topics":["batch","batch-framework","parallelization"],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mbaynton.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-29T19:16:36.000Z","updated_at":"2019-01-03T14:38:01.000Z","dependencies_parsed_at":"2022-11-02T14:16:12.023Z","dependency_job_id":null,"html_url":"https://github.com/mbaynton/batch-framework","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbaynton%2Fbatch-framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbaynton%2Fbatch-framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbaynton%2Fbatch-framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbaynton%2Fbatch-framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mbaynton","download_url":"https://codeload.github.com/mbaynton/batch-framework/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbaynton%2Fbatch-framework/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":258339177,"owners_count":22685544,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch","batch-framework","parallelization"],"created_at":"2024-11-12T09:03:41.752Z","updated_at":"2025-06-17T05:34:54.622Z","avatar_url":"https://github.com/mbaynton.png","language":"PHP","readme":"\u003ch1\u003eBatch Processing Framework\u003c/h1\u003e\n\n[![Build Status](https://travis-ci.org/mbaynton/batch-framework.svg?branch=master)](https://travis-ci.org/mbaynton/batch-framework)\n[![Coverage Status](https://coveralls.io/repos/github/mbaynton/batch-framework/badge.svg?branch=master)](https://coveralls.io/github/mbaynton/batch-framework?branch=master)\n\nThis library offers foundational algorithms and structures to enable scenarios\nwhere long-running tasks that can be divided into small work units get processed\nprogressively by successive calls to a PHP script on a webserver. This avoids\nexceeding script execution time and network timeout limitations often found in \nweb execution environments.\n\nIt emphasizes minimal overhead of the framework itself so that jobs\ncomplete as quickly as possible.\n\nFeatures include:\n * Support for processing the batch of work units across the lifespan of many\n   requests when being run in a web environment. This prevents individual\n   responses and webserver processes from running longer than is desirable.\n * Efficient determination of when to stop running more work units based on\n   past work units' runtimes so that requests complete around a target\n   duration.\n * Attention to minimizing the amount of state data and number of trips to a\n   backing store that are involved with handing off between reqeusts.\n * Support for parallel execution of embarrasingly parallelizable problems, e.g.\n   those where individual work units do not need to communicate or coordinate\n   between each other during their execution. See\n   [parallelization](#parallelization:-using-multiple-runners) for details.\n * No requirement to use a particular PHP framework, but with an awareness of\n   controller and service design patterns.\n\nAs this is a library, it offers no functionality \"out of the box.\"\n\n## Dependencies\n * PHP 5.4+\n * `Psr\\Http\\Message\\ResponseInterface` available via Composer, and any \n   implementation of this interface.\n   \n## Documentation / Examples\nThe docs here will help start you up writing code that's meant to work with this\nframework. If you encounter gaps or questions about the info here, you might want to\nrefer to the [Curator application on GitHub](http://github.com/curator-wik/curator),\nwhich uses and was written alongside this framework.\n\nDocumentation is accurate for `v1.0.0`.\n\n### Terms and their definitions\n  * **Runnable**:  \n    One of the user-implemented classes that models a long-running task. An instance of a Runnable\n    models and provides the implementation for a single unit of work. It is its `run()`\n    method whose body does the actual work/computation to further the Task's progress.\n  * **Runnable Iterator**:  \n    A PHP `\\Iterator` (please extend `AbstractRunnableIterator`) that produces `Runnables`\n    appropriate to the segment of the overall task that should be performed, given as\n    input the `Runner rank` and number of `Runnables` already performed on prior\n    incarnations of the `Runner`.    \n  * **Runner**:  \n    The server-side code that runs the show. The Runner pumps the Runnable iterator for\n    new Runnables, launches\n    them, monitors the time runnables are taking and the time remaining to decide when\n    to stop, dispatches Runnable and Task execution events to Task and Controller\n    callbacks, and initiates Runnable and Task intermediate result aggregation.\n  * **Runner id**:\n    An integer uniquely identifying a given logical `Runner`.\n    Clients are expected to create as many corresponding `Runner` requests\n    as the framework's current `Task instance state` supports, initially assigning\n    a unique integer id that the client has not used before to each of these requests. \n  * **Runner incarnation**:  \n    Logically, the framework tries to create the illusion of `n` `Runnable` units of\n    work that are executed by`x` `Runners` (concurrently if `x \u003e 1`.) However, in order\n    to prevent the HTTP request that started the `Runnable` from remaining incomplete\n    for longer than desired, the framework may stop launching new `Runnables`, let\n    the `Runner` stop doing work early, and signal the client to make a successive\n    request with the same `Runner id`. Each HTTP request that's handled by starting a\n    `Runner` bearing the same `Runner id` is called an *incarnation* of the runner with\n    that id. All incarnations of a `Runner` also will share the same `Runner rank`.\n  * **Runner rank**:  \n    A number uniquely identifying a given `Runner` within a Task. If your Task only\n    supports one concurrent `Runner`, this will always be `0`. If your `Task` declares\n    support for `n` concurrent `Runner`s, this will range from `0` to `n-1`. Differs\n    from `Runner id` in that its range is always `0` to `n-1`.\n  * **Task**:  \n    One of the user-implemented classes that models a long-running task. The `Task`\n    serves as a factory for `Runnable Iterator`s, tells the framework what to do\n    with results of `Runnable`s, may intervene in the event a `Runnable` experiences\n    a throwable error or exception, provides methods to reduce multiple `Runnable` results\n    to simpler intermediate results, and provides a method to translate\n    the complete `Runnable` results to a `Psr\\Http\\Message\\ResponseInterface`.\n  * **Task instance state**:  \n    One of the user-implemented classes that models a long-running task. Task instance\n    state captures the variable properties of a given task execution, such as where to\n    find inputs to operate on, who (in terms of PHP session id) is currently running\n    this `Task`, how large the `Task` is estimated to be (in terms of `Runnable`s), and\n    how many concurrent `Runners` the `Task` supports. Typically, one can extend the\n    `TaskInstanceState` class, which handles most everything but your task's unique inputs.\n    Note that this class is not intended to be used to capture `Runnable` output.\n  \nThis framework primarily provides an implementation of the `Runner` in the class `AbstractRunner`.\nA complete system leveraging this library will typically include a concrete extension \nof `AbstractRunner` to interface with your application's persistence layer (e.g.,\ndatabase), and a controller or other script making use of the `HttpRunnerControllerTrait`\nto  handle incoming requests and interface with your application's session layer.\n\nCoding a long-running task typically involves setting up the following components:\n  - An implementation of `TaskInterface`.\n  - An extension of `AbstractRunnableIterator` to serve `Runnables`.\n  - An implementation of `RunnableInterface` to do the work units.\n  - An extension of `TaskInstanceState` to provide input properties specific to the job.\n\n### Parallelization: using multiple runners\nStrictly speaking, this framework supports concurrent execution of more than one runnable\nfrom the same Task at a time. But, in order to do concurrent runnables, lots of other\ncode must support this, too:\n  * Your extension of `AbstractRunner` must implement its methods in a concurrency-safe\n    manner, especially `AbstractRunner::retrieveRunnerState()` and\n    `AbstractRunner::finalizeRunner()` should read and write to their underlying storage\n    in a way that does not cause corruption or lost writes should several instances\n    for the same Task instance be run simultaneously.\n  * Your client must be programmed to send multiple concurrent batch runner requests.\n  * The work you want to do must be [embarrasingly parallelizable](https://en.wikipedia.org/wiki/Embarrassingly_parallel).\n    Each runnable can produce output, but runnables cannot take other runnables' output\n    from the `Task` as input or otherwise interfere with each other if they access\n    a shared resource.\n  * Your `Task instance state`'s `getNumRunners()` must return more than 1 to declare\n    concurrent support for more than 1 `Runner`.\n  * The `Runnable iterator` constructed by your `Task` must take the `Runner rank` into\n    account and be able to assign a portion of the total `Runnable`s to each `Runner rank`,\n    as evenly as possible, with each `Runnable` unit of work being given out to one of the\n    `Runner`s exactly once.\n  * Your overall application (request controller, etc.) must not be impacted by several\n    simultaneous requests from the same user, and must not be holding the [PHP session lock](http://php.net/manual/en/function.session-write-close.php)\n    when the runnables are executing.\n    \n### Why is the Task's final result always an HTTP response?\nPackaging the batch run's overall result in a standard HTTP response format enables\napplications to receive requests and decide whether or not to defer them to a batch task. \nIn either case, the HTTP response that the client is expecting is ultimately generated. This\nworks  well when clients are implemented using libraries that support request middleware \nand the Promise pattern. The request middleware watches for raw responses that indicate \na batch task is necessary, and rather than resolving the client application code's Promise\nwith this incomplete raw response, launches `Runner` requests until it obtains the result HTTP \nresponse, which it finally resolves the original Promise with.\n\n## License\nMIT\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmbaynton%2Fbatch-framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmbaynton%2Fbatch-framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmbaynton%2Fbatch-framework/lists"}