{"id":21113888,"url":"https://github.com/nyoungstudios/multiflow","last_synced_at":"2025-03-14T10:26:21.115Z","repository":{"id":57443935,"uuid":"384309958","full_name":"nyoungstudios/multiflow","owner":"nyoungstudios","description":"A Python multithreading library for data processing pipelines, data streaming, etc.","archived":false,"fork":false,"pushed_at":"2021-12-01T03:55:34.000Z","size":379,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-21T04:41:34.904Z","etag":null,"topics":["concurrency","data-streaming","multithreading","python","thread-pool"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/multiflow/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nyoungstudios.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-09T03:25:32.000Z","updated_at":"2024-07-11T16:00:43.000Z","dependencies_parsed_at":"2022-09-14T01:02:19.740Z","dependency_job_id":null,"html_url":"https://github.com/nyoungstudios/multiflow","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyoungstudios%2Fmultiflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyoungstudios%2Fmultiflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyoungstudios%2Fmultiflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nyoungstudios%2Fmultiflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nyoungstudios","download_url":"https://codeload.github.com/nyoungstudios/multiflow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243560672,"owners_count":20310969,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concurrency","data-streaming","multithreading","python","thread-pool"],"created_at":"2024-11-20T01:59:26.249Z","updated_at":"2025-03-14T10:26:21.093Z","avatar_url":"https://github.com/nyoungstudios.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003emultiflow\u003c/h1\u003e\n\n[![tests](https://github.com/nyoungstudios/multiflow/actions/workflows/python-test.yml/badge.svg)](https://github.com/nyoungstudios/multiflow/actions/workflows/python-test.yml)\n[![codecov](https://codecov.io/gh/nyoungstudios/multiflow/branch/main/graph/badge.svg?token=9M2UZ4WJ36)](https://codecov.io/gh/nyoungstudios/multiflow)\n[![Gitpod ready](https://img.shields.io/badge/Gitpod-ready-blue?logo=gitpod)](https://gitpod.io/#https://github.com/nyoungstudios/multiflow)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/multiflow.svg)](https://pypi.python.org/project/multiflow/)\n[![PyPI license](https://img.shields.io/pypi/l/multiflow.svg)](https://pypi.python.org/project/multiflow/)\n\n## About\n`multiflow` is a Python multithreading library for data processing pipelines/workflows, streaming, etc. It extends `concurrent.futures` by allowing the input and output to be generator objects. And, it makes it easy to string together multiple thread pools together to create a multithreaded pipeline.\n\nAdditionally, `multiflow` comes with periodic logging, automatic retries, error handling, and argument expansion.\n\n## Why?\nThe ability to accept an input generator object while yielding an output generator object makes it ideal for concurrently doing multiple jobs where the output of the first job is the input of the second job. This means that it can start doing work on the second job before the first job completes; thus, completing the total work faster.\n\nA great use case for this is streaming data. For example, with `multiflow` and [`smart_open`](https://github.com/RaRe-Technologies/smart_open), you could stream images from S3 and process them in a multithreaded environment before exporting them elsewhere.\n\n## Install\n```sh\npip install multiflow\n```\n\n## Quickstart\n```python\nfrom multiflow import MultithreadedFlow\n\n\nimage_paths = []  # list of images\n\n\ndef transform(image_path):\n    # do some work\n    return new_path\n\n\nwith MultithreadedFlow() as flow:\n    flow.consume(image_paths)  # can accept generator object or iterable item (see examples below for generator)\n    flow.add_function(transform)\n\n    for output in flow:\n        if output:  # if successful\n            print(output)  # new_path\n        else:\n            e = output.get_exception()\n\n    success = flow.get_successful_job_count()\n    failed = flow.get_failed_job_count()\n\n```\n\n## Examples\nFor a working program using `multiflow`, see this [example](https://github.com/nyoungstudios/multiflow/blob/main/examples/resize/resize.py) which resizes a S3 bucket of images to 50% and saves the resized images locally.\n\n## Documentation\nThe documentation is still a work in progress, but for the most up to date documentation, please see this [page](https://github.com/nyoungstudios/multiflow/blob/main/docs/thread.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyoungstudios%2Fmultiflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnyoungstudios%2Fmultiflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyoungstudios%2Fmultiflow/lists"}