{"id":34102988,"url":"https://github.com/rcorrero/light-pipe","last_synced_at":"2025-12-14T17:06:45.421Z","repository":{"id":60934840,"uuid":"519015405","full_name":"rcorrero/light-pipe","owner":"rcorrero","description":"A high-level syntax for data pipelines, designed to make pipeline development quick and painless.","archived":false,"fork":false,"pushed_at":"2024-08-05T16:44:19.000Z","size":1595,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-11-27T18:34:09.664Z","etag":null,"topics":["data","data-pipelines","data-processing","geospatial-analysis","geospatial-processing","pipeline"],"latest_commit_sha":null,"homepage":"https://www.light-pipe.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rcorrero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-07-28T22:54:43.000Z","updated_at":"2024-08-05T16:44:22.000Z","dependencies_parsed_at":"2023-12-31T00:13:03.567Z","dependency_job_id":"f3520516-e56c-498c-8761-7661132f843e","html_url":"https://github.com/rcorrero/light-pipe","commit_stats":{"total_commits":78,"total_committers":1,"mean_commits":78.0,"dds":0.0,"last_synced_commit":"aac3a65f2a2f390094067c3b8d3943c90d2ea239"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/rcorrero/light-pipe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcorrero%2Flight-pipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcorrero%2Flight-pipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcorrero%2Flight-pipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcorrero%2Flight-pipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rcorrero","download_url":"https://codeload.github.com/rcorrero/light-pipe/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcorrero%2Flight-pipe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27732237,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-14T02:00:11.348Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-pipelines","data-processing","geospatial-analysis","geospatial-processing","pipeline"],"created_at":"2025-12-14T17:06:44.857Z","updated_at":"2025-12-14T17:06:45.415Z","avatar_url":"https://github.com/rcorrero.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [Light-Pipe](https://github.com/rcorrero/light-pipe)\n\n---\n\n## Overview\n\n[Light-Pipe](https://www.light-pipe.io/) is a high-level syntax for data pipelines, designed to make pipeline development quick and painless. It is an extensible, light-weight Python framework with zero non-standard dependencies for data pipelines that scale effortlessly. It abstracts away the implementation details of the pipeline itself, meaning that the developer only has to define the transformations performed within the pipeline on individual units of data.\n\nPipelines defined using Light-Pipe scale effortlessly, with native support for all forms of concurrency, allowing for the mixing and matching of asynchronous, multi-threaded, and multi-process operations all within a single pipeline. It's easily extensible for use with distributed processing services such as [Celery](https://docs.celeryq.dev/en/stable/). It's also super fast and efficient, having been used to perform critical geospatial data processing tasks [at least an order of magnitude faster than existing systems](https://github.com/rcorrero/light-pipe/blob/depth_first/data/plots/test_geo_tiling.png).\n\nLight-Pipe is released under a [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause).\n\n## Installing Light-Pipe\n\n```console\n$ pip install light-pipe\n```\n\n## A Basic Example\n\n```python\n\u003e\u003e\u003e from light_pipe import make_data, make_transformer\n\u003e\u003e\u003e \n\u003e\u003e\u003e \n\u003e\u003e\u003e @make_data\n\u003e\u003e\u003e def gen_dicts(x: int):\n\u003e\u003e\u003e     for i in range(x):\n\u003e\u003e\u003e         yield {\n\u003e\u003e\u003e             \"one\": 3 * i, \n\u003e\u003e\u003e             \"two\": 3 * i + 1, \n\u003e\u003e\u003e             \"three\": 3 * i + 2\n\u003e\u003e\u003e         }\n\u003e\u003e\u003e \n\u003e\u003e\u003e @make_transformer\n\u003e\u003e\u003e def get_third(one: int, two: int, three: int):\n\u003e\u003e\u003e     print(f\"Third: {three}\")\n\u003e\u003e\u003e     return three\n\u003e\u003e\u003e \n\u003e\u003e\u003e \n\u003e\u003e\u003e data = gen_dicts(x=3, store_results=True)\n\u003e\u003e\u003e data \u003e\u003e get_third()\n\u003e\u003e\u003e \n\u003e\u003e\u003e print(data(block=True))\nThird: 2\nThird: 5\nThird: 8\n[2, 5, 8]\n\u003e\u003e\u003e\n\u003e\u003e\u003e print(data(block=True))\n[2, 5, 8]\n```\n\n## A (Slightly) More Interesting Example\n\n```python\n\u003e\u003e\u003e import asyncio\n\u003e\u003e\u003e import time\n\u003e\u003e\u003e \n\u003e\u003e\u003e from light_pipe import AsyncGatherer, make_data, make_transformer\n\u003e\u003e\u003e \n\u003e\u003e\u003e \n\u003e\u003e\u003e @make_data\n\u003e\u003e\u003e def gen(x: int):\n\u003e\u003e\u003e     yield from range(x)\n\u003e\u003e\u003e \n\u003e\u003e\u003e \n\u003e\u003e\u003e @make_transformer\n\u003e\u003e\u003e async def add_one(x: int):\n\u003e\u003e\u003e     await asyncio.sleep(1)\n\u003e\u003e\u003e     return x + 1\n\u003e\u003e\u003e \n\u003e\u003e\u003e \n\u003e\u003e\u003e data = gen(x=8)\n\u003e\u003e\u003e \n\u003e\u003e\u003e t = add_one(parallelizer=AsyncGatherer())\n\u003e\u003e\u003e \n\u003e\u003e\u003e \n\u003e\u003e\u003e for _ in range(10):\n\u003e\u003e\u003e     data \u003e\u003e t\n\u003e\u003e\u003e \n\u003e\u003e\u003e start = time.time()\n\u003e\u003e\u003e print(data(block=True))\n[12, 10, 14, 11, 16, 15, 13, 17]\n\u003e\u003e\u003e \n\u003e\u003e\u003e end = time.time()\n\u003e\u003e\u003e diff = end - start\n\u003e\u003e\u003e print(f\"Total time to execute tasks: {diff:.1f} seconds.\")\nTotal time to execute tasks: 10.0 seconds.\n```\n\n## More Information\n\n- [GitHub](https://github.com/rcorrero/light-pipe)\n\n- [Documentation](https://www.light-pipe.io/)\n\n---\n\nCopyright 2020-Present [Richard Correro](https://www.richardcorrero.com/).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcorrero%2Flight-pipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frcorrero%2Flight-pipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcorrero%2Flight-pipe/lists"}