https://github.com/austinv11/pypeline
A simple data pipeline builder for Python 3+
https://github.com/austinv11/pypeline
data leveldb pypeline python python3 stream-processing
Last synced: 4 months ago
JSON representation
A simple data pipeline builder for Python 3+
- Host: GitHub
- URL: https://github.com/austinv11/pypeline
- Owner: austinv11
- License: apache-2.0
- Created: 2018-09-04T03:30:34.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-05T01:15:06.000Z (about 7 years ago)
- Last Synced: 2025-07-22T14:39:48.906Z (5 months ago)
- Topics: data, leveldb, pypeline, python, python3, stream-processing
- Language: Python
- Homepage: https://pypi.org/project/data-pypeline/
- Size: 63.5 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pypeline
This is a package for creating iterative data processing pipelines. Note that
this is NOT a general purpose stream processing library. It is only designed as
being a low overhead and simple-to-setup stream processing library. So for
large scale production applications, use something like kafka instead.
## Warning
This library is still at an ALPHA stage. So things may not work as intended
and the api is not final!
## Trivial Example
```python
from pypeline import build_action, Pypeline, ForkingPypelineExecutor, wrap
import asyncio
async def step1():
results = []
for i in range(1000):
results.append(wrap(i))
return results
async def step2(i):
return i * 10
async def step3(i):
return i + 1
async def run_pipeline():
pypeline = Pypeline()
# Adding actions to the pipeline
pypeline.add_action(build_action("Step1", step1)) \
.add_action(build_action("Step2", step2)) \
.add_action(build_action("Step3", step3, serialize_dir="./example")) # Serialize results so future runs will skip this step entirely
results = await pypeline.run(executor=ForkingPypelineExecutor()) # Custom executor that avoids the GIL
# Results are wrapped in a utility namedtuple, so let's flatten it.
results = [r.args[0] for r in results]
return results
results = asyncio.get_event_loop().run_until_complete(run_pipeline())
for result in results:
print(result)
```