Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fkodom/unipipe
Build batch pipelines in Python that run anywhere.
https://github.com/fkodom/unipipe
cloud data-processing docker machine-learning pipeline python
Last synced: about 2 months ago
JSON representation
Build batch pipelines in Python that run anywhere.
- Host: GitHub
- URL: https://github.com/fkodom/unipipe
- Owner: fkodom
- License: mit
- Created: 2022-07-17T03:08:37.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-06T18:52:01.000Z (about 2 years ago)
- Last Synced: 2024-10-23T03:17:05.485Z (2 months ago)
- Topics: cloud, data-processing, docker, machine-learning, pipeline, python
- Language: Python
- Homepage:
- Size: 402 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# unipipe
**Uni**fied **pipe**line library.
:warning: Experimental :warning:
* Build batch pipelines in Python that run anywhere -- on your laptop, on the server, and in the cloud.
* Easily scale local experiments to the cloud without any changes
* Save time by only writing each pipeline once
* Save money by only paying for the compute infrastructure you need
## About
`unipipe` makes it easy to build batch pipelines in Python, then run them either locally or in the cloud. It was originally created for machine learning workflows, but it works for any batch data processing pipeline.
## Install
From PyPI:
```bash
# Minimal install
pip install unipipe# With additional executors (e.g. 'docker', 'vertex')
pip install unipipe[vertex]
```From source:
```bash
# Minimal install
pip install "unipipe @ git+ssh://[email protected]/fkodom/unipipe.git"# With additional executors (e.g. 'docker', 'vertex')
pip install[vertex] "unipipe @ git+ssh://[email protected]/fkodom/unipipe.git"
```If you'd like to contribute, install all dependencies and pre-commit hooks:
```bash
# Install all dependencies
pip install "unipipe[all] @ git+ssh://[email protected]/fkodom/unipipe.git"
# Setup pre-commit hooks
pre-commit install
```## Getting Started
Build a pipeline once using the `unipipe` DSL:
```python
from unipipe import dsl@dsl.component
def say_hello(name: str) -> str:
return f"Hello, {name}!"@dsl.pipeline
def pipeline():
say_hello(name="world")
```Then, run the pipeline using any of the supported backends:
```python
from unipipe import runrun(
# Supported executors include:
# 'python' --> runs in the current Python process
# 'docker' --> runs each component in a separate Docker container
# 'vertex' --> runs in GCP through Vertex, which in turn uses KFP
executor="python",
pipeline=pipeline(),
)
```Expected output:
```bash
INFO:root:[say_hello-1603ae3e] - Hello, world!
```## Run Any Python Script
Or scale **any** Python script to the cloud using the `unipipe` CLI:
```bash
# Same choices of executors as above.
unipipe run-script \
--executor vertex \
--pipeline-root "gs://bucket-name/artifact-root/ \
./examples/ex01_hello_world.py
```This makes experimentation easy. `unipipe` will automatically compose your script into a pipeline, and launch it with your chosen executor. [See this example for more details.](./examples/ex11_using_scripts.py)
## More Examples
Link | Description
-----|------------
[Hello World](./examples/ex01_hello_world.py) | Create/run your first `unipipe` pipeline
[Hello Pipeline](./examples/ex02_hello_pipeline.py) | Create pipelines with multiple steps
[Multi-output Components](./examples/ex03_multi_output_components.py) | Build components that return more than one type-checked value
[Pipeline Arguments](./examples/ex04_pipeline_arguments.py) | Make pipelines reusable with dynamic inputs
[Dependency Management](./examples/ex05_dependency_management.py) | Install and use other Python packages in your pipelines
[Hardware Specs](./examples/ex06_hardware_specs.py) | Request hardware (CPUs, Memory, GPUs) for your pipeline runs
[Nested Pipelines](./examples/ex07_nested_pipelines.py) | Call existing pipelines from inside another pipeline
[Control Flow](./examples/ex08_control_flow.py) | Add conditional control flow to your pipelines
[Advanced Control Flow](./examples/ex09_advanced_control_flow.py) | Best practices for advanced control flow
[Private Dependencies](./examples/ex10_private_dependencies.py) | Using private Python packages
[Run Any Python Script](./examples/ex11_using_scripts.py) | Run any Python script using `unipipe`## Why `unipipe`?
1. **`unipipe` was designed to mitigate issues with Kubeflow Pipelines (KFP).**
* Kubeflow and KFP are often used by machine learning engineers to orchestrate training jobs, data preprocessing, and other computationally intensive tasks.
2. **KFP pipelines only run on Kubeflow.**
* Kubeflow requires specialized knowledge and additional compute resources. It can be expensive and/or impractical for individuals and small teams.
* Managed, serverless platforms like Vertex (Google Cloud) exist, which automate all of that. But still, pipelines only run on KFP/Vertex -- not on your laptop.
3. **Why write the same pipeline twice?**
* KFP developers often write multiple pipeline scripts. One for their laptop, and another for the cloud.
* TODO: Finish this section...## TODO
1. Add executor for KFP clusters, in addition to Vertex.
2. Better up-front type checking (in progress).