https://github.com/joschnitzbauer/dalymi
A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.
https://github.com/joschnitzbauer/dalymi
dag data data-science pipeline python workflow
Last synced: 23 days ago
JSON representation
A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.
- Host: GitHub
- URL: https://github.com/joschnitzbauer/dalymi
- Owner: joschnitzbauer
- License: mit
- Created: 2017-12-02T17:32:46.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2019-11-24T22:45:47.000Z (about 6 years ago)
- Last Synced: 2026-01-08T00:28:19.554Z (29 days ago)
- Topics: dag, data, data-science, pipeline, python, workflow
- Language: Python
- Homepage:
- Size: 158 KB
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# dalymi
*[data like you mean it]*
[](http://dalymi.readthedocs.io/en/latest/?badge=latest) 
A lightweight, data-focused and non-opinionated pipeline manager written in and for Python.
--------------------------------------------------------------------------------
_dalymi_ allows to build data processing pipelines as [directed acyclic graphs]([https://en.wikipedia.org/wiki/Directed_acyclic_graph]) (DAGs) and facilitates rapid, but controlled, model development. The goal is to prototype quickly, but scale to production with ease.
To achieve this, _dalymi_ uses "make"-style workflows, _i.e._ tasks with missing input trigger the execution of input-producing tasks before being executed themselves. At the same time, _dalymi_ provides fine control to run and undo specific pipeline parts for quick test iterations. This ensures output reproducability and minimizes manual errors.
Several features facilitate _dalymi_'s goal:
- simple, non-opinionated API (most choices left to user)
- no external dependencies for pipeline execution
- one-line installation (ready for use)
- no configuration
- auto-generated command line interface for pipeline execution
- quick start, but high flexibility to customize and extend:
- task output can be stored in any format Python can touch (local files being the default)
- customizable command line arguments
- templated output location (e.g. timestamped files)
- support for automated checks on data integrity during runtime
- DAG visualization using [graphviz](https://www.graphviz.org/)
- API design encourages good development practices (modular code, defined data schemas, self-documenting code, easy workflow viz, etc.)
## Installation
_dalymi_ requires Python >= 3.5.
``` bash
pip install dalymi
```
For the latest development:
``` bash
pip install git+https://github.com/joschnitzbauer/dalymi.git
```
## Documentation
http://dalymi.readthedocs.io/
## Simple example
simple.py:
``` python
from dalymi import Pipeline
from dalymi.resources import PandasCSV
import pandas as pd
# Define resources:
numbers_resource = PandasCSV(name='numbers', loc='numbers.csv', columns=['number'])
squares_resource = PandasCSV(name='squares', loc='squares.csv', columns=['number', 'square'])
# Define the pipeline
pl = Pipeline()
@pl.output(numbers_resource)
def create_numbers(**context):
return pd.DataFrame({'number': range(11)})
@pl.output(squares_resource)
@pl.input(numbers_resource)
def square_numbers(numbers, **context):
numbers['square'] = numbers['number']**2
return numbers
if __name__ == '__main__':
# Run the default command line interface
pl.cli()
```
Command line:
```bash
python simple.py run # executes the pipeline. skips tasks for which output already exists.
```
More examples can be found [here](https://github.com/joschnitzbauer/dalymi/tree/master/examples).
## Roadmap
- More docstrings
- Unit tests
- Parallel task processing
- REST API during pipeline run
- Web interface for pipeline run
## Warranty
Although _dalymi_ is successfully used in smaller applications, it is not battle-tested yet and lacks unit tests. If you decide to use it, be prepared to communicate issues or fix bugs (it's not a lot of code... :)).
## Contributions
... are welcome!