An open API service indexing awesome lists of open source software.

https://github.com/factorpricingmodel/prefect-yaml

Schedule your tasks with YAML configuration files in Prefect perfectly
https://github.com/factorpricingmodel/prefect-yaml

orchestration prefect scheduler yaml-configuration

Last synced: 5 months ago
JSON representation

Schedule your tasks with YAML configuration files in Prefect perfectly

Awesome Lists containing this project

README

          

# Prefect YAML



CI Status


Documentation Status


Test coverage percentage




Poetry


black


pre-commit




PyPI Version

Supported Python versions
License

Package to run prefect with YAML configuration. For further details, please refer
to the [documentation](https://prefect-yaml.readthedocs.io/en/latest/)

## Installation

Install this via pip (or your favourite package manager):

`pip install prefect-yaml`

## Usage

Run the command line `prefect-yaml` with the specified configuration
file.

For example, the following YAML configuration is located in [examples/simple_config.yaml](examples/simple_config.yaml).

```
metadata:
output:
directory: .output

task:
task_a:
caller: math:fabs
parameters:
- -9.0
output:
format: json
task_b:
caller: math:sqrt
parameters:
- !data task_a
output:
directory: null
task_c:
caller: math:fsum
parameters:
- [!data task_b, 1]
```

Run the following command to generate all the task outputs to the
directory `.output` in the running directory.

```shell
prefect-yaml -c examples/simple_config.yaml
```

The output directory contains all the task outputs in the specified
format.

```shell
% tree .output
.output
├── task_a.json
└── task_c.pickle

0 directories, 2 files
```

The expected behavior is to

1. run `task_a` to dump the value `fabs(-9.0)` to the output directory in JSON format,
2. run `task_b` to get the value `sqrt(9.0)` (from the output of `task_a`)
3. run `task_c` to dump the value `fsum([3.0, 1.0])` to the output directory in pickle format.

As the output directory in `task_b` is overridden as `null`, the output of `task_b` is passed to `task_c` in memory. Also, the output format in `task_c`
is not specified so it is dumped in default format (pickle).

For further details, please see the section [configuration](https://prefect-yaml.readthedocs.io/en/latest/configuration.html) in the documentation.

## Configuration

The output section defines how the task writes and loads the task return. The section in `metadata` applies for all tasks globally while that in each `task`
overrides the global parameters.

For further details, please see the [documentation](https://prefect-yaml.readthedocs.io/en/latest/configuration.html#output) for parameter definitions
in each section.

## Output

The default output format is either pickle (default) or JSON, while users
can define their own output format.

For example, if you would like to use `pandas` to load and dump the parquet file
in pyarrow engine by default, you can define the configuration like below.

```
metadata:
format: parquet
dump-caller: object.to_parquet
dump-parameters:
engine: pyarrow
load-caller: pandas:read_parquet
load-parameters:
engine: pyarrow
```

All the output parameters, like directory, dumper and loaders, can be overridden
in the task level. You can also specify which tasks to export to the output
directory, while the others to only be passed down to downstream in memory.

For further details, please see the [output](https://prefect-yaml.readthedocs.io/en/latest/output.html) section in documentation.

## Roadmap

Currently the project is still under development. The basic features are
mostly available while the following features are coming soon

- Multi cloud storage support
- Subtasks supported in each task
-

## Contributing

All levels of contributions are welcomed. Please refer to the [contributing](https://prefect-yaml.readthedocs.io/en/latest/contributing.html)
section for development and release guidelines.