https://github.com/scailfin/flowserv-core
Reproducible and Reusable Data Analysis Workflow Server
https://github.com/scailfin/flowserv-core
benchmarks data-analysis reproducibility reusability workflows
Last synced: 18 days ago
JSON representation
Reproducible and Reusable Data Analysis Workflow Server
- Host: GitHub
- URL: https://github.com/scailfin/flowserv-core
- Owner: scailfin
- License: mit
- Created: 2019-09-25T21:22:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-10-12T09:52:53.000Z (over 3 years ago)
- Last Synced: 2025-09-28T00:08:28.794Z (4 months ago)
- Topics: benchmarks, data-analysis, reproducibility, reusability, workflows
- Language: Python
- Homepage:
- Size: 3.83 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 7
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
=======================================================
Reproducible and Reusable Data Analysis Workflow Server
=======================================================
.. image:: https://img.shields.io/pypi/pyversions/flowserv-core.svg
:target: https://pypi.org/pypi/flowserv-core
.. image:: https://badge.fury.io/py/flowserv-core.svg
:target: https://badge.fury.io/py/flowserv-core
.. image:: https://img.shields.io/badge/License-MIT-yellow.svg
:target: https://github.com/scailfin/flowserv-core/blob/master/LICENSE
.. image:: https://github.com/scailfin/flowserv-core/workflows/build/badge.svg
:target: https://github.com/scailfin/flowserv-core/actions?query=workflow%3A%22build%22
.. image:: https://readthedocs.org/projects/flowserv-core/badge/?version=latest
:target: https://flowserv-core.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://codecov.io/gh/scailfin/flowserv-core/branch/master/graph/badge.svg
:target: https://codecov.io/gh/scailfin/flowserv-core
.. figure:: https://github.com/scailfin/flowserv-core/blob/master/docs/figures/logo.png
:align: center
:alt: flowServ Logo
About
=====
This repository contains the implementation of the core infrastructure for the *Reproducible and Reusable Data Analysis Workflow Server* (**flowServ**). This is an experimental prototype to support reuse and evaluation of published data analysis pipelines as well as community benchmarks of data analysis algorithms. **flowServ** is not yet-another workflow engine. The aim instead is to provide a layer between a client (e.g. a Web user interface) and a workflow engine to facilitate the execution of a defined workflow templates (as shown in the figure below). *flowServ* is designed to be independent of the underlying workflow engine.
Workflow templates contain placeholders for workflow steps and/or input data and parameters that are provided by the user (e.g., by providing Docker containers that satisfy the workflow steps or uploading input data files). **flowServ** triggers and monitors the execution of the workflow for the given input values and maintains the workflow results. The API provides the functionality to submit new workflow runs and to retrieve the evaluation results of completed workflow runs.
.. figure:: https://github.com/scailfin/flowserv-core/blob/master/docs/figures/flowserv-overview.png
:align: center
:alt: ROB Architecture
**flowServ** was motivated by the `Reproducible Open Benchmarks for Data Analysis Platform (ROB) `_. The goal of ROB is to allow user communities to evaluate the performance of their different data analysis algorithms in a controlled competition-style format. In ROB, the benchmark coordinator defines the workflow template along with input data. Benchmark participants provide their own implementation of the variable workfow steps. The workflow engine processes workflows on submission. Execution results are maintained by **flowServ** in an internal database. The goal of **flowServ** is to be a more generic platform that can not only be used for benchmarks but also for other types of data analysis workflows.
More Information
================
Workflow templates are motivated by the goal to allow users to run pre-defined data analytics workflows while providing their own input data, parameters, as well as their own code modules. Workflow templates are inspired by, but not limited to, workflow specifications for the `Reproducible Research Data Analysis Platform (REANA) `_. The `Workflow Templates Section `_ provides further information about templates and their syntax. These templates are used by **flowServ** to run workflows and to maintain benchmark results.
The **flowServ** API defines the main interface to programmatically interact with the underlying database and workflow engine. The API implementation that is included in this repository provides a default serialization of all API resources as Python dictionaries. The API is intended to be used by Web applications. These applications can be build using different frameworks. The `current default Web API implementation for ROB `_ uses the `Flask web framework `_.
ROB currently provides two different interfaces to interact with a Web API: the `Command Line Client `_ and the `Web User Interface `_. See the respective repositories for further information on how to install and use these interfaces.
For an overview of ROB there are `slides `_ from the ROB Demo at the `Moore-Sloan Data Science Environment's annual summit 2019 `_ and our `presentation `_ at the `Analysis Systems Topical Workshop `_.
The full documentation is also available on `readthedocs.io `_.
Note
====
**flowServ** originated from the Reproducible Open Benchmarks for Data Analysis Platform (ROB). This repository replaces `Workflow Templates `_ and the `Reproducible Benchmark Engine `_ from an earlier version of ROB.