Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/industrydive/fileflow
Airflow plugin to transfer arbitrary files between operators
https://github.com/industrydive/fileflow
Last synced: 2 months ago
JSON representation
Airflow plugin to transfer arbitrary files between operators
- Host: GitHub
- URL: https://github.com/industrydive/fileflow
- Owner: industrydive
- License: apache-2.0
- Created: 2016-10-26T15:58:42.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-10-19T01:30:48.000Z (about 6 years ago)
- Last Synced: 2024-08-03T02:06:59.116Z (5 months ago)
- Language: Python
- Homepage: http://fileflow.readthedocs.io/en/latest/
- Size: 67.4 KB
- Stars: 78
- Watchers: 12
- Forks: 21
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-apache-airflow - fileflow - Collection of modules to support large data transfers between Airflow operators through either local file system or S3. This addresses a gap where data is too large for XCOMs but too small or inconvenient for loading directly in the operator. Built by [Industry Dive](https://www.industrydive.com/). (Libraries, Hooks, Utilities)
README
# fileflow
[![Documentation Status](https://readthedocs.org/projects/fileflow/badge/?version=latest)](http://fileflow.readthedocs.io/en/latest/?badge=latest)
Fileflow is a collection of modules that support data transfer between Airflow tasks via file targets and dependencies with either a local file system or S3 backed storage mechanism. The concept is inherited from other pipelining systems such as Make, Drake, Pydoit, and Luigi that organize pipeline dependencies with file targets. In some ways this is an alternative to Airflow's XCOM system, but supports arbitrarily large and arbitrarily formatted data for transfer whereas XCOM can only support a pickle of the size the backend database's BLOB or BINARY LARGE OBJECT implementation can allow.
### Installation
pip install from git: `pip install git+git://github.com/industrydive/fileflow.git#egg=fileflow`
### Resources
- Read the docs at [readthedocs.io](http://fileflow.readthedocs.io/en/latest/).
- Learn about why Industry Dive chose to make fileflow with [this video from PyData DC 2016](https://www.youtube.com/watch?v=60FUHEkcPyY&index=35&list=PLGVZCDnMOq0qLoYpkeySVtfdbQg1A_GiB) given by contributor [@lauralorenz](https://github.com/lauralorenz)### Contributors
- [@lauralorenz](https://github.com/lauralorenz)
- [@MiriamSexton](https://github.com/MiriamSexton)
- [@dbarbar](https://github.com/dbarbar)
- [@dvetal](https://github.com/dvetal)