Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eschmidt42/etl-with-metaflow
The bare minimum to create an csv -> parquet ETL using metaflow.
https://github.com/eschmidt42/etl-with-metaflow
Last synced: 9 days ago
JSON representation
The bare minimum to create an csv -> parquet ETL using metaflow.
- Host: GitHub
- URL: https://github.com/eschmidt42/etl-with-metaflow
- Owner: eschmidt42
- License: bsd-3-clause
- Created: 2023-12-04T14:28:23.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-04T17:21:19.000Z (about 1 year ago)
- Last Synced: 2024-11-08T22:47:00.118Z (2 months ago)
- Language: Makefile
- Homepage:
- Size: 44.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Using metaflow for an etl
[Metaflow](https://docs.metaflow.org/introduction/what-is-metaflow) is a fun tool to orchestrate data sciency workflows.
In this repo you find the bare minimum to create an csv -> parquet ETL using metaflow.
The data was taken from: https://data.gov.lv/dati/dataset/76c7e3fe-2a07-4164-a391-9bca8e039992/resource/7af98218-6266-4459-a79d-a7dfe29277e0/download/t.csv
## Setup (linux / macOS)
git clone https://github.com/eschmidt42/etl-with-metaflow.git
cd etl-with-metaflow
make install## Usage
make etl
or
source .venv/bin/activate
python src/etl_with_metaflow/flows/etl.py run --source data/Surprisingly, one needs to call a module containing a flow directly. Currently it seems metaflow is designed to not be run from within python (without using python's `subprocess`).