https://github.com/datitran/luigi_boilerplate
Postgres + PySpark
https://github.com/datitran/luigi_boilerplate
Last synced: about 1 year ago
JSON representation
Postgres + PySpark
- Host: GitHub
- URL: https://github.com/datitran/luigi_boilerplate
- Owner: datitran
- Created: 2016-02-27T14:43:16.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-02-27T23:07:27.000Z (over 10 years ago)
- Last Synced: 2024-10-29T08:04:39.326Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 1.95 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
## Luigi Boilerplate for Postgres + PySpark
This is a simple boilerplate for setting up a pipeline for Postgres and PySpark with [Luigi](https://github.com/spotify/luigi). It includes only a dummy postgres task and a pyspark task. The boilerplate is work-in-progress.
Run `python pipeline.py` to start the workflow. I included a dummy target for the Postgres task since the user has to provide the connection to the database in the `PSQLConn` class. In practice, unlike GNU Make, Luigi does not remove the target but it has to be removed by the user. A dummy target will be created for the Spark task which has to be removed as well when you want to restart the pipeline.
Author: Dat Tran