https://github.com/elipapa/pokemon_luigi
example ETL pipeline task using luigi
https://github.com/elipapa/pokemon_luigi
Last synced: 8 months ago
JSON representation
example ETL pipeline task using luigi
- Host: GitHub
- URL: https://github.com/elipapa/pokemon_luigi
- Owner: elipapa
- Created: 2017-02-27T17:06:22.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-12-20T16:13:29.000Z (almost 9 years ago)
- Last Synced: 2025-01-14T14:58:31.430Z (9 months ago)
- Language: Python
- Size: 26.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pokemon_luigi
Example ETL pipeline task using luigi### Install
```shell
$ virtualenv pokemon_luigi.env
$ source pokemon_luigi.env/bin/activate
$ pip install -r requirements.txt
# create sqlite database
$ python manage.py version_control
$ python manage.py upgrade
```### Usage
```shell
# run the luigi task
# PYTHONPATH is required as luigi only uses modules in the global path
$ PYTHONPATH=. luigi --local-scheduler --module pokemon_etl PokemonAddTypeCounts
# use --LoadPokemonTask-csv-file to load a specific dataset
$ PYTHONPATH=. luigi --local-scheduler --module pokemon_etl \
> PokemonAddTypeCounts --LoadPokemonTask-csv-file datasets/pokemon_cleaned.csv
# verify the results in the db
$ sqlite3 -csv -batch pokemon.db "SELECT * FROM pokemon_type_counts ORDER BY type_1_count DESC"
```