An open API service indexing awesome lists of open source software.

https://github.com/sharifah-malhan/etl-with-pyspark

python script for ETL job with pyspark, both the source and destination databses are MySQL (the spark job is embedded into flask for the sake of deployment)
https://github.com/sharifah-malhan/etl-with-pyspark

etl etl-automation etl-job etl-pipeline flask pyspark python spark

Last synced: 7 months ago
JSON representation

python script for ETL job with pyspark, both the source and destination databses are MySQL (the spark job is embedded into flask for the sake of deployment)

Awesome Lists containing this project

README

          

in this project i developed ETL job using pyspark

after establishing the connection with the source database (MySQL) the ETL job starts

three stages to the etl:

1- extracting the data from the tables in the database to dataframes

2- apply transformation rules to these dataframes and check the rule is valid

3-connect and load the tranformed data into the destination database (Also MySQL, already manually structured)