https://github.com/sharifah-malhan/etl-with-pyspark
python script for ETL job with pyspark, both the source and destination databses are MySQL (the spark job is embedded into flask for the sake of deployment)
https://github.com/sharifah-malhan/etl-with-pyspark
etl etl-automation etl-job etl-pipeline flask pyspark python spark
Last synced: 7 months ago
JSON representation
python script for ETL job with pyspark, both the source and destination databses are MySQL (the spark job is embedded into flask for the sake of deployment)
- Host: GitHub
- URL: https://github.com/sharifah-malhan/etl-with-pyspark
- Owner: Sharifah-Malhan
- Created: 2024-08-21T13:29:09.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-21T20:24:37.000Z (about 1 year ago)
- Last Synced: 2025-03-06T17:15:33.229Z (7 months ago)
- Topics: etl, etl-automation, etl-job, etl-pipeline, flask, pyspark, python, spark
- Language: Python
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
in this project i developed ETL job using pyspark
after establishing the connection with the source database (MySQL) the ETL job starts
three stages to the etl:
1- extracting the data from the tables in the database to dataframes
2- apply transformation rules to these dataframes and check the rule is valid
3-connect and load the tranformed data into the destination database (Also MySQL, already manually structured)