https://github.com/sharifah-malhan/etl-with-pyspark

python script for ETL job with pyspark, both the source and destination databses are MySQL (the spark job is embedded into flask for the sake of deployment)
https://github.com/sharifah-malhan/etl-with-pyspark

etl etl-automation etl-job etl-pipeline flask pyspark python spark

Last synced: 7 months ago
JSON representation

python script for ETL job with pyspark, both the source and destination databses are MySQL (the spark job is embedded into flask for the sake of deployment)

Host: GitHub
URL: https://github.com/sharifah-malhan/etl-with-pyspark
Owner: Sharifah-Malhan
Created: 2024-08-21T13:29:09.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-21T20:24:37.000Z (about 1 year ago)
Last Synced: 2025-03-06T17:15:33.229Z (7 months ago)
Topics: etl, etl-automation, etl-job, etl-pipeline, flask, pyspark, python, spark
Language: Python
Homepage:
Size: 3.91 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

in this project i developed ETL job using pyspark

after establishing the connection with the source database (MySQL) the ETL job starts

three stages to the etl:

1- extracting the data from the tables in the database to dataframes

2- apply transformation rules to these dataframes and check the rule is valid

3-connect and load the tranformed data into the destination database (Also MySQL, already manually structured)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sharifah-malhan/etl-with-pyspark

Awesome Lists containing this project

README