Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/undisputed-jay/airflow-etl-pipeline-with-pyspark-and-google-cloud-dataproc
This project automates daily vehicle data processing on Google Cloud using Apache Airflow. It uploads scripts to Google Cloud Storage, runs specific PySpark jobs on Dataproc based on the day, and shuts down resources when done for efficiency.
https://github.com/undisputed-jay/airflow-etl-pipeline-with-pyspark-and-google-cloud-dataproc
automated-etl-airflow-dataproc cost-effective-data-processing daily-data-analysis-airflow-pyspark
Last synced: about 14 hours ago
JSON representation
This project automates daily vehicle data processing on Google Cloud using Apache Airflow. It uploads scripts to Google Cloud Storage, runs specific PySpark jobs on Dataproc based on the day, and shuts down resources when done for efficiency.
- Host: GitHub
- URL: https://github.com/undisputed-jay/airflow-etl-pipeline-with-pyspark-and-google-cloud-dataproc
- Owner: Undisputed-jay
- Created: 2024-11-01T00:38:49.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-01T00:47:05.000Z (3 months ago)
- Last Synced: 2024-12-20T07:15:19.377Z (about 2 months ago)
- Topics: automated-etl-airflow-dataproc, cost-effective-data-processing, daily-data-analysis-airflow-pyspark
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0