Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dodat-12/airflow-spark-job
A workspace to experiment with Apache Spark and Airflow in a Docker environment
https://github.com/dodat-12/airflow-spark-job
airflow docker rdbms spark
Last synced: 20 days ago
JSON representation
A workspace to experiment with Apache Spark and Airflow in a Docker environment
- Host: GitHub
- URL: https://github.com/dodat-12/airflow-spark-job
- Owner: DoDat-12
- Created: 2024-10-07T04:35:27.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-29T09:28:07.000Z (2 months ago)
- Last Synced: 2024-10-29T11:42:31.126Z (2 months ago)
- Topics: airflow, docker, rdbms, spark
- Language: Python
- Homepage:
- Size: 1.03 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# A workspace to experiment with Apache Spark and Airflow in a Docker environment
## Prerequisites Setup
- Postgresql (pgAdmin 4)
- MySQL
- SqliteRemember to change username, password, database, table with yours
## Quick Setup
Create airflow-spark cluster
docker compose up -d
# need to run twice to start webserver after init dbAccess webserver at `localhost:8080/home`, username `admin` password `admin`
Set up Spark connection to Airflow
![spark-conn.png](./doc/spark-conn.png)
## TikiJob
![airflow_tiki_v3.png](./doc/airflow_tiki_v3.png)
## Spark Job Optimization
https://towardsdatascience.com/6-recommendations-for-optimizing-a-spark-job-5899ec269b4b
> `.\env\Scripts\activate`