Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/20cent16/airflow-spark
If you want to use airflow with spark, ready to use ;-)
https://github.com/20cent16/airflow-spark
airflow spark
Last synced: 4 months ago
JSON representation
If you want to use airflow with spark, ready to use ;-)
- Host: GitHub
- URL: https://github.com/20cent16/airflow-spark
- Owner: 20cent16
- Created: 2024-09-08T11:18:28.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-27T17:18:21.000Z (5 months ago)
- Last Synced: 2024-09-29T07:01:37.065Z (4 months ago)
- Topics: airflow, spark
- Language: Python
- Homepage:
- Size: 254 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# airflow-spark : if you want to use airflow with spark
## ready to use, enjoy ;-)### Versions
Airflow : `2.10.1`Spark : `bitnami/spark:latest`
### Initalize
Get source files`git clone https://github.com/20cent16/airflow-spark.git`
Navigate to docker directory and execute
`mkdir -p ./apps ./dags ./logs ./plugins ./config`
**apps** = directory containing your programs
**dags** = directory containing your dags calling your programs
`echo -e "AIRFLOW_UID=$(id -u)" > .env`
We manually set AIRFLOW_UID to avoid warning during Airflow starting
### Start containers
Navigate to docker directory and execute`docker compose up`
### Configure Airflow
UI : http://localhost:8080/connection/list/Add a spark connection with following configuration:
**host** : spark://spark-master
**port** : 7077
### Useful and Optional : Clean existing docker files
#### Stop all running containers
`sudo docker stop $(sudo docker ps -aq)`
#### Remove all containers
`sudo docker rm $(sudo docker ps -aq)`
#### Remove all images
`sudo docker rmi $(sudo docker images -q)`