Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/matbragan/emr-airflow
Developing a Flow with EMR and Airflow
https://github.com/matbragan/emr-airflow
airflow aws aws-emr-clusters emr emr-cluster spark
Last synced: about 2 hours ago
JSON representation
Developing a Flow with EMR and Airflow
- Host: GitHub
- URL: https://github.com/matbragan/emr-airflow
- Owner: matbragan
- Created: 2024-07-26T11:41:20.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-08-07T18:31:25.000Z (about 2 months ago)
- Last Synced: 2024-09-25T08:33:32.869Z (3 days ago)
- Topics: airflow, aws, aws-emr-clusters, emr, emr-cluster, spark
- Language: Python
- Homepage:
- Size: 33.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# emr-airflow
en-us
### Developing a Flow with EMR and Airflow
Airflow routines to create, execute and terminate clusters on EMR
### Airflow
Use the [`Makefile`](https://github.com/matbragan/emr-airflow/blob/main/Makefile) to upload a Airflow local server, but before that, configure the host machine environment with [AWS credentials](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html), this way when the make command is executed the Airflow server will have the necessary attributes to run AWS products.
make command to upload the local Airflow:
~~~sh
make airflow-up
~~~### EMR
Change the [`constants.py`](https://github.com/matbragan/emr-airflow/blob/main/dags/emr_development/constants.py) with the correct application bucket.
Change the scripts located on [dags/emr_development/scripts](https://github.com/matbragan/emr-airflow/tree/main/dags/emr_development/scripts) to have the required scripts to run the required project.
If necessary, change the [`emr_config.py`](https://github.com/matbragan/emr-airflow/blob/main/dags/emr_development/emr_config.py) to have the required cluster settings to run the required project.If EMR is being run for the first time in this AWS account, it's necessary to create its default roles, which can be done through the code below:
~~~sh
aws emr create-default-roles
~~~
pt-br
### Desenvolvendo um Fluxo com EMR e Airflow
Rotinas no Airflow para criar, executar e encerrar clusters no EMR
### Airflow
Use o [`Makefile`](https://github.com/matbragan/emr-airflow/blob/main/Makefile) para subir um servidor local do Airflow, mas antes disso, configure a máquina hospedeira com as [Credenciais da AWS](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html), dessa forma quando o comando make for executado o servidor do Airflow terá os atributos necessários para rodar produtos da AWS.
Comando make para subir o Airflow local:
~~~sh
make airflow-up
~~~### EMR
Mude o [`constants.py`](https://github.com/matbragan/emr-airflow/blob/main/dags/emr_development/constants.py) com o bucket correto de sua aplicação.
Altere os scripts localizados em [dags/emr_development/scripts](https://github.com/matbragan/emr-airflow/tree/main/dags/emr_development/scripts) para ter os scripts necessários para rodar seu projeto.
Se necessário, altere o [`emr_config.py`](https://github.com/matbragan/emr-airflow/blob/main/dags/emr_development/emr_config.py) para ter as configurações de cluster necessárias para rodar seu projeto.Se o EMR estiver sendo executado pela primeira vez nesta conta da AWS, é necessário criar suas funções padrões, que pode ser feito através do código abaixo:
~~~sh
aws emr create-default-roles
~~~