https://github.com/outerbounds/airflow-on-minikube
https://github.com/outerbounds/airflow-on-minikube
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/outerbounds/airflow-on-minikube
- Owner: outerbounds
- Created: 2022-04-18T02:22:35.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2022-08-16T23:34:26.000Z (almost 4 years ago)
- Last Synced: 2025-01-06T08:17:52.236Z (over 1 year ago)
- Language: Python
- Size: 25.4 KB
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Metaflow with Airflow on Minikube
Before proceeding please configure an Amazon S3 bucket and ensure you have AWS credentials (access keys, secret keys etc.) set in your environment that allow you to issue get/put/list requests to this bucket.
To run Metaflow flows with Airflow on Minikube follow these steps in order:
1. [Setup Helm, Minikube, and a dag folder mount point to the minikube cluster](#setup-helm-and-minikube)
2. [Create Kubernetes namespaces and set AWS related Kubernetes secrets.](#namespace-and-authentication-setup)
3. [Setup Metaflow and airflow on the minikube cluster](#setup-metaflow-and-airflow-in-minikube-cluster)
4. [Setup metaflow related configurations](#setting-up-auth-and-metaflow-configurations)
1. [Setup metaflow configuration file](#metaflow-configuration-setup)
2. [Run a test flow after setting up the metaflow configuration file](#creating-a-test-airflow-dag-from-a-metaflow-flow)
3. Access the [Airflow](#getting-access-to-airflow-ui) and [Metaflow UI](#getting-access-to-metaflow-ui)
4. [Access the artifacts generated by your flow programmatically](#accessing-artifacts-generated-from-the-flow)
## Setup Helm and Minikube
1. Install Helm and Minikube. You can get these using `brew`: `brew install minikube helm`. We recommend helm version >= 2.5.0.
2. Start Minikube: `minikube start --cpus 6 --memory 10240` . We recommend at least 6 cpus for the entire deployment. The more resources, the smoother the overall user-experience.
3. Update Helm with the airflow repo: `helm repo add apache-airflow https://airflow.apache.org`
4. In a separate terminal window run: ``minikube mount ./dags:/data/dags``. This will create a volume on Minikube that points to the dags folder in this directory. Any file added in the dags folder will be available to containers that have this volume mounted. This step is required for [setting up Airflow on the Minikube cluster](#setup-metaflow-and-airflow-in-minikube-cluster). You can change `./dags` to any directory where you will host Airflow DAGs.
## Namespace and Authentication Setup
1. Install the `requirements` file in this repository with `pip`.
```bash
pip install -r requirements.txt
```
2. Create a namespace for `metaflow` and `airflow` deployment
```bash
kubectl create namespace metaflow
kubectl create namespace airflow
```
3. Once the namespaces have been created, we will setup AWS related credentials in the `metaflow` and `airflow` namespaces. The below command creates secrets for all environment variables in your local shell that start with `AWS`. These secrets will be essential when running Metaflow flows via Airflow for this setup. You wouldn't necessarily need to do the same for a production deployment of Metaflow on Airflow.
```bash
python metaflow_configure.py setup-aws-secrets afsecret airflow
python metaflow_configure.py setup-aws-secrets afsecret metaflow
```
## Setup Metaflow and Airflow in Minikube Cluster
1. The below command deploys Airflow using the Helm chart configuration provided in [airflow-minikube-config.yml](./airflow-minikube-config.yml). The configuration values attach a volume to the dags folder in the Airflow containers. Currently, the [dags](./dags) folder in the root of this repository is attached as the common volume. Any new files added to this folder will be automatically made available inside Airflow containers.
```bash
helm upgrade --install airflow apache-airflow/airflow \
-f airflow-minikube-config.yml \
--timeout 10m0s \
--namespace airflow --create-namespace
```
2. Wait for 10 minutes after the helm deployment has finished, since all components may not be in the ready state immediately.
3. Install `nginx-ingress` in the Minikube cluster : `minikube addons enable ingress`
4. Clone the metaflow tools repository : `git clone git@github.com:outerbounds/metaflow-tools.git`
5. Deploy the Helm chart for Metaflow from the metaflow-tools repo using the below command. The namespace of the below deployment is `metaflow`. Change the path for `s3://mybucket` to the path of your Amazon S3 bucket. Change `metaflow-ui.envFrom[0].secretRef.name` from `afsecret` to something else only if you have set a different secret name.
```bash
helm upgrade --install metaflow metaflow-tools/k8s/helm/metaflow \
--timeout 15m0s \
--namespace metaflow \
--create-namespace \
--set metaflow-ui.METAFLOW_DATASTORE_SYSROOT_S3=s3://mybucket/metaflow \
--set "metaflow-ui.envFrom[0].secretRef.name=afsecret" \
--set metaflow-ui.ingress.className=nginx \
--set metaflow-ui.ingress.enabled=true \
--set metaflow-ui.image.tag=latest
```
- Wait for a few minutes so that the metaflow deployment finishes
## Metaflow Configurations
### Metaflow Configuration Setup
Create `~/.metaflowconfig` folder if it doesn’t exist and then run the below command to extract a Metaflow configuration for the Minikube cluster and store it in the `~/.metaflow_config` folder. The command requires an Amazon S3 bucket path. If you changed the name of the secret in when [setting up metaflow](#setup-metaflow-and-airflow-in-minikube-cluster) then add `--metaflow-secret ` option to the below command
```bash
python metaflow_configure.py export-metaflow-config s3://mybucket > ~/.metaflowconfig/config.json
```
### Creating a Test Airflow Dag from a Metaflow flow
1. Install Metaflow fork with Airflow support:
```bash
pip install metaflow
```
2. Since we have [added the AWS related environment variables](#namespace-and-authentication-setup) to `afsecret` we can just run the below command to create the `firstdag.py` :
```bash
python flows/card_flow.py --with kubernetes airflow create dags/firstdag.py
```
### Getting Access to Airflow UI
The below command forwards traffic from port 8080 on the airflow-webserver container to port 8080 on the local machine. After running the command you can access the Airflow UI at `http://localhost:8080`
```bash
kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow
```
### Getting Access to Metaflow UI
The below command opens a tunnel to the Metaflow UI. After running the command you can access the Metaflow UI at `http://localhost`
```bash
minikube tunnel
```
If Minikube tunnel doesn't work you can always ssh-port forward port 80 on the Minikube shell.:
```bash
# Running this will have UI available at http://localhost:8008
ssh -i $(minikube ssh-key) docker@$(minikube ip) -L 8008:localhost:80
```
### Accessing Artifacts generated from the flow
The below command forwards traffic from port 8080 of the metaflow-service to port 8089 on the local machine.
```sh
kubectl port-forward svc/metaflow-metaflow-service 8089:8080 --namespace metaflow
```
After running this command you can access the artifacts of your flows in the following way :
```python
from metaflow import Flow,metadata
metadata('http://localhost:8089')
flow = Flow('CardFlow') # Change the flow name to get a different flow.
run = flow.latest_run # Get the run object
steps = list(run.steps())
task = steps[-1].task # Getting task
data_artifacts = list(task) # Get the data artifacts.
```
## [User API Documentation](./user-api.md)