{"id":19329742,"url":"https://github.com/outerbounds/metaflow-with-airflow-minio","last_synced_at":"2025-04-22T21:32:02.703Z","repository":{"id":160212368,"uuid":"632166681","full_name":"outerbounds/metaflow-with-airflow-minio","owner":"outerbounds","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-22T22:04:45.000Z","size":13,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-04-22T23:25:47.236Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/outerbounds.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-24T21:12:04.000Z","updated_at":"2024-04-22T23:25:49.296Z","dependencies_parsed_at":null,"dependency_job_id":"340f9473-2986-4059-8dce-cd4c1a528c80","html_url":"https://github.com/outerbounds/metaflow-with-airflow-minio","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-with-airflow-minio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-with-airflow-minio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-with-airflow-minio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-with-airflow-minio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/outerbounds","download_url":"https://codeload.github.com/outerbounds/metaflow-with-airflow-minio/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223905738,"owners_count":17222945,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T02:29:44.273Z","updated_at":"2024-11-10T02:29:44.752Z","avatar_url":"https://github.com/outerbounds.png","language":"Python","funding_links":[],"categories":["Infrastructure \u0026 IaC"],"sub_categories":[],"readme":"## Running Metaflow with Airflow and MinIO on Minikube\n\nThis guide provides instructions on how to set up Metaflow with Airflow and MinIO on Minikube\n\n### Prerequisites\n\nEnsure the following software is installed on your system:\n\n- Minikube: Download and install from [Minikube's official website](https://minikube.sigs.k8s.io/docs/start/)\n- kubectl: Download and install from [Kubernetes' official website](https://kubernetes.io/docs/tasks/tools/)\n- Helm: Download and install from [Helm's official website](https://helm.sh/docs/intro/install/)\n- ngrok: Download and install from [ngrok's official website](https://ngrok.com/downloads)\n- An ngrok key created from the ngrok dashboard\n\n### Step-by-step Instructions\n\n1. **Start Minikube** by executing the following command:\n\n   ```bash\n   minikube start --cpus 6 --memory 10240\n   ```\n\n2. **Add the MinIO Helm repository** and update it:\n\n   ```bash\n   helm repo add minio https://charts.min.io/\n   helm repo update\n   ```\n\n3. **Install MinIO using Helm**:\n\n   ```bash\n   helm install --set resources.requests.memory=512Mi --set replicas=1 --set persistence.enabled=false --set mode=standalone --set rootUser=rootuser,rootPassword=rootpass123 minio-s3 minio/minio\n   ```\n\n4. **Set up port forwarding for the MinIO service**:\n\n   ```bash\n   kubectl port-forward svc/minio-s3 9000 --namespace default\n   ```\n\n   The MinIO service will now be accessible at http://localhost:9000.\n\n5. **Install metaflow and kubernetes**:\n\n   ```bash\n   pip install metaflow kubernetes\n   ```\n\n6. **Create a metaflow bucket** named `metaflow-test` in MinIO using the [create-bucket.py](./create-bucket.py) Python script. The `--access-key`/`--secret-key` correspond to the `rootUser` / `rootPassword` set in Step 4. \n   ```bash\n   python create_bucket.py --access-key rootuser --secret-key rootpass123 --bucket-name metaflow-test\n   ```\n\n7. **Create an ngrok tunnel** to the port-forwarded MinIO service in a separate terminal window:\n\n   ```\n   ngrok http 9000\n   ```\n\n8. **Create a Kubernetes secret for MinIO.** This secret will be used by Metaflow Tasks and Metaflow UI running on Kubernetes to access data stored in MinIO:\n\n   ```sh\n   kubectl create secret generic minio-secret --from-literal=AWS_ACCESS_KEY_ID=rootuser --from-literal=AWS_SECRET_ACCESS_KEY=rootpass123\n   ```\n\n9. **Install Metaflow in the `default` namespace and enable ingress on minikube**:\n\n   ```sh\n   minikube addons enable ingress\n   git clone git@github.com:outerbounds/metaflow-tools.git .mf-tools\n   helm upgrade --install metaflow .mf-tools/k8s/helm/metaflow \\\n   \t--timeout 15m0s \\\n   \t--namespace default \\\n       --set metaflow-ui.uiBackend.metaflowDatastoreSysRootS3=s3://metaflow-test/metaflow \\\n       --set metaflow-ui.uiBackend.metaflowS3EndpointURL=\"\u003cNGROK-TUNNEL-URL-COMES-HERE\u003e\" \\\n       --set \"metaflow-ui.envFrom[0].secretRef.name=minio-secret\" \\\n       --set metaflow-ui.ingress.className=nginx \\\n       --set metaflow-ui.ingress.enabled=true\n   ```\n\n10. **Create a metaflow configuration file** under `~/.metaflowconfig/config.json`. Ensure you name it `config_airflow_minio.json`:\n    ```json\n    {\n        \"METAFLOW_S3_ENDPOINT_URL\": \"\u003cNGROK TUNNEL URL COMES HERE\u003e\",\n        \"METAFLOW_DEFAULT_DATASTORE\": \"s3\",\n        \"METAFLOW_DATASTORE_SYSROOT_S3\":\"s3://metaflow-test/metaflow\",\n        \"METAFLOW_DATATOOLS_S3ROOT\": \"s3://metaflow-test/data\",\n        \"METAFLOW_DEFAULT_METADATA\" : \"service\",\n        \"METAFLOW_KUBERNETES_SECRETS\": \"minio-secret\",\n        \"METAFLOW_SERVICE_INTERNAL_URL\": \"http://metaflow-metaflow-service.default.svc.cluster.local:8080\",\n        \"METAFLOW_AIRFLOW_KUBERNETES_KUBECONFIG_CONTEXT\": \"minikube\"\n    }\n    ```\n\n11. **Start the Airflow installation** in a separate terminal window:\n\n    ```sh\n    pip install apache-airflow apache-airflow-providers-cncf-kubernetes\n    mkdir ~/airflow \u0026\u0026 mkdir ~/airflow/dags \u0026\u0026 airflow standalone\n    ```\n\n12. **Create the Airflow DAG file** for the [helloflow.py](./helloflow.py):\n\n    ```sh \n    export METAFLOW_PROFILE=airflow_minio\n    python helloflow.py airflow create minio-test-dag.py\n    cp minio-test-dag.py ~/airflow/dags/minio-test-dag.py\n    ```\n\n13. **Ensure DAGs get loaded** with `airflow dags reserialize`.\n\n14. **Trigger the DAG** from the Airflow UI. The DAG named `HelloFlow` will appear on the Airflow UI.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fmetaflow-with-airflow-minio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fouterbounds%2Fmetaflow-with-airflow-minio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fmetaflow-with-airflow-minio/lists"}