{"id":19771998,"url":"https://github.com/danieldacosta/airflow-ml-prediction","last_synced_at":"2025-04-30T17:33:04.680Z","repository":{"id":55672574,"uuid":"316796616","full_name":"DanielDaCosta/airflow-ml-prediction","owner":"DanielDaCosta","description":"Running ECS task for ML prediction orchestrated by Airflow","archived":false,"fork":false,"pushed_at":"2023-05-04T23:33:58.000Z","size":101,"stargazers_count":14,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-06T03:41:16.037Z","etag":null,"topics":["airflow","etl"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DanielDaCosta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-28T18:25:58.000Z","updated_at":"2023-08-15T06:06:32.000Z","dependencies_parsed_at":"2022-08-15T06:10:22.654Z","dependency_job_id":null,"html_url":"https://github.com/DanielDaCosta/airflow-ml-prediction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielDaCosta%2Fairflow-ml-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielDaCosta%2Fairflow-ml-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielDaCosta%2Fairflow-ml-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielDaCosta%2Fairflow-ml-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DanielDaCosta","download_url":"https://codeload.github.com/DanielDaCosta/airflow-ml-prediction/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251751354,"owners_count":21637911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","etl"],"created_at":"2024-11-12T05:05:04.755Z","updated_at":"2025-04-30T17:33:04.280Z","avatar_url":"https://github.com/DanielDaCosta.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Airflow ETL\nRunning an ECS task for ML prediction orchestrated by Airflow\n\n## Building Airflow on Docker\n```bash\ndocker pull puckel/docker-airflow\n```\n\nBuilding the image (installing *boto3* for AWS configurations):\n\n```bash \ndocker build -t ml-pipeline .\n```\n\nWe will create a volume that maps the directory on our local machine where we’ll hold DAG definitions, and the locations where Airflow reads them on the container with the following command:\n\n```bash\ndocker run -d -p 8080:8080 -v /Users/danieldacosta/Documents/GitHub/airflow-etl/dags:/usr/local/airflow/dags ml-pipeline\n```\n\n## S3\nOn this example we are using two buckets: one for storing the model (`.sav`) and inputs (`.csv`), and another one for storing the model output.\n\n- READ_BUCKET=ml-sls-deploy-prd\n- READ_DATA_PATH=data\n- READ_MODELS_PATH=models\n- WRITE_BUCKET=ml-sls-deploy-prd-results\n- WRITE_DATA_PATH=results\n\n## Deploy your ECS cluster\nYou will need to create the following objects:\n\n- **Create a Cluster:** Choose `Network only`. This configuration is built using Fargate Tasks: *the Fargate launch type allows you to run your containerized applications without the need to provision and manage the backend infrastructure. When you run a task with a Fargate-compatible task definition, Fargate launches the containers for you.*\n\n- **Task Definition:** The creation of your container blueprint. You'll need to create a `Task Role`: IAM Role that tasks can use to make API requests to authorized AWS services; Since our container is reading and writing to/from s3, it will need these permissions. You will also need to create a `Task Execution Role`: an IAM that helps pulling images from your docker register, we are using ECR here.\n\n- **Add a Container:** You'll need to deploy your container to ECS Fargate. You can use the Docker image on folder 'ml-pipeline' as an example.\n\nI recommend that you follow this tutorial: https://towardsdatascience.com/step-by-step-guide-of-aws-elastic-container-service-with-images-c258078130ce. \n\n## Setting environment variables on Airflow\nYou will need to set up your AWS credentials and ECS variables on the Airflow Console\n![Airflow_varibales](Images/Airflow_Variables.png)\n\n## Run DAG\nOnce everything set up you can Trigger your DAG manually and check if everthing went well.\n\n# References\n\n- http://www.marknagelberg.com/getting-started-with-airflow-using-docker/\n- https://towardsdatascience.com/step-by-step-guide-of-aws-elastic-container-service-with-images-c258078130ce\n- https://headspring.com/2020/06/17/orchestrating-and-running-multiple-tasks-in-aws-via-airflow/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldacosta%2Fairflow-ml-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldacosta%2Fairflow-ml-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldacosta%2Fairflow-ml-prediction/lists"}