https://github.com/thomd/machine-learning-ops-with-airflow
Train a Machine Learning Model using Apache Airflow
https://github.com/thomd/machine-learning-ops-with-airflow
airflow mlops python sklearn
Last synced: about 1 month ago
JSON representation
Train a Machine Learning Model using Apache Airflow
- Host: GitHub
- URL: https://github.com/thomd/machine-learning-ops-with-airflow
- Owner: thomd
- Created: 2023-04-05T11:49:15.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-04-05T13:20:56.000Z (about 3 years ago)
- Last Synced: 2025-03-30T19:14:58.119Z (about 1 year ago)
- Topics: airflow, mlops, python, sklearn
- Language: Python
- Homepage:
- Size: 5.86 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Machine Learning Training using Apache Airflow
This is an **educational project** and **proof-of-concept**.
The pipeline trains a **Random Forest classifier** and a **Logistic Regression classifier** using the
[Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) and identifies the best model by accuracy.
## ML Pipeline
```
airflow dags show ml | sed 1d | graph-easy --as=boxart
╭──────────────────╮ ╭────────────────────╮ ╭────────────────╮ ╭─────────────────────╮
│ download_dataset │ ──▶ │ data_processing │ ──▶ │ train_logistic │ ──▶ │ identify_best_model │
╰──────────────────╯ ╰────────────────────╯ ╰────────────────╯ ╰─────────────────────╯
│ ▲
│ │
▼ │
╭────────────────────╮ │
│ train_randomforest │ ─────────────────────────────┘
╰────────────────────╯
```
## Setup
pyenv shell 3.10.9
python -m venv .venv
source .venv/bin/activate
export AIRFLOW_HOME=$(pwd)
export SQLALCHEMY_SILENCE_UBER_WARNING=1
export AIRFLOW__CORE__LOAD_EXAMPLES=False
pip install apache-airflow numpy pandas scikit-learn
airflow db init
airflow scheduler
## Train
airflow dags unpause ml
airflow dags trigger ml
cat accuracy.txt