Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/khadkarajesh/wine-prediction

White and Red Wine classification using logistic regression
https://github.com/khadkarajesh/wine-prediction

airflow airflow-dags classification classification-algorithm data-science dataingestion evidently flask logistic-regression logistic-regression-algorithm machine-learning machine-learning-pipeline mlflow numpy pandas pipeline postgresql python scikit-learn supervised-learning

Last synced: 26 days ago
JSON representation

White and Red Wine classification using logistic regression

Awesome Lists containing this project

README

        

# wine-prediction
Wine-Prediction classifies the wine label based upon following features:
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- chlorides
- free sulfur dioxide
- total sulfur dioxide
- density
- pH
- sulphates
- alcohol
- quality
- label

This application is built to demonstrate the machine learning pipeline using widely used technologies.

# Dataset
Dataset is extracted from the [UCI](https://archive.ics.uci.edu/ml/datasets/wine).

## Architecture Diagram

![airflow_diagram](/media/architecture.png)

## Used Technologies

* Flask
* Python
* Streamlit
* Postgresql
* AirFlow 2.2
* Grafana

## Steps to Run Application

1. [Install Dependencies](#install-dependencies)
2. [Run API](#run-api)
3. [Run Airflow](#run-airflow)
4. [Run Frontend](#run-frontend)

### Install Dependencies

1. Create a virtual environment with python3
```shell
python3 -m venv wine_prediction
```
2. Activate the virtual environment:
```shell
cd wine_prediction
source /bin/activate
```
2. Install dependencies
```shell
pip install -r requirements.txt
```

### Run API

1. Create database and add .env file in ```api/.env```. template of ```.env``` is as follows:
```shell
DATABASE_NAME = YOUR_DATABASE
DATABASE_PORT = 5432
USER_NAME = YOUR_DATABASE_USER
USER_PASSWORD = YOUR_DATABASE_USER_PASSWORD
```
2. Navigate to root of the project
3. Set environment variables
```bash
export FLASK_APP=app:create_app
export APP_SETTINGS="api.config.DevelopmentConfig"
```
4. Run Flask
```bash
flask run
```

### Run Frontend

1. Navigate to the ```/frontend``` directory of application
2. Run streamlit application as:

```bash
streamlit run run.py
```

### Run Airflow

1. Create database user and grant all permission to that user which will be used to store the logs of airflow

Create user using psql shell.
```psql
CREATE DATABASE wine_airflow;
CREATE USER airflow_user WITH ENCRYPTED PASSWORD 'airflow_pass';
GRANT ALL PRIVILEGES ON DATABASE wine_airflow TO airflow_user;
```

2. Go to root directory of project and set env variable ```AIRFLOW_HOME``` as:
```bash
export AIRFLOW_HOME=$PWD/airflow
```
3. Initialize database
```bash
airflow db init
```
4. Create User (username:admin, password:admin) to access the airflow web application which will be run
on ```http://localhost:8080```
```bash
airflow users create --username admin --firstname admin --lastname admin --role Admin --email [email protected] --password admin
```
5. Start Airflow Scheduler
```bash
# Set Environment variable to use postgresql as database to store airflow log
export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow

airflow scheduler
```
6. Start Web Server
```bash
# Set Environment variable to use postgresql as database to store airflow log
export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow

airflow webserver
```

Once you run the webserver you can access airflow dashboard on ```http://localhost:8080```.

Airflow has the following data ingestion pipeline:

![airflow_diagram](/media/airflow.png)

When the data validation fails, airflow sends email to the respective member which can be configured by adding following
variables in airflow. To check this scenario we can enable ```mimic_validation_fail``` in airflow variable.

![airflow_diagram](/media/airflow_variable.png)

## Data Drift Report

Data Drift report can be generated by running the jupyter notebook available in the
directory `/notebooks/data_drift_report.ipynb`. If there is drift in data reporting will be of the following format.

![airflow_diagram](/media/data_drift_report.png)