Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/khadkarajesh/wine-prediction
White and Red Wine classification using logistic regression
https://github.com/khadkarajesh/wine-prediction
airflow airflow-dags classification classification-algorithm data-science dataingestion evidently flask logistic-regression logistic-regression-algorithm machine-learning machine-learning-pipeline mlflow numpy pandas pipeline postgresql python scikit-learn supervised-learning
Last synced: 26 days ago
JSON representation
White and Red Wine classification using logistic regression
- Host: GitHub
- URL: https://github.com/khadkarajesh/wine-prediction
- Owner: khadkarajesh
- Created: 2021-11-04T21:10:23.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2021-12-24T14:04:10.000Z (almost 3 years ago)
- Last Synced: 2024-05-05T15:34:58.688Z (6 months ago)
- Topics: airflow, airflow-dags, classification, classification-algorithm, data-science, dataingestion, evidently, flask, logistic-regression, logistic-regression-algorithm, machine-learning, machine-learning-pipeline, mlflow, numpy, pandas, pipeline, postgresql, python, scikit-learn, supervised-learning
- Language: HTML
- Homepage:
- Size: 2.74 MB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# wine-prediction
Wine-Prediction classifies the wine label based upon following features:
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- chlorides
- free sulfur dioxide
- total sulfur dioxide
- density
- pH
- sulphates
- alcohol
- quality
- labelThis application is built to demonstrate the machine learning pipeline using widely used technologies.
# Dataset
Dataset is extracted from the [UCI](https://archive.ics.uci.edu/ml/datasets/wine).## Architecture Diagram
![airflow_diagram](/media/architecture.png)
## Used Technologies
* Flask
* Python
* Streamlit
* Postgresql
* AirFlow 2.2
* Grafana## Steps to Run Application
1. [Install Dependencies](#install-dependencies)
2. [Run API](#run-api)
3. [Run Airflow](#run-airflow)
4. [Run Frontend](#run-frontend)### Install Dependencies
1. Create a virtual environment with python3
```shell
python3 -m venv wine_prediction
```
2. Activate the virtual environment:
```shell
cd wine_prediction
source /bin/activate
```
2. Install dependencies
```shell
pip install -r requirements.txt
```### Run API
1. Create database and add .env file in ```api/.env```. template of ```.env``` is as follows:
```shell
DATABASE_NAME = YOUR_DATABASE
DATABASE_PORT = 5432
USER_NAME = YOUR_DATABASE_USER
USER_PASSWORD = YOUR_DATABASE_USER_PASSWORD
```
2. Navigate to root of the project
3. Set environment variables
```bash
export FLASK_APP=app:create_app
export APP_SETTINGS="api.config.DevelopmentConfig"
```
4. Run Flask
```bash
flask run
```### Run Frontend
1. Navigate to the ```/frontend``` directory of application
2. Run streamlit application as:```bash
streamlit run run.py
```### Run Airflow
1. Create database user and grant all permission to that user which will be used to store the logs of airflow
Create user using psql shell.
```psql
CREATE DATABASE wine_airflow;
CREATE USER airflow_user WITH ENCRYPTED PASSWORD 'airflow_pass';
GRANT ALL PRIVILEGES ON DATABASE wine_airflow TO airflow_user;
```2. Go to root directory of project and set env variable ```AIRFLOW_HOME``` as:
```bash
export AIRFLOW_HOME=$PWD/airflow
```
3. Initialize database
```bash
airflow db init
```
4. Create User (username:admin, password:admin) to access the airflow web application which will be run
on ```http://localhost:8080```
```bash
airflow users create --username admin --firstname admin --lastname admin --role Admin --email [email protected] --password admin
```
5. Start Airflow Scheduler
```bash
# Set Environment variable to use postgresql as database to store airflow log
export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow
airflow scheduler
```
6. Start Web Server
```bash
# Set Environment variable to use postgresql as database to store airflow log
export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow
airflow webserver
```Once you run the webserver you can access airflow dashboard on ```http://localhost:8080```.
Airflow has the following data ingestion pipeline:
![airflow_diagram](/media/airflow.png)
When the data validation fails, airflow sends email to the respective member which can be configured by adding following
variables in airflow. To check this scenario we can enable ```mimic_validation_fail``` in airflow variable.![airflow_diagram](/media/airflow_variable.png)
## Data Drift Report
Data Drift report can be generated by running the jupyter notebook available in the
directory `/notebooks/data_drift_report.ipynb`. If there is drift in data reporting will be of the following format.![airflow_diagram](/media/data_drift_report.png)