Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ivangolt/mlops_flask_mlflow_grafana_streamlit_aws

This is a capstone project associated with MLOps Zoomcamp. The end goal of the project is to build an end-to-end machine learning project containing feature engineering, training, validation, tracking, modeel deployment, hosting, and general engineering best practices aimed at making house price predictions.
https://github.com/ivangolt/mlops_flask_mlflow_grafana_streamlit_aws

aws evidently flask mlflow s3-bucket streamlit

Last synced: 16 days ago
JSON representation

Host: GitHub
URL: https://github.com/ivangolt/mlops_flask_mlflow_grafana_streamlit_aws
Owner: ivangolt
Created: 2024-09-09T14:04:10.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-09-09T16:55:35.000Z (2 months ago)
Last Synced: 2024-10-16T12:59:19.639Z (about 1 month ago)
Topics: aws, evidently, flask, mlflow, s3-bucket, streamlit
Language: Python
Homepage:
Size: 1010 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Capstone Project (Mlops-Zoomcamp) - House Price Prediction

![Architecture](./images/house-price-497112-KhCJQICS.jpg)

## Problem Statement

This is a capstone project associated with [MLOps Zoomcamp](https://github.com/DataTalksClub/mlops-zoomcamp), and it will be peer reviewed and scored.

The end goal of the project is to build an end-to-end machine learning project containing feature engineering, trainig, vallidation,tracking, modeel deployment,hosting and general engineering best practices aimed at making house price prediction.

## Dataset

This  data set has 414 rows and 7 columns.

It provides the market historical data set of real estate valuations which are collected from Sindian Dist., New Taipei City, Taiwan.

This data set is recommended for learning and practicing your skills in **exploratory data analysis**, **data visualization**, and **regression modelling techniques**.

Feel free to explore the data set with multiple **supervised** and **unsupervised** learning techniques.

The Following data dictionary gives more details on this data set:

---

### Data Dictionary

| Column   Position 	| Atrribute Name                         	| Definition                                                                                                                                                                 	| Data Type    	| Example                         	| % Null Ratios 	|

|-------------------	|----------------------------------------	|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|--------------	|---------------------------------	|---------------	|

| 1                 	| X1 transaction date                    	| The   transaction date (for example, 2013.250=2013 March, 2013.500=2013 June, etc.)                                                                                        	| Qualitative  	| 2013.500,   2013.500, 2013.333  	| 0             	|

| 2                 	| X2 house age                           	| The house age   (unit: year)                                                                                                                                               	| Quantitative 	| 19.5, 13.3, 5.0                 	| 0             	|

| 3                 	| X3 distance to the nearest MRT station 	| The distance   to the nearest MRT station (unit: meter)                                                                                                                    	| Quantitative 	| 390.5684, 405.21340, 23.38284   	| 0             	|

| 4                 	| X4 number of convenience stores        	| The number of   convenience stores in the living circle on foot                                                                                                            	| Quantitative 	| 6, 8, 1                         	| 0             	|

| 5                 	| X5 latitude                            	| The geographic   coordinate, latitude (unit: degree)                                                                                                                       	| Quantitative 	| 24.97937,   24.97544, 24.94925  	| 0             	|

| 6                 	| X6 longtitude                          	| The geographic   coordinate, longitude (unit: degree)                                                                                                                      	| Quantitative 	| 121.54243, 121.49587, 121.51151	 	| 0             	|

| 7                 	| Y house price of unit area             	| The house price of unit   area (10000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3   meter squared) for example, 29.3 = 293,000 New Taiwan Dollar/Ping 	| Quantitative 	| 29.3, 33.6, 47.7

## Design & flow architecture

The architecture below  depicts the system design:

![Architecture](./images/project_design.png)

Language, frameworks, libraries, Services and Tools used to bootstrap this project.

* ![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)

* ![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white)

* ![scikit-learn](https://img.shields.io/badge/scikit--learn-%23F7931E.svg?style=for-the-badge&logo=scikit-learn&logoColor=white)

* ![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white) : Container

* ![Flask](https://img.shields.io/badge/flask-%23000.svg?style=for-the-badge&logo=flask&logoColor=white) : Prediction service (web app)

* ![Gunicorn](https://img.shields.io/badge/gunicorn-%298729.svg?style=for-the-badge&logo=gunicorn&logoColor=white)

* ![AWS](https://img.shields.io/badge/AWS-%23FF9900.svg?style=for-the-badge&logo=amazon-aws&logoColor=white) : s3 for storage,RDS as database, EC2 as virtual machine

*  : Experiment tracking and model registry

* : Workflow orchestration

*  : open source app framework in Python language

*  : Monitoring

*  : Monitoring Dashboard

*  : Monitoring Database

* ![Git](https://img.shields.io/badge/git-%23F05033.svg?style=for-the-badge&logo=git&logoColor=white)

* ![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)

* ![MongoDB](https://img.shields.io/badge/MongoDB-%234ea94b.svg?style=for-the-badge&logo=mongodb&logoColor=white)

* ![Linux](https://img.shields.io/badge/Linux_Bash_Scripting-FCC624?style=for-the-badge&logo=linux&logoColor=black)

* Pylint + Black + isort : Linter and code formaters

## Flow

- [x] Training , orchestration, Tracking, Model Registry & Deployment

```bash

make train

```

![experiement](./images/MLFLOW_EXPER.PNG)

![model](./images/mlflow_model.PNG)

![train](./images/train.PNG)

![](./images/deploy.PNG)

- [x] Prediction service setup , Monitoring service setup, Integratin Test, Streamlit provisioning

```bash

make build

```

![experiement](./images/docker.PNG)

- [x] Batch Prediction

```bash

python stream_send.py

```

![experiement](./images/drift.PNG)

- [x] Prediction

```bash

[http:](http://localhost:8501)

```

![experiement](./images/web_page_STREAMLIT.PNG)

## Project Tree Structure

The following is the resulting repo structure:

```bash

|-- Makefile

|-- README.md

|-- Test

|   `-- integration_test

|       `-- run.sh

|-- Tracking_Orchestration

|   |-- Pipfile

|   |-- Pipfile.lock

|   |-- test.py

|   |-- track.sh

|   `-- train.py

|-- data

|   |-- batch_test.csv

|   |-- data.xlsx

|   `-- train.csv

|-- images

|   |-- MLFLOW_EXPER.PNG

|   |-- deploy.PNG

|   |-- docker.PNG

|   |-- drift.PNG

|   |-- mlflow_model.PNG

|   |-- train.PNG

|   `-- web_page_STREAMLIT.PNG

|-- pyproject.toml

|-- streamlit

|   |-- Dockerfile

|   |-- Pipfile

|   |-- Pipfile.lock

|   |-- frontend.py

|   `-- images

|       `-- house.jpg

`-- web_service_monitoring

    |-- Pipfile

    |-- Pipfile.lock

    |-- docker-compose.yml

    |-- evidently_service

    |   |-- Dockerfile

    |   |-- app.py

    |   |-- config

    |   |   |-- grafana_dashboards.yaml

    |   |   |-- grafana_datasources.yaml

    |   |   `-- prometheus.yml

    |   |-- config.yaml

    |   |-- dashboards

    |   |   |-- cat_target_drift.json

    |   |   |-- classification_performance.json

    |   |   |-- data_drift.json

    |   |   |-- num_target_drift.json

    |   |   `-- regression_performance.json

    |   |-- datasets

    |   |   `-- train.csv

    |   `-- requirements.txt

    |-- prediction_service

    |   |-- Dockerfile

    |   |-- app.py

    |   `-- requirements.txt

    |-- requirements.txt

    |-- stream_send.py

    `-- test.py

   13 directories, 46 files

```