https://github.com/mpolinowski/i-know-flowers
Serve SKLearn Image Classifier using Flask & Docker
https://github.com/mpolinowski/i-know-flowers
model-deployment python sklearn
Last synced: about 2 months ago
JSON representation
Serve SKLearn Image Classifier using Flask & Docker
- Host: GitHub
- URL: https://github.com/mpolinowski/i-know-flowers
- Owner: mpolinowski
- Created: 2023-07-23T09:26:58.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-07-23T09:31:08.000Z (almost 3 years ago)
- Last Synced: 2025-01-28T19:17:43.780Z (over 1 year ago)
- Topics: model-deployment, python, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 11.9 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SKLearn ML Modelserver
- [SKLearn ML Modelserver](#sklearn-ml-modelserver)
- [Quickstart](#quickstart)
- [SGD Classifier](#sgd-classifier)
- [kNN Classifier](#knn-classifier)
- [Dataset](#dataset)
- [Build your own Classifier](#build-your-own-classifier)
- [Step 0 - Dataset Preprocessing](#step-0---dataset-preprocessing)
- [Step 1 - SGD Model Preparation](#step-1---sgd-model-preparation)
- [Step 1 - SGD Model Prediction Pipeline](#step-1---sgd-model-prediction-pipeline)
- [Step 2 - And so on...](#step-2---and-so-on)
- [Step 3 - Deployment](#step-3---deployment)
- [Potential Error](#potential-error)
## Quickstart
> __Update__: I had to remove the model and dataset as they were to big for Github. So this Quickstart does not work unless you download the Flower102 dataset, label it and run the scripts as explained below to generate the models needed for this container ¯\\_ (ツ)_/¯
```bash
cd ./STEP3_Webfrontend_Deployment
docker build -t modelserver .
docker run -p 5000:5000 modelserver:latest
```
Visit `localhost:5000` to access the web interface:

Select the model you want to use and upload a close-up photo of a flower contained in the map classes the model was trained on.
### SGD Classifier
The Stochastic Gradient Descent classifier was trained to an accuracy of `29.4%` over all 45 class labels:

### kNN Classifier
The k-Nearest-Neighbors classifier was trained to an accuracy of `36.7%` over all 45 class labels. As seen from the confusion matrix the confusion is much more localized to a handful of classes and could probably be remedied by cleaning up / extending the dataset for those classes:

### Dataset
Under the __About__ tab you see an overview of all included classes and samples of the images used to train the classifiers - all images used were taken from the [Flowers102 Dataset](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html):

## Build your own Classifier
### Step 0 - Dataset Preprocessing
Collect images and place them into sub folders in `/STEP0_Dataset_Preprocessing/labelled_data` based on their classes. Go though the [Python notebook](/STEP0_Dataset_Preprocessing) to prepare the dataset pickle.
### Step 1 - SGD Model Preparation
The [next step](/STEP1_SGD_01_Model_Preparation) is to prepare your data by performing a train/test-split and extract a feature vector from each image using the Hog transformer. Prepare the SGD Classifier (can be replaced with any classifier from SKLearn that you want to use). Fit it to the feature vectors from your dataset.
Once you have a working model perform a grid search to find the optimal hyperparameter. Re-fit the model using those parameter and export the trained model ready for deployment.
### Step 1 - SGD Model Prediction Pipeline
In the [next step](/STEP1_SGD_02_Model_Prediction_Pipeline) you need to develop the prediction pipeline that you will use for the Flask App Prediction API later on.
### Step 2 - And so on...
Add as many classifiers as you need to find one that performs the best - I added the k-Nearest-Neighbors classifier that outperformed the SGD model. Depending on your dataset you might experience the opposite or need another classifier all-together.
### Step 3 - Deployment
With those models prepared we need a [Flask Backend](/STEP3_Webfrontend_Deployment) to serve them as a prediction API with a simple HTML user interface. Add routes for your own classifiers - if you added them above. And jump back up to [Quickstart](#quickstart) to build and deploy your service container!
#### Potential Error
If you see error messages like the following when trying to start your container:
> `/usr/local/lib/python3.11/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator SGDClassifier from version 1.2.2 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations`
> `AttributeError: Can't get attribute 'ManhattanDistance' on `
Make sure that the version of Scikit-Learn on your system is identical to the version installed inside the container! Edit the version used in `requirements.txt` accordingly.