Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anirudhg07/mnist_multidigitevaluator
https://github.com/anirudhg07/mnist_multidigitevaluator
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/anirudhg07/mnist_multidigitevaluator
- Owner: AnirudhG07
- License: gpl-3.0
- Created: 2023-12-27T19:27:36.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-02-04T18:05:30.000Z (10 months ago)
- Last Synced: 2024-02-04T21:00:31.846Z (10 months ago)
- Language: Jupyter Notebook
- Size: 24.1 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MNIST LINEAR MULTI DIGIT EVALUATION
MNIST is a famous dataset which is used by everyone to train models to identify handwritten images.
However, the images contains only 1 digit rather than many.
Hence, this repository contains code on how to make your own dataset and how to evaluate your model.
Therefore, we shall create our own `Dataset` and `train` our model to identify it with the following processes and ideas.Visit Kaggle to download more pre-made Data at MNIST Linear MULTIDIGIT DATASET created by me.
# Image Classifications and Dataset
Two datasets of 10000 images has been uploaded, one containing images with 2 digit numbers and other with 3 digit numbers. You can unzip and see the jpeg images.
You can also create your own dataset of n digit numbers using `data_creator_n.py` which will load the MNIST dataset and randomly select n of 60000 training images
(or testing, depending on what you would like) and concatenate them to form a 28 x 28*n size image, saving all in matter of seconds.
A csv file containing the number labels will also be created and automatically saved in your folder.
For Example:
See the below image of number `726766291`, a 9 nine digit number.
For a human, it is trivial to identify it, but for a machine we need to segregate the image into pieces so that it can evaluate using out model trained.
**IMPORTANT NOTE:**
While saving the image, matplotlib may change the dimension of image from 28x252 to 55x496 as well, which is dangerous as model is trained on 28*28=784 input size.
Hence PIL is used to save the image.
```python
import numpy as np
from PIL import Image# HOW TO SAVE appropriately to maintain size
combined_image_np = combined_image.numpy() # combined image is the final concatenated image
combined_image_pil = Image.fromarray((combined_image_np * 255).astype(np.uint8))
combined_image_pil.save("image.jpg")
```
# Segregation of image
The most important part of the process is proper segregation of image and feeding the image tensor to the model. The input size of `mnist_model.pt` is usually 28*28=784. Hence the image tensor must have shape (1,28,28) where 1 stands for channels(grayscale in this case, 3 for RGB).
We have made a list of tensors of 28x28 chunks of images and parsed the tensor later on.
```python
chnl, height, width= image.shape #image is a tensor here
for i in range(width//height):
# split image into 28x28 chunks
img_tensor = image[:, :, i*height:(i+1)*height] # breaks into consecutive image tensors
image_seg.append(img_tensor)
```
If you use any library which does not give 28x28 chunks and resizes it AFTER saving. Make sure to change input size of model & segregation which take place width//height iterations, this value should correpond to number of digits.
# Model Evaluation
Model is made made using Pytorch or Tensorflow , which will detect the number in each 28x28 image and add up to finally output desired result.
The code `evaluator.py` loads the data, trains the model, segregates image of n digit and finally OUTPUTS predicted n digit number.
**= 726766291**
Thus we have successfully made our dataset and model which gives us required values. This idea is very general and basic, various other techniques are need for different types for example, for language character identification too. This way the whole text images can be cut and sentences and words can be outputted. The main and the most important step is how to segragate it!