https://github.com/nithin-sudarsan/handwritten-digit-recognizer
A Convolutional Neural Networks model used to recognize handwritten numbers.
https://github.com/nithin-sudarsan/handwritten-digit-recognizer
cnn-classification deep-learning deep-neural-networks digit-recognition handwritten-digit-recognition machine-learning neural-network
Last synced: 5 months ago
JSON representation
A Convolutional Neural Networks model used to recognize handwritten numbers.
- Host: GitHub
- URL: https://github.com/nithin-sudarsan/handwritten-digit-recognizer
- Owner: nithin-sudarsan
- Created: 2022-03-24T16:53:04.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-04-08T18:04:20.000Z (over 3 years ago)
- Last Synced: 2025-02-16T17:43:53.494Z (8 months ago)
- Topics: cnn-classification, deep-learning, deep-neural-networks, digit-recognition, handwritten-digit-recognition, machine-learning, neural-network
- Language: Jupyter Notebook
- Homepage:
- Size: 29.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Handwritten-digit-recognizer
Different machine learning models were built and compared to recognize the handwritten digits.## Input format fed to the models
### Training data
The `train.csv` file contains training data, which consists of 42000 images of dimensions 28 pixels x 28 pixels each. Each value in the training data represents the brightness of the particular pixel, and this value varies from 0 to 255, 255 being the brightest pixel value. The first column in the training data represents the target variable whereas the other columns represent the feature variables.
### Testing data
The `test.csv` file contains testing data, which consists of 28000 images of dimensions 28 pixels x 28 pixels each. Each value in the training data represents the brightness of the particular pixel, and this value varies from 0 to 255, 255 being the brightest pixel value.
## Models built and their respective accuracy scores:
| Model| Accuracy Score |
|-------|---------------|
| Logistic Regression | 0.8 (80%) |
| Random Forest Clasifier | 0.8 (80%) |
| Decision Tree Classifier | 0.8 (80%) |
| Naive Bayes classifier | 0.8 (80%) |From the table above we can see that a Convolutional Neural Networks model has the best performance in recognizing handwritten numbers.
Let us now understand the working of a convolutional neural network.## Working of Convolutional Neural Networks
Convolutional Neural networks is a branch in Deep-learning that is found to be very effective in the field of media processing such as image recognition and audio/video recognition.
The dimentionality of the image is reduced before it is fed to a full-connected neural networks, in such a way that all the important features in an image is retained.
The important processes in a Convolutional Neural Network in image processing are as follows
1. [Convolution](#convolution)
2. [Padding](#padding)
3. [Pooling](#pooling)
4. [Flatten](#flatten)#
Convolution
In the process of Convolution, a Kernel (a matrix) which is used to extract the features from the images, moves over the input image and performs dot product with the sub-region of that image. The output of Convolution is a matrix of this dot product.
The kernel is moved across the image from left to right, top to bottom according to the number of steps given by the Stride value.
The dimensions of the image after undergoing Convolution is given by
O=([I-K]/S)+1
`O` stands for Output image dimension
`I` stands for Input image dimension
`K` stands for Kernel size
`S` stands for Stride
#
Padding
Padding is the approach where an extra layer of pixels are added around the image copying the pixels from the edge of the image in order to make the process of convolution efficient at the edge pixels.
It is used to resolve the Border Effect that is caused when the edge pixels are not processed completely during convolution.
The dimensions of the image after undergoing Padding is given by
O=([I-K+2P]/S)+1
`O` stands for Output image dimension
`I` stands for Input image dimension
`K` stands for Kernel size
`P` stands for Padding size
`S` stands for Stride
#
Pooling
Pooling is used to scale down the dimensions of the image by retaining the important features in the feature map. The features are scaled down by summarizing the presence of features in patches of the feature maps. These patches are generally known as `Pool Window`.
The common Pooling methods are
* **Max Pooling**
Used to summarize the most activated presence of a feature in the pooling window
* **Min Pooling**
Used to summarize the least activated presence of a feature in the pooling window
* **Average Pooling**
Used to summarize the average presence of a feature in the pooling window#
Flatten
Once the pooled feature map is obtained the next step is to flatten this feature map into a single column before feeding it to the neural network. This step makes the computation of the neural network much efficient and less expensive.
### The diagram below shows the processes that an image undergoes before it is fed to a neural network