https://github.com/dragonxdev/pytorch-mnist-nn
A neural network designed to convert handwritten digits into their numerical value.
https://github.com/dragonxdev/pytorch-mnist-nn
machine-learning mnist neural-network pytorch
Last synced: 10 months ago
JSON representation
A neural network designed to convert handwritten digits into their numerical value.
- Host: GitHub
- URL: https://github.com/dragonxdev/pytorch-mnist-nn
- Owner: DragonXDev
- Created: 2024-05-03T02:52:29.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-09T17:14:36.000Z (about 2 years ago)
- Last Synced: 2025-02-25T00:35:23.818Z (over 1 year ago)
- Topics: machine-learning, mnist, neural-network, pytorch
- Language: Python
- Homepage:
- Size: 24.3 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MNIST PyTorch Neural Network
A MNIST CNN developed to recognize handwritten digits; grayscale input channel of 1 accepted, ```1x28x28``` input.

## Usage & Information
The CNN is trained in batches of ```32``` from the DB. Each of the 32 filters is a 2D ```3x3``` and extracted from the ```1x28x28```
A Visual of an MNIST CNN

To train the CNN,
```bash
poetry run python src/train_model.py
```
This will contact the MNIST DB & run 10 epochs to train the model. The appropriate state after the model is finished executing will be created
as the ```digital_model.pt``` file.
Once completed, run the model with the binary generated using
```bash
poetry run python src/main.py
```
## Optimizers
For the purpose of MNIST, the Adam optimizer with ```lr=1e-3``` performs best

### Adam
Starting with Stochastic Gradient Descent
```math
g = \frac{1}{m} \nabla_{\theta}\sum_{i}L(f(x^{(i)};\theta),y^{(i)})
```
Where $\theta$ is the models' params, $g$ is the negative direction of the gradient, $m$ is the size of the mini-batches of data, $f(x^{(i)}; \theta)$ is the neural network, $x^{(i)}$ is the training data, $y^{(i)}$ are the training labels, and $L()$ is the loss function.
The Adam optimizer redefines SGD's params as such:
```math
m = \beta_{1}m + (1-\beta_{1})g
```
```math
s = \beta_{2}s + (1-\beta_{2})g^{T}g
```
```math
\theta = \theta - \epsilon_{k}\cdot\frac{m}{\sqrt{s+eps}}
```
For this project, $\beta_{1} = 0.9$, $\beta_{2} = 0.999$, and $eps$ (learning rate) $=$ 1e-3.
### Nesterov Momentum $\nabla$
Nesterov's Momentum Acceleration $\nabla$ (NAG) performs significantly worse than Adam.
```math
\nu = \alpha\nu - \epsilon\nabla_{\theta}(\frac{1}{m}\sum_{i}L(f(x^{(i)};\theta+\alpha\cdot\nu),y^{(i)})
```
```math
\theta = \theta + \nu
```
The loss / cost after ```Epoch 10``` ended at ~ ```0.01253``` on average.
## Contributors
I am the sole contributor of this project.
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md)