https://github.com/jloveric/forward-forward
forward forward algorithm
https://github.com/jloveric/forward-forward
Last synced: 2 months ago
JSON representation
forward forward algorithm
- Host: GitHub
- URL: https://github.com/jloveric/forward-forward
- Owner: jloveric
- License: mit
- Created: 2022-12-30T00:38:40.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-30T23:18:16.000Z (over 2 years ago)
- Last Synced: 2025-02-12T19:47:49.651Z (4 months ago)
- Size: 231 KB
- Stars: 8
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# About Repo
This repo was originally copied from https://github.com/mohammadpz/pytorch_forward_forward. I've extended it and used poetry, hydra, tensorboard etc... Work in Progress## installation
```
poetry install
```
running the example
```
python examples/mnist.py
```
# Notes from the original author. pytorch_forward_forward
Implementation of forward-forward (FF) training algorithm - an alternative to back-propagation
---Below is my understanding of the FF algorithm presented at [Geoffrey Hinton's talk at NeurIPS 2022](https://www.cs.toronto.edu/~hinton/FFA13.pdf).\
The conventional backprop computes the gradients by successive applications of the chain rule, from the objective function to the parameters. FF, however, computes the gradients locally with a local objective function, so there is no need to backpropagate the errors.
The local objective function is designed to push a layer's output to values larger than a threshold for positive samples and to values smaller than a threshold for negative samples.
A positive sample $s$ is a real datapoint with a large $P(s)$ under the training distribution.\
A negative sample $s'$ is a fake datapoint with a small $P(s')$ under the training distribution.
Among the many ways of generating the positive/negative samples, for MNIST, we have:\
Positive sample $s = merge(x, y)$, the image and its label\
Negative sample $s' = merge(x, y_{random})$, the image and a random label
After training all the layers, to make a prediction for a test image $x$, we find the pair $s = (x, y)$ for all $0 \leq y < 10$ that maximizes the network's overall activation.
With this implementation, the training and test errors on MNIST are:
```python
> python main.py
train error: 0.06754004955291748
test error: 0.06840002536773682
```