An open API service indexing awesome lists of open source software.

https://github.com/iacolippo/direct-feedback-alignment

Experiments with Direct Feedback Alignment training scheme for DNNs
https://github.com/iacolippo/direct-feedback-alignment

dfa ipynb neural-network

Last synced: 8 months ago
JSON representation

Experiments with Direct Feedback Alignment training scheme for DNNs

Awesome Lists containing this project

README

          

# Direct-Feedback-Alignment

## Understanding the general framework

In [dfa-linear-net.ipynb](https://github.com/iacolippo/Direct-Feedback-Alignment/blob/master/dfa-linear-net.ipynb), I show how a neural network without activation function can learn a linear function (multiplication by a matrix) using direct feedback alignment (DFA), as in [Nøkland, 2016](https://arxiv.org/pdf/1609.01596.pdf). There is also some theory about it.

In [dfa-mnist.ipynb](https://github.com/iacolippo/Direct-Feedback-Alignment/blob/master/dfa-mnist.ipynb), I show how a neural network trained with DFA achieves very similar results to one trained with backpropagation. The architecture is very simple: one hidden layer of 800 Tanh units, sigmoid in the last layer and binary crossentropy loss.

Go to the last lines of [mlp-torch-results.txt](https://github.com/iacolippo/Direct-Feedback-Alignment/blob/master/mlp-torch-results.txt) if you want to see the results of the same architecture using Torch code provided by Nøkland.

## Stacking neural networks

Do networks with different feedback matrices learn different features at least in the first few steps? Apparently yes. **Stacking** works training a lot of weak learners on recognizing different features and using their outputs as inputs for a new model, which will learn how to combine these weak learners and give a performance boost.

In [Stacking-dfa-nets](https://github.com/iacolippo/Direct-Feedback-Alignment/tree/master/Stacking-dfa-nets) folder, you have the following files. They must be executed in the following order:

1. *create_dataset.py*: preprocess MNIST data loaded from Keras and save them to a Numpy file *mnist.npz*, ready to be used.
2. *weak-learners.py* or *diff-weak-learners.py*: train as many weak learners as you want (NNs with one hidden layer 800 Tanh units). The difference between the first and the second is that the first trains all of them starting from the same initialization, while the second initializes each one of them in a different state. They generate respectively files called: *train_linouts.npz* & *test_linouts.npz*, *diff-train_linouts.npz* & *diff-test_linouts.npz*
3. *stacked-model.py* or *RD-stacked-model.py*: train respectively a dense or an RD layer on top of the features extracted by each weak learner. The program takes as input the names of the files generated by the previous steps and the number of weak learners given in the previous step.

Example call to train 50 weak learners:
```bash
python weak_learners.py 50
```

Example call to train a stacked model on top of 50 weak learners:
```bash
python weak_learners.py 50 train_linouts.npz test_linouts.npz
```

## RD Layers

Layers with a linear number of parameters vaguely inspired by [ACDC](https://arxiv.org/abs/1511.05946). Basically they do these operations:

![Equation](http://latex.codecogs.com/gif.latex?a_1%20%3D%20D_1%20x%5Cmbox%7B%2C%20%7D%20a_2%20%3D%20R%20a_1%5Cmbox%7B%2C%20%7D%20a_3%20%3D%20D_2%20a_2)

where D1 and D2 are diagonal matrices and R is a random matrix.

# Requirements

- numpy
- matplotlib
- scipy
- keras
- scikit-learn