Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/manuelblancovalentin/ethw

Edge Trainable Hardware repo.
https://github.com/manuelblancovalentin/ethw
Last synced: 25 days ago
JSON representation
Edge Trainable Hardware repo.
Host: GitHub
URL: https://github.com/manuelblancovalentin/ethw
Owner: manuelblancovalentin
Created: 2023-01-31T19:41:59.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2023-03-16T05:30:57.000Z (almost 2 years ago)
Last Synced: 2024-11-08T10:34:32.317Z (3 months ago)
Language: C++
Size: 67.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # Edge Trainable Hardware (ETHW)

Author: Manuel Blanco Valentin ([email protected])

Supervisor: Seda Memik ([email protected])

## Structure of this repo

* [README.md](README.md): This file. Contains information regarding this repo, the research and the files found in this directory.

* **datasets**: Folder containing dummy datasets created just to 

* **docs**: Folder containing documentation and literature regarding the research. 

* **tutorials**: Folder containing tutorials and external repos regarding hls4ml, ML, conversion of models, etc.

    * hls4ml-tutorial: Cloned from the [hls4ml-tutorial repo](https://github.com/fastmachinelearning/hls4ml-tutorial). Contains some tutorials on how to use hls4ml to create and convert a ml model into synthesizable code.

    * custom: These are custom tutorials created in the process and research of turning hls4ml models into fully on-edge trainable models.

## 1. Tutorials 

Before diving into the creation of an ML model for an actual application, let's start with the tutorials. 

## 1. Getting used to hls4ml 

### 1.1. hls4ml-tutorial

For this, the first thing to do is to create a conda environment just for hls4ml. Make sure you have conda with python3 installed and create a new environment using the *environment.yml* file in *tutorials/hls4ml-tutorial/environment.yml*. 

```shell

conda env create -f environment.yml

conda activate hls4ml-tutorial

```

### 1.2. Compile the neural network using hls4ml and qkeras

Follow tutorials/custom/part1_custom_dummy_forward_network.ipynb

### 1.3. Run vivado hls directly from the cpp generated by hls4ml

```bash 

cd /home/manuelbv/ETHW/tutorials/custom/model_dummy_forward/hls4ml_prj

vivado_hls build_prj.tcl

```

This should generate a folder named "myproject_prj"

```bash

vivado_hls -p "myproject_prj"

```

GIUSEPPE'S COMMENT:

> We might want to start using floating point instead of fixed or fixed with a lot of bits <128,64>, to make sure that we actually "force" the RTL synthesis to generate MACs, cause for very simple models with very simple arithmetic it might happen that vivado just creates other logic instead of MACs.

Take a look at the documentation/paper of QKeras to see how they implemented backprop in there. Did they use floating point for backprop or fixed point?

Okay, so let's implement giuseppe's suggestion for now. Let's open `myproject.cpp` and let's go to the declaration of `myproject.h` and inside to the declaration of `defines.h`. Then let's change the typedef of all variables to something ridiculous like `<128,64>`, like so:

```cpp

// [@manuelbv]: Changed this to a very large precision for the manual testing/computation of loss

typedef ap_fixed<128,64> model_default_t;

typedef ap_fixed<128,64> input_t;

typedef ap_fixed<128,64> layer2_t;

typedef ap_fixed<128,64> weight2_t;

typedef ap_uint<1> bias2_t;

typedef ap_fixed<128,64,AP_RND,AP_SAT> result_t;

```

Save the file and now run the simulation. To run the simulation, simply click the button highlighted in the following screenshot. 

![a](docs/imgs/custom_tutorial0_running_csim_tb.png)

The simulation should run, but it should tell you that it wasn't able to find the tb_input_data. That's fine, let's now create two files in 

`/home/manuelbv/ETHW/tutorials/custom/model_dummy_forward/hls4ml_prj/tb_data/`

One will be called `tb_input_features.dat` and will contain simply the number `1.0`:

```text

1.0

```

The second will be called `tb_output_predictions.dat` and will contain simple the number `0.5`:

```text

0.5

```

Now re-run simulation using vivado's gui and a message like the following in the log should appear: 

```text

INFO: [SIM 211-4] CSIM will launch GCC as the compiler.

   Compiling ../../../../myproject_test.cpp in debug mode

   Compiling ../../../../firmware/myproject.cpp in debug mode

   Generating csim.exe

Processing input 0

Predictions

0.5 

Quantized predictions

0.5 

INFO: Saved inference results to file: tb_data/csim_results.log

INFO: [SIM 211-1] CSim done with 0 errors.

INFO: [SIM 211-3] *************** CSIM finish ***************

Finished C simulation.

```

Beautiful, this means our simulation is working and it's taking the predictions, as expected. Let's now move-on and start implementing backprop. 

## 2. Implementing backprop in c++

We are going to modify the c++ scripts generated automatically by hls4ml, so it's a good thing if you get acquainted with whatever hls4ml generates (the translation from qkeras/keras to c++). 

### 2.0. Backprop recap

This section is under construction. I'll add any further info about backpropagation and how it works as I need to implement each step of the process. 

These are the steps I hope to divide this section (and the implementation) into:

* Computation of the loss at the final layer

* Computation of the gradient at the final layer

* Propagation of the gradient for previous layers

* Update of the weights & biases

### 2.1. Computation of losses at the final layer

To see how to integrate these losses to the cpp code go to `2.1.x. Integration`

#### 2.1.1. MSE/MAE

Let's create the cpp code that computes mse and mae computation

```cpp

```

### 2.1.x. Integration 

Let's now integrate the computation of the losses with the cpp code we got from hls4ml

```cpp

void myproject(

    input_t fc1_input[N_INPUT_1_1],

    result_t layer3_out[N_LAYER_2],

    unsigned short &const_size_in_1,

    unsigned short &const_size_out_1

) {

    ...

}

```

### 3. Integration to hls4ml

Here I present the changes I needed to apply to HLS4ML to adapt it to generate models with training capabilities. 

#### 3.1. Setting up the env

The first thing I tried was cloning the hls4ml main branch and modifying that, however this caused an error down the line, cause apparently we need to use `hls4ml[profiling]` instead of hls4ml. I couldn't find the git repo for `hls4ml[profiling]`, so what I did is the following: 

- First, install all the dependencies by using the environment.yml in the hls4ml-tutorial dir/repo. 

- Then, copy the hls4ml folder from `~//anaconda3/envs/hls4ml-tutorial/lib/python3.8/site-packages/hls4ml` to `~/ETHW/`. 

- After that, simply uninstall `hls4ml` using the `--force` flag (this will uninstall only hls4ml, without the dependencies): `conda uninstall --force hls4ml[profiling]`

- Now, in your code, add the following lines to import the local hls4ml code: 

```python

    import sys

    sys.path.append('/home/manuelbv/ETHW')

    import hls4ml

```

Note: After doing this, you might get dependency errors like numpy complaints and stuff. If so, just reinstall numpy with something like `conda install numpy` or so, and it should work...

Now let's go into the repo and create a new branch for this project

```shell

cd ~/ETHW/hls4ml && git checkout -b ethw

```

#### 3.2. hls4ml reverse engineering

As shown previously, the way we organized our project is by creating custom extra c++ headers implementing the backprop layer-bylayer, which are then used as templates "pulled" by hls4ml, and instantiated in the final neural network c++ code. This means we basically need to add those templates somewhere in the hls4ml source dir structure, and ask it to pull them when generating the neural network. 

That is, of course, if the user actually wants the final neural network to have training capabilities. We want to retain the original behavior of hls4ml, so the first thing we need to do is to allow the user to decide whether the final network should be trainable or not. This yells "flag". 

If we open the jupyter notebook in `tutorials/custom/part1_custom_dummy_forward_network.ipynb` and look at the instruction that starts the building process (translation from keras to c++), we can find something like the following instruction at some point:

```python

ls_model = hls4ml.converters.convert_from_keras_model(model,

                                                       hls_config=config,

                                                       output_dir=f'{model_name}/hls4ml_prj',

                                                       part='xcu250-figd2104-2L-e')

```

Alright, here comes the first decision we need to make. It's clear that we need to integrate the "trainable" flag in this instruction, but how should we pass it? One option would be to pass it as a global flag directly to "convert_from_keras_model", something like:

```python

ls_model = hls4ml.converters.convert_from_keras_model(model,

                                                       trainable = True,

                                                       hls_config=config,

                                                       output_dir=f'{model_name}/hls4ml_prj',

                                                       part='xcu250-figd2104-2L-e')

```

However, this wouldn't give the user much control over the trainability of the network. This would be a global setting, that would result in ALL the layers in the structure to be trainable. We know that adding the backprop structure results in a big area overhead, so what if the user only wants to train a specific part of the network and freeze the rest of it? Wouldn't it be better if the user could specify the trainability of individual layers? 

If we inspect the instruction above we see we are also passing a variable `config`. Pretty self-explanatory variable name, but let's take a look at it and see what contains. We see this config is generated by a previous instruction:

```python

config = hls4ml.utils.config_from_keras_model(model, granularity='name')

```

Now if we print this we get something like:

```

-----------------------------------

Model

  Precision:         ap_fixed<16,6>

  ReuseFactor:       1

  Strategy:          Latency

LayerName

  fc1_input

    Precision

      result:        ap_fixed<16,6>

  fc1

    Precision

      weight:        ap_fixed<6,1>

      bias:          ap_fixed<6,1>

    ReuseFactor:     1

-----------------------------------

```

This is cool. The config contains info about specific properties of each layer. Like for instance, you see the fc1 layer's precision for weights and biases, as well as a parameter called ReuseFactor in there. We could add our trainability there. An extra configuration for the trainability of our network, per layer. 

What's more, if we investigate a bit further the tutorial code, we see a line similar to:

```python

#config['LayerName']['softmax']['exp_table_t'] = 'ap_fixed<18,8>'

```

This is telling us we have the ability to access and set specific configuration parameters of our model. In the previous case we are setting the precision of a layer called "LayerName" to be `ap_fixed<18,8>`. We could use this to set the trainability of specific layers of our model.

Furthermore, we could add a global flag that we pass to `hls4ml.utils.config_from_keras_model(...)` which enables global trainability, something like `hls4ml.utils.config_from_keras_model(..., trainable = True, ...)`, which would make all the layers trainable, or in case the user wants to control the trainability of specific layers, then simply set that flag to `False` and modify the `config['LayerName']['...']['trainable'] = 'True'` layer per layer.

So let's do that.

#### 3.2.1. Hacking the hls4ml config structure to add global/local trainability of layers

Let's find the definition of function `hls4ml.utils.config_from_keras_model`. Open `~/ETHW/hls4ml/hls4ml/utils/config.py` and take a look at it. Search for `config_from_keras_model`: that's the definition we were looking for.

If we go a bit down in that code we'll find the following definition for the `make_layer_config` function. This method sets specific configs per layer. 

```python

    ...

    def make_layer_config(layer):

        layer_config = {}

        if layer['class_name'] in dense_layers + conv_layers:

            layer_config['Precision'] = {}

            layer_config['Precision']['weight'] = default_precision

            layer_config['Precision']['bias'] = default_precision

            layer_config['Precision']['result'] = default_precision

            layer_config['ReuseFactor'] = default_reuse_factor

        elif layer['class_name'] in activation_layers:

            layer_config['Precision'] = default_precision

            layer_config['ReuseFactor'] = default_reuse_factor

            layer_config['table_size'] = 1024

    ...

```

This is where we can integrate our global flag `trainable`. So the first thing we want to do is to go to the definition of this function `config_from_keras_model` and change it to:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.1` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

# [@manuelbv]: CHANGELOG_a.1 I added the flag "trainable = False" to allow the user to implement trainable layers

def config_from_keras_model(

    model, granularity='model', backend=None, default_precision='fixed<16,6>', default_reuse_factor=1,

    trainable=False

):

...

```

Now let's implement the point where we use this variable to specify whether layers are trainable or not. So let's go back to the `make_layer_config` definition, add the trainability def for dense layers, conv layers and activation layers (why activation layers? Because even if there's no trainable weight, we need to create the structure that propagates the gradients thru them):

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.2` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

        if layer['class_name'] in dense_layers + conv_layers:

            layer_config['Precision'] = {}

            layer_config['Precision']['weight'] = default_precision

            layer_config['Precision']['bias'] = default_precision

            layer_config['Precision']['result'] = default_precision

            layer_config['ReuseFactor'] = default_reuse_factor

            # [@manuelbv]: CHANGELOG_a.2 Setting the trainability of specific layers

            layer_config['Trainable'] = trainable

```

Now let's do the same for activation layers:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.3` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

        elif layer['class_name'] in activation_layers:

            layer_config['Precision'] = default_precision

            layer_config['ReuseFactor'] = default_reuse_factor

            layer_config['table_size'] = 1024

            is_softmax = layer['class_name'] == 'Softmax'

            if 'config' in layer.keys():

                if 'activation' in layer['config'].keys():

                    is_softmax = is_softmax or (layer['config']['activation'] == 'softmax')

            if is_softmax:

               layer_config['exp_table_t'] = 'ap_fixed<18,8,AP_RND,AP_SAT>'

               layer_config['inv_table_t'] = 'ap_fixed<18,8,AP_RND,AP_SAT>'

            else:

                layer_config['table_t'] = 'ap_fixed<18,8>'

            

            # [@manuelbv]: CHANGELOG_a.3 Setting the trainability of specific layers

            layer_config['Trainable'] = trainable

```

And for the qkeras_layers:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.4` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

        elif layer['class_name'] in qkeras_layers:

            if 'precision' in layer:

                layer_config['Precision'] = {}

                for name, precision in layer['precision'].items():

                    layer_config['Precision'][name] = precision

            else:

                print('WARNING: Found no precision information in QKeras layer {} ({})'.format(layer['name'], layer['class_name']))

                layer_config['Precision'] = default_precision

            layer_config['ReuseFactor'] = default_reuse_factor

            # [@manuelbv]: CHANGELOG_a.4 Setting the trainability of specific layers

            layer_config['Trainable'] = trainable

```

Finally, let's also add a global flag of trainability. I'm antecipating that this will help us further down the line, to manage whether we need to initialize the global structure for trainable networks or not. So go to the part of the code where the config is initialized, something like `config = {}`, and change it to the following (note that we are adding the last two lines, basically, the `model_config['Trainable'] = trainable` is the important part there, the rest is added for reference):

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.5` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    model_config = {}

    model_config['Precision'] = default_precision

    model_config['ReuseFactor'] = default_reuse_factor

    model_config['Strategy'] = 'Latency'

    #model_config['Compression'] = False

    #model_config['Trace'] = False

    # [@manuelbv]: CHANGELOG_a.5 Setting the trainability of global model

    model_config['Trainable'] = trainable

```

Let's test this out. Restart the jupyter notebook kernel where you were running `part2_custom_dummy_2weights_forward_network.ipynb` and re-run everything til you reach the config part. 

Now if we print the config, we should see the trainable flags in there. Beautiful!

```

-----------------------------------

Model

  Precision:         ap_fixed<16,6>

  ReuseFactor:       1

  Strategy:          Latency

  Trainable:         True

LayerName

  fc1_input

    Precision

      result:        ap_fixed<16,6>

  fc1

    Precision

      weight:        ap_fixed<6,1>

      bias:          ap_fixed<6,1>

    ReuseFactor:     1

    Trainable:       True

-----------------------------------

```

Note that we could specify the trainability of a specific layer now by running something like:

```python

config['LayerName']['fc1']['Trainable'] = False

```

#### 3.2.2. Hacking the converter method

The function that effectively converts our keras/qkeras model into actual c++ code is the `hls4ml.converters.convert_from_keras_model` method. So let's open the file `~/ETHW/hls4ml/converters/__init__.py`. We can see the definition for `convert_from_keras_model` is there. 

In our tutorial (jupyter notebook) we see that we invoke this instruction and pass the config dictionary we generated in the previous step by using the `hls_config` flag, like so:

```python

hls_model = hls4ml.converters.convert_from_keras_model(model,

                                                       hls_config=config,

                                                       output_dir=f'{model_name}/hls4ml_prj',

                                                       part='xcu250-figd2104-2L-e')

```

Now, looking at the definition of this method `convert_from_keras_model` we see that apart from checking some stuff, the bulk of the conversion is actually run at the last line, in the return statement itself, which invokes `keras_to_hls(config)`. This is what we want to check, so let's fetch the definition for that function. Open file `~/ETHW/hls4ml/converters/keras_to_hls.py` and search for `keras_to_hls`. 

The part that actually writes out the files (cpp files) is `hls_model.compile()`. We see that our hls_model object is generated when we invoke `convert_from_keras_model`. Thus, let's actually open `~/ETHW/hls4ml/model/hls_model.py` and look at the compile method for HLSModel. In here you will see a `self.write()` method. Follow that path. Inside the `write` method you will see we are calling a `self.config.writer.write_hls`. We can see that the config object (HLSConfig) is defined in the `__init__` method when we create the HLSModel object (`self.config = HLSConfig(config)`), so let's go to the definition of the `HLSConfig` class. 

In the `HLSConfig` `__init__` method we see we initialize the writer attribute by calling a function `get_writer`. This function is imported from `hls4ml.writer.get_writer` so let's open file `~/ETHW/hls4ml/writer/__init__.py` and take a look at the definitions of the writers. Here we see that we are registering the class `VivadoWriter` as `Vivado`. Cool. In here we are also importing the `get_writer` method from `~/ETHW/hls4ml/writer/writers.py`. Let's open this file. We can see that get_writer points to `VivadoWriter` which is imported from `~/ETHW/hls4ml/writer/vivado_writer.py`. And finally, here, in this file, is where we have everything we will need to tweak in the `VivadoWriter` to make sure we pull the right templates we modified to add the backprop functionalities. Keep this file open, cause we will need it in a second.

Now, let's go back to the `compile` method inside `hls_model.py`. We can see that inside this method most of the work is executed by invoking the `write` method, which by its turn invokes the `writer` (remember, the `VivadoWriter` object we just saw) and its `write_hls` method. So let's go back to the `VivadoWriter` class definition and search for this `write_hls` method. You should see something like this:

```python

    ...

    def write_hls(self, model):

        print('Writing HLS project')

        self.write_project_dir(model)

        self.write_project_cpp(model)

        self.write_project_header(model)

        self.write_weights(model)

        self.write_defines(model)

        self.write_parameters(model)

        self.write_test_bench(model)

        self.write_bridge(model)

        self.write_build_script(model)

        self.write_nnet_utils(model)

        self.write_yml(model)

        self.write_tar(model)

        print('Done')

    ...

```

Let's see what these steps are doing. One by one, we might skip those that are irrelevant for us. 

**write_project_dir**:

This method simply creates the folder structure `{}/firmware/weights` in the output directory. We don't need to modify this.

**write_project_cpp**:

This method, on the other hand, is prob gonna take us quite some time to analyze. So let's take it easy and analyze it step by step so we make sure we aren't skipping anything important.

The method starts with something very simple: 

- We open a template file for reading (f)

- We open the output cpp code we will generate for writing (fout).

```python

    def write_project_cpp(self, model):

        ###################

        ## myproject.cpp

        ###################

        filedir = os.path.dirname(os.path.abspath(__file__))

        f = open(os.path.join(filedir,'../templates/vivado/firmware/myproject.cpp'),'r')

        fout = open('{}/firmware/{}.cpp'.format(model.config.get_output_dir(), model.config.get_project_name()),'w')

    ...

```

Let's take a look at this template we are picking and see what's defined in there. Open `~/ETHW/hls4ml/templates/vivado/firmware/myproject.cpp`. First of all, this file looks pretty empty for a template, but something interesting can be seen in it. We can see some interesting comments like:

```cpp

//hls-fpga-machine-learning insert header

```

I'm antecipating that these will be fetch by the writer, and then stuff will be inserted in-place. This is a good method to populate template files. Keep this in mind, cause in the future we might have to add our own comments to add parts of our backpropagating structure.

Let's go back to the `write_project_cpp` method and keep going. After the previous block, we basically fetch the model inputs, outputs and brams.

```python

    ... 

        model_inputs = model.get_input_variables()

        model_outputs = model.get_output_variables()

        model_brams   = model.get_bram_variables()

        indent = '    '

    ...

```

Now, after this, we basically start looping thru each one of the lines in our template file and parsing them. As we expected, depending on what that line contains (or, more specifically, if that line contains a specific comment), we might write some specific definition for our neural network or just copy the original one. 

First check: if we find `myproject` in the line, we replace that with the project name (which, to be fair, will be `myproject` most of the time, unless the user changed it). We don't need to change anything here.

```python

    ...

        for line in f.readlines():

            #Add headers to weights and biases

            if 'myproject' in line:

                newline = line.replace('myproject', model.config.get_project_name())

    ...     

```

Second check: if we find the comment `//hls-fpga-machine-learning insert header` we will insert the header. 

```python

    ...

            elif '//hls-fpga-machine-learning insert header' in line:

                inputs_str = ', '.join([self.variable_definition_cpp(model, i, as_reference=True) for i in model_inputs])

                outputs_str = ', '.join([self.variable_definition_cpp(model, o, as_reference=True) for o in model_outputs])

                brams_str  = ', \n'.join([indent + self.variable_definition_cpp(model, b, as_reference=False) for b in model_brams])

                insize_str = ', '.join(['unsigned short &const_size_in_{}'.format(i) for i in range(1, len(model_inputs) + 1)])

                outsize_str = ', '.join(['unsigned short &const_size_out_{}'.format(i) for i in range(1, len(model_outputs) + 1)])

                newline = ''

                newline += indent + inputs_str + ',\n'

                newline += indent + outputs_str + ',\n'

                if len(model_brams) > 0: 

                    newline += brams_str + ',\n'

                newline += indent + insize_str + ',\n'

                newline += indent + outsize_str + '\n'

    ...

```

What does the header consists on? Let's take a closer look at the changes we are applying. Here we can see we are basically defining the ports of our cpp module. For instance, for each model_input we call the `self.variable_definition_cpp` method. Let's find the definition of this method and see what it does:

```python

    ...

    def variable_definition_cpp(self, model, var, name_suffix='', as_reference=False):

        var_class = var.__class__.__name__

        if var_class == 'ArrayVariable':

            return '{type} {name}{suffix}[{shape}]'.format(type=var.type.name, name=var.cppname, suffix=name_suffix, shape=var.size_cpp())

        elif var_class == 'StreamVariable':

            if as_reference: # Function parameter

                return 'hls::stream<{type}> &{name}{suffix}'.format(type=var.type.name, name=var.cppname, suffix=name_suffix)

            else: # Declaration

                return 'hls::stream<{type}> {name}{suffix}("{name}")'.format(type=var.type.name, name=var.cppname, suffix=name_suffix)

        elif var_class == 'WeightVariable':

            return '{type} {name}{suffix}[{size}]'.format(type=var.type.name, name=var.cppname, suffix=name_suffix, size=var.data_length)

        elif var_class == 'InplaceVariable':

            return None

        else:

            raise Exception('Unknown variable class "{}"'.format(var_class))

    ...

```

For instance, for the simple linear NN we created with 1 input and 1 output, and 2 weights, the fc1_input layer of the network will be represented as an "ArrayVariable" in the model_inputs var. Its type is `input_t`, which was defined previously. Cppname will be `fc1_input`.  Name suffix should be empty. Size_cpp() should return `N_INPUT_1_1`. We might have to go back to the definition of these types, as well as the sizes, but for now, let's skip this. In any case, we see that the definition for this variable should be `input_t fc1_input[N_INPUT_1_1]`, which is exactly what we get if we execute `self.variable_definition_cpp(model, model_inputs[0], as_reference=True)` in our python debugger.

The same thing applies to the outputs and the brams. 

Okay, so the first thing we need to do is to modify the template for myproject.cpp that we opened before. Open it again and go to the part that has the following definition.

```cpp

    ...

#include "myproject.h"

#include "parameters.h"

void myproject(

    ...

```

In here, we basically want to add the following comment: `//hls-fpga-machine-learning insert autograd-def`

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.6` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```cpp

    ...

#include "myproject.h"

#include "parameters.h"

//hls-fpga-machine-learning insert autograd-def

void myproject(

    ...

```

This is because, in case our network needs to be trainable, we will need to change this comment by the following block, which as you can see, simply includes the definition of the losses. 

```cpp

// -------------------------- AUTOGRAD --------------------------

// [@manuelbv]: Manually including parameters for autograd

#include "losses/losses_parameters.h"

// --------------------------------------------------------------

```

Great. Now that we have that, let's go back to the `vivado_writer.py` file and add this file to be parsed during the write_project_cpp method. We will add this right below the `if 'myproject' in line:` check. 

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.7` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

            elif '//hls-fpga-machine-learning insert autograd-def' in line:

                # [@manuelbv]: CHANGELOG_a.6 If this is a trainable network, include autograd losses definition

                if model.config.config['HLSConfig']['Model']['Trainable']:

                    newline =  "// -------------------------- AUTOGRAD --------------------------\n"

                    newline += "// [@manuelbv]: Manually including parameters for autograd\n"

                    newline += '#include "losses/losses_parameters.h"\n'

                    newline += "// --------------------------------------------------------------\n"

    ...

```

The next change we have to make to hls4ml is, in case this is a trainable network, we must declare two new ports: the loss, and the ground truth value. The loss will hold the loss value computed after the forward pass for whatever data we inputed to the network thru the inputs. It's of the same type as the result (output of the network), except that the size will always be 1, cause it's a simple number. We might want to change this in the future, cause using the same type for the loss, as the output type might not be ideal. Maybe we want to use something like a predefined fixed point number. Idk. But for now, this just makes everything easier. The ground truth is very easy to implement too, because it's basically a clone (type-wise) of the output(s) of the network.

Thus, let's go back to the `elif '//hls-fpga-machine-learning insert header' in line:` check in the `vivado_writer.py` file and let's modify it to the following code:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.8` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

            elif '//hls-fpga-machine-learning insert header' in line:

                inputs_str = ', '.join([self.variable_definition_cpp(model, i, as_reference=True) for i in model_inputs])

                outputs_str = ', '.join([self.variable_definition_cpp(model, o, as_reference=True) for o in model_outputs])

                brams_str  = ', \n'.join([indent + self.variable_definition_cpp(model, b, as_reference=False) for b in model_brams])

                #[@manuelbv]: CHANGELOG_a.8 If this is a trainable network, then we need to add the loss definitions as pointers

                _autograd_loss_definition = lambda var: '{type} &loss_{name}'.format(type=var.type.name, name=var.cppname)

                loss_str = ', '.join([_autograd_loss_definition(o) for o in model_outputs])

                ground_truth_str = ', '.join([self.variable_definition_cpp(model, o, as_reference=True, name_suffix="_ground_truth") for o in model_outputs])

                insize_str = ', '.join(['unsigned short &const_size_in_{}'.format(i) for i in range(1, len(model_inputs) + 1)])

                outsize_str = ', '.join(['unsigned short &const_size_out_{}'.format(i) for i in range(1, len(model_outputs) + 1)])

                newline = ''

                newline += indent + inputs_str + ',\n'

                newline += indent + outputs_str + ',\n'

                if len(model_brams) > 0: 

                    newline += brams_str + ',\n'

                

                # [@manuelbv]: If model is trainable, add loss + ground truth IOs

                if model.config.config['HLSConfig']['Model']['Trainable']:

                    newline += indent + loss_str + ',\n'

                    newline += indent + insize_str + ',\n'

                    newline += indent + outsize_str + ',\n'

                    newline += indent + ground_truth_str + '\n'

                else:

                    newline += indent + insize_str + ',\n'

                    newline += indent + outsize_str + '\n'

    ...

```

Let's go back to the template for a second and add a second comment that will allow us to include some definitions for the backprop layer wrappers. We will change the following section:

```cpp

    ...

#endif

    // ****************************************

    // NETWORK INSTANTIATION

    // ****************************************

    //hls-fpga-machine-learning insert layers

}

    ...

```

and we will add the following comment: `//hls-fpga-machine-learning autograd-layer-wrappers`

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.9` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```cpp

    ...

#endif

    // ****************************************

    // NETWORK INSTANTIATION

    // ****************************************

    //hls-fpga-machine-learning insert layers

    //hls-fpga-machine-learning autograd-layer-wrappers

}

    ...

```

And now, I just realized that we forgot to add something to the configuration inside the `hls4ml/utils/config.py` block. Let's go back there and add something else. We forgot to add the loss definition in the config. Open `~/ETHW/hls4ml/utils/config.py` and go to where we defined `model_config['Trainable'] = trainable` (CHANGELOG_a.5). Let's add the following after that:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.10` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    # [@manuelbv]: CHANGELOG_a.10 Add definition of the losses for future use when instantiating the grads and losses

    model_config['Losses'] = list(model.loss)

```

Now that we have that comment in the template, let's again modify the writer code so we add a new condition to parse this comment and add whatever we need to add. So fly back to `vivado_writer.py` file and let's modify it to the following code after the `//hls-fpga-machine-learning insert layers` checking:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.11` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

            elif '//hls-fpga-machine-learning autograd-layer-wrappers' in line:

                # [@manuelbv]: CHANGELOG_a.11 If this is a trainable network, include the definition of the autograd layer wrappers

                if model.config.config['HLSConfig']['Model']['Trainable']:

                    

                    # [@manuelbv]: Add a comment for later change traceability 

                    newline =  '    ' + "// -------------------------------- AUTOGRAD ---------------------------------\n"

                    newline += '    ' + '// [@manuelbv]: Instantiation of grads and computation of loss for each output\n'

                    newline += '    ' + "// ---------------------------------------------------------------------------\n"

                    # [@manuelbv]: Get outputs

                    outputs = model.get_output_variables()

                    # [@manuelbv]: Get losses

                    losses = model.config.config['HLSConfig']['Model']['Losses']

                    # [@manuelbv]: Loop thru outputs

                    for no,(o,lo) in enumerate(zip(outputs,losses)):

                        grad_str = self.variable_definition_cpp(model, o, as_reference=True, name_suffix="_grads")

                        # [@manuelbv]: Add definition of grad var

                        newline += '    ' + f'// [@manuelbv]: Definition of the gradient for output {o.cppname}\n'

                        newline += '    ' + f'{grad_str};\n'

                        # [@manuelbv]: Now check if ground truth is a valid pointer (valid data), if not, loss will be zero and grad will not be applied

                        ground_truth_str = f'{o.cppname}_ground_truth'

                        newline += '    ' + f'if ({ground_truth_str} != nullptr) ' + "{\n"

                        # [@manuelbv]: Add a placeholder for printing out information in case we want it

                        newline += ''.join(['    ']*2) + f'// [@manuelbv]: Uncomment this for debugging\n'

                        newline += ''.join(['    ']*2) + f'//std::cout <<  "Ground truth passed to nnet thru output {o.cppname} seems valid, computing loss" << std::endl ;\n'

                        # [@manuelbv]: Instantiation of the actual loss computation

                        if lo == "mse":

                            newline += ''.join(['    ']*2) + f'losses::mse<{o.type.name}, mse_config>({o.cppname}, {ground_truth_str}, loss_{o.cppname}, {o.cppname}_grads);\n'

                        else:

                            raise ValueError(f"Loss {lo} not implemented yet.")

                        newline += '    ' + '} else {\n'

                        newline += ''.join(['    ']*2) + f'// [@manuelbv]: Uncomment this for debugging\n'

                        newline += ''.join(['    ']*2) + f'//std::cout <<  "Ground truth passed to nnet thru output {o.cppname} is invalid (nullptr). Loss=0. Not performing backprop." << std::endl ;\n'

                        if lo == "mse":

                            newline += ''.join(['    ']*2) + f'losses::mse<{o.type.name}, mse_config>({o.cppname}, {o.cppname}, loss_{o.cppname}, {o.cppname}_grads);\n'

                        else:

                            raise ValueError(f"Loss {lo} not implemented yet.")

                        newline += '    ' + '}\n'

                        # [@manuelbv] At the end we want something like this:

                        """

                        //[@manuelbv]: Instantiate the loss and grads

                        result_t grads_layer3_out[N_LAYER_2];

                        if(layer3_ground_truth != nullptr) {

                            //std::cout <<  "Ground truth passed to nnet seems valid, computing loss" << std::endl ;

                            losses::mse(layer3_out,layer3_ground_truth,loss,grads_layer3_out);

                        } else {

                            //std::cout <<  "Ground truth passed to nnet is nullptr, invalid loss" << std::endl ;

                            losses::mse(layer3_out,layer3_out,loss,grads_layer3_out);

                        }

                        """

                        # [@manuelbv]: Now we need to add the actual backpropagation

                        inputs = model.get_input_variables()

                        outputs = model.get_output_variables()

                        grads = []

                        newline += "\n" + '    ' + "// [@manuelbv]: Backpass\n"

                        #print("----")

                        #newline = ""

                        for layer in reversed(model.get_layers()):

                            layer_type = layer.attributes['class_name']

                            layer_name = layer.attributes['name']

                            vars = layer.get_variables()

                            layer_config_cpp = layer.config_cpp()

                            layer_config = dict()

                            if layer_config_cpp:

                                layer_config_cpp = layer_config_cpp.split("struct ")[1].split(" :")[0]

                                print(layer_config_cpp)

                            if layer_type.lower() == 'activation':

                                layer_config['activation'] = layer.attributes['activation']

                            elif layer_type.lower() == 'dense' or layer_type.lower() == 'qdense':

                                layer_config['WnB'] = [layer.weights['weight'].cppname, layer.weights['bias'].cppname]

                            for var in vars:

                                grads.append([layer_name, layer_type, var.cppname, var.type.name, var.size_cpp(), layer_config_cpp, layer_config])

                                # if var not in inputs and var not in outputs:

                                #     print(var.type.name)

                                # else: 

                                #     print(f"Input/Output: {var.type.name}")

                            if len(grads) > 1:

                                # Make declaration with previous grads

                                prev_grad = grads[-2]

                                prev_grad_layer_type = prev_grad[1]

                                prev_grad_outvar_name = prev_grad[2]

                                prev_grad_outvar_type = prev_grad[3]

                                prev_grad_config = prev_grad[5]

                                prev_grad_layer_config = prev_grad[6]

                                this_grad_name = f"{var.cppname}_grads"

                                this_grad_size = var.size_cpp()

                                this_grad_type = var.type.name

                                this_grad_def = f"{this_grad_type} {this_grad_name}[{this_grad_size}];"

                                #print(this_grad_def)

                                newline += '    ' + this_grad_def + '\n'

                                if prev_grad_layer_type.lower() == 'activation':

                                    if prev_grad_layer_config['activation'] == 'linear':

                                        backpass_def = 'linear_backpass'

                                    else:

                                        raise ValueError(f"Activation {prev_grad_layer_config['activation']} not implemented yet")

                                    backpass = f"nnet::{backpass_def}<{prev_grad_outvar_type}, {this_grad_type}, {prev_grad_config}>({prev_grad_outvar_name}_grads, {this_grad_name}); // {prev_grad[0]}"

                                elif prev_grad_layer_type.lower() == 'dense' or prev_grad_layer_type.lower() == 'qdense':

                                    print(prev_grad_layer_config)

                                    WnB = prev_grad_layer_config['WnB']

                                    backpass = f"nnet::dense_backpass<{prev_grad_outvar_type}, {this_grad_type}, {prev_grad_config}>({prev_grad_outvar_name}_grads, {this_grad_name}, {var.cppname}, {', '.join(WnB)}); // {prev_grad[0]}"

                                else:

                                    raise ValueError(f"Unknown type of layer {prev_grad_layer_type.lower()}")

                                newline += '    ' + backpass + '\n\n'

                            # print(prev_grad, layer_type,layer_name)

                            #print("----")

                        newline += "\n\n"

                        #print(newline)

                        """

                        // [@manuelbv]: Backpass

                        // see: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

                        // linear

                        result_t grads_layer2_out[N_LAYER_2];

                        nnet::linear_backpass(grads_layer3_out, grads_layer2_out); // fc1_linear

                        // fc1 

                        result_t grads_fc1_out[N_LAYER_2];

                        nnet::dense_backpass(grads_layer2_out, grads_fc1_out, fc1_input, w2, b2); // fc1

                        """

    ...

```

Now, if we try to run the previous code we will face an error telling us that the file "losses/losses_parameters.h" was not found, which is true. That's a custom file we created, but hls4ml is not aware of it. So we need to create a couple of soft links (or simply copy, whatever you prefer, I prefer links) in the vivado templates folder inside hls4ml, so these dependencies can be used while compiling our model. 

```shell

ln -sf /mnt/raid0/asic/projects/NU/ETHW/manuelbv/include/losses ~/ETHW/hls4ml/templates/vivado/losses

ln -sf /mnt/raid0/asic/projects/NU/ETHW/manuelbv/include/autograd ~/ETHW/hls4ml/templates/vivado/autograd

```

Alright, let's keep it moving. We need to do something else before we try to compile this, cause rght now the dependencies for autograd are not automatically copied. First, let's make sure we include them in the `build_lib.sh` script when creating it. To do so, let's open `~/ETHW/hls4ml/writer/vivado_writer.py` and go to `write_build_script` method defintion. Find the part that creates the build_lib.sh file and add the two lines below (after my comment) which adds the dependencies in INCFLAGS (we are also adding autograd and losses, as you can see, so the compiler is aware it needs to use that). 

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.12` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

        ###################

        # build_lib.sh

        ###################

        f = open(os.path.join(filedir,'../templates/vivado/build_lib.sh'),'r')

        fout = open('{}/build_lib.sh'.format(model.config.get_output_dir()),'w')

        for line in f.readlines():

            line = line.replace('myproject', model.config.get_project_name())

            line = line.replace('mystamp', model.config.get_config_value('Stamp'))

            # [@manuelbv]: CHANGELOG_a.12 -> Make sure we add all dependencies to the output build script, if trainable

            if model.config.config['HLSConfig']['Model']['Trainable']:

                line = line.replace('INCFLAGS="-Ifirmware/ap_types/"', 'INCFLAGS="-Ifirmware/ap_types/ -Ifirmware/autograd/ -Ifirmware/losses/"')

            fout.write(line)

        f.close()

        fout.close()

    ...

```

Let's now create a new method to copy over the autograd dependencies to the output firmware, in case we need it. In `~/ETHW/hls4ml/writer/vivado_writer.py`, create this method after `write_nnet_utils`:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.13` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    # [@manuelbv]: CHANGELOG_a.13 - > Adding the autograd dependencies to the output folder

    def write_autograd_utils(self, model):

        ###################

        ## autograd utilities

        ###################

        filedir = os.path.dirname(os.path.abspath(__file__))

        

        if model.config.config['HLSConfig']['Model']['Trainable']:

            for ffdep in ["autograd","losses"]:

                srcpath = os.path.join(filedir,f'../templates/vivado/{ffdep}/')

                dstpath = f'{model.config.get_output_dir()}/firmware/{ffdep}/'

                if os.path.exists(dstpath):

                    rmtree(dstpath)

                copytree(srcpath, dstpath)

```

One more change in the `vivado_writer.py` file. We need to invoke this method when we call `write_hls`. The final blok should look like this:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.14` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    def write_hls(self, model):

        print('Writing HLS project')

        self.write_project_dir(model)

        self.write_project_cpp(model)

        self.write_project_header(model)

        self.write_weights(model)

        self.write_defines(model)

        self.write_parameters(model)

        self.write_test_bench(model)

        self.write_bridge(model)

        self.write_build_script(model)

        self.write_nnet_utils(model)

        #[@manuelbv]: CHANGELOG_a.14: Added newly defined method to copy autograd/losses definitions to output dir

        self.write_autograd_utils(model)

        self.write_yml(model)

        self.write_tar(model)

        print('Done')

```

Let's open the `~/ETHW/hls4ml/templates/vivado/firmware/myproject.h` template and add the `hls-fpga-machine-learning insert autograd-headers` line under `#include "defines.h"` so we can add the header files for backprop.

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.15` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```cpp

    ...

    #include "defines.h

    //hls-fpga-machine-learning insert autograd-headers

    ...

```

Inside the `vivado_writer`, function `write_project_header`, let's add another check and try to search for  `//hls-fpga-machine-learning insert autograd-headers`. If we find it, then we parse the different types of losses and layers in our design, and add the headers for the required backprop wrappers.

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.16` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

            elif '//hls-fpga-machine-learning insert autograd-headers' in line:

                #[@manuelbv]: CHANGELOG_a.16 If we find the "//hls-fpga-machine-learning insert autograd-headers" comment, and this is a trainable network

                #               then include the autograd headers.

                newline = ''

                if model.config.config['HLSConfig']['Model']['Trainable']:

                    newline =  '// -------------------------- AUTOGRAD --------------------------\n'

                    newline += '// [@manuelbv]: Manually including definitions for autograd\n'

                    newline += '#include "autograd/autograd_defines.h"\n\n'

                    # [@manuelbv]: Parse the different types of losses used

                    losses = model.config.config['HLSConfig']['Model']['Losses']

                    losses = list(np.unique(losses))

                    if len(losses) > 0:

                        newline += '// [@manuelbv]: Manually importing losses\n'

                        for l in losses:

                            if l == 'mse':

                                newline += '#include "losses/mse.h"\n'

                            else:

                                raise ValueError(f"Unknown loss {l}")

                        newline += "\n"

                    

                    # [@manuelbv]: Import backprop implementations for diff layer types

                    newline += "// [@manuelbv]: Import backprop implementations\n"

                    inputs = model.get_input_variables()

                    outputs = model.get_output_variables()

                    # [@manuelbv]: List with all types of layers we are using in the model 

                    unique_layer_types = ['activation']

                    for layer in model.get_layers():

                        vars = layer.get_variables()

                        for var in vars:

                            if var not in inputs and var not in outputs:

                                layer_type = layer.attributes['class_name']

                                if layer_type not in unique_layer_types:

                                    unique_layer_types.append(layer_type)

                    for ut in unique_layer_types:

                        if ut.lower() == "qdense" or ut.lower() == "dense":

                            newline += '#include "autograd/nnet_dense_backprop.h"\n'

                        elif ut.lower() == "activation" or ut.lower() == "relu":

                            newline += '#include "autograd/nnet_activation_backprop.h"\n'

                        else:

                            raise ValueError(f"Unknown type of layer {ut}")

                    

                    # [@manuelbv]: Finally, close 

                    newline += "// --------------------------------------------------------------\n"

    ...

```

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.17` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

            elif '//hls-fpga-machine-learning insert header' in line:

                inputs_str = ', '.join([self.variable_definition_cpp(model, i, as_reference=True) for i in model_inputs])

                outputs_str = ', '.join([self.variable_definition_cpp(model, o, as_reference=True) for o in model_outputs])

                brams_str  = ', \n'.join([indent + self.variable_definition_cpp(model, b, as_reference=False) for b in model_brams])

                

                #[@manuelbv]: CHANGELOG_a.17 If this is a trainable network, then add loss + ground truth IOs in header

                _autograd_loss_definition = lambda var: '{type} &loss_{name}'.format(type=var.type.name, name=var.cppname)

                loss_str = ', '.join([_autograd_loss_definition(o) for o in model_outputs])

                ground_truth_str = ', '.join([self.variable_definition_cpp(model, o, as_reference=True, name_suffix="_ground_truth") for o in model_outputs])

                

                insize_str = ', '.join(['unsigned short &const_size_in_{}'.format(i) for i in range(1, len(model_inputs) + 1)])

                outsize_str = ', '.join(['unsigned short &const_size_out_{}'.format(o) for o in range(1, len(model_outputs) + 1)])

                newline = ''

                newline += indent + inputs_str + ',\n'

                newline += indent + outputs_str + ',\n'

                if len(model_brams) > 0: 

                    newline += brams_str + ',\n'

                

                # [@manuelbv]: If model is trainable, add loss + ground truth IOs

                if model.config.config['HLSConfig']['Model']['Trainable']:

                    newline += indent + loss_str + ',\n'

                    newline += indent + insize_str + ',\n'

                    newline += indent + outsize_str + ',\n'

                    newline += indent + ground_truth_str + ' = nullptr\n'

                else:

                    newline += indent + insize_str + ',\n'

                    newline += indent + outsize_str + '\n'

    ...

```

Let's continue modifying the templates. Two other files are generated and used when compiling the model. These are `myproject_bridge.cpp` and `myproject_test.cpp`. So let's modify them to incorporate the new definitions for the trainable networks. Let's start with the bridge. 

Open the file `~/ETHW/hls4ml/templates/vivado/myproject_bridge.cpp` and take a look at it. Now go back to the `vivado_writer.py` file and go to the function `write_bridge` definition. The changes we need to apply here are pretty similar to the ones we've applied before to create `myproject.cpp`. Take a look at the check statement `elif '//hls-fpga-machine-learning insert header' in line:`. We will modify this to look like the following:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.18` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

            elif '//hls-fpga-machine-learning insert header' in line:

                dtype = line.split('#', 1)[1].strip()

                inputs_str = ', '.join(['{type} {name}[{shape}]'.format(type=dtype, name=i.cppname, shape=i.size_cpp()) for i in model_inputs])

                outputs_str = ', '.join(['{type} {name}[{shape}]'.format(type=dtype, name=o.cppname, shape=o.size_cpp()) for o in model_outputs])

                

                #[@manuelbv]: CHANGELOG_a.18 If this is a trainable network, then add loss + ground truth IOs in header

                _autograd_loss_definition = lambda var: '{type} &loss_{name}'.format(type=dtype, name=var.cppname)

                loss_str = ', '.join([_autograd_loss_definition(o) for o in model_outputs])

                ground_truth_str = ', '.join([f'{dtype} {o.cppname}_ground_truth[{o.size_cpp()}]' for o in model_outputs])

                

                insize_str = ', '.join(['unsigned short &const_size_in_{}'.format(i) for i in range(1, len(model_inputs) + 1)])

                outsize_str = ', '.join(['unsigned short &const_size_out_{}'.format(o) for o in range(1, len(model_outputs) + 1)])

                newline = ''

                newline += indent + inputs_str + ',\n'

                newline += indent + outputs_str + ',\n'

                # [@manuelbv]: If model is trainable, add loss + ground truth IOs

                if model.config.config['HLSConfig']['Model']['Trainable']:

                    newline += indent + loss_str + ',\n'

                    newline += indent + insize_str + ',\n'

                    newline += indent + outsize_str + ',\n'

                    newline += indent + ground_truth_str + '\n'

                else:

                    newline += indent + insize_str + ',\n'

                    newline += indent + outsize_str + '\n'

    ...

```

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.19` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

            elif '//hls-fpga-machine-learning insert wrapper' in line:

                dtype = line.split('#', 1)[1].strip()

                newline = ''

                for i in model_inputs:

                    newline += indent + '{var};\n'.format(var=self.variable_definition_cpp(model, i, name_suffix='_ap'))

                    newline += indent + 'nnet::convert_data<{}, {}, {}>({}, {}_ap);\n'.format(dtype, i.type.name, i.size_cpp(), i.cppname, i.cppname)

                newline += '\n'

                

                for o in model_outputs:

                    newline += indent + '{var};\n'.format(var=self.variable_definition_cpp(model, o, name_suffix='_ap'))

                    if model.config.config['HLSConfig']['Model']['Trainable']:

                        newline += indent + '{type} loss_{name}_{name_suffix};\n'.format(type=o.type.name, name=o.cppname, name_suffix="ap")

                        newline += indent + '{type} {name}_ground_truth_{name_suffix}[{thisshape}];\n'.format(type=o.type.name, name=o.cppname, name_suffix="ap", thisshape=o.size_cpp())

                newline += '\n'

                input_size_vars = ','.join(['const_size_in_{}'.format(i) for i in range(1, len(model.get_input_variables()) + 1)])

                output_size_vars = ','.join(['const_size_out_{}'.format(o) for o in range(1, len(model.get_output_variables()) + 1)])

                input_vars = ','.join([i.cppname + '_ap' for i in model.get_input_variables()])

                bram_vars   =','.join([b.cppname for b in model.get_bram_variables()]) 

                output_vars = ','.join([o.cppname + '_ap' for o in model.get_output_variables()])

                

                # Concatenate the input, output, and bram variables. Filter out empty/null values

                all_vars = ','.join(filter(None, [input_vars, output_vars, bram_vars]))

                #[@manuelbv]: CHANGELOG_a.19 If this is a trainable network, then add loss + ground truth IOs in instantiation of top module

                _autograd_loss_definition = lambda var: 'loss_{name}_{name_suffix}'.format(name=var.cppname, name_suffix="ap")

                loss_str = ', '.join([_autograd_loss_definition(o) for o in model_outputs])

                ground_truth_str = ', '.join([f'{o.cppname}_ground_truth_ap' for o in model_outputs])

                

                if model.config.config['HLSConfig']['Model']['Trainable']:

                    top_level = indent + '{}({},{},{},{},{});\n'.format(model.config.get_project_name(), all_vars, loss_str, input_size_vars, output_size_vars, ground_truth_str)

                else:

                    top_level = indent + '{}({},{},{});\n'.format(model.config.get_project_name(), all_vars, input_size_vars, output_size_vars)

                newline += top_level

                newline += '\n'

                for o in model_outputs:

                    newline += indent + 'nnet::convert_data<{}, {}, {}>({}_ap, {});\n'.format(o.type.name, dtype, o.size_cpp(), o.cppname, o.cppname)

                    if model.config.config['HLSConfig']['Model']['Trainable']:

                        newline += indent + 'nnet::convert_data<{}, {}, {}>({}_ground_truth_ap, {}_ground_truth);\n'.format(o.type.name, dtype, o.size_cpp(), o.cppname, o.cppname)

                        newline += indent + 'nnet::convert_data<{}, {}, 1>(&loss_{}_ap, &loss_{});\n'.format(o.type.name, dtype, o.cppname, o.cppname)

            

            

    ...

```

Let's now modify the testbench that gets generated by hls4ml so we can add the parts that train the network automatically. I'll do this in one go, that is, I will give you here how your myproject_tet.cpp template has to look like, instead of individual changes.

This is how your `~/ETHW/hls4ml/templates/vivado/myproject_test.cpp` needs to look like:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.20` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```cpp

    //

    //    rfnoc-hls-neuralnet: Vivado HLS code for neural-net building blocks

    //

    //    Copyright (C) 2017 EJ Kreinar

    //

    //    This program is free software: you can redistribute it and/or modify

    //    it under the terms of the GNU General Public License as published by

    //    the Free Software Foundation, either version 3 of the License, or

    //    (at your option) any later version.

    //

    //    This program is distributed in the hope that it will be useful,

    //    but WITHOUT ANY WARRANTY; without even the implied warranty of

    //    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

    //    GNU General Public License for more details.

    //

    //    You should have received a copy of the GNU General Public License

    //    along with this program.  If not, see .

    //

    #include 

    #include 

    #include 

    #include 

    #include 

    #include 

    #include 

    #include 

    #include "firmware/myproject.h"

    #include "firmware/nnet_utils/nnet_helpers.h"

    //hls-fpga-machine-learning insert bram

    //hls-fpga-machine-learning insert autograd-helpers-include

    #define CHECKPOINT 5000

    namespace nnet {

        bool trace_enabled = true;

        std::map *trace_outputs = NULL;

        size_t trace_type_size = sizeof(double);

    }

    int main(int argc, char **argv)

    {

    //load input data from text file

    std::ifstream fin("tb_data/tb_input_features.dat");

    //load predictions from text file

    std::ifstream fpr("tb_data/tb_output_predictions.dat");

    #ifdef RTL_SIM

    std::string RESULTS_LOG = "tb_data/rtl_cosim_results.log";

    #else

    std::string RESULTS_LOG = "tb_data/csim_results.log";

    #endif

    std::ofstream fout(RESULTS_LOG);

    //hls-fpga-machine-learning insert autograd-output-file-declaration

    std::string iline;

    std::string pline;

    int e = 0;

    if (fin.is_open() && fpr.is_open()) {

        while ( std::getline(fin,iline) && std::getline (fpr,pline) ) {

        if (e % CHECKPOINT == 0) std::cout << "Processing input " << e << std::endl;

        char* cstr=const_cast(iline.c_str());

        char* current;

        std::vector in;

        current=strtok(cstr," ");

        while(current!=NULL) {

            in.push_back(atof(current));

            current=strtok(NULL," ");

        }

        cstr=const_cast(pline.c_str());

        std::vector pr;

        current=strtok(cstr," ");

        while(current!=NULL) {

            pr.push_back(atof(current));

            current=strtok(NULL," ");

        }

        //hls-fpga-machine-learning insert data

        //hls-fpga-machine-learning insert top-level-function

        if (e % CHECKPOINT == 0) {

            //hls-fpga-machine-learning insert autograd_custom_printing

            std::cout << "Predictions" << std::endl;

            //hls-fpga-machine-learning insert predictions

            std::cout << "Quantized predictions" << std::endl;

            //hls-fpga-machine-learning insert quantized

        }

        e++;

        //hls-fpga-machine-learning insert tb-output

        }

        fin.close();

        fpr.close();

        //hls-fpga-machine-learning insert autograd-output-file-closure

    } else {

        std::cout << "INFO: Unable to open input/predictions file, using default input." << std::endl;

        //hls-fpga-machine-learning insert zero

        //hls-fpga-machine-learning insert top-level-function

        //hls-fpga-machine-learning insert output

        //hls-fpga-machine-learning insert tb-output

    }

    fout.close();

    std::cout << "INFO: Saved inference results to file: " << RESULTS_LOG << std::endl;

    return 0;

    }

```

By now you know the procedure. And this is how your function `write_test_bench` inside the `vivado_writer.py` file needs to look like:

⚠️⚠️⚠️⚠️⚠️⚠️⚠️ **CHANGE IN hls4ml!!!!!**: `CHANGELOG_a.21` ⚠️⚠️⚠️⚠️⚠️⚠️⚠️

```python

    ...

        elif '//hls-fpga-machine-learning insert bram' in line:

            newline = line

            for bram in model.get_bram_variables():

                newline += '#include \"firmware/weights/{}.h\"\n'.format(bram.cppname)

                

                # [@manuelbv]: We added the following extra check to add the helpers for backprop

            elif '//hls-fpga-machine-learning insert autograd-helpers-include' in line:

                newline = line

                newline += '#include "firmware/autograd/trainer_helpers.h"\n'

    ...

```

#ifdef RTL_SIM

  std::string RESULTS_LOG = "tb_data/rtl_cosim_results.log";

#else

  std::string RESULTS_LOG = "tb_data/csim_results.log";

#endif

  std::ofstream fout(RESULTS_LOG);

  //hls-fpga-machine-learning insert autograd-output-file-declaration

## complete CHANGELOG: Hls4ml

This list should contain ALL changes I made to hls4ml, but in case I forgot something, everytime I change something I always add a comment with the `#[@manuelbv]` tag before it, so if you grep for that in the hls4ml directory, you should see absolutely all changes I made. 

* `CHANGELOG_a.1`: I added the flag "trainable = False" to allow the user to implement trainable layers in `~/ETHW/hls4ml/hls4ml/utils/config.py`, on method `config_from_keras_model`.

* `CHANGELOG_a.2`: Inside file `~/ETHW/hls4ml/hls4ml/utils/config.py`, on method `config_from_keras_model`, inside the condition `if layer['class_name'] in dense_layers + conv_layers:` I am passing the `trainable` flag to the configuration of each specific layer, so that at the end the configuration dictionary contains the trainable flag.

* `CHANGELOG_a.3`: The same as `_a.2` but for activation layers.

* `CHANGELOG_a.4`: The same as `_a.2` and `_a.3` but for qkeras_layers.

* `CHANGELOG_a.5`: Inside file `~/ETHW/hls4ml/hls4ml/utils/config.py`, on method `config_from_keras_model`, I'm adding the trainable flag also to the global configuration of the whole network iself (apart from layer by layer).

* `CHANGELOG_a.6`: Change the vivado template for the top myproject.cpp definition in `~/ETHW/hls4ml/templates/vivado/firmware/myproject.cpp` and add the `//hls-fpga-machine-learning insert autograd-def` comment so we can later on add the definition of the losses.

* `CHANGELOG_a.7`: Added checking for `//hls-fpga-machine-learning insert autograd-def` in `~/ETHW/hls4ml/writer/vivado_writer.py` file. If trainable, then we basically include the losses header definition to the final cpp file.

* `CHANGELOG_a.8`: Modified the `//hls-fpga-machine-learning insert header` check statement in `~/ETHW/hls4ml/writer/vivado_writer.py` file to make sure we are adding the loss + ground truth IO definitions to the top module definition.

* `CHANGELOG_a.9`: Change the vivado template for the top myproject.cpp definition in `~/ETHW/hls4ml/templates/vivado/firmware/myproject.cpp` and add the `//hls-fpga-machine-learning autograd-layer-wrappers` comment so we can later on add the definition of the auto-grad layer wrappers.

* `CHANGELOG_a.10`: I added the definition of the losses in `~/ETHW/hls4ml/hls4ml/utils/config.py`, on method `config_from_keras_model`.

* `CHANGELOG_a.11`: Implementation of the autograd in the main myproject.cpp output file. This is defined in the `//hls-fpga-machine-learning autograd-layer-wrappers` condition in `~/ETHW/hls4ml/writer/vivado_writer.py`. This is basically instantiating the loss layer + backprop layers (the propagation of the gradient from last layer back).

* `CHANGELOG_a.12`: Modified the `~/ETHW/hls4ml/writer/vivado_writer.py`, method `write_build_script`, to make sure we modify the the build_lib.sh script to include the `autograd` and `losses` libraries when compiling.

* `CHANGELOG_a.13`: Created a new method `write_autograd_utils` in `~/ETHW/hls4ml/writer/vivado_writer.py`, to make sure we copy the autograd/losses dependencies to the project output dir.

* `CHANGELOG_a.14`: In the `write_hls` method, we need to invoke the `write_autograd_utils` we created in `~/ETHW/hls4ml/writer/vivado_writer.py`. Let's do this after `write_nnet_utils` so the autograd/losses dependencies are copied out.

* `CHANGELOG_a.15`: Modify the `~/ETHW/hls4ml/templates/vivado/firmware/myproject.h` template and add the `hls-fpga-machine-learning insert autograd-headers` line under `#include "defines.h"` so we can add the header files for backprop.

* `CHANGELOG_a.16`: Inside the `vivado_writer`, function `write_project_header`, let's add another check and try to search for  `//hls-fpga-machine-learning insert autograd-headers`. If we find it, then we parse the different types of losses and layers in our design, and add the headers for the required backprop wrappers.

* `CHANGELOG_a.17`: Modified the `//hls-fpga-machine-learning insert header` check statement in `~/ETHW/hls4ml/writer/vivado_writer.py` in the `write_project_header` so we add the IO ports for Loss & ground truth to the header definition.

* `CHANGELOG_a.18`: Modified the `elif '//hls-fpga-machine-learning insert header' in line:` check statement in `~/ETHW/hls4ml/writer/vivado_writer.py` in the `write_bridge` to add the loss and ground truth IO defs.

* `CHANGELOG_a.19`: 

* `CHANGELOG_a.20`: Added `//hls-fpga-machine-learning insert autograd-helpers-include` to the `myproject_test.cpp` template in hls4ml, so we can add the include headers to the testbench automtically generated by hls4ml.

* `CHANGELOG_a.21`: Added the check for `//hls-fpga-machine-learning insert autograd-helpers-include` in `vivado_writer.py`.