An open API service indexing awesome lists of open source software.

https://github.com/impetus-udes/rule4ml

Resource Utilization and Latency Estimation for ML on FPGA.
https://github.com/impetus-udes/rule4ml

fpga hls keras machine-learning neural-network onnx prediction python pytorch regression-models resource-utilization surrogate-models vitis vivado

Last synced: 4 months ago
JSON representation

Resource Utilization and Latency Estimation for ML on FPGA.

Awesome Lists containing this project

README

          

[![License](https://img.shields.io/badge/License-GPL_3.0-red.svg)](https://opensource.org/license/gpl-3-0)
[![PyPI version](https://badge.fury.io/py/rule4ml.svg)](https://badge.fury.io/py/rule4ml)

# rule4ml: Resource Utilization and Latency Estimation for ML

`rule4ml` is a tool designed for pre-synthesis estimation of FPGA resource utilization and inference latency for machine learning models.

## Installation

`rule4ml` releases are uploaded to the Python Package Index for easy installation via `pip`.

```bash
pip install rule4ml
```

This will only install the [base package](https://github.com/IMPETUS-UdeS/rule4ml/tree/main/rule4ml) and its dependencies for resources and latency prediction. The [data_gen](https://github.com/IMPETUS-UdeS/rule4ml/tree/main/data_gen/) scripts and the [Jupyter notebooks](https://github.com/IMPETUS-UdeS/rule4ml/tree/main/notebooks) are to be cloned from the repo if needed.

The data generation dependencies are listed seperately in [data_gen/requirements.txt](https://github.com/IMPETUS-UdeS/rule4ml/tree/main/data_gen/requirements.txt), or can be installed with:

```bash
pip install rule4ml[datagen]
```

## Getting Started

### Tutorial
To get started with `rule4ml`, please refer to the detailed Jupyter Notebook [tutorial](https://github.com/IMPETUS-UdeS/rule4ml/tree/main/notebooks/tutorial.ipynb). This tutorial covers:

- Using pre-trained estimators for resources and latency predictions.
- Generating synthetic datasets.
- Training and testing your own predictors.

### Usage
Here's a quick example of how to use `rule4ml` to estimate resources and latency for a given model:

```python
import keras
from keras.layers import Input, Dense, Activation

from rule4ml.models.wrappers import MultiModelWrapper

# Example of a simple keras Model
input_size = 16
inputs = Input(shape=(input_size,))
x = Dense(32, activation="relu")(inputs)
x = Dense(32, activation="relu")(x)
x = Dense(32, activation="relu")(x)
outputs = Dense(5, activation="softmax")(x)

model_to_predict = keras.Model(inputs=inputs, outputs=outputs, name="Jet Classifier")
model_to_predict.build((None, input_size)) # building keras models is required

# Loading default predictors
estimator = MultiModelWrapper()
estimator.load_default_models()

# MultiModelWrapper predictions are formatted as a pandas DataFrame
prediction_df = estimator.predict(model_to_predict)

# Further formatting can be applied to organize the DataFrame
if not prediction_df.empty:
prediction_df = prediction_df.groupby(
["Model", "Board", "Strategy", "Precision", "Reuse Factor", "HLS4ML Version", "Vivado Version"], observed=True
).mean() # each row is unique in the groupby, mean() is only called to convert DataFrameGroupBy

# Outside of Jupyter notebooks, we recommend saving the DataFrame as HTML for better readability
prediction_df.to_html("keras_example.html")
```

**keras_example.html** (truncated)










BRAM
BRAM (%)
DSP
DSP (%)
FF
FF (%)
LUT
LUT (%)
CYCLES
INTERVAL


Model
Board
Strategy
Precision
Reuse Factor
HLS4ML Version
Vivado Version














Jet Classifier
pynq-z2
latency
ap_fixed<2, 1>
1
0.8.1
2019.1
2.52
0.90
0.32
0.14
1265.02
1.19
3564.90
6.70
125.77
1.35


2019.2
2.47
0.88
0.48
0.22
1262.29
1.19
3380.57
6.35
115.48
1.35


2020.1
2.29
0.82
0.49
0.22
1109.34
1.04
3279.37
6.16
115.62
1.35


2020.2
2.55
0.91
0.53
0.24
1490.04
1.40
3457.23
6.50
118.07
1.35


2021.1
2.31
0.83
0.44
0.20
1054.50
0.99
2915.67
5.48
118.99
1.35


2021.2
2.48
0.89
0.58
0.26
1085.17
1.02
3072.19
5.77
117.91
1.35


2022.1
2.53
0.90
0.47
0.21
1301.50
1.22
3093.67
5.82
119.36
1.35


2022.2
2.43
0.87
0.57
0.26
1150.09
1.08
3032.74
5.70
119.39
1.35


2023.1
2.51
0.90
0.59
0.27
1357.55
1.28
3327.19
6.25
118.30
1.35


2023.2
2.39
0.85
0.29
0.13
304.04
0.29
2689.27
5.06
108.34
1.35


2024.1
2.41
0.86
0.54
0.25
1574.28
1.48
3517.61
6.61
116.26
1.35


2024.2
2.08
0.74
0.77
0.35
936.16
0.88
2780.73
5.23
110.77
1.35


1.1.0
2019.1
2.57
0.92
1.16
0.53
1237.20
1.16
2434.88
4.58
37.70
1.35


2019.2
2.53
0.90
1.39
0.63
1273.41
1.20
2317.88
4.36
28.73
1.35


2020.1
2.35
0.84
1.42
0.65
1023.07
0.96
2275.59
4.28
28.97
1.35


2020.2
2.64
0.94
1.45
0.66
1314.61
1.24
2359.94
4.44
30.62
1.35


2021.1
2.34
0.84
1.35
0.61
983.35
0.92
2025.47
3.81
31.37
1.35


2021.2
2.56
0.91
1.50
0.68
1149.12
1.08
2167.54
4.07
30.66
1.35


2022.1
2.65
0.95
1.39
0.63
1104.21
1.04
2131.50
4.01
31.74
1.35


2022.2
2.47
0.88
1.49
0.68
1200.66
1.13
2120.53
3.99
31.79
1.35


2023.1
2.58
0.92
1.64
0.74
1247.67
1.17
2301.45
4.33
30.79
1.35


2023.2
2.49
0.89
1.14
0.52
499.64
0.47
1795.66
3.38
25.01
1.35


2024.1
2.46
0.88
1.45
0.66
1373.96
1.29
2405.98
4.52
29.38
1.35


2024.2
2.09
0.75
1.99
0.91
1059.89
1.00
2089.47
3.93
26.71
1.35

## Datasets
Training accurate predictors requires large datasets of synthesized neural networks. We used [hls4ml](https://github.com/fastmachinelearning/hls4ml) to synthesize neural networks generated with parameters randomly sampled from predefined ranges (defaults of data classes in the code). Our models' training data is publicly available at [https://borealisdata.ca/dataverse/rule4ml](https://borealisdata.ca/dataverse/rule4ml).

Newer predictors were trained on `wa-hls4ml`, a bigger dataset including more architectures and parameter ranges. This dataset, along with the HLS project files, can be found at [https://huggingface.co/datasets/fastmachinelearning/wa-hls4ml](https://huggingface.co/datasets/fastmachinelearning/wa-hls4ml) and [https://huggingface.co/datasets/fastmachinelearning/wa-hls4ml-projects](https://huggingface.co/datasets/fastmachinelearning/wa-hls4ml-projects).

## Limitations
In their current iteration, the predictors can process [Keras](https://keras.io/about/) or [PyTorch](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) models to generate FPGA resources (**BRAM**, **DSP**, **FF**, **LUT**) and latency (**Clock Cycles**) estimations for various synthesis configurations. However, the training models are limited to specific layers: **Dense/Linear**, **ReLU**, **Tanh**, **Sigmoid**, **Softmax**, **BatchNorm**, **Add**, **Concatenate**, and **Dropout**. They are also constrained by synthesis parameters, notably **clock_period** (10 ns) and **io_type** (io_parallel). Inputs outside these configurations may result in inaccurate predictions.

## License
This project is licensed under the GPL-3.0 License. See the [LICENSE](https://github.com/IMPETUS-UdeS/rule4ml/tree/main/LICENSE) file for details.