https://github.com/icarogabryel/cnn-accelerator
CNN accelerator using radix-4 Booth's algorithm described in VHDL . It multiplies a 32-bit integer with a 7-bit constant from a 3x3 kernel and accumulates the results.
https://github.com/icarogabryel/cnn-accelerator
accelerator booths-algorithm cnn compressor computer-architecture computer-organization hardware hardware-acceleration hardware-designs hdl ia integrated-circuits multiplier
Last synced: 11 months ago
JSON representation
CNN accelerator using radix-4 Booth's algorithm described in VHDL . It multiplies a 32-bit integer with a 7-bit constant from a 3x3 kernel and accumulates the results.
- Host: GitHub
- URL: https://github.com/icarogabryel/cnn-accelerator
- Owner: icarogabryel
- License: gpl-3.0
- Created: 2025-03-23T01:01:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-04T20:22:53.000Z (12 months ago)
- Last Synced: 2025-07-04T20:34:30.779Z (12 months ago)
- Topics: accelerator, booths-algorithm, cnn, compressor, computer-architecture, computer-organization, hardware, hardware-acceleration, hardware-designs, hdl, ia, integrated-circuits, multiplier
- Language: VHDL
- Homepage:
- Size: 249 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# CNN Accelerator
In this repository, you will find the hardware design for a Convolutional Neural Network (CNN) accelerator described in VHDL. It multiplies a 32-bit integer with a 7-bit constant from a 3x3 kernel and accumulates the results.
## The multiplier
The multiplier uses a parallel approach of the radix-4 Booth's algorithm. In this case, the 7-bit constant is used as the multiplier to generate only 4 partial products. Firstly, the architecture should use a wallace tree to sum the partial products. However, with only two levels, it was decided to use a four way adder that uses a 4:2 compressor internally to reduce the size of the circuit.
Figure 1
As seen in figure 1, the multiplier has 4 components responsible for generating the partial products and a 4-way adder to sum them. Inside the partial product generator, the Booth's algorithm is implemented as follows figure 2. The multiplexer is used to select the value of the partial product based on the value of block made by part of the multiplier number.
Figure 2
The partials products are generated by the following the radix-4 table:
| Operation | Block |
|------------|-------|
| mtpcd * 0 | 000 |
| mtpcd * 1 | 001 |
| mtpcd * 1 | 010 |
| mtpcd * 2 | 011 |
| mtpcd * -2 | 100 |
| mtpcd * -1 | 101 |
| mtpcd * -1 | 110 |
| mtpcd * 0 | 111 |
Also, the multiplexer only has 33 bits because the extension of the signal is only necessary when summing the partial products. When extending the signal, is also made the shift to the left to align the bits in Booth's algorithm.
The 4-way adder is implemented with a 4:2 compressor chain. The chain was implemented as follows:
```vhdl
compressor_gen : for i in 0 to 38 generate
p_carry(i) <= carry_bus(i) or c_out_bus(i);
compressor_inst : compressor
port map(
b0 => a(i),
b1 => b(i),
b2 => c(i),
b3 => d(i),
c_in => p_carry(i),
c_out => c_out_bus(i + 1),
carry => carry_bus(i + 1),
sum => sum(i)
);
end generate compressor_gen;
```