https://github.com/daschr/cuda_firewall
Implementing a Firewall using dpdk and CUDA
https://github.com/daschr/cuda_firewall
cuda dpdk firewall
Last synced: about 1 year ago
JSON representation
Implementing a Firewall using dpdk and CUDA
- Host: GitHub
- URL: https://github.com/daschr/cuda_firewall
- Owner: daschr
- Created: 2021-10-04T08:31:33.000Z (over 4 years ago)
- Default Branch: async
- Last Pushed: 2022-06-07T12:07:45.000Z (about 4 years ago)
- Last Synced: 2025-03-24T16:24:40.299Z (about 1 year ago)
- Topics: cuda, dpdk, firewall
- Language: C
- Homepage:
- Size: 6.11 MB
- Stars: 10
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Thesis - *Offloading network packet classification to GPUs using CUDA and dpdk*
See [this PDF](https://github.com/daschr/cuda_firewall/blob/e4e9b63af005667067a52d3d302dcd39398bfcf8/thesis.pdf).
# cuda_firewall
Implementing a Firewall using dpdk and CUDA
# current stats

## Line rate*
| line rate | 100Mbits/s | 500 Mbits/s | 1Gbit/s | 5 Gbit/s | 10 Gbit/s | 20 Gbit/s | 40 Gbit/s|
|-----------|:----------:|:-----------:|:-------:|:--------:|:---------:|:---------:|:--------:|
||**reached**|**reached**|**reached**|**reached**|**reached**|*pending*|*pending*|
*tested using iperf3 and two Mellanox ConnectX-3 NICs (40GigE)
## Packet rate**
| packet rate | with tap forward | without tap forward |
|-----------|:----------:|:-----------:|
||~2.8Mpps|~12.5Mpps|
**tested using pktgen-dpdk using the asynchronous execution model (*async* branch) and two Mellanox ConnectX-3 NICs (40GigE)
# current progress
- [x] working bitvector search usng CUDA
- [x] make use of dpdk table api
- [x] simple 5 tuple rule syntax with DROP/ACCEPT actions
- [x] l2 polling on trunk port and l2 forward to correspondending tap iface, if lookup successfully highest priority rule has ACCEPT action
- [x] simple l2 forward of incoming packet from tap to trunk port
- [ ] switch from tap to kni
- [ ] add better stats collection to firewall
- [x] improving speed of bitvector search
- [ ] misc. refactoring
# settings
* use `isolcpus` to isolate at least two adjacent logical cores
* force device managed flow steering, f.e. for Mellanox ConnectX-3: `mlx4_core.log_num_mgm_entry_size=-1`
* example: `GRUB_CMDLINE_LINUX_DEFAULT="quiet isolcpus=2,3 mlx4_core.log_num_mgm_entry_size=-1"`
# usage
* build dpdk (>=21.08)
* `make all`
* run:
1. `sudo ./firewall -l0-1 --vdev=net_tap0,iface=fw0 rules.txt`
2. `ip a add dev fw0`
3. on second host: `ip a add `
4. now test