https://github.com/Natsu-Akatsuki/RangeNet-TensorRT
Rangenet++ with high-version TensorRT (e.g.8~10), libtorch, CUDA programming.
https://github.com/Natsu-Akatsuki/RangeNet-TensorRT
cuda libtorch semantic-segmentation tensorrt
Last synced: 11 months ago
JSON representation
Rangenet++ with high-version TensorRT (e.g.8~10), libtorch, CUDA programming.
- Host: GitHub
- URL: https://github.com/Natsu-Akatsuki/RangeNet-TensorRT
- Owner: Natsu-Akatsuki
- License: mit
- Created: 2022-02-11T11:42:52.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-12-04T07:58:23.000Z (over 1 year ago)
- Last Synced: 2025-03-29T03:12:45.109Z (about 1 year ago)
- Topics: cuda, libtorch, semantic-segmentation, tensorrt
- Language: C++
- Homepage:
- Size: 5.71 MB
- Stars: 54
- Watchers: 2
- Forks: 10
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# [RangeNet-TensorRT](https://github.com/Natsu-Akatsuki/RangeNet-TensorRT)
🎉 This project has been a pleasure, allowing me to repay technical debt, learn how to locate bugs during model deployment, gain experience with GitHub Actions, and explore CUDA programming. I greatly appreciate the valuable feedback from others that has contributed to improving the project. I hope that this project will be of use to you.
[English](README.md) | [ç®€ä½“ä¸æ–‡](README_cn.md)
## 1. Purpose
1. **Use more newer dependencies and APIs**. Specifically, we deploy the [RangeNet repository](https://github.com/PRBonn/rangenet_lib) in an environment with TensorRT 8+, Ubuntu 20.04+, remove Boost dependency, manage TensorRT objects and GPU memory with smart pointers, and provide ROS demo.
2. Faster Performance. Resolve the issue of reduced segmentation accuracy when using FP16 ([issue#9](https://github.com/PRBonn/rangenet_lib/issues/9)), achieving a significant speed boost without sacrificing accuracy. Preprocess data using CUDA. Perform KNN post-processing with libtorch (
refer to [here](https://github.com/PRBonn/lidar-bonnetal/blob/master/train/tasks/semantic/postproc/KNN.py)).
## 2. Installation
### 2.1 Docker installation
We provide a Docker installation, please see more in [docker/README.md](docker/README.md)
### 2.2 Source installation
Step 1: Download and Extract libtorch
> [!note]
> Using the Torch library from Conda was observed to slow down the post-processing stage from 6 ms to 30 ms.
```bash
$ wget -c https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.10.2%2Bcu113.zip -O libtorch.zip
$ unzip libtorch.zip
```
Step 2: Set up the deep learning environment (install NVIDIA driver, CUDA, TensorRT, cuDNN). The tested configurations are listed below. At least 3000 MB of GPU memory is required.
| Ubuntu | GPU | TensorRT | CUDA | cuDNN | — |
|:------:|:----------------------------------------------------:|:---------:|:---------------:|:----------------:|:------------------:|
| 20.04 | TITAN RTX | 8.2.3 | CUDA 11.4.r11.4 | cuDNN 8.2.4 | :heavy_check_mark: |
| 20.04 | NVIDIA GeForce RTX 3060 | 8.4.1.5 | CUDA 11.3.r11.3 | cuDNN 8.0.5 | :heavy_check_mark: |
| 20.04 | NVIDIA GeForce RTX 3060
NVIDIA GeForce RTX 4070 | 10.6.0.26 | CUDA 11.1.105 | cuDNN 8.0.5.39 | :heavy_check_mark: |
| 20.04 | NVIDIA GeForce RTX 3060
NVIDIA GeForce RTX 4070 | 10.6.0.26 | CUDA 12.4.r12.4 | cuDNN 9.1.0.70-1 | :heavy_check_mark: |
| 22.04 | NVIDIA GeForce RTX 3060 | 8.2.5.1 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | :heavy_check_mark: |
| 22.04 | NVIDIA GeForce RTX 3060 | 8.4.1.5 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | :heavy_check_mark: |
| 22.04 | NVIDIA GeForce RTX 3060 | 8.4.3.1 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | :heavy_check_mark: |
| 22.04 | NVIDIA GeForce RTX 3060 | 8.6.1.6 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | :heavy_check_mark: |
| 22.04 | NVIDIA GeForce RTX 3060 | 10.6.0.26 | CUDA 11.3.r11.3 | cuDNN 8.8.0 | :heavy_check_mark: |
> [!note]
>
> You must choose the appropriate version of CUDA based on your Compute Capability. For example, if your want to use Compute Capability 89, you must choose CUDA 11.8+.
>
> You can see `Compute Capability` in https://developer.nvidia.com/cuda-gpus#compute.
| GPU Hardware Architecture | Compute Capability | Relevant GPUs | Minimum CUDA Version |
|:-------------------------:|:------------------:|:----------------------------------:|:--------------------:|
| Ampere Architecture | 86 | RTX 3060,RTX3070,RTX 3080,RTX 3090 | CUDA 11.1 |
| Ada Lovelace Architecture | 89 | RTX 4090, RTX 4080 | CUDA 11.8 |
> [!note]
>
> You must choose the appropriate version of CUDA based on your nvidia-driver.
| nvidia-driver Version | Maximum CUDA Version |
|:---------------------:|:--------------------:|
| 545 | CUDA 12.3 |
| 550 | CUDA 12.4 |
Add the following environment variables to ~/.bashrc:
```bash
# Example configuration:
# >>> Deep Learning Configuration >>>
# Import CUDA environment
CUDA_PATH=/usr/local/cuda/bin
CUDA_LIB_PATH=/usr/local/cuda/lib64
# Import TensorRT environment
export TENSORRT_DIR=${HOME}/Application/TensorRT-8.4.1.5/
TENSORRT_PATH=${TENSORRT_DIR}/bin
TENSORRT_LIB_PATH=${TENSORRT_DIR}/lib
# Import libtorch environment
export Torch_DIR=${HOME}/Application/libtorch/share/cmake/Torch
export PATH=${PATH}:${CUDA_PATH}:${TENSORRT_PATH}
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_LIB_PATH}:${TENSORRT_LIB_PATH}
```
Step 3: (Optional, if ROS components are needed). Please install ROS1 (Noetic) or ROS2 (Humble).
```bash
# Install ROS
$ ...
# Install extra dependency
$ sudo apt install ros-${ROS_DISTRO}-pcl-ros
```
Step 4: Install apt-related and Python packages
```bash
$ sudo apt install build-essential python3-dev python3-pip apt-utils git cmake libboost-all-dev libyaml-cpp-dev libopencv-dev python3-empy libfmt-dev
$ pip install catkin_tools trollius numpy
```
Step 5: Clone the Repository
```bash
$ git clone https://github.com/Natsu-Akatsuki/RangeNet-TensorRT ~/rangenet/src/rangenet/
```
Step 6: Import model files and datasets.
```bash
# Download model files
$ wget -c https://github.com/Natsu-Akatsuki/RangeNet-TensorRT/releases/download/v0.0.0-alpha/model.onnx -O ~/rangenet/src/rangenet/model/model.onnx
```
Download datasets: see [Baidu Cloud](https://pan.baidu.com/s/1iXSWaEfZsfpRps1yvqMOrA?pwd=9394).
Directory Structure
```bash
.
├── model
│  ├── arch_cfg.yaml
│  ├── data_cfg.yaml
│  └── model.onnx
├── data
└── ├── 000000.pcd
├── kitti_2011_09_30_drive_0027_synced
└── kitti_2011_09_30_drive_0027_synced.bag
```
## 3. Usage
> [!note]
>
> The first run may take some time to generate the TensorRT optimized engine.
> [!note]
>
> Since we use set(CMAKE_CUDA_STANDARD 17), a feature introduced in [CMake 3.18](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_STANDARD.html), it requires at least version 3.18. Unfortunately, the default CMake version in Ubuntu 20.04 is 3.16.3. Therefore, we provide a workaround to use a higher version of CMake with minimal effort.
> ```bash
> $ pip3 install --user cmake==3.18
> $ echo 'export PATH=${HOME}/.local/bin:${PATH}' >> ~/.bashrc
:wrench: Usage 1:
Run data in ROS1 or ROS2
```bash
# >>> ROS1 >>>
$ cd ~/rangetnet/
# USE -Wno-dev to suppress PCL WARNING
$ catkin build --cmake-args -Wno-dev
$ source devel/setup.bash
$ roslaunch rangenet_pp ros1_rangenet.launch
$ roslaunch rangenet_pp ros1_bag.launch
# >>> ROS2 >>>
$ cd ~/rangetnet/
$ colcon build --symlink-install
$ source install/setup.bash
$ ros2 launch rangenet_pp ros2_rangenet.launch
$ ros2 launch rangenet_pp ros2_bag.launch
```
:wrench: Usage 2:
Predict single-frame point clouds (PCD format)
> [!note]
> PCD point cloud fields must be xyzi, and the intensity field should be normalized (0-1).
```bash
# Modify the parameters in config/infer.yaml
$ cd ~/rangenet/src/rangenet/
$ mkdir build
$ cd build
# To display inference time: cmake -DPERFORMANCE_LOG=ON .. && make
$ unset ROS_VERSION && cmake -Wno-dev .. && make -j4
$ ./demo
```
| Step | Time |
|:--------------:|:----------:|
| Preprocessing | 1.51363 ms |
| Inference | 21.8513 ms |
| Postprocessing | 4.98176 ms |
## 4. FAQ
:question: Issue 1:
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1:
The ONNX model is incomplete. Please Re-download the model.
:question: Issue 2:
Segmentation fault [Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)] when visualizing single point cloud frames in Ubuntu 22.04 using PCL.
Use PCL library version 1.13.0+. Please provide variable `PCL_DIR` in `cmake/ThirdParty.cmake`. See more in [Here](https://github.com/PointCloudLibrary/pcl/pull/5252).
## Roadmap
- [x] Test ROS1 demo
- [x] Resolve [issue#8](https://github.com/Natsu-Akatsuki/RangeNetTrt8/issues/8) (2023.07.01)
- [x] Add English documentation (2024.11.19)
- [x] Explain why using FP16 leads to precision degradation [See more in [Here](docs/the_reason_for_why_using__FP16_can_cause_accuracy_degradation.md)] (2024.11.28)
- [x] Provide a Docker environment (2024.11.30)
- [ ] Add Pybind11 implementation
- [ ] Resolve non-reproducibility
- [ ] Refactor code to follow coding standards and improve readability