https://github.com/Natsu-Akatsuki/RangeNet-TensorRT

Rangenet++ with high-version TensorRT (e.g.8~10), libtorch, CUDA programming.
https://github.com/Natsu-Akatsuki/RangeNet-TensorRT

cuda libtorch semantic-segmentation tensorrt

Last synced: 11 months ago
JSON representation

Rangenet++ with high-version TensorRT (e.g.8~10), libtorch, CUDA programming.

Host: GitHub
URL: https://github.com/Natsu-Akatsuki/RangeNet-TensorRT
Owner: Natsu-Akatsuki
License: mit
Created: 2022-02-11T11:42:52.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2024-12-04T07:58:23.000Z (over 1 year ago)
Last Synced: 2025-03-29T03:12:45.109Z (about 1 year ago)
Topics: cuda, libtorch, semantic-segmentation, tensorrt
Language: C++
Homepage:
Size: 5.71 MB
Stars: 54
Watchers: 2
Forks: 10
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # [RangeNet-TensorRT](https://github.com/Natsu-Akatsuki/RangeNet-TensorRT)

🎉 This project has been a pleasure, allowing me to repay technical debt, learn how to locate bugs during model deployment, gain experience with GitHub Actions, and explore CUDA programming. I greatly appreciate the valuable feedback from others that has contributed to improving the project. I hope that this project will be of use to you.



[English](README.md) | [简体中文](README_cn.md)



## 1. Purpose

1. **Use more newer dependencies and APIs**. Specifically, we deploy the [RangeNet repository](https://github.com/PRBonn/rangenet_lib) in an environment with TensorRT 8+, Ubuntu 20.04+, remove Boost dependency, manage TensorRT objects and GPU memory with smart pointers, and provide ROS demo.

2. Faster Performance. Resolve the issue of reduced segmentation accuracy when using FP16 ([issue#9](https://github.com/PRBonn/rangenet_lib/issues/9)), achieving a significant speed boost without sacrificing accuracy. Preprocess data using CUDA. Perform KNN post-processing with libtorch (

   refer to [here](https://github.com/PRBonn/lidar-bonnetal/blob/master/train/tasks/semantic/postproc/KNN.py)).



	



## 2. Installation

### 2.1 Docker installation

We provide a Docker installation, please see more in [docker/README.md](docker/README.md)

### 2.2 Source installation

Step 1: Download and Extract libtorch

> [!note]

> Using the Torch library from Conda was observed to slow down the post-processing stage from 6 ms to 30 ms.

```bash

$ wget -c https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.10.2%2Bcu113.zip -O libtorch.zip

$ unzip libtorch.zip

```

Step 2: Set up the deep learning environment (install NVIDIA driver, CUDA, TensorRT, cuDNN). The tested configurations are listed below. At least 3000 MB of GPU memory is required.

| Ubuntu |                         GPU                          | TensorRT  |      CUDA       |      cuDNN       |         —          |

|:------:|:----------------------------------------------------:|:---------:|:---------------:|:----------------:|:------------------:|

| 20.04  |                      TITAN RTX                       |   8.2.3   | CUDA 11.4.r11.4 |   cuDNN 8.2.4    | :heavy_check_mark: |

| 20.04  |               NVIDIA GeForce RTX 3060                |  8.4.1.5  | CUDA 11.3.r11.3 |   cuDNN 8.0.5    | :heavy_check_mark: |

| 20.04  | NVIDIA GeForce RTX 3060
NVIDIA GeForce RTX 4070 | 10.6.0.26 |  CUDA 11.1.105  | cuDNN  8.0.5.39  | :heavy_check_mark: |

| 20.04  | NVIDIA GeForce RTX 3060
NVIDIA GeForce RTX 4070 | 10.6.0.26 | CUDA 12.4.r12.4 | cuDNN 9.1.0.70-1 | :heavy_check_mark: |

| 22.04  |               NVIDIA GeForce RTX 3060                |  8.2.5.1  | CUDA 11.3.r11.3 |   cuDNN 8.8.0    | :heavy_check_mark: |

| 22.04  |               NVIDIA GeForce RTX 3060                |  8.4.1.5  | CUDA 11.3.r11.3 |   cuDNN 8.8.0    | :heavy_check_mark: |

| 22.04  |               NVIDIA GeForce RTX 3060                |  8.4.3.1  | CUDA 11.3.r11.3 |   cuDNN 8.8.0    | :heavy_check_mark: |

| 22.04  |               NVIDIA GeForce RTX 3060                |  8.6.1.6  | CUDA 11.3.r11.3 |   cuDNN 8.8.0    | :heavy_check_mark: |

| 22.04  |               NVIDIA GeForce RTX 3060                | 10.6.0.26 | CUDA 11.3.r11.3 |   cuDNN 8.8.0    | :heavy_check_mark: |

> [!note]

>

> You must choose the appropriate version of CUDA based on your Compute Capability. For example, if your want to use Compute Capability 89, you must choose CUDA 11.8+.

>

> You can see  `Compute Capability` in https://developer.nvidia.com/cuda-gpus#compute.

| GPU Hardware Architecture | Compute Capability |           Relevant GPUs            | Minimum CUDA Version |

|:-------------------------:|:------------------:|:----------------------------------:|:--------------------:|

|    Ampere Architecture    |         86         | RTX 3060，RTX3070，RTX 3080，RTX 3090 |      CUDA 11.1       |

| Ada Lovelace Architecture |         89         |         RTX 4090, RTX 4080         |      CUDA 11.8       |

> [!note]

>

> You must choose the appropriate version of CUDA based on your nvidia-driver.

| nvidia-driver Version | Maximum CUDA Version |

|:---------------------:|:--------------------:|

|          545          |      CUDA 12.3       |

|          550          |      CUDA 12.4       |

Add the following environment variables to ~/.bashrc:

```bash

# Example configuration:

# >>> Deep Learning Configuration >>>

# Import CUDA environment

CUDA_PATH=/usr/local/cuda/bin

CUDA_LIB_PATH=/usr/local/cuda/lib64

# Import TensorRT environment

export TENSORRT_DIR=${HOME}/Application/TensorRT-8.4.1.5/

TENSORRT_PATH=${TENSORRT_DIR}/bin

TENSORRT_LIB_PATH=${TENSORRT_DIR}/lib

# Import libtorch environment

export Torch_DIR=${HOME}/Application/libtorch/share/cmake/Torch

export PATH=${PATH}:${CUDA_PATH}:${TENSORRT_PATH}

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_LIB_PATH}:${TENSORRT_LIB_PATH}

```

Step 3: (Optional, if ROS components are needed). Please install ROS1 (Noetic) or ROS2 (Humble).

```bash

# Install ROS

$ ...

# Install extra dependency

$ sudo apt install ros-${ROS_DISTRO}-pcl-ros

```

Step 4: Install apt-related and Python packages

```bash

$ sudo apt install build-essential python3-dev python3-pip apt-utils git cmake libboost-all-dev libyaml-cpp-dev libopencv-dev python3-empy libfmt-dev

$ pip install catkin_tools trollius numpy

```

Step 5: Clone the Repository

```bash

$ git clone https://github.com/Natsu-Akatsuki/RangeNet-TensorRT ~/rangenet/src/rangenet/

```

Step 6: Import model files and datasets.

```bash

# Download model files

$ wget -c https://github.com/Natsu-Akatsuki/RangeNet-TensorRT/releases/download/v0.0.0-alpha/model.onnx -O ~/rangenet/src/rangenet/model/model.onnx

```

Download datasets: see [Baidu Cloud](https://pan.baidu.com/s/1iXSWaEfZsfpRps1yvqMOrA?pwd=9394).

    Directory Structure

```bash

.

├── model

│   ├── arch_cfg.yaml

│   ├── data_cfg.yaml

│   └── model.onnx

├── data

└── ├── 000000.pcd

    ├── kitti_2011_09_30_drive_0027_synced

    └── kitti_2011_09_30_drive_0027_synced.bag

    

```

## 3. Usage

> [!note]

>

> The first run may take some time to generate the TensorRT optimized engine.

> [!note]

>

> Since we use set(CMAKE_CUDA_STANDARD 17), a feature introduced in [CMake 3.18](https://cmake.org/cmake/help/latest/prop_tgt/CUDA_STANDARD.html), it requires at least version 3.18. Unfortunately, the default CMake version in Ubuntu 20.04 is 3.16.3. Therefore, we provide a workaround to use a higher version of CMake with minimal effort.

> ```bash

> $ pip3 install --user cmake==3.18

> $ echo 'export PATH=${HOME}/.local/bin:${PATH}' >> ~/.bashrc

    :wrench: Usage 1：

        Run data in ROS1 or ROS2

    

 

  



```bash

# >>> ROS1 >>>

$ cd ~/rangetnet/

# USE -Wno-dev to suppress PCL WARNING

$ catkin build --cmake-args -Wno-dev

$ source devel/setup.bash

$ roslaunch rangenet_pp ros1_rangenet.launch

$ roslaunch rangenet_pp ros1_bag.launch

# >>> ROS2 >>>

$ cd ~/rangetnet/

$ colcon build --symlink-install

$ source install/setup.bash

$ ros2 launch rangenet_pp ros2_rangenet.launch

$ ros2 launch rangenet_pp ros2_bag.launch

```

    :wrench: Usage 2：

        Predict single-frame point clouds (PCD format)

    

> [!note]

> PCD point cloud fields must be xyzi, and the intensity field should be normalized (0-1).

```bash

# Modify the parameters in config/infer.yaml

$ cd ~/rangenet/src/rangenet/

$ mkdir build

$ cd build

# To display inference time: cmake -DPERFORMANCE_LOG=ON .. && make

$ unset ROS_VERSION && cmake -Wno-dev .. && make -j4

$ ./demo

```

|      Step      |    Time    |

|:--------------:|:----------:|

| Preprocessing  | 1.51363 ms |

|   Inference    | 21.8513 ms |

| Postprocessing | 4.98176 ms |

## 4. FAQ

 

    :question: Issue 1: 

        [libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1:

    

The ONNX model is incomplete. Please Re-download the model.

 

    :question: Issue 2: 

        Segmentation fault [Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)] when visualizing single point cloud frames in Ubuntu 22.04 using PCL.

    

Use PCL library version 1.13.0+. Please provide variable `PCL_DIR` in `cmake/ThirdParty.cmake`. See more in [Here](https://github.com/PointCloudLibrary/pcl/pull/5252).

## Roadmap

- [x] Test ROS1 demo

- [x] Resolve [issue#8](https://github.com/Natsu-Akatsuki/RangeNetTrt8/issues/8) (2023.07.01)

- [x] Add English documentation (2024.11.19)

- [x] Explain why using FP16 leads to precision degradation [See more in [Here](docs/the_reason_for_why_using__FP16_can_cause_accuracy_degradation.md)] (2024.11.28)

- [x] Provide a Docker environment (2024.11.30)

- [ ] Add Pybind11 implementation

- [ ] Resolve non-reproducibility

- [ ] Refactor code to follow coding standards and improve readability

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Natsu-Akatsuki/RangeNet-TensorRT

Awesome Lists containing this project

README