https://github.com/fabio-sim/dedode-onnx-tensorrt

ONNX-compatible DeDoDe 🎶 Detect, Don't Describe - Describe, Don't Detect, for Local Feature Matching. Supports TensorRT 🚀
https://github.com/fabio-sim/dedode-onnx-tensorrt

dedode deep-learning feature-extraction feature-matching homography homography-estimation local-feature-matching local-features machine-learning onnx onnxruntime openvino pose-estimation pytorch tensorrt visual-localization

Last synced: 5 months ago
JSON representation

ONNX-compatible DeDoDe 🎶 Detect, Don't Describe - Describe, Don't Detect, for Local Feature Matching. Supports TensorRT 🚀

Host: GitHub
URL: https://github.com/fabio-sim/dedode-onnx-tensorrt
Owner: fabio-sim
License: mit
Created: 2023-08-17T21:34:54.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-08-21T16:14:00.000Z (about 2 years ago)
Last Synced: 2025-01-20T23:52:32.870Z (9 months ago)
Topics: dedode, deep-learning, feature-extraction, feature-matching, homography, homography-estimation, local-feature-matching, local-features, machine-learning, onnx, onnxruntime, openvino, pose-estimation, pytorch, tensorrt, visual-localization
Language: Python
Homepage:
Size: 2.9 MB
Stars: 69
Watchers: 2
Forks: 4
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![GitHub](https://img.shields.io/github/license/fabio-sim/DeDoDe-ONNX-TensorRT)](/LICENSE)
[![ONNX](https://img.shields.io/badge/ONNX-grey)](https://onnx.ai/)
[![TensorRT](https://img.shields.io/badge/TensorRT-76B900)](https://developer.nvidia.com/tensorrt)
[![GitHub Repo stars](https://img.shields.io/github/stars/fabio-sim/DeDoDe-ONNX-TensorRT)](https://github.com/fabio-sim/DeDoDe-ONNX-TensorRT/stargazers)
[![GitHub all releases](https://img.shields.io/github/downloads/fabio-sim/DeDoDe-ONNX-TensorRT/total)](https://github.com/fabio-sim/DeDoDe-ONNX-TensorRT/releases)

# DeDoDe-ONNX-TensorRT
Open Neural Network Exchange (ONNX) compatible implementation of [DeDoDe 🎶 Detect, Don't Describe - Describe, Don't Detect, for Local Feature Matching](https://github.com/Parskatt/DeDoDe). Supports TensorRT 🚀.

DeDoDe figure
The DeDoDe detector learns to detect 3D consistent repeatable keypoints, which the DeDoDe descriptor learns to match. The result is a powerful decoupled local feature matcher.

DeDoDe ONNX TensorRT provides a 2x speedup over PyTorch.

## 🔥 ONNX Export

Prior to exporting the ONNX models, please install the [requirements](/requirements.txt).

To convert the DeDoDe models to ONNX, run [`export.py`](/export.py). We provide two types of ONNX exports: individual standalone models, and a combined end-to-end pipeline (recommended for convenience) with the `--end2end` flag.

Export Example


python export.py \

    --img_size 256 256 \

    --end2end \

    --dynamic_img_size --dynamic_batch \

    --fp16

If you would like to try out inference right away, you can download ONNX models that have already been exported [here](https://github.com/fabio-sim/DeDoDe-ONNX-TensorRT/releases) or run `./weights/download.sh`.

## ⚡ ONNX Inference

With ONNX models in hand, one can perform inference on Python using ONNX Runtime (see [requirements-onnx.txt](/requirements-onnx.txt)).

The DeDoDe inference pipeline has been encapsulated into a runner class:

```python
from onnx_runner import DeDoDeRunner

images = DeDoDeRunner.preprocess(image_array)
# images.shape == (2B, 3, H, W)

# Create ONNXRuntime runner
runner = DeDoDeRunner(
end2end_path="weights/dedode_end2end_1024.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
# TensorrtExecutionProvider
)

# Run inference
matches_A, matches_B, batch_ids = runner.run(images)

matches_A = DeDoDeRunner.postprocess(matches_A, H_A, W_A)
matches_B = DeDoDeRunner.postprocess(matches_B, H_B, W_B)
```
Alternatively, you can also run [`infer.py`](/infer.py).

Inference Example


python infer.py \

    --img_paths assets/im_A.jpg assets/im_B.jpg \

    --img_size 256 256 \

    --end2end \

    --end2end_path weights/dedode_end2end_1024_fp16.onnx \

    --fp16 \

    --viz

## 🚀 TensorRT Support

TensorRT offers the best performance and greatest memory efficiency.

TensorRT inference is supported for the end-to-end model via the TensorRT Execution Provider in ONNXRuntime. Please follow the [official documentation](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) to install TensorRT. The exported ONNX models must undergo [shape inference](/tools/symbolic_shape_infer.py) for compatibility with TensorRT.

TensorRT Example


python tools/symbolic_shape_infer.py \

  --input weights/dedode_end2end_1024.onnx \

  --output weights/dedode_end2end_1024_trt.onnx \

  --auto_merge


CUDA_MODULE_LOADING=LAZY && python infer.py \

  --img_paths assets/DSC_0410.JPG assets/DSC_0411.JPG \

  --img_size 256 256 \

  --end2end \

  --end2end_path weights/dedode_end2end_1024_trt.onnx \

  --trt \

  --viz

The first run will take longer because TensorRT needs to initialise the `.engine` and `.profile` files. Subsequent runs should use the cached files. Only static input shapes are supported. Note that TensorRT will rebuild the cache if it encounters a different input shape.

## ⏱️ Inference Time Comparison

The inference times of the end-to-end DeDoDe pipelines are shown below.

# Keypoints10242048384040968192Latency (ms) (RTX 4080 12GB)PyTorch169.72170.42N/A176.18189.53PyTorch-MP79.4280.09N/A83.896.93ONNX170.84171.83N/A180.18203.37TensorRT78.1279.5994.88N/AN/ATensorRT-FP1633.935.4542.35N/AN/A

Evaluation Details
The inference time, or latency, of only the end-to-end DeDoDe pipeline is reported; that is, the time taken for image preprocessing, postprocessing, copying data between the host & device, or finding inliers (e.g., CONSAC/MAGSAC) is not measured. The inference time is defined as the median over all samples in the MegaDepth test dataset. We use the data provided by LoFTR here - a total of 403 image pairs.

Each image is resized such that its dimensions are `512x512` before being fed into the pipeline. The inference time of the DeDoDe pipeline is then measured for different values of the detector's `num_keypoints` parameter: 1024, 2048, 4096, and 8192. Note that TensorRT has a hard limit of 3840 keypoints.

For reproducibility, the evaluation script `eval.py` is provided.

Latency figure

## Credits
If you use any ideas from the papers or code in this repo, please consider citing the authors of [DeDoDe](https://arxiv.org/abs/2308.08479). Lastly, if the ONNX or TensorRT versions helped you in any way, please also consider starring this repository.

```txt
@article{edstedt2023dedode,
title={DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local Feature Matching},
author={Johan Edstedt and Georg Bökman and Mårten Wadenbäck and Michael Felsberg},
year={2023},
eprint={2308.08479},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fabio-sim/dedode-onnx-tensorrt

Awesome Lists containing this project

README