https://github.com/saronic-technologies/libinfer
https://github.com/saronic-technologies/libinfer
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/saronic-technologies/libinfer
- Owner: saronic-technologies
- License: mpl-2.0
- Created: 2024-02-15T14:06:01.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-09-04T17:12:58.000Z (9 months ago)
- Last Synced: 2025-09-04T19:16:16.607Z (9 months ago)
- Language: C++
- Size: 11.4 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `libinfer`
This library provides a simple Rust interface to a TensorRT engine using [cxx](https://cxx.rs/)
## Overview
`libinfer` allows for seamless integration of TensorRT models into Rust applications with minimal overhead. The library handles the complex C++ interaction with TensorRT while exposing a simple, idiomatic Rust API.
## Installation
To use this library, you'll need:
- CUDA and TensorRT installed on your system
- Environment variables set properly:
- `TENSORRT_LIBRARIES`: Path to TensorRT libraries
- `CUDA_LIBRARIES`: Path to CUDA libraries
- `CUDA_INCLUDE_DIRS`: Path to CUDA include directories
Add to your `Cargo.toml`:
```toml
[dependencies]
libinfer = "0.0.3"
```
## Usage
The goal of the API is to keep as much processing in Rust land as possible. Here is a sample usage:
```rust
let options = Options {
path: "yolov8n.engine".into(),
device_index: 0,
};
let mut engine = Engine::new(&options).unwrap();
// Get input dimensions of the engine as [Channels, Height, Width]
let dims = engine.get_input_dims();
// Construct a dummy input (uint8 or float32 depending on model)
let input_size = dims.iter().fold(1, |acc, &e| acc * e as usize);
let input = InputTensor {
name: "input".to_string();
data: vec![0u8; input_size];
// Run inference
let output = engine.pin_mut().infer(&input).unwrap();
// Postprocess the output according to your model's output format
// ...
```
This library is intended to be used with pre-built TensorRT engines created by the Python API or the `trtexec` CLI tool for the target device.
## Features
- Support for both fixed and dynamic batch sizes
- Automatic handling of different input data types (UINT8, FP32)
- Direct access to model dimensions and parameters
- Error handling via Rust's `Result` type
- Logging integration with `RUST_LOG` environment variable
## Examples
Check the `examples/` directory for working examples:
- `basic.rs`: Simple inference example
- `benchmark.rs`: Performance benchmarking with various batch sizes
- `dynamic.rs`: Working with dynamic batch sizes
- `functional_test.rs`: Testing correctness of model outputs
Run an example with:
```
cargo run --example basic -- --path /path/to/model.engine
```
### Example Requirements
- You must provide your own TensorRT engine files (.engine)
- For the functional_test example, you'll need input.bin and features.txt files
- To create engine files, use NVIDIA's TensorRT tools such as:
- TensorRT Python API
- trtexec command-line tool
- ONNX -> TensorRT conversion tools
See the documentation in each example file for specific requirements.
## Current Limitations
- The underlying engine code is not threadsafe (and the Rust binding does not implement `Sync`)
- Engine instances are `Send` but not `Sync`
- Input and output data transfers happen on the CPU-GPU boundary
## Future Work
- Allow passing device pointers and CUDA streams for stream synchronization events
- Async execution support
## Credits
Much of the C++ code is based on the [tensorrt-cpp-api](https://github.com/cyrusbehr/tensorrt-cpp-api) repo.