https://github.com/rdcm/triton-ng
Rust SDK for writing custom backends for NVIDIA Triton Inference Server
https://github.com/rdcm/triton-ng
custom-backend infrence nvidia rust triton-inference-server
Last synced: 7 days ago
JSON representation
Rust SDK for writing custom backends for NVIDIA Triton Inference Server
- Host: GitHub
- URL: https://github.com/rdcm/triton-ng
- Owner: rdcm
- License: mit
- Created: 2025-10-12T23:10:04.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2026-04-11T01:39:30.000Z (about 2 months ago)
- Last Synced: 2026-04-11T02:22:04.955Z (about 2 months ago)
- Topics: custom-backend, infrence, nvidia, rust, triton-inference-server
- Language: Rust
- Homepage:
- Size: 153 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
> **WIP** — work in progress, API is unstable
# triton-ng
Rust SDK for [NVIDIA Triton Inference Server](https://github.com/triton-inference-server/server).
Provides two things:
- A safe Rust API for writing **custom Triton backends** (compiled as `.so` and loaded by Triton)
- A high-level async **gRPC client** for sending inference requests to a running Triton server
## Crates
| Crate | Description |
|---|---|
| `triton-ng-sys` | Raw FFI bindings generated by bindgen from `tritonbackend.h` |
| `triton-ng` | Safe Rust wrapper over `triton-ng-sys` |
| `triton-ng-macros` | Proc-macros for `triton-ng` |
| `triton-ng-client` | High-level async gRPC client |
| `example/custom-backend` | Example custom backend (MNIST, proxies to ONNX model) |
| `example/app` | Example client application |
## Writing a custom backend
Implement the `Backend` trait and register it with `declare_backend!`:
```rust
use triton_ng::backend::Backend;
use triton_ng::{BackendHandle, DataType, Error, InferenceRequest, Response};
struct MyBackend;
impl Backend for MyBackend {
fn initialize(backend: &BackendHandle) -> Result<(), Error> {
Ok(())
}
fn model_instance_execute(
model: triton_ng::Model,
requests: &[triton_ng::Request],
) -> Result<(), Error> {
for request in requests {
let input = request.get_input("INPUT")?;
let data = input.as_fp32_vec()?;
// ... run inference ...
let mut response = Response::new(request)?;
response
.create_output("OUTPUT", DataType::Fp32, &[1, 10])?
.write_fp32_vec(&result)?;
response.send()?;
}
Ok(())
}
}
triton_ng::declare_backend!(MyBackend);
```
Build as a `cdylib`:
```toml
# Cargo.toml
[lib]
crate-type = ["cdylib"]
```
## Using the gRPC client
```rust
use triton_ng_client::{InferInput, InferOptions, TritonClient, TritonClientConfig};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let client = TritonClient::new(TritonClientConfig::new("http://localhost:8001")).await?;
let meta = client.model_metadata("my_model", None, None).await?;
let n: usize = meta.inputs[0].shape.iter().map(|&d| d as usize).product();
let response = client
.infer(
"my_model",
None,
[InferInput::fp32("INPUT", meta.inputs[0].shape.clone(), vec![0.0f32; n])],
["OUTPUT"],
InferOptions::default(),
)
.await?;
println!("{:?}", response.outputs[0].data);
Ok(())
}
```
TLS:
```rust
use triton_ng_client::{ClientTlsConfig, TritonClientConfig};
let config = TritonClientConfig::new("https://triton.example.com:8001")
.with_tls(ClientTlsConfig::new()); // uses system roots
```
## Getting started
### Prerequisites
- Rust stable
- NVIDIA driver 570+ (580+ for Blackwell / RTX 50xx)
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
- Docker
### First run
```bash
git submodule update --init --recursive
make build # compile custom backend → target/release/libtriton_custom_backend.so
make download-model # download mnist_onnx + create model version dirs
make docker-env-up # start Triton (mounts .so and models/)
```
### Run the example app
```bash
cargo run --manifest-path=example/app/Cargo.toml --release
```
Triton must be running with both models in READY state.
### Run integration tests
```bash
make tests # cargo nextest run --workspace
```
Tests require a running Triton instance (`make docker-env-up`).
### Rebuild after backend changes
```bash
make build
make docker-env-down && make docker-env-up
```
## Features
| Feature | Description |
|---|---|
| `cuda` | Enable GPU and pinned memory allocation in `ResponseAllocator` |
```toml
triton-ng = { version = "0.1", features = ["cuda"] }
```
## License
MIT