https://github.com/rdcm/triton-ng

Rust SDK for writing custom backends for NVIDIA Triton Inference Server
https://github.com/rdcm/triton-ng

custom-backend infrence nvidia rust triton-inference-server

Last synced: about 2 months ago
JSON representation

Rust SDK for writing custom backends for NVIDIA Triton Inference Server

Host: GitHub
URL: https://github.com/rdcm/triton-ng
Owner: rdcm
License: mit
Created: 2025-10-12T23:10:04.000Z (10 months ago)
Default Branch: main
Last Pushed: 2026-04-11T01:39:30.000Z (4 months ago)
Last Synced: 2026-04-11T02:22:04.955Z (4 months ago)
Topics: custom-backend, infrence, nvidia, rust, triton-inference-server
Language: Rust
Homepage:
Size: 153 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

           > **WIP** — work in progress, API is unstable

# triton-ng

Rust SDK for [NVIDIA Triton Inference Server](https://github.com/triton-inference-server/server).

Provides two things:

- A safe Rust API for writing **custom Triton backends** (compiled as `.so` and loaded by Triton)

- A high-level async **gRPC client** for sending inference requests to a running Triton server

## Crates

| Crate | Description |

|---|---|

| `triton-ng-sys` | Raw FFI bindings generated by bindgen from `tritonbackend.h` |

| `triton-ng` | Safe Rust wrapper over `triton-ng-sys` |

| `triton-ng-macros` | Proc-macros for `triton-ng` |

| `triton-ng-client` | High-level async gRPC client |

| `example/custom-backend` | Example custom backend (MNIST, proxies to ONNX model) |

| `example/app` | Example client application |

## Writing a custom backend

Implement the `Backend` trait and register it with `declare_backend!`:

```rust

use triton_ng::backend::Backend;

use triton_ng::{BackendHandle, DataType, Error, InferenceRequest, Response};

struct MyBackend;

impl Backend for MyBackend {

    fn initialize(backend: &BackendHandle) -> Result<(), Error> {

        Ok(())

    }

    fn model_instance_execute(

        model: triton_ng::Model,

        requests: &[triton_ng::Request],

    ) -> Result<(), Error> {

        for request in requests {

            let input = request.get_input("INPUT")?;

            let data = input.as_fp32_vec()?;

            // ... run inference ...

            let mut response = Response::new(request)?;

            response

                .create_output("OUTPUT", DataType::Fp32, &[1, 10])?

                .write_fp32_vec(&result)?;

            response.send()?;

        }

        Ok(())

    }

}

triton_ng::declare_backend!(MyBackend);

```

Build as a `cdylib`:

```toml

# Cargo.toml

[lib]

crate-type = ["cdylib"]

```

## Using the gRPC client

```rust

use triton_ng_client::{InferInput, InferOptions, TritonClient, TritonClientConfig};

#[tokio::main]

async fn main() -> anyhow::Result<()> {

    let client = TritonClient::new(TritonClientConfig::new("http://localhost:8001")).await?;

    let meta = client.model_metadata("my_model", None, None).await?;

    let n: usize = meta.inputs[0].shape.iter().map(|&d| d as usize).product();

    let response = client

        .infer(

            "my_model",

            None,

            [InferInput::fp32("INPUT", meta.inputs[0].shape.clone(), vec![0.0f32; n])],

            ["OUTPUT"],

            InferOptions::default(),

        )

        .await?;

    println!("{:?}", response.outputs[0].data);

    Ok(())

}

```

TLS:

```rust

use triton_ng_client::{ClientTlsConfig, TritonClientConfig};

let config = TritonClientConfig::new("https://triton.example.com:8001")

    .with_tls(ClientTlsConfig::new()); // uses system roots

```

## Getting started

### Prerequisites

- Rust stable

- NVIDIA driver 570+ (580+ for Blackwell / RTX 50xx)

- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)

- Docker

### First run

```bash

git submodule update --init --recursive

make build           # compile custom backend → target/release/libtriton_custom_backend.so

make download-model  # download mnist_onnx + create model version dirs

make docker-env-up   # start Triton (mounts .so and models/)

```

### Run the example app

```bash

cargo run --manifest-path=example/app/Cargo.toml --release

```

Triton must be running with both models in READY state.

### Run integration tests

```bash

make tests           # cargo nextest run --workspace

```

Tests require a running Triton instance (`make docker-env-up`).

### Rebuild after backend changes

```bash

make build

make docker-env-down && make docker-env-up

```

## Features

| Feature | Description |

|---|---|

| `cuda` | Enable GPU and pinned memory allocation in `ResponseAllocator` |

```toml

triton-ng = { version = "0.1", features = ["cuda"] }

```

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rdcm/triton-ng

Awesome Lists containing this project

README