https://github.com/deepset-ai/nvidia-triton-inference

This repository contains setup examples for hosting model inference using NVIDIA triton
https://github.com/deepset-ai/nvidia-triton-inference

Last synced: 2 months ago
JSON representation

This repository contains setup examples for hosting model inference using NVIDIA triton

Host: GitHub
URL: https://github.com/deepset-ai/nvidia-triton-inference
Owner: deepset-ai
License: apache-2.0
Created: 2024-09-12T12:31:31.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-10-01T12:47:51.000Z (9 months ago)
Last Synced: 2025-03-27T22:23:32.745Z (3 months ago)
Language: Python
Size: 31.3 KB
Stars: 0
Watchers: 0
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# nvidia-triton-inference
This repository contains setup examples for hosting model inference using NVIDIA triton

## How to build a triton embedding image

1. Setup your model and tokenizer files

- move model.onnx to `hf-embedding-template/onnx_model/1/`
- move any other model files (model and tokenizer config) to `hf-embedding-template/preprocessing/1/`

2. Start Triton Server and attach shell

```
docker run --shm-size=16g --gpus all -it --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /hf-embedding-template:/models nvcr.io/nvidia/tritonserver:24.08-py3 bash
```

3. Run inside the Triton Container

```
pip install transformers

tritonserver --model-repository=/models
```

4. Run client

```
pip install tritonclient[http]

python client.py
```

## helm charts

This repo comes with ready to run helm charts. They can be found under `/helm`. E.g. `text-embedder-trion` is readily configured to run a triton embedding server.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/deepset-ai/nvidia-triton-inference

Awesome Lists containing this project

README