https://github.com/deepset-ai/nvidia-triton-inference
This repository contains setup examples for hosting model inference using NVIDIA triton
https://github.com/deepset-ai/nvidia-triton-inference
Last synced: 2 months ago
JSON representation
This repository contains setup examples for hosting model inference using NVIDIA triton
- Host: GitHub
- URL: https://github.com/deepset-ai/nvidia-triton-inference
- Owner: deepset-ai
- License: apache-2.0
- Created: 2024-09-12T12:31:31.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-10-01T12:47:51.000Z (9 months ago)
- Last Synced: 2025-03-27T22:23:32.745Z (3 months ago)
- Language: Python
- Size: 31.3 KB
- Stars: 0
- Watchers: 0
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# nvidia-triton-inference
This repository contains setup examples for hosting model inference using NVIDIA triton## How to build a triton embedding image
1. Setup your model and tokenizer files
- move model.onnx to `hf-embedding-template/onnx_model/1/`
- move any other model files (model and tokenizer config) to `hf-embedding-template/preprocessing/1/`2. Start Triton Server and attach shell
```
docker run --shm-size=16g --gpus all -it --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /hf-embedding-template:/models nvcr.io/nvidia/tritonserver:24.08-py3 bash
```3. Run inside the Triton Container
```
pip install transformerstritonserver --model-repository=/models
```4. Run client
```
pip install tritonclient[http]python client.py
```## helm charts
This repo comes with ready to run helm charts. They can be found under `/helm`. E.g. `text-embedder-trion` is readily configured to run a triton embedding server.