https://github.com/softwaremill/triton_playground
Triton Inference Server playground with different features to play around.
https://github.com/softwaremill/triton_playground
Last synced: 11 months ago
JSON representation
Triton Inference Server playground with different features to play around.
- Host: GitHub
- URL: https://github.com/softwaremill/triton_playground
- Owner: softwaremill
- License: apache-2.0
- Created: 2023-04-25T06:17:21.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-08T12:56:16.000Z (over 2 years ago)
- Last Synced: 2025-01-18T06:42:44.230Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Environment
The recommended way to setup environemnt is to create Python
virtual environment.
```bash
virtualenv -p python3.10 .venv
```
```bash
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
# Prepare model-repository
First, download the model from `torchhub` and save it as TorchScript and ONNX.
```bash
python3 utils/export.py
```
Next, activate docker conatiner and convert ONNX model to TensorRT.
```bash
docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt:22.04-py3
bash utils/convert_to_tensorrt.sh
exit
```
Move exported models into `model-repository`.
```bash
bash utils/move_models.sh
```
# Run Triton
Activate docker container with Triton server with mounted `model_repository` directory.
```bash
docker run -it --gpus all -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd)/model_repository:/models --shm-size 1024m --net=host nvcr.io/nvidia/tritonserver:22.04-py3
```
Next, install dependencies and run Triton:
```bash
pip install pillow torch transformers
tritonserver --model-repository=/models
```
# perf_analzyer
With the Triton running in another container run the command below to enter
appropriate container and run `perf_analyzer`:
```bash
docker run -it --gpus all -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):/workspace --net=host nvcr.io/nvidia/tritonserver:22.04-py3-sdk
perf_analyzer -m model_torchscript -b 1 --concurrency-range 1:4
```
# model_analzyer
To use the `model-analyzer` shutdown previously started Triton server.
Enter a docker container and run an analysis. All parameters for the
`model-analyzer` are described here:
https://github.com/triton-inference-server/model_analyzer/blob/main/docs/config.md.
```bash
docker run -it --gpus all -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):$(pwd) --shm-size 1024m --net=host nvcr.io/nvidia/tritonserver:22.04-py3-sdk
cd /home/... # go to the same path as You were in your host filesystem
model-analyzer profile --model-repository $(pwd)/model_repository --profile-models model_onnx --triton-launch-mode=docker --output-model-repository-path $(pwd)/output/ -f perf.yaml --override-output-model-repository
```
To generate a report follow instructions from `model-analyzer`.
# Benchmark
To reproduce a benchmark reported in a blogpost run `run_benchmark.sh`
script in a `utils` directory. Use the same configuration as in the
`perf_analyzer` section.