{"id":23448301,"url":"https://github.com/triton-inference-server/model_navigator","last_synced_at":"2025-05-15T20:07:32.198Z","repository":{"id":40663141,"uuid":"353778387","full_name":"triton-inference-server/model_navigator","owner":"triton-inference-server","description":"Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.","archived":false,"fork":false,"pushed_at":"2025-04-22T14:25:52.000Z","size":9598,"stargazers_count":200,"open_issues_count":3,"forks_count":26,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-05-15T14:35:23.199Z","etag":null,"topics":["deep-learning","gpu","inference"],"latest_commit_sha":null,"homepage":"https://triton-inference-server.github.io/model_navigator/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/triton-inference-server.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":"docs/support_matrix.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-04-01T17:35:30.000Z","updated_at":"2025-05-06T09:25:07.000Z","dependencies_parsed_at":"2024-01-29T13:46:25.365Z","dependency_job_id":"9955a4d0-5ce8-458b-835e-1c8c10aa5918","html_url":"https://github.com/triton-inference-server/model_navigator","commit_stats":null,"previous_names":[],"tags_count":51,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triton-inference-server%2Fmodel_navigator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triton-inference-server%2Fmodel_navigator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triton-inference-server%2Fmodel_navigator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triton-inference-server%2Fmodel_navigator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/triton-inference-server","download_url":"https://codeload.github.com/triton-inference-server/model_navigator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254364224,"owners_count":22058880,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","gpu","inference"],"created_at":"2024-12-23T22:14:54.163Z","updated_at":"2025-05-15T20:07:27.137Z","avatar_url":"https://github.com/triton-inference-server.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\nCopyright (c) 2021-2024, NVIDIA CORPORATION. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n--\u003e\n\n# Triton Model Navigator\n\nWelcome to [Triton Model Navigator](https://github.com/triton-inference-server/model_navigator), an inference toolkit designed\nfor optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs. The Triton Model Navigator streamlines the\nprocess of moving models and pipelines implemented in [PyTorch](https://pytorch.org),\n[TensorFlow](https://www.tensorflow.org), and/or [ONNX](https://onnx.ai)\nto [TensorRT](https://github.com/NVIDIA/TensorRT).\n\nThe Triton Model Navigator automates several critical steps, including model export, conversion, correctness testing, and\nprofiling. By providing a single entry point for various supported frameworks, users can efficiently search for the best\ndeployment option using the per-framework optimize function. The resulting optimized models are ready for deployment on\neither [PyTriton](https://github.com/triton-inference-server/pytriton)\nor [Triton Inference Server](https://github.com/triton-inference-server/server).\n\n## Features at Glance\n\nThe distinct capabilities of Triton Model Navigator are summarized in the feature matrix:\n\n| Feature                     | Description                                                                                                                                      |\n|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|\n| Ease-of-use                 | Single line of code to run all possible optimization paths directly from your source code                                                        |\n| Wide Framework Support      | Compatible with various machine learning frameworks including PyTorch, TensorFlow, and ONNX                                                      |\n| Models Optimization         | Enhance the performance of models such as ResNET and BERT for efficient inference deployment                                                     |\n| Pipelines Optimization      | Streamline Python code pipelines for models such as Stable Diffusion and Whisper using Inplace Optimization, exclusive to PyTorch                |\n| Model Export and Conversion | Automate the process of exporting and converting models between various formats with focus on TensorRT and Torch-TensorRT                        |\n| Correctness Testing         | Ensures the converted model produce correct outputs validating against the original model                                                        |\n| Performance Profiling       | Profiles models to select the optimal format based on performance metrics such as latency and throughput to optimize target hardware utilization |\n| Models Deployment           | Automates models and pipelines deployment on PyTriton and Triton Inference Server through dedicated API                                          |\n\n## Documentation\n\nLearn more about Triton Model Navigator features in [documentation](https://triton-inference-server.github.io/model_navigator).\n\n## Prerequisites\n\nBefore proceeding with the installation of Triton Model Navigator, ensure your system meets the following criteria:\n\n- **Operating System**: Linux (Ubuntu 20.04+ recommended)\n- **Python**: Version `3.8` or newer\n- NVIDIA GPU\n\nYou can use NGC Containers for PyTorch and TensorFlow which contain all necessary dependencies:\n\n- [PyTorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)\n- [TensorFlow](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow)\n\n## Install\n\nThe Triton Model Navigator can be installed from `pypi.org`.\n\n### Installing with PyTorch extras\n\nFor installing with PyTorch dependencies, use:\n\n```shell\npip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[torch]\n```\n\n### Installing with TensorFlow extras\n\nFor installing with TensorFlow dependencies, use:\n\n```shell\npip install -U --extra-index-url https://pypi.ngc.nvidia.com triton-model-navigator[tensorflow]\n```\n\n### Installing with onnxruntime-gpu for CUDA 11\n\nThe default CUDA version for ONNXRuntime since 1.19.0 is CUDA 12. To install with CUDA 11 support use following extra index url:\n```shell\n.. --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/ ..\n```\n\n## Quick Start\n\nThe quick start section provides examples of possible optimization and deployment paths provided in Triton Model Navigator.\n\n### Optimize Stable Diffusion with Inplace\n\nThe Inplace Optimize allows seamless optimization of models for deployment, such as converting\nthem to TensorRT, without requiring any changes to the original Python pipelines.\n\n\nThe below code presents Stable Diffusion pipeline optimization. But first, before you run the example install the required\npackages:\n\n```shell\npip install transformers diffusers torch\n```\n\nThen, initialize the pipeline and wrap the model components with `nav.Module`::\n\n```python\nimport model_navigator as nav\nfrom transformers.modeling_outputs import BaseModelOutputWithPooling\nfrom diffusers import DPMSolverMultistepScheduler, StableDiffusionPipeline\n\n\ndef get_pipeline():\n    # Initialize Stable Diffusion pipeline and wrap modules for optimization\n    pipe = StableDiffusionPipeline.from_pretrained(\"stabilityai/stable-diffusion-2-1\")\n    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)\n    pipe = pipe.to(\"cuda\")\n    pipe.text_encoder = nav.Module(\n        pipe.text_encoder,\n        name=\"clip\",\n        output_mapping=lambda output: BaseModelOutputWithPooling(**output),\n    )\n    pipe.unet = nav.Module(\n        pipe.unet,\n        name=\"unet\",\n    )\n    pipe.vae.decoder = nav.Module(\n        pipe.vae.decoder,\n        name=\"vae\",\n    )\n    return pipe\n```\n\nPrepare a simple dataloader:\n\n```python\n# Please mind, the first element in tuple need to be a batch size\ndef get_dataloader():\n    return [(1, \"a photo of an astronaut riding a horse on mars\")]\n```\n\nExecute model optimization:\n\n```python\npipe = get_pipeline()\ndataloader = get_dataloader()\n\nnav.optimize(pipe, dataloader)\n```\nOnce the pipeline has been optimized, you can load explicit the most performant version of the modules executing:\n\n```python\nnav.load_optimized()\n```\n\nAt this point, you can simply use the original pipeline to generate prediction with optimized models directly in Python:\n```python\npipe.to(\"cuda\")\n\nimages = pipe([\"a photo of an astronaut riding a horse on mars\"])\nimage = images[0][0]\n\nimage.save(\"an_astronaut_riding_a_horse.png\")\n```\n\nAn example of how to serve a Stable Diffusion pipeline through PyTriton can be found [here](https://github.com/triton-inference-server/pytriton/tree/main/examples/huggingface_stable_diffusion).\n\nPlease read [Error isolation when running Python script](#error-isolation-when-running-python-script) when you plan\nto place code in Python script.\n\n\n### Optimize ResNET and deploy on Triton\n\nTriton Model Navigator support also optimization path for deployment on Triton. This path is supported for nn.Module,\nkeras.Model or ONNX files which inputs are tensors.\n\nTo optimize ResNet50 model from TorchHub run the following code:\n\n```python\nimport torch\nimport model_navigator as nav\n\n# Initialize the model\nresnet50 = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_resnet50', pretrained=True).eval()\n\n# Wrap model in nav.Module\nresnet50 = nav.Module(resnet50, name=\"resnet50\")\n\n# Optimize Torch model loaded from TorchHub\nnav.optimize(resnet50, dataloader=[(1, [torch.randn(1, 3, 256, 256)])])\n```\n\nOnce optimization is done, creating a model store for deployment on Triton is simple as following code:\n\n```python\nimport pathlib\n\n# Generate the model store from optimized model\nresnet50.triton_model_store(\n    model_repository_path=pathlib.Path(\"model_repository\"),\n)\n```\n\nPlease read [Error isolation when running Python script](#error-isolation-when-running-python-script) when you plan\nto place code in Python script.\n\n### Profile any model or callable in Python\n\nTriton Model Navigator enhances models and pipelines and provides a uniform method for profiling any Python\nfunction, callable, or model. At present, our support is limited strictly to static batch profiling scenarios.\n\nAs an example, we will use a simple function that simply sleeps for 50ms:\n\n```python\nimport time\n\n\ndef custom_fn(input_):\n    # wait 50ms\n    time.sleep(0.05)\n    return input_\n```\n\nLet's provide a dataloader we will use for profiling:\n\n```python\n# Tuple of batch size and data sample\ndataloader = [(1, [\"This is example input\"])]\n```\n\nFinally, run the profiling of the function with prepared dataloader:\n\n```python\nnav.profile(custom_fn, dataloader)\n```\n\n## Error isolation when running Python script\n\n**Important**: Please review below section to prevent unexpected issues when running `optimize`.\n\nFor better error isolation, some conversions and exports are run in separate child processes using multiprocessing in\nthe `spawn` mode. This means that everything in a global scope will be run in a child process. You can encounter\nunexpected issue when the optimization code is place in Python script and executed as:\n```shell\npython optimize.py\n```\nTo prevent nested optimization, you have to either put the optimize code in:\n```python\nif __name__ == \"__main__\":\n    # optimization goes here\n```\nor\n```python\nimport multiprocessing as mp\nif mp.current_process().name == \"MainProcess\":\n    # optimization goes here\n```\n\nIf none of the above works for you, you can run all optimization in a single process at the cost of error isolation by\nsetting the following environment variable:\n```bash\nNAVIGATOR_USE_MULTIPROCESSING=False\n```\n\n## Examples\n\nWe offer comprehensive, step-by-step [guides](examples) that showcase the utilization of the Triton Model Navigator’s\ndiverse features. These guides are designed to elucidate the processes of optimization, profiling, testing, and\ndeployment of models using [PyTriton](https://github.com/triton-inference-server/pytriton) and [Triton Inference Server](https://github.com/triton-inference-server/server).\n\n## Useful Links\n\n* [Changelog](CHANGELOG.md)\n* [Support Matrix](docs/support_matrix.md)\n* [Known Issues](docs/known_issues.md)\n* [Contributing](CONTRIBUTING.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftriton-inference-server%2Fmodel_navigator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftriton-inference-server%2Fmodel_navigator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftriton-inference-server%2Fmodel_navigator/lists"}