https://github.com/nvidia/tensorrt-llm
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
https://github.com/nvidia/tensorrt-llm
Last synced: 27 days ago
JSON representation
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
- Host: GitHub
- URL: https://github.com/nvidia/tensorrt-llm
- Owner: NVIDIA
- License: apache-2.0
- Created: 2023-08-16T17:14:27.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-05-12T03:53:53.000Z (27 days ago)
- Last Synced: 2025-05-12T04:13:53.610Z (27 days ago)
- Language: C++
- Homepage: https://nvidia.github.io/TensorRT-LLM
- Size: 1.25 GB
- Stars: 10,461
- Watchers: 119
- Forks: 1,420
- Open Issues: 800
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS