https://github.com/nvidia/tensorrt-llm

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
https://github.com/nvidia/tensorrt-llm

Last synced: 27 days ago
JSON representation

Host: GitHub
URL: https://github.com/nvidia/tensorrt-llm
Owner: NVIDIA
License: apache-2.0
Created: 2023-08-16T17:14:27.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-05-12T03:53:53.000Z (27 days ago)
Last Synced: 2025-05-12T04:13:53.610Z (27 days ago)
Language: C++
Homepage: https://nvidia.github.io/TensorRT-LLM
Size: 1.25 GB
Stars: 10,461
Watchers: 119
Forks: 1,420
Open Issues: 800
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nvidia/tensorrt-llm

Awesome Lists containing this project