https://github.com/NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://github.com/NVIDIA/TensorRT-LLM

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/NVIDIA/TensorRT-LLM
Owner: NVIDIA
License: apache-2.0
Created: 2023-08-16T17:14:27.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-03-14T02:23:55.000Z (9 months ago)
Last Synced: 2025-03-14T03:35:25.502Z (9 months ago)
Language: C++
Homepage: https://nvidia.github.io/TensorRT-LLM
Size: 908 MB
Stars: 9,701
Watchers: 102
Forks: 1,149
Open Issues: 518
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-LLM-Productization - TensorRT-LLM - an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. (Models and Tools / LLM Deployment)
awesome-repositories - NVIDIA/TensorRT-LLM - TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT (C++)
awesome-llmops - TensorRT-LLM - LLM.svg?style=flat-square) | (Serving / Large Model Serving)
StarryDivineSky - NVIDIA/TensorRT-LLM - LLM 为用户提供了一个易于使用的 Python API，用于定义大型语言模型（LLM）并构建包含最先进优化的 TensorRT 引擎，以便在 NVIDIA GPU 上高效执行推理。TensorRT-LLM还包含用于创建Python的组件，以及执行这些TensorRT引擎的C++运行时。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
Awesome-LLM - TensorRT-LLM - Nvidia Framework for LLM Inference (LLM Deployment)
Awesome-LLM-Compression - [Code
alan_awesome_llm - TensorRT-LLM - LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. (推理 Inference)
awesome-llm-and-aigc - TensorRT-LLM - LLM?style=social"/> : TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. [nvidia.github.io/TensorRT-LLM](https://nvidia.github.io/TensorRT-LLM) (Summary)
awesome-yolo-object-detection - TensorRT-LLM - LLM?style=social"/> : TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. [nvidia.github.io/TensorRT-LLM](https://nvidia.github.io/TensorRT-LLM) (Lighter and Deployment Frameworks)
awesome-LLM-resources - TensorRT-LLM - LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. (推理 Inference)
Awesome-LLMOps - TensorRT-LLM - LLM) | ![GitHub Release](https://img.shields.io/github/v/release/NVIDIA/TensorRT-LLM) | ![GitHub contributors](https://img.shields.io/github/contributors/NVIDIA/TensorRT-LLM) | TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.|| (Inference)
awesome-ai-papers - [TensorRT-LLM - inference-server/server)\]\[[Dynamo](https://github.com/ai-dynamo/dynamo)\]\[[GenerativeAIExamples](https://github.com/NVIDIA/GenerativeAIExamples)\]\[[TensorRT-Model-Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer)\]\[[TensorRT](https://github.com/NVIDIA/TensorRT)\]\[[kvpress](https://github.com/NVIDIA/kvpress)\]\[[OpenVINO](https://github.com/openvinotoolkit/openvino)\] (NLP / 3. Pretraining)
AiTreasureBox - NVIDIA/TensorRT-LLM - 11-03_12012_0](https://img.shields.io/github/stars/NVIDIA/TensorRT-LLM.svg)|TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.| (Repos)
awesome-local-llms - TensorRT-LLM - LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way. | 11,535 | 1,722 | 1,139 | 248 | 39 | Apache License 2.0 | 0 days, 8 hrs, 8 mins | (Open-Source Local LLM Projects)
awesome-cuda-and-hpc - TensorRT-LLM - LLM?style=social"/> : TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. [nvidia.github.io/TensorRT-LLM](https://nvidia.github.io/TensorRT-LLM) (Frameworks)
awesome-llm-tools - TensorRT-LLM
awesome-local-ai - Tensorrt-llm - Inference efficiently on NVIDIA GPUs | Python / C++ runtimes | Both | ❌ | Python/C++ | Text-Gen | (Inference Engine)
awesome - NVIDIA/TensorRT-LLM - TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT (C++)
awesome-llm-resources - TensorRT-LLM/FastTransformer

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/NVIDIA/TensorRT-LLM

Awesome Lists containing this project