https://github.com/dataxujing/tensorrt-llm-chatglm3
:fire: 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM
https://github.com/dataxujing/tensorrt-llm-chatglm3
Last synced: 11 months ago
JSON representation
:fire: 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM
- Host: GitHub
- URL: https://github.com/dataxujing/tensorrt-llm-chatglm3
- Owner: DataXujing
- License: apache-2.0
- Created: 2024-02-21T00:50:01.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-26T03:49:33.000Z (over 2 years ago)
- Last Synced: 2025-04-04T13:23:05.838Z (about 1 year ago)
- Language: Python
- Size: 6.2 MB
- Stars: 26
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## 大模型加速部署:TensorRT-LLM, Triton Inference Server, vLLM, LangChain
### 基于ChatGLM3


+ ChatGLM3-6B的模型解析和HF部署(流式,非流式)
+ TensorRT-LLM的特性,安装以及大模型部署(流式,非流式)
+ Triton Inference Server的trtllm-backend, vllm-backend的部署
+ vLLM特性,安装及大模型部署
+ Langchain实现RAG(ChatGLM3-6B)
+ Langchain+TensorRT-LLM实现RAG
+ Langchain+Triton Inference Server实现RAG
+ Langchain+vLLM实现RAG
关于详细的slide介绍,请在issue中索要!