https://github.com/dataxujing/tensorrt-llm-chatglm3

:fire: 大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM
https://github.com/dataxujing/tensorrt-llm-chatglm3

Last synced: 11 months ago
JSON representation

:fire: 大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM

Host: GitHub
URL: https://github.com/dataxujing/tensorrt-llm-chatglm3
Owner: DataXujing
License: apache-2.0
Created: 2024-02-21T00:50:01.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-02-26T03:49:33.000Z (over 2 years ago)
Last Synced: 2025-04-04T13:23:05.838Z (about 1 year ago)
Language: Python
Size: 6.2 MB
Stars: 26
Watchers: 1
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## 大模型加速部署：TensorRT-LLM, Triton Inference Server, vLLM, LangChain

### 基于ChatGLM3

![](./img/face.jpg)

![](./img/content.jpg)

+ ChatGLM3-6B的模型解析和HF部署（流式，非流式）
+ TensorRT-LLM的特性，安装以及大模型部署（流式，非流式）
+ Triton Inference Server的trtllm-backend, vllm-backend的部署
+ vLLM特性，安装及大模型部署
+ Langchain实现RAG(ChatGLM3-6B)
+ Langchain+TensorRT-LLM实现RAG
+ Langchain+Triton Inference Server实现RAG
+ Langchain+vLLM实现RAG

关于详细的slide介绍，请在issue中索要！

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dataxujing/tensorrt-llm-chatglm3

Awesome Lists containing this project

README