https://github.com/yinqiwen/lmsf
https://github.com/yinqiwen/lmsf
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/yinqiwen/lmsf
- Owner: yinqiwen
- Created: 2023-06-29T08:19:44.000Z (over 2 years ago)
- Default Branch: rust
- Last Pushed: 2024-04-08T11:38:44.000Z (over 1 year ago)
- Last Synced: 2025-03-29T04:25:16.598Z (7 months ago)
- Language: Cuda
- Size: 1.07 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Rust LLM Serving Framework
## Features
- Paged Attention
- Continuous Batch
- Quantization
- awq
- squeezellm
- Models
- llama
- gemma
- chatglm
# Getting Started
**Examples**
```sh
$ cargo run --release --example llm_engine_example -- --model --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024
```
**API Server**
```sh
$ cargo build --release
$ ./target/release/entrypoints --model --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024 --host 0.0.0.0 --port 8000
```