https://github.com/zuellni/llasa-webui
LLaSA WebUI using ExLlamaV2 and FastAPI.
https://github.com/zuellni/llasa-webui
ai exllamav2 fastapi stt tts
Last synced: 12 months ago
JSON representation
LLaSA WebUI using ExLlamaV2 and FastAPI.
- Host: GitHub
- URL: https://github.com/zuellni/llasa-webui
- Owner: Zuellni
- License: mit
- Created: 2025-02-01T21:08:22.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-30T00:39:56.000Z (about 1 year ago)
- Last Synced: 2025-03-30T01:32:14.739Z (about 1 year ago)
- Topics: ai, exllamav2, fastapi, stt, tts
- Language: Python
- Homepage:
- Size: 1.09 MB
- Stars: 23
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLaSA WebUI
A simple web interface for [LLaSA](https://huggingface.co/collections/HKUSTAudio/llasa-679b87dbd06ac556cc0e0f44) using [ExLlamaV2](https://github.com/turboderp-org/exllamav2) with an [OpenAI](https://platform.openai.com/docs/guides/text-to-speech) compatible [FastAPI](https://github.com/fastapi/fastapi) server.
## Installation
Clone the repo:
```sh
git clone https://github.com/zuellni/llasa-webui
cd llasa-webui
```
Create a conda/mamba/python env:
```sh
conda create -n llasa-webui python
conda activate llasa-webui
```
Install dependencies, ignore any `xcodec2` errors:
```sh
pip install -r requirements.txt
pip install xcodec2 --no-deps
```
Install wheels for [`exllamav2`](https://github.com/turboderp-org/exllamav2/releases/latest) and [`flash-attn`](https://github.com/kingbri1/flash-attention/releases/latest):
```sh
pip install link-to-exllamav2-wheel-goes-here+cu124.torch2.6.0.whl
pip install link-to-flash-attn-wheel-goes-here+cu124.torch2.6.0.whl
```
## Models
LLaSA-1B:
```sh
git clone https://huggingface.co/hkustaudio/llasa-1b model # bf16
```
LLaSA-3B:
```sh
git clone https://huggingface.co/annuvin/llasa-3b-8.0bpw-h8-exl2 model # 8bpw
git clone https://huggingface.co/hkustaudio/llasa-3b model # bf16
```
LLaSA-8B:
```sh
git clone https://huggingface.co/annuvin/llasa-8b-4.0bpw-exl2 model # 4bpw
git clone https://huggingface.co/annuvin/llasa-8b-6.0bpw-exl2 model # 6bpw
git clone https://huggingface.co/annuvin/llasa-8b-8.0bpw-h8-exl2 model # 8bpw
git clone https://huggingface.co/hkustaudio/llasa-8b model # bf16
```
X-Codec-2:
```sh
git clone https://huggingface.co/annuvin/xcodec2-bf16 codec # bf16
git clone https://huggingface.co/annuvin/xcodec2-fp32 codec # fp32
```
## Usage
```sh
python server.py -m model -c codec -v voices
```
Add `--cache q4 --dtype bf16` for less [VRAM usage](https://www.canirunthisllm.net). You can specify a HuggingFace repo id for `xcodec2`, but you will still need to download one of the LLaSA models above.
## Preview
