https://github.com/second-state/qwen3_audio_api

OpenAI compatible API servers for the Qwen3 TTS models
https://github.com/second-state/qwen3_audio_api

Last synced: 4 days ago
JSON representation

OpenAI compatible API servers for the Qwen3 TTS models

Host: GitHub
URL: https://github.com/second-state/qwen3_audio_api
Owner: second-state
License: apache-2.0
Created: 2026-01-26T21:36:25.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-05-19T22:10:58.000Z (12 days ago)
Last Synced: 2026-05-20T01:39:23.837Z (12 days ago)
Language: Rust
Homepage:
Size: 229 KB
Stars: 82
Watchers: 2
Forks: 13
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Qwen3 Audio API

OpenAI-compatible API servers for [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) and [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR), enabling self-hosted text-to-speech and speech-to-text via the standard OpenAI audio endpoints:

- `/v1/audio/speech` — Text-to-speech (TTS)

- `/v1/audio/transcriptions` — Speech-to-text (ASR)

## Implementations

| Language | Directory | Status |

|----------|-----------|--------|

| Python | [python/](python/) | Available |

| Rust | [rust/](rust/) | Available |

The **Python** implementation is a FastAPI server built on the `qwen-tts` and `qwen-asr` Python packages. See [python/README.md](python/README.md) for setup, Docker images, API reference, and usage examples.

The **Rust** implementation is a high-performance axum/tokio server built on the [qwen3_tts](https://github.com/second-state/qwen3_tts_rs) and [qwen3_asr](https://github.com/second-state/qwen3_asr_rs) Rust crates, with libtorch (Linux) and MLX (macOS Apple Silicon) backends. See [rust/README.md](rust/README.md) for setup, pre-built binaries, API reference, and usage examples.

## Features

- **Text-to-Speech (TTS)**: Generate natural speech from text using Qwen3-TTS models

  - Multiple voice presets (Vivian, Ryan, Serena, etc.)

  - Voice cloning from audio samples

  - Multiple languages (English, Chinese, Japanese, Korean, and more)

  - Multiple output formats (WAV, MP3, FLAC, Opus, AAC)

- **Speech-to-Text (ASR)**: Transcribe audio to text using Qwen3-ASR models

  - 30+ language support with auto-detection

  - Accepts various audio formats (WAV, MP3, M4A, etc.)

## Why

The purpose of these API servers is to provide self-hosted, free backend audio services for projects such as:

- [OpenClaw](https://github.com/openclaw/openclaw) — AI agent

- [EchoKit](https://github.com/second-state/echokit_server) — Voice AI device

- [Olares](https://github.com/beclab/Olares) — Personal AI cloud OS

- [GaiaNet](https://github.com/GaiaNet-AI/gaianet-node) — Incentivized AI agent network and marketplace

Any application that speaks the OpenAI audio API can swap in this server as a drop-in replacement.

## License

See [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/second-state/qwen3_audio_api

Awesome Lists containing this project

README