https://github.com/speaches-ai/speaches

docker docker-compose faster-whisper openai-api openai-whisper openai-whisper-translation transcription whisper whisper-ai

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/speaches-ai/speaches
Owner: speaches-ai
License: mit
Created: 2024-05-18T02:26:20.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2025-04-11T22:49:57.000Z (2 months ago)
Last Synced: 2025-04-14T22:09:42.630Z (2 months ago)
Topics: docker, docker-compose, faster-whisper, openai-api, openai-whisper, openai-whisper-translation, transcription, whisper, whisper-ai
Language: Python
Homepage: https://speaches.ai/
Size: 1.97 MB
Stars: 1,695
Watchers: 24
Forks: 218
Open Issues: 79
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        > [!NOTE]

> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just ASR.

# Speaches

`speaches` is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for Text-to-Speech [piper](https://github.com/rhasspy/piper) and [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) are used. This project aims to be Ollama, but for TTS/STT models.

Try it out on the [HuggingFace Space](https://huggingface.co/spaces/speaches-ai/speaches)

See the documentation for installation instructions and usage: [speaches.ai](https://speaches.ai/)

## Features:

- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `speaches`.

- Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)

  - Generate a spoken audio summary of a body of text (text in, audio out)

  - Perform sentiment analysis on a recording (audio in, text out)

  - Async speech to speech interactions with a model (audio in, audio out)

- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).

- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.

- Text-to-Speech via `kokoro`(Ranked #1 in the [TTS Arena](https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena)) and `piper` models.

- GPU and CPU support.

- [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)

- [Highly configurable](https://speaches.ai/configuration/)

- [Realtime API](https://speaches.ai/configuration/)

Please create an issue if you find a bug, have a question, or a feature suggestion.

## Demo

### Streaming Transcription

TODO

### Speech Generation

https://github.com/user-attachments/assets/0021acd9-f480-4bc3-904d-831f54c4d45b

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/speaches-ai/speaches

Awesome Lists containing this project

README