https://github.com/speaches-ai/speaches
https://github.com/speaches-ai/speaches
docker docker-compose faster-whisper openai-api openai-whisper openai-whisper-translation transcription whisper whisper-ai
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/speaches-ai/speaches
- Owner: speaches-ai
- License: mit
- Created: 2024-05-18T02:26:20.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-04-11T22:49:57.000Z (6 months ago)
- Last Synced: 2025-04-14T22:09:42.630Z (6 months ago)
- Topics: docker, docker-compose, faster-whisper, openai-api, openai-whisper, openai-whisper-translation, transcription, whisper, whisper-ai
- Language: Python
- Homepage: https://speaches.ai/
- Size: 1.97 MB
- Stars: 1,695
- Watchers: 24
- Forks: 218
- Open Issues: 79
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - speaches-ai/speaches - AI/Speaches 是一个开源项目,旨在提供端到端的语音识别和语音合成解决方案。它使用PyTorch构建,支持多种语言,并提供预训练模型。该项目的主要特点包括:高质量的语音识别和合成效果,易于使用的API,以及可扩展的架构。Speaches的核心工作原理是利用深度学习模型,例如Transformer和Conformer,来将语音转换为文本,或将文本转换为语音。用户可以通过简单的命令或Python代码来使用Speaches,进行语音识别、语音合成和语音翻译等任务。该项目还提供了一些实用工具,例如语音数据增强和模型微调,以帮助用户提高模型的性能。Speaches适用于各种应用场景,例如语音助手、语音搜索和语音翻译等。项目鼓励社区贡献,并提供详细的文档和示例,方便用户上手和使用。 (语音识别与合成_其他 / 资源传输下载)
README
> [!NOTE]
> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just ASR.# Speaches
`speaches` is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for Text-to-Speech [piper](https://github.com/rhasspy/piper) and [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) are used. This project aims to be Ollama, but for TTS/STT models.
Try it out on the [HuggingFace Space](https://huggingface.co/spaces/speaches-ai/speaches)
See the documentation for installation instructions and usage: [speaches.ai](https://speaches.ai/)
## Features:
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `speaches`.
- Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
- Text-to-Speech via `kokoro`(Ranked #1 in the [TTS Arena](https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena)) and `piper` models.
- GPU and CPU support.
- [Deployable via Docker Compose / Docker](https://speaches.ai/installation/)
- [Highly configurable](https://speaches.ai/configuration/)
- [Realtime API](https://speaches.ai/configuration/)Please create an issue if you find a bug, have a question, or a feature suggestion.
## Demo
### Streaming Transcription
TODO
### Speech Generation
https://github.com/user-attachments/assets/0021acd9-f480-4bc3-904d-831f54c4d45b