Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fishaudio/fish-speech
Brand new TTS solution
https://github.com/fishaudio/fish-speech
llama transformer tts valle vits vqgan vqvae
Last synced: 7 days ago
JSON representation
Brand new TTS solution
- Host: GitHub
- URL: https://github.com/fishaudio/fish-speech
- Owner: fishaudio
- License: other
- Created: 2023-10-10T03:16:51.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-18T06:38:18.000Z (18 days ago)
- Last Synced: 2024-10-18T22:09:42.764Z (17 days ago)
- Topics: llama, transformer, tts, valle, vits, vqgan, vqvae
- Language: Python
- Homepage: https://speech.fish.audio
- Size: 17.5 MB
- Stars: 13,430
- Watchers: 93
- Forks: 1,005
- Open Issues: 58
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome - fishaudio/fish-speech - Brand new TTS solution (Python)
- AiTreasureBox - fishaudio/fish-speech - 11-02_13726_7](https://img.shields.io/github/stars/fishaudio/fish-speech.svg)|Brand new TTS solution| (Repos)
- StarryDivineSky - fishaudio/fish-speech
README
Fish Speech
**English** | [简体中文](docs/README.zh.md) | [Portuguese](docs/README.pt-BR.md) | [日本語](docs/README.ja.md)
This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.
---
## Features
1. **Zero-shot & Few-shot TTS:** Input a 10 to 30-second vocal sample to generate high-quality TTS output. **For detailed guidelines, see [Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
2. **Multilingual & Cross-lingual Support:** Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
3. **No Phoneme Dependency:** The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.
4. **Highly Accurate:** Achieves a low CER (Character Error Rate) and WER (Word Error Rate) of around 2% for 5-minute English texts.
5. **Fast:** With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.
6. **WebUI Inference:** Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.
7. **GUI Inference:** Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. [See GUI](https://github.com/AnyaCoder/fish-speech-gui).
8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows and MacOS, minimizing speed loss.
## Disclaimer
We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
## Online Demo
[Fish Audio](https://fish.audio)
## Quick Start for Local Inference
[inference.ipynb](/inference.ipynb)
## Videos
#### V1.4 Demo Video: [Youtube](https://www.youtube.com/watch?v=Ghc8cJdQyKQ)
## Documents
- [English](https://speech.fish.audio/)
- [中文](https://speech.fish.audio/zh/)
- [日本語](https://speech.fish.audio/ja/)
- [Portuguese (Brazil)](https://speech.fish.audio/pt/)## Samples (2024/10/02 V1.4)
- [English](https://speech.fish.audio/samples/)
- [中文](https://speech.fish.audio/zh/samples/)
- [日本語](https://speech.fish.audio/ja/samples/)
- [Portuguese (Brazil)](https://speech.fish.audio/pt/samples/)## Credits
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)## Sponsor