{"id":13456475,"url":"https://github.com/fishaudio/fish-speech","last_synced_at":"2025-05-13T16:03:42.798Z","repository":{"id":205085992,"uuid":"702796244","full_name":"fishaudio/fish-speech","owner":"fishaudio","description":"SOTA Open Source TTS","archived":false,"fork":false,"pushed_at":"2025-04-12T14:01:16.000Z","size":18899,"stargazers_count":20970,"open_issues_count":34,"forks_count":1676,"subscribers_count":124,"default_branch":"main","last_synced_at":"2025-05-06T16:07:30.356Z","etag":null,"topics":["llama","transformer","tts","valle","vits","vqgan","vqvae"],"latest_commit_sha":null,"homepage":"https://speech.fish.audio","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fishaudio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-10T03:16:51.000Z","updated_at":"2025-05-06T15:38:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"8b48df0f-9ba3-4ec9-b5b9-96c717db79a3","html_url":"https://github.com/fishaudio/fish-speech","commit_stats":{"total_commits":593,"total_committers":49,"mean_commits":12.10204081632653,"dds":0.2613827993254637,"last_synced_commit":"f8a57fb61fec684612d5477705280b644c48a291"},"previous_names":["fishaudio/fish-speech"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishaudio%2Ffish-speech","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishaudio%2Ffish-speech/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishaudio%2Ffish-speech/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishaudio%2Ffish-speech/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fishaudio","download_url":"https://codeload.github.com/fishaudio/fish-speech/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253504558,"owners_count":21918831,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llama","transformer","tts","valle","vits","vqgan","vqvae"],"created_at":"2024-07-31T08:01:22.726Z","updated_at":"2025-05-13T16:03:42.776Z","avatar_url":"https://github.com/fishaudio.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eFish Speech\u003c/h1\u003e\n\n**English** | [简体中文](docs/README.zh.md) | [Portuguese](docs/README.pt-BR.md) | [日本語](docs/README.ja.md) | [한국어](docs/README.ko.md) \u003cbr\u003e\n\n\u003ca href=\"https://www.producthunt.com/posts/fish-speech-1-4?embed=true\u0026utm_source=badge-featured\u0026utm_medium=badge\u0026utm_souce=badge-fish\u0026#0045;speech\u0026#0045;1\u0026#0045;4\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=488440\u0026theme=light\" alt=\"Fish\u0026#0032;Speech\u0026#0032;1\u0026#0046;4 - Open\u0026#0045;Source\u0026#0032;Multilingual\u0026#0032;Text\u0026#0045;to\u0026#0045;Speech\u0026#0032;with\u0026#0032;Voice\u0026#0032;Cloning | Product Hunt\" style=\"width: 250px; height: 54px;\" width=\"250\" height=\"54\" /\u003e\n\u003c/a\u003e\n\u003ca href=\"https://trendshift.io/repositories/7014\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://trendshift.io/api/badge/repositories/7014\" alt=\"fishaudio%2Ffish-speech | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"/\u003e\n\u003c/a\u003e\n\u003cbr\u003e\n\u003c/div\u003e\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"https://count.getloli.com/get/@fish-speech?theme=asoul\" /\u003e\u003cbr\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca target=\"_blank\" href=\"https://discord.gg/Es5qTB9BcN\"\u003e\n        \u003cimg alt=\"Discord\" src=\"https://img.shields.io/discord/1214047546020728892?color=%23738ADB\u0026label=Discord\u0026logo=discord\u0026logoColor=white\u0026style=flat-square\"/\u003e\n    \u003c/a\u003e\n    \u003ca target=\"_blank\" href=\"https://hub.docker.com/r/fishaudio/fish-speech\"\u003e\n        \u003cimg alt=\"Docker\" src=\"https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square\u0026logo=docker\"/\u003e\n    \u003c/a\u003e\n    \u003ca target=\"_blank\" href=\"https://huggingface.co/spaces/fishaudio/fish-speech-1\"\u003e\n        \u003cimg alt=\"Huggingface\" src=\"https://img.shields.io/badge/🤗%20-space%20demo-yellow\"/\u003e\n    \u003c/a\u003e\n    \u003ca target=\"_blank\" href=\"https://pd.qq.com/s/bwxia254o\"\u003e\n      \u003cimg alt=\"QQ Channel\" src=\"https://img.shields.io/badge/QQ-blue?logo=tencentqq\"\u003e\n    \u003c/a\u003e\n\u003c/div\u003e\n\nThis codebase is released under Apache License and all model weights are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.\n\n---\n## Fish Agent\nWe are very excited to announce that we have made our self-research agent demo open source, you can now try our agent demo for instant English and Chinese chat locally by following the [docs](https://speech.fish.audio/start_agent/).\n\nYou should mention that the content is released under a **CC BY-NC-SA 4.0 licence**. And the demo is an early alpha test version, the inference speed needs to be optimised, and there are a lot of bugs waiting to be fixed. If you've found a bug or want to fix it, we'd be very happy to receive an issue or a pull request.\n\n## Features\n### Fish Speech\n\n1. **Zero-shot \u0026 Few-shot TTS:** Input a 10 to 30-second vocal sample to generate high-quality TTS output. **For detailed guidelines, see [Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**\n\n2. **Multilingual \u0026 Cross-lingual Support:** Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.\n\n3. **No Phoneme Dependency:** The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.\n\n4. **Highly Accurate:** Achieves a low CER (Character Error Rate) and WER (Word Error Rate) of around 2% for 5-minute English texts.\n\n5. **Fast:** With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.\n\n6. **WebUI Inference:** Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.\n\n7. **GUI Inference:** Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. [See GUI](https://github.com/AnyaCoder/fish-speech-gui).\n\n8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows and MacOS, minimizing speed loss.\n\n### Fish Agent\n1. **Completely End to End:** Automatically integrates ASR and TTS parts, no need to plug-in other models, i.e., true end-to-end, not three-stage (ASR+LLM+TTS).\n\n2. **Timbre Control:** Can use reference audio to control the speech timbre.\n\n3. **Emotional:** The model can generate speech with strong emotion.\n\n## Disclaimer\n\nWe do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.\n\n## Online Demo\n\n[Fish Audio](https://fish.audio)\n\n[Fish Agent](https://fish.audio/demo/live)\n\n## Quick Start for Local Inference \n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/fishaudio/fish-speech/blob/main/inference.ipynb)\n\n## Videos\n\n#### V1.5 Demo Video: [Watch the video on X (Twitter).](https://x.com/FishAudio/status/1864370933496205728)\n\n## Documents\n\n- [English](https://speech.fish.audio/)\n- [中文](https://speech.fish.audio/zh/)\n- [日本語](https://speech.fish.audio/ja/)\n- [Portuguese (Brazil)](https://speech.fish.audio/pt/)\n\n## Samples (2024/10/02 V1.4)\n\n- [English](https://speech.fish.audio/samples/)\n- [中文](https://speech.fish.audio/zh/samples/)\n- [日本語](https://speech.fish.audio/ja/samples/)\n- [Portuguese (Brazil)](https://speech.fish.audio/pt/samples/)\n\n## Credits\n\n- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)\n- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)\n- [GPT VITS](https://github.com/innnky/gpt-vits)\n- [MQTTS](https://github.com/b04901014/MQTTS)\n- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)\n- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)\n\n## Tech Report (V1.4)\n```bibtex\n@misc{fish-speech-v1.4,\n      title={Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis},\n      author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},\n      year={2024},\n      eprint={2411.01156},\n      archivePrefix={arXiv},\n      primaryClass={cs.SD},\n      url={https://arxiv.org/abs/2411.01156},\n}\n```\n\n## Sponsor\n\n\u003cdiv\u003e\n  \u003ca href=\"https://6block.com/\"\u003e\n    \u003cimg src=\"https://avatars.githubusercontent.com/u/60573493\" width=\"100\" height=\"100\" alt=\"6Block Avatar\"/\u003e\n  \u003c/a\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://6block.com/\"\u003eData Processing sponsor by 6Block\u003c/a\u003e\n\u003c/div\u003e\n","funding_links":[],"categories":["Python","🤖 AI \u0026 Machine Learning","App","Text-to-Speech (TTS)","Projects","语音合成","精选文章","End2End Speech Dialogue System","Repos","📋 Contents","TTS Models"],"sub_categories":["Open-Source Models \u0026 Libraries","Global AI Projects","网络服务_其他","文字转语音","Model","🧠 2. Open Foundation Models"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffishaudio%2Ffish-speech","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffishaudio%2Ffish-speech","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffishaudio%2Ffish-speech/lists"}