{"id":13404238,"url":"https://github.com/neonbjb/tortoise-tts","last_synced_at":"2025-05-12T03:48:08.222Z","repository":{"id":37326196,"uuid":"452939314","full_name":"neonbjb/tortoise-tts","owner":"neonbjb","description":"A multi-voice TTS system trained with an emphasis on quality","archived":false,"fork":false,"pushed_at":"2024-11-19T18:59:13.000Z","size":55498,"stargazers_count":14058,"open_issues_count":340,"forks_count":1958,"subscribers_count":178,"default_branch":"main","last_synced_at":"2025-05-01T13:54:42.437Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neonbjb.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-01-28T04:33:15.000Z","updated_at":"2025-05-01T12:33:50.000Z","dependencies_parsed_at":"2024-12-09T14:20:06.456Z","dependency_job_id":null,"html_url":"https://github.com/neonbjb/tortoise-tts","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonbjb%2Ftortoise-tts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonbjb%2Ftortoise-tts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonbjb%2Ftortoise-tts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neonbjb%2Ftortoise-tts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neonbjb","download_url":"https://codeload.github.com/neonbjb/tortoise-tts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253672389,"owners_count":21945474,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T19:01:41.321Z","updated_at":"2025-05-12T03:48:08.164Z","avatar_url":"https://github.com/neonbjb.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Audio","Repos","\u003cspan id=\"speech\"\u003eSpeech\u003c/span\u003e","Inbox: Text-to-speech (TTS) and avatars","📦 Legacy \u0026 Inactive Projects","HarmonyOS","语言资源库","Python","语音合成","Text-to-Speech (TTS)","3.4 Further Links on Audio Synthesis and Detection","Learning","Open Source TTS Libraries","🎙 Voice \u0026 Audio Tools","Voice \u0026 Multimodal (local) (16)","Audio \u0026 Voice Assistants","Voice AI"],"sub_categories":["Speech","\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","Creative Uses of Generative AI Image Synthesis Tools","Windows Manager","python","网络服务_其他","Open-Source Models \u0026 Libraries","Difference between Watermarking and Cryptography","Repositories","Text-to-speech","Python Libraries","Text-to-Speech","Human-in-the-Loop Agents"],"readme":"# TorToiSe\n\nTortoise is a text-to-speech program built with the following priorities:\n\n1. Strong multi-voice capabilities.\n2. Highly realistic prosody and intonation.\n   \nThis repo contains all the code needed to run Tortoise TTS in inference mode.\n\nManuscript: https://arxiv.org/abs/2305.07243\n## Hugging Face space\n\nA live demo is hosted on Hugging Face Spaces. If you'd like to avoid a queue, please duplicate the Space and add a GPU. Please note that CPU-only spaces do not work for this demo.\n\nhttps://huggingface.co/spaces/Manmay/tortoise-tts\n\n## Install via pip\n```bash\npip install tortoise-tts\n```\n\nIf you would like to install the latest development version, you can also install it directly from the git repository:\n\n```bash\npip install git+https://github.com/neonbjb/tortoise-tts\n```\n\n## What's in a name?\n\nI'm naming my speech-related repos after Mojave desert flora and fauna. Tortoise is a bit tongue in cheek: this model\nis insanely slow. It leverages both an autoregressive decoder **and** a diffusion decoder; both known for their low\nsampling rates. On a K80, expect to generate a medium sized sentence every 2 minutes.\n\nwell..... not so slow anymore now we can get a **0.25-0.3 RTF** on 4GB vram and with streaming we can get \u003c **500 ms** latency !!! \n\n## Demos\n\nSee [this page](http://nonint.com/static/tortoise_v2_examples.html) for a large list of example outputs.\n\nA cool application of Tortoise + GPT-3 (not affiliated with this repository): https://twitter.com/lexman_ai. Unfortunately, this project seems no longer to be active.\n\n## Usage guide\n\n### Local installation\n\nIf you want to use this on your own computer, you must have an NVIDIA GPU.\n\n\u003e [!TIP]\n\u003e On Windows, I **highly** recommend using the Conda installation method. I have been told that if you do not do this, you will spend a lot of time chasing dependency problems.\n\nFirst, install miniconda: https://docs.conda.io/en/latest/miniconda.html\n\nThen run the following commands, using anaconda prompt as the terminal (or any other terminal configured to work with conda)\n\nThis will:\n1. create conda environment with minimal dependencies specified\n1. activate the environment\n1. install pytorch with the command provided here: https://pytorch.org/get-started/locally/\n1. clone tortoise-tts\n1. change the current directory to tortoise-tts\n1. run tortoise python setup install script\n\n```shell\nconda create --name tortoise python=3.9 numba inflect\nconda activate tortoise\nconda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia\nconda install transformers=4.29.2\ngit clone https://github.com/neonbjb/tortoise-tts.git\ncd tortoise-tts\npython setup.py install\n```\n\nOptionally, pytorch can be installed in the base environment, so that other conda environments can use it too. To do this, simply send the `conda install pytorch...` line before activating the tortoise environment.\n\n\u003e [!NOTE]  \n\u003e When you want to use tortoise-tts, you will always have to ensure the `tortoise` conda environment is activated.\n\nIf you are on windows, you may also need to install pysoundfile: `conda install -c conda-forge pysoundfile`\n\n### Docker\n\nAn easy way to hit the ground running and a good jumping off point depending on your use case.\n\n```sh\ngit clone https://github.com/neonbjb/tortoise-tts.git\ncd tortoise-tts\n\ndocker build . -t tts\n\ndocker run --gpus all \\\n    -e TORTOISE_MODELS_DIR=/models \\\n    -v /mnt/user/data/tortoise_tts/models:/models \\\n    -v /mnt/user/data/tortoise_tts/results:/results \\\n    -v /mnt/user/data/.cache/huggingface:/root/.cache/huggingface \\\n    -v /root:/work \\\n    -it tts\n```\nThis gives you an interactive terminal in an environment that's ready to do some tts. Now you can explore the different interfaces that tortoise exposes for tts.\n\nFor example:\n\n```sh\ncd app\nconda activate tortoise\ntime python tortoise/do_tts.py \\\n    --output_path /results \\\n    --preset ultra_fast \\\n    --voice geralt \\\n    --text \"Time flies like an arrow; fruit flies like a bananna.\"\n```\n\n## Apple Silicon\n\nOn macOS 13+ with M1/M2 chips you need to install the nighly version of PyTorch, as stated in the official page you can do:\n\n```shell\npip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu\n```\n\nBe sure to do that after you activate the environment. If you don't use conda the commands would look like this:\n\n```shell\npython3.10 -m venv .venv\nsource .venv/bin/activate\npip install numba inflect psutil\npip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu\npip install transformers\ngit clone https://github.com/neonbjb/tortoise-tts.git\ncd tortoise-tts\npip install .\n```\n\nBe aware that DeepSpeed is disabled on Apple Silicon since it does not work. The flag `--use_deepspeed` is ignored.\nYou may need to prepend `PYTORCH_ENABLE_MPS_FALLBACK=1` to the commands below to make them work since MPS does not support all the operations in Pytorch.\n\n\n### do_tts.py\n\nThis script allows you to speak a single phrase with one or more voices.\n```shell\npython tortoise/do_tts.py --text \"I'm going to speak this\" --voice random --preset fast\n```\n### do socket streaming\n```socket server\npython tortoise/socket_server.py \n```\nwill listen at port 5000\n\n\n### faster inference read.py\n\nThis script provides tools for reading large amounts of text.\n\n```shell\npython tortoise/read_fast.py --textfile \u003cyour text to be read\u003e --voice random\n```\n\n### read.py\n\nThis script provides tools for reading large amounts of text.\n\n```shell\npython tortoise/read.py --textfile \u003cyour text to be read\u003e --voice random\n```\n\nThis will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series\nof spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and\noutput that as well.\n\nSometimes Tortoise screws up an output. You can re-generate any bad clips by re-running `read.py` with the --regenerate\nargument.\n\n### API\n\nTortoise can be used programmatically, like so:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech()\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\nTo use deepspeed:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(use_deepspeed=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\nTo use kv cache:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(kv_cache=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\nTo run model in float16:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(half=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\nfor Faster runs use all three:\n\n```python\nreference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]\ntts = api.TextToSpeech(use_deepspeed=True, kv_cache=True, half=True)\npcm_audio = tts.tts_with_preset(\"your text here\", voice_samples=reference_clips, preset='fast')\n```\n\n## Acknowledgements\n\nThis project has garnered more praise than I expected. I am standing on the shoulders of giants, though, and I want to\ncredit a few of the amazing folks in the community that have helped make this happen:\n\n- Hugging Face, who wrote the GPT model and the generate API used by Tortoise, and who hosts the model weights.\n- [Ramesh et al](https://arxiv.org/pdf/2102.12092.pdf) who authored the DALLE paper, which is the inspiration behind Tortoise.\n- [Nichol and Dhariwal](https://arxiv.org/pdf/2102.09672.pdf) who authored the (revision of) the code that drives the diffusion model.\n- [Jang et al](https://arxiv.org/pdf/2106.07889.pdf) who developed and open-sourced univnet, the vocoder this repo uses.\n- [Kim and Jung](https://github.com/mindslab-ai/univnet) who implemented univnet pytorch model.\n- [lucidrains](https://github.com/lucidrains) who writes awesome open source pytorch models, many of which are used here.\n- [Patrick von Platen](https://huggingface.co/patrickvonplaten) whose guides on setting up wav2vec were invaluable to building my dataset.\n\n## Notice\n\nTortoise was built entirely by the author (James Betker) using their own hardware. Their employer was not involved in any facet of Tortoise's development.\n\n## License\n\nTortoise TTS is licensed under the Apache 2.0 license.\n\nIf you use this repo or the ideas therein for your research, please cite it! A bibtex entree can be found in the right pane on GitHub.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneonbjb%2Ftortoise-tts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneonbjb%2Ftortoise-tts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneonbjb%2Ftortoise-tts/lists"}