{"id":18936908,"url":"https://github.com/boltzmannentropy/xtts2-ui","last_synced_at":"2025-05-16T08:00:21.714Z","repository":{"id":209165733,"uuid":"723377084","full_name":"BoltzmannEntropy/xtts2-ui","owner":"BoltzmannEntropy","description":"A User Interface for XTTS-2 Text-Based Voice Cloning using only 10 seconds of speech","archived":false,"fork":false,"pushed_at":"2024-12-06T16:51:50.000Z","size":5246,"stargazers_count":337,"open_issues_count":17,"forks_count":52,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-12T04:47:57.714Z","etag":null,"topics":["coqui-tts","streamlit","tts","voice-cloning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BoltzmannEntropy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-25T13:29:03.000Z","updated_at":"2025-04-11T09:02:36.000Z","dependencies_parsed_at":"2024-01-25T17:29:25.650Z","dependency_job_id":"b8f7291b-cc9b-49fb-8467-928de739abf4","html_url":"https://github.com/BoltzmannEntropy/xtts2-ui","commit_stats":null,"previous_names":["boltzmannentropy/xtts2-ui"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BoltzmannEntropy%2Fxtts2-ui","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BoltzmannEntropy%2Fxtts2-ui/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BoltzmannEntropy%2Fxtts2-ui/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BoltzmannEntropy%2Fxtts2-ui/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BoltzmannEntropy","download_url":"https://codeload.github.com/BoltzmannEntropy/xtts2-ui/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254493382,"owners_count":22080126,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coqui-tts","streamlit","tts","voice-cloning"],"created_at":"2024-11-08T12:09:12.774Z","updated_at":"2025-05-16T08:00:21.506Z","avatar_url":"https://github.com/BoltzmannEntropy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# XTTS-2-UI: A User Interface for XTTS-2 Text-Based Voice Cloning\n\nThis repository contains the essential code for cloning any voice using just text and a 10-second audio sample of the target voice. XTTS-2-UI is simple to setup and use. [Example Results 🔊](#examples)\n\nWorks in [16 languages](#language-support) and has in-built voice recording/uploading. \nNote: Don't expect EL level quality, it is not there yet. \n\n## Model \nThe model used is `tts_models/multilingual/multi-dataset/xtts_v2`. For more details, refer to [Hugging Face - XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and its specific version [XTTS-v2 Version 2.0.2](https://huggingface.co/coqui/XTTS-v2/tree/v2.0.2).\n\n\u003ch1 align=\"center\"\u003e    \n  \u003cimg src=\"demo_info/ui.png\" width=\"100%\"\u003e\u003c/a\u003e  \n\u003c/h1\u003e\n\n## Table of Contents\n\n- [XTTS-2-UI: A User Interface for XTTS-2 Text-Based Voice Cloning](#xtts-2-ui-a-user-interface-for-xtts-2-text-based-voice-cloning)\n  - [Model](#model)\n  - [Table of Contents](#table-of-contents)\n  - [Setup](#setup)\n  - [Inference](#inference)\n  - [Target Voices Dataset](#target-voices-dataset)\n  - [Sample Audio Examples:](#sample-audio-examples)\n  - [Language Support](#language-support)\n  - [Notes](#notes)\n  - [Credits](#credits)\n\n## Setup\n\nTo set up this project, follow these steps in a terminal:\n\n1. **Clone the Repository**\n\n    - Clone the repository to your local machine.\n      ```bash\n      git clone https://github.com/pbanuru/xtts2-ui.git\n      cd xtts2-ui\n      ```\n\n2. **Create a Virtual Environment:**\n   - Run the following command to create a Python virtual environment:\n     ```bash\n     python -m venv venv\n     ```\n   - Activate the virtual environment:\n     - Windows:\n       ```bash\n       # cmd prompt\n       venv\\Scripts\\activate\n       ```\n       or\n       \n       ```bash\n       # git bash\n       source venv/Scripts/activate\n       ```\n     - Linux/Mac:\n       ```bash\n       source venv/bin/activate\n       ```\n\n3. **Install PyTorch:**\n   \n   - If you have an Nvidia CUDA-Enabled GPU, choose the appropriate PyTorch installation command:\n     - Before installing PyTorch, check your CUDA version by running:\n       ```bash\n       nvcc --version\n       ```\n     - For CUDA 12.1:\n       ```bash\n       pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\n       ```\n     - For CUDA 11.8:\n       ```bash\n       pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n       ```\n   - If you don't have a CUDA-enabled GPU,:\n     Follow the instructions on the [PyTorch website](https://pytorch.org/get-started/locally/) to install the appropriate version of PyTorch for your system.\n\n4. **Install Other Required Packages:**\n   - Install direct dependencies:\n     ```bash\n     pip install -r requirements.txt\n     ```\n   - Upgrade the TTS package to the latest version:\n     ```bash\n     pip install --upgrade TTS\n     ```\n\n\n     \n\nAfter completing these steps, your setup should be complete and you can start using the project.\n\nModels will be downloaded automatically upon first use.\n\nDownload paths:\n- MacOS: `/Users/USR/Library/Application Support/tts/tts_models--multilingual--multi-dataset--xtts_v2`\n- Windows: `C:\\Users\\ YOUR-USER-ACCOUNT \\AppData\\Local\\tts\\tts_models--multilingual--multi-dataset--xtts_v2`\n- Linux: `/home/${USER}/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2`\n\n\n\n## Inference\nTo run the application:\n\n```\npython app.py\nOR\nstreamlit run app2.py \n```\nOr, You can also run from the terminal itself, by providing sample input texts on texts.json and generate multiple audios with multiple speakers, (you may need to adjust on appTerminal.py)\n```\npython appTerminal.py\n```\nOn initial use, you will need to agree to the terms:\n\n```\n[XTTS] Loading XTTS...\n \u003e tts_models/multilingual/multi-dataset/xtts_v2 has been updated, clearing model cache...\n \u003e You must agree to the terms of service to use this model.\n | \u003e Please see the terms of service at https://coqui.ai/cpml.txt\n | \u003e \"I have read, understood and agreed to the Terms and Conditions.\" - [y/n]\n | | \u003e\n ```\n\nIf your model is re-downloading each run, please consult [Issue 4723 on GitHub](https://github.com/oobabooga/text-generation-webui/issues/4723#issuecomment-1826120220).\n\n## Target Voices Dataset\nThe dataset consists of a single folder named `targets`, pre-populated with several voices for testing purposes.\n\nTo add more voices (if you don't want to go through the GUI), create a 24KHz WAV file of approximately 10 seconds and place it under the `targets` folder. \nYou can use yt-dlp to download a voice from YouTube for cloning:\n```\nyt-dlp -x --audio-format wav \"https://www.youtube.com/watch?\"\n```\n\n\n## Sample Audio Examples:\n\n| Language | Audio Sample Link |\n|----------|-------------------|\n| English  | [▶️](demo_info/Rogger_sample_en.wav) |\n| Russian  | [▶️](demo_info/Rogger_sample_ru.wav) |\n| Arabic   | [▶️](demo_info/Rogger_sample_aa.wav) |\n\n## Language Support\nArabic, Chinese, Czech, Dutch, English, French, German, Hungarian, Italian, Japanese[ (see setup)](#notes), Korean, Polish, Portuguese, Russian, Spanish, Turkish\n\n## Notes\nIf you would like to select **Japanese** as the target language, you must install a dictionary.\n```bash\n# Lite version\npip install fugashi[unidic-lite]\n```\nor for more serious processing:\n```bash\n# Full version\npip install fugashi[unidic]\npython -m unidic download\n```\nMore details [here](https://github.com/polm/fugashi#installing-a-dictionary).\n\n\n## Credits\n1. Heavily based on https://github.com/kanttouchthis/text_generation_webui_xtts/ \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboltzmannentropy%2Fxtts2-ui","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboltzmannentropy%2Fxtts2-ui","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboltzmannentropy%2Fxtts2-ui/lists"}