Projects in Awesome Lists tagged with voice-cloning

https://github.com/corentinj/real-time-voice-cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

deep-learning python pytorch tensorflow tts voice-cloning

Last synced: 22 Apr 2025

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

deep-learning python pytorch tensorflow tts voice-cloning

Last synced: 14 Mar 2025

https://github.com/rvc-boss/gpt-sovits

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

text-to-speech tts vits voice-clone voice-cloneai voice-cloning

Last synced: 22 Apr 2025

https://github.com/coqui-ai/tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

deep-learning glow-tts hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis tacotron text-to-speech tts tts-model vocoder voice-cloning voice-conversion voice-synthesis

Last synced: 22 Apr 2025

https://github.com/coqui-ai/TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

deep-learning glow-tts hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis tacotron text-to-speech tts tts-model vocoder voice-cloning voice-conversion voice-synthesis

Last synced: 14 Mar 2025

https://github.com/RVC-Boss/GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

text-to-speech tts vits voice-clone voice-cloneai voice-cloning

Last synced: 24 Mar 2025

https://github.com/funaudiollm/cosyvoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

audio-generation cantonese chatbot chatgpt chinese cosyvoice cross-lingual english fine-grained fine-tuning gpt-4o japanese korean multi-lingual natural-language-generation python text-to-speech tts voice-cloning

Last synced: 22 Apr 2025

https://github.com/huanshere/videolingo

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

ai-translation dubbing localization video-translation voice-cloning

Last synced: 23 Apr 2025

https://github.com/paddlepaddle/paddlespeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper

Last synced: 21 Apr 2025

https://github.com/PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

asr code-switch conformer kws punctuation-restoration self-supervised-learning sound-classification speech-alignment speech-recognition speech-synthesis speech-translation streaming-asr streaming-tts transformer tts vocoder voice-cloning voice-recognition wav2vec2 whisper

Last synced: 24 Mar 2025

https://github.com/drewthomasson/ebook2audiobook

Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning. Supports 1,107+ languages!

audiobooks chinese colab-notebook docker english epub gradio kaggle linux mac multilingual tts voice-cloning windows xtts

Last synced: 23 Apr 2025

https://github.com/FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

audio-generation cantonese chatbot chatgpt chinese cosyvoice cross-lingual english fine-grained fine-tuning gpt-4o japanese korean multi-lingual natural-language-generation python text-to-speech tts voice-cloning

Last synced: 24 Mar 2025

https://github.com/Huanshere/VideoLingo

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

ai-translation dubbing localization video-translation voice-cloning

Last synced: 18 Dec 2024

https://github.com/multimodal-art-projection/yue

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

ai audio-generation deep-learning foundation-models gpt huggingface llama llms music-generation style-transfers voice-cloning

Last synced: 15 Apr 2025

https://github.com/abus-aikorea/voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

audiobook faster-whisper gradio karaoke podcasts speech-recognition speech-synthesis speech-to-text subtitles text-to-speech transcription translator tts voice-cloning voice-conversion webui whisper whisperx yt-dlp

Last synced: 13 Apr 2025

https://github.com/camb-ai/mars5-tts

MARS5 speech model (TTS) from CAMB.AI

prosody speech speech-synthesis text-to-speech voice-cloneai voice-cloning

Last synced: 11 Apr 2025

https://github.com/iahispano/applio

A simple, high-quality voice conversion tool focused on ease of use and performance.

ai applio pytorch rvc speech speech-to-speech text-to-speech tts vc vits voice voice-clone voice-cloning voice-conversion

Last synced: 10 Apr 2025

https://github.com/IAHispano/Applio

A simple, high-quality voice conversion tool focused on ease of use and performance

ai applio pytorch rvc speech speech-to-speech text-to-speech tts vc vits voice voice-clone voice-cloning voice-conversion

Last synced: 14 Nov 2024

https://github.com/coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

speech-emotion-recognition speech-processing speech-recognition speech-separation speech-synthesis speech-to-text stt text-to-speech tts voice-activity-detection voice-cloning voice-recognition

Last synced: 26 Mar 2025

https://github.com/DrewThomasson/ebook2audiobook

Generates an audiobook with chapters and ebook metadata using Calibre and Xtts from Coqui tts, and with optional voice cloning, and supports multiple languages

audiobooks chinese docker english epub gradio linux mac multilingual tts voice-cloning windows xtts

Last synced: 15 Dec 2024

https://github.com/gitmylo/audio-webui

A webui for different audio related Neural Networks

ai aio all-in-one artificial-intelligence audiocraft audioldm bark bark-gui generative-audio generative-music music rvc rvc-gui text-to-audio text-to-speech tts voice-cloning

Last synced: 14 Mar 2025

https://github.com/Tomiinek/Multilingual_Text_to_Speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

code-switching multilingual speech-synthesis text-to-speech tts voice-cloning

Last synced: 03 Apr 2025

https://github.com/wladradchenko/wunjo.wladradchenko.ru

Wunjo AI: Synthesize & clone voices in English, Russian & Chinese, real-time speech recognition, deepfake face & lips animation, face swap with one photo, change video by text prompts, segmentation, and retouching. Open-source, local & free.

controlnet deepfake deepfake-emotion deepfakes diffusion face-swap face-swapping free image-animation retouching-video segment-anything tacotron2 talking-face talking-face-generation talking-head tts vid2vid voice-cloning voice-recognition wunjo

Last synced: 17 Nov 2024

https://github.com/PlayVoice/lora-svc

singing voice change based on whisper, and lora for singing voice clone

lora singing-voice-conversion speech-to-sing uni-svc vits vits-svc voice-change voice-cloning voice-conversion whisper

Last synced: 11 Apr 2025

https://github.com/playvoice/lora-svc

singing voice change based on whisper, and lora for singing voice clone

lora singing-voice-conversion speech-to-sing uni-svc vits vits-svc voice-change voice-cloning voice-conversion whisper

Last synced: 04 Apr 2025

https://github.com/jackaduma/CycleGAN-VC2

Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2

aigc cyclegan cyclegan-vc cyclegan-vc2 deep-learning deeplearning gan pix2pix pytorch-implementation speech-synthesis voice-cloning voice-conversion

Last synced: 11 Apr 2025

https://github.com/jackaduma/cyclegan-vc2

Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2

aigc cyclegan cyclegan-vc cyclegan-vc2 deep-learning deeplearning gan pix2pix pytorch-implementation speech-synthesis voice-cloning voice-conversion

Last synced: 05 Apr 2025

https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

Phoneme multilingual(Russian-English) voice cloning based on

deep-learning g2p pytorch russian tacotron tensorflow tts voice-cloning wavernn

Last synced: 27 Nov 2024

https://github.com/HKoon/ChatTTS-OpenVoice

Fuse ChatTTS with OpenVoice, upload a 10-second audio clip, and clone your personalized ChatTTS voice.

chattts openvoice tts voice-clone voice-cloning

Last synced: 11 Apr 2025

https://github.com/boltzmannentropy/xtts2-ui

A User Interface for XTTS-2 Text-Based Voice Cloning using only 10 seconds of speech

coqui-tts streamlit tts voice-cloning

Last synced: 12 Apr 2025

https://github.com/lukaszliniewicz/Pandrator

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.

audiobook audiobook-creator audiobook-maker audiobooks customtkinterprojects dubbing llm pdf-to-audio rvc silero subtitle-to-speech subtitle-to-voice text-processing text-to-speech tkinter-gui voice-clone voice-cloning voicecraft xtts xttsv2

Last synced: 25 Jan 2025

https://github.com/FlorianEagox/WeeaBlind

A program to dub non-english media with modern AI speech synthesis, diarization, and voice cloning!

a11y accessibility anime blindness diariz dubbing python tts voice-cloning

Last synced: 20 Nov 2024

https://github.com/CMsmartvoice/One-Shot-Voice-Cloning

:relaxed: One Shot Voice Cloning base on Unet-TTS

one-shot style-transfer tts voice-cloning

Last synced: 27 Apr 2025

https://github.com/MiniMax-AI/MiniMax-MCP

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

image-generation image-to-video mcp mcp-server mcp-tools text-to-image text-to-speech text-to-video video-generation voice-cloning

Last synced: 27 Apr 2025

https://github.com/dunky11/voicesmith

[WIP] VoiceSmith makes training text to speech models easy.

dataset-manager delightfultts preprocessing speech-synthesis text-to-speech toolkit tts univnet voice-cloning

Last synced: 10 Jan 2025

https://github.com/aifsh/comfyui-gpt_sovits

a comfyui custom node for GPT-SoVITS! you can voice cloning and tts in comfyui now

gpt-sovits tts voice-cloning

Last synced: 14 Apr 2025

https://github.com/smoke-trees/voice-synthesis

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

keras pytorch-implementation speech-to-text sv2tts tensorflow voice-cloning voice-synthesis

Last synced: 22 Nov 2024

https://github.com/jackaduma/cyclegan-vc3

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

aigc cyclegan cyclegan-vc cyclegan-vc2 cyclegan-vc3 gan pytorch pytorch-implementation voice-cloning voice-conversion

Last synced: 27 Apr 2025

https://github.com/adiksondev/youtranslate

Takes a youtube video, clones the voice and re-creates that video in a different language

ai collaborate elevenlabs-api github localization-tool translation voice-cloning voice-recognition youtube

Last synced: 24 Jan 2025

https://github.com/olaviinha/neuraltexttoaudio

Text prompt steered synthetic audio generators

audio audio-generation audio-processing audio-synthesis audioldm colab colab-notebook mubert mubertai music-generation text2audio text2music voice-cloning voice-synthesis

Last synced: 14 Jan 2025

https://github.com/gooofy/zerovox

zero-shot realtime TTS system, fully offline, free and open source

deep-learning hifigan melgan multi-speaker-tts python pytorch speaker-encoder speaker-encodings speech speech-synthesis text-to-speech tts tts-model voice-cloning voice-synthesis

Last synced: 14 Apr 2025

https://github.com/pnkvalavala/digitaltwin

Using a single image and just 10 seconds of sample audio, our project enables you to create a video where it appears as if you're speaking the desired text.

audio-driven-talking-face deep-fake talking-face-generation text-to-speech voice-cloning

Last synced: 25 Jan 2025

https://github.com/ttop32/coqui_tts_korea

Korean TTS using coqui TTS (glowtts and multiband melgan) - 한국어 TTS

coqui coqui-ai deep-learning glow-tts half-life korea korean korean-language korean-letters korean-text-processing korean-tokenizer korean-tts multiband-melgan pytorch speech speech-synthesis text-to-speech tts vocoder voice-cloning

Last synced: 11 Nov 2024

https://github.com/nateraw/voice-cloning

Make Kanye sing any song ya want 🎤🔥

gradio huggingface kanye so-vits-svc voice-cloning voice-conversion

Last synced: 23 Apr 2025

https://github.com/pnkvalavala/multivoice

Multivoice: Enhance your foreign-language movie and TV show experience with personalized dubbed versions. Our project uses voice cloning and TTS to deliver natural and engaging dubbed dialogue for a seamless viewing adventure.

elevenlabs movies openai text-to-speech translation tv-shows voice-cloning

Last synced: 25 Jan 2025

https://github.com/ardagnsrn/elevenlabs-laravel

This is an Open Source PHP Laravel package for ElevenLabs Text to Speech API.

ai ai-speech ai-tts elevenlabs elevenlabs-api elevenlabs-laravel elevenlabs-php laravel text-to-speech tts tts-ai tts-api voice-cloneai voice-cloning

Last synced: 12 Apr 2025

https://github.com/adhadse/deepdubpy

A complete end-to-end Deep Learning system to generate high quality human like speech in English for Korean Drama (WIP)

cross-language deep-learning korean-drama machine-learning speech-sythesis tensorflow text-to-speech voice-cloning

Last synced: 17 Dec 2024

https://github.com/aryanvbw/aivoiceclone

Transform Your Voice: Replicate Your Unique Sound in a Pristine Pre-Trained Model and Cultivate Your Custom Voiceprint

ai ai-tools artificial-intelligence aryanshop aryanvbw audio-processing clonevoice vivek voice-cloning voice-imitation

Last synced: 14 Apr 2025

https://github.com/expectopatronm/Realtime-voice-cloning-as-a-microservice

SV2TTS as a Microservice (FastAPI endpoint)

fast-api voice-cloning

Last synced: 11 Apr 2025

https://github.com/ardagnsrn/elevenlabs-js

This is an Open Source NodeJS package for ElevenLabs Text to Speech API.

ai ai-speech ai-tts elevenlabs elevenlabs-api elevenlabs-js elevenlabs-node text-to-speech tts tts-ai tts-api voice-cloneai voice-cloning

Last synced: 12 Apr 2025

https://github.com/iahispano/applio-api

Robust functionality, focused on granting convenient access to AI models developed using the RVC technology.

ai api applio rvc vc voice voice-clone voice-cloning

Last synced: 12 Apr 2025

https://github.com/mobile-artificial-intelligence/babylon.cpp

Babylon.cpp is a C and C++ library for grapheme to phoneme conversion and text to speech synthesis. For phonemization a ONNX runtime port of the DeepPhonemizer model is used. For speech synthesis VITS models are used. Piper models are compatible after a conversion script is run.

11labs artificial-intelligence deep-phonemizer elevenlabs g2p grapheme-to-phoneme neural-tts onnx onnx-models onnx-runtime onnxruntime phonemization test-to-speech tts vits voice-cloning

Last synced: 19 Apr 2025

https://github.com/isaiahbjork/csm-voice-cloning

Sesame CSM 1B Voice Cloning

ai modal python voice-cloning

Last synced: 15 Mar 2025

https://github.com/codename0og/codename-rvc-fork-3

Codename's rvc fork version 3, based on Applio.

ai applio pytorch speech speech-to-speech text-to-speech tts vc vits voice voice-cloning voice-conversion

Last synced: 29 Dec 2024

https://github.com/hrishikesh-gavai/nerv-translate

Problem Statement: Developing A Software For Dubbing Videos.

2024 cloning dubbing imitation problem-statement project python smart-india-hackathon text-to-speech voice-cloning voice-imitation voice-recognition

Last synced: 11 Apr 2025

https://github.com/richardn2002/shizuka-app

Data curation, training and deployment of VITS model(s) of 好本静, Yoshimoto Shizuka, from 君のことが大大大大大好きな100人.

shizuka tts vits voice-cloning yoshimoto

Last synced: 11 Apr 2025

https://github.com/deezer/real-cloned-singer-id

Repository for the ISMIR 2024 Paper "From Real to Cloned Singer Identification".

deepfake-detection music-information-retrieval singer-id source-separation voice-cloning

Last synced: 10 Feb 2025

https://github.com/veeeetzzzz/mars5-tts

Python implementation for the MARS5 TTS repo that allows you to clone a voice with a command line interface.

text-to-speech voice-cloning

Last synced: 11 Jan 2025

https://github.com/thismodernday/f5-tts

F5-TTS is a web application that allows users to clone voices and generate text-to-speech audio using advanced AI models.

ai tts voice-cloning

Last synced: 20 Nov 2024

https://github.com/coffee-expert/intelligent-transspeaker-

A service designed to translate speeches in multimedia using AI and ML voice cloning technology.

landing-page transcription translation voice-cloning

Last synced: 16 Mar 2025

https://github.com/spaceforgets-code/ai-voice-cloning-tool

A script that clones voices using AI and deep learning.

ai artificial-neural-networks clone clonevoice dub flask generative-ai script-generation speech-recognition voice-clone voice-cloneai voice-cloning voice-model voiceclone

Last synced: 07 Mar 2025

https://github.com/flo-bit/youtube-speaker-separation

simple python script that outputs separate audio files for each speaker in a youtube video, using whisper on replicate

speaker-diarization speech-to-text text-to-speech voice-cloning whisper youtube

Last synced: 06 Apr 2025

https://github.com/falkyn7/text-toolkit

Advanced MCP server providing comprehensive text transformation and formatting tools. TextToolkit offers over 40 specialized utilities for case conversion, encoding/decoding, formatting, analysis, and text manipulation - all accessible directly within your AI assistant workflow.

bert conformer glow-tts library melgan multi-speaker-tts nltk speech tacotron text-to-speech tts-model tty voice-cloning whisper