https://github.com/isabelleysseric/voice-cloning

Speech synthesis with conditioning on very small dataset. Using Nvidia's Tacotron2 and WaveGlow models with Pytorch.
https://github.com/isabelleysseric/voice-cloning

nvidia signal-processing speech-recognition speech-synthesis speechbrain tacotron2 text-to-speech tts waveglow

Last synced: 5 months ago
JSON representation

Speech synthesis with conditioning on very small dataset. Using Nvidia's Tacotron2 and WaveGlow models with Pytorch.

Host: GitHub
URL: https://github.com/isabelleysseric/voice-cloning
Owner: isabelleysseric
Created: 2023-12-20T07:14:42.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-09-10T21:29:29.000Z (10 months ago)
Last Synced: 2024-12-30T08:44:33.947Z (7 months ago)
Topics: nvidia, signal-processing, speech-recognition, speech-synthesis, speechbrain, tacotron2, text-to-speech, tts, waveglow
Language: Jupyter Notebook
Homepage:
Size: 28 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Voice Cloning

**Note**: The `Voice_cloning_Training_with_Tacotron2_and_WaveGlow.ipynb` notebook is to be run in *Google Colab*. Once in Colab, you need to import the *data_cleaned.zip* dataset into the current folder `/content/`.
Replace the files in the folder `/content/TTS-TT2/filelists/` with my files that have the same name after installing Tacotron2. LThe rest of the code will take care of unzipping it and putting it in the new folder `/content/TTS-TT2/wavs/`.
The program will then ask you to load your transcription file. You will give it the `list.txt` file.

The files in the `input` folder are needed to give input to the speech synthesis model. They are also found at the root of the project. The wav files correspond to the zip file: `data_cleaned.zip` and the `list.txt`, `ljs_audio_text_val_filelists.txt`, `ljs_audio_text_val_filelists.txt` and `ljs_audio_text_val_filelists.txt`files are also found at the root of the project.
The files in the `output` folder are the results of the model, during and after training.

**TREE**:

[input](https://github.com/isabelleysseric/voice-cloning/tree/main/input)

- [filelists](https://github.com/isabelleysseric/voice-cloning/tree/main/input/wavs)
- `list.txt`
- `ljs_audio_text_test_filelists.txt`
- `ljs_audio_text_train_filelists.txt`
- `ljs_audio_text_val_filelists.txt`

- [wavs](https://github.com/isabelleysseric/voice-cloning/tree/main/input/audio)
- `1.npy`
- `1.wav`
- ...
- `60.npy`
- `60.wav`

[output](https://github.com/isabelleysseric/voice-cloning/tree/main/output)

- [audio](https://github.com/isabelleysseric/voice-cloning/tree/main/output/audio)
- `model_BS_6_0.00003_350epoch_0_original_audio.wav`
- `model_BS_6_0.00003_350epoch_0_predicted_audio.wav`
- ...
- `model_BS_6_0.00003_350epoch_20_original_audio.wav`
- `model_BS_6_0.00003_350epoch_20_predicted_audio.wav`

- `model_BS_6_0.00003_350signals_epoch_0.png`
- ...
- `model_BS_6_0.00003_350signals_epoch_20.png`

- [images](https://github.com/isabelleysseric/voice-cloning/tree/main/output/images)
- `model_BS_6_0.00003_350_Alignment_Epoch_0_Iteration_9_Validation_Loss_1.7767614126205444.png`
- ...
- `model_BS_6_0.00003_350_Alignment_Epoch_20_Iteration_189_Validation_Loss_1.0240533351898193.png`

- [logs](https://github.com/isabelleysseric/voice-cloning/tree/main/output/logs)
- `events.out.tfevents.1703405636.c8a2ca7defbc.1806.11`

- [loss](https://github.com/isabelleysseric/voice-cloning/tree/main/output/loss)
- `model_BS_6_0.00003_350loss_curve_epoch_0.png`
- ...
- `model_BS_6_0.00003_350loss_curve_epoch_22.png`

- [spectrogram](https://github.com/isabelleysseric/voice-cloning/tree/main/output/spectrogram)
- `model_BS_6_0.00003_350spectrograms_epoch_0.png`
- ...
- `model_BS_6_0.00003_350spectrograms_epoch_20.png`

[Voice_cloning_Training_with_Tacotron2_and_WaveGlow.ipynb](https://github.com/isabelleysseric/voice-cloning/blob/main/Voice_cloning_Training_with_Tacotron2_and_WaveGlow.ipynb)
[MLSP Presentation_Clonage_de_la_voix.pdf](https://github.com/isabelleysseric/voice-cloning/blob/main/MLSP%20Presentation_Clonage_de_la_voix.pdf)
[MLSP_Rapport_Clonage_de_la_voix.pdf](https://github.com/isabelleysseric/voice-cloning/blob/main/MLSP_Rapport_Clonage_de_la_voix.pdf)
[README.md](https://github.com/isabelleysseric/voice-cloning/blob/main/README.md)
[data_cleaned.zip](https://github.com/isabelleysseric/voice-cloning/blob/main/data_cleaned.zip)
[list.txt](https://github.com/isabelleysseric/voice-cloning/blob/main/list.txt)
[ljs_audio_text_test_filelist.txt](https://github.com/isabelleysseric/voice-cloning/blob/main/ljs_audio_text_test_filelist.txt)
[ljs_audio_text_train_filelist.txt](https://github.com/isabelleysseric/voice-cloning/blob/main/ljs_audio_text_train_filelist.txt)
[ljs_audio_text_val_filelist.txt](https://github.com/isabelleysseric/voice-cloning/blob/main/ljs_audio_text_val_filelist.txt)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/isabelleysseric/voice-cloning

Awesome Lists containing this project

README

Voice Cloning