https://github.com/darthph0enix7/tts_and_speech_recognition

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/darthph0enix7/tts_and_speech_recognition
Owner: Darthph0enix7
Created: 2024-08-15T16:10:34.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-09-07T19:41:42.000Z (10 months ago)
Last Synced: 2025-02-15T20:51:37.528Z (5 months ago)
Language: Jupyter Notebook
Size: 23.6 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Voice Synthesizer - TTSStreamer

This project is a Text-to-Speech (TTS) synthesizer that clones voices based on short audio clips, such as famous personalities like Morgan Freeman. It generates speech in the style of the target voice using a custom trained model and allows real-time streaming of generated audio with buffering. The project utilizes `pydub` for audio processing and `TTS` models for speech synthesis.

## Features

- **Clone Voice:** Synthesize a target voice based on a short audio clip.

- **Streaming Audio Playback:** Play generated audio with real-time buffering.

- **Adjustable Speed & Language:** Modify the speed and language of the generated audio.

- **GPU Support:** Leverage CUDA for faster audio generation using GPUs.

## Installation

1. Clone the repository:

   ```bash

   git clone https://github.com/your-repo/voice-synthesizer.git

   cd voice-synthesizer

   ```

2. Install dependencies:

   ```bash

   pip install pydub soundfile torch TTS

   ```

3. Download or prepare your pre-trained TTS model and place it in the `XTTS-v2` directory.

## Usage

### Initialize TTSStreamer

The `TTSStreamer` class handles the loading of the TTS model and the generation of audio from text.

```python

from tts_streamer import TTSStreamer

tts_streamer = TTSStreamer(

    model_path="XTTS-v2", 

    config_path="XTTS-v2/config.json", 

    vocab_path="XTTS-v2/vocab.json"

)

```

### Stream Text as Audio

Provide text input to stream audio with real-time playback.

```python

text = """

A series of extracts from Middle English texts of different dates and types...

"""

tts_streamer.stream_audio_with_buffering(

    text, 

    language="en", 

    speed=1.3, 

    speaker_path="XTTS-v2/samples/morgan_freeman.wav",

    fireup_delay=5.0

)

```

This will generate audio in the cloned voice style, buffer it, and play it with a 5-second delay before starting playback.

## Project Structure

- `TTSStreamer`: Main class that handles model loading, text processing, and audio generation.

- `XTTS-v2/`: Directory for storing model files, configuration, and vocabulary.

- `samples/`: Example audio files used to clone voices.

## Requirements

- Python 3.8+

- [PyDub](https://github.com/jiaaro/pydub)

- [TTS](https://github.com/coqui-ai/TTS)

- GPU (CUDA support recommended)

## Example

Here’s an example of the generated output using Morgan Freeman’s voice:

- **Original Clip**: `samples/morgan_freeman_original.mp3`

- **Cloned Voice**: `samples/morgan_freeman_cloned.mp3`

## License

This project is licensed under the MIT License.

## Acknowledgements

Special thanks to the developers of the [Coqui TTS](https://github.com/coqui-ai/TTS) framework for providing open-source tools for text-to-speech synthesis.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/darthph0enix7/tts_and_speech_recognition

Awesome Lists containing this project

README