https://github.com/skulux/voicetral
This repository contains an amateur implementation of an interface between the Ollama model and Applio's TTS and voice conversion services. It serves as a basic example of integrating speech recognition, text generation, and audio processing for personal or experimental use.
https://github.com/skulux/voicetral
ai applio audio-processing conversational-ai local natural-language ollama rvc speech-to-text stt text-generation text-to-speech tts voicetral
Last synced: 5 months ago
JSON representation
This repository contains an amateur implementation of an interface between the Ollama model and Applio's TTS and voice conversion services. It serves as a basic example of integrating speech recognition, text generation, and audio processing for personal or experimental use.
- Host: GitHub
- URL: https://github.com/skulux/voicetral
- Owner: Skulux
- License: mit
- Created: 2024-09-17T16:24:51.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-06T19:31:56.000Z (over 1 year ago)
- Last Synced: 2025-03-30T19:23:02.612Z (about 1 year ago)
- Topics: ai, applio, audio-processing, conversational-ai, local, natural-language, ollama, rvc, speech-to-text, stt, text-generation, text-to-speech, tts, voicetral
- Language: Python
- Homepage:
- Size: 16.6 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Voicetral
## Overview
This project provides an interface between the Ollama model and a text-to-speech (TTS) engine using Suno's Bark. It converts user speech input into text, generates responses using Ollama, and then synthesizes and plays back the response using Bark.
## Features
- Speech-to-text conversion using `speech_recognition`.
- Text generation using the Ollama model.
- Text-to-speech conversion using Bark.
- Audio playback using `sounddevice`.
- Audio resampling and processing with `pydub`.
## Requirements
### Software Dependencies
- Python 3.9
- [FFmpeg](https://ffmpeg.org/download.html) (for audio processing)
- **Ollama**: A model service for text generation. [Visit Ollama's website](https://ollama.com) for installation and usage instructions.
- **Bark**: Used for text-to-speech synthesis.
### Python Packages
The required Python packages are listed in `requirements.txt`. To install them, use the following command:
pip install -r requirements.txt
### Configuration
1. **FFmpeg**: Ensure that FFmpeg is installed and accessible in your system's PATH. You can download FFmpeg from [here](https://ffmpeg.org/download.html) and follow the installation instructions for your operating system.
2. **Ollama**: Install and run the Ollama service according to the instructions on their website. Make sure it's accessible at the specified URL.
3. **Bark**: Run the provided `setup_bark.sh` script to download the required models.
4. **Configuration File**: Update the `config.ini` file with the appropriate paths and settings for your environment.
- `START_PROMPT`: Your initial prompt for the Ollama model.
- `OLLAMA_MODEL`: The name of the Ollama model to use.
- `BARK_VOICE_PRESET`: Voice preset to use with Bark.
- `TTS_OUTPUT_PATH`: Path where the TTS output will be saved.
## Installation
1. Clone the repository:
```bash
git clone https://github.com/Skulux/Voicetral
cd Voicetral
```
2. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
```
3. Install the required packages:
```bash
pip install -r requirements.txt
```
4. Run the Bark setup script to download the models:
```bash
./setup_bark.sh
```
5. Ensure FFmpeg is installed and properly configured in your PATH.
6. Install and start the Ollama service as per its instructions.
## Usage
1. Configure your `config.ini` file with the necessary settings as described in the Configuration section.
2. Run the main script:
```bash
python main.py
```
3. Follow the on-screen prompts. Speak into your microphone to interact with the bot.
4. Say "exit" to stop the program. It is important if you want to save your conversation history.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
Feel free to submit issues or pull requests if you have suggestions or improvements. For significant changes, please open an issue first to discuss what you would like to change.
## Contact
For questions or feedback, please contact github@petrilionis.lt or open an issue on the project's GitHub repository.
## External Services
- **Ollama**: [Installation and usage instructions](https://ollama.com)