Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/KoljaB/LocalAIVoiceChat
Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.
https://github.com/KoljaB/LocalAIVoiceChat
chatbot python realtime
Last synced: 4 months ago
JSON representation
Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.
- Host: GitHub
- URL: https://github.com/KoljaB/LocalAIVoiceChat
- Owner: KoljaB
- License: other
- Created: 2023-11-04T18:55:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-12T08:56:33.000Z (6 months ago)
- Last Synced: 2024-08-12T10:13:23.153Z (6 months ago)
- Topics: chatbot, python, realtime
- Language: Python
- Homepage:
- Size: 1.42 MB
- Stars: 435
- Watchers: 9
- Forks: 44
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-local-llms - LocalAIVoiceChat
README
# Local AI Voice Chat
Provides talk in realtime with AI, completely local on your PC, with customizable AI personality and voice.
> **Hint:** *Anybody interested in state-of-the-art voice solutions please also have a look at [Linguflex](https://github.com/KoljaB/Linguflex). It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.*
> **Note:** If you run into 'General synthesis error: isin() received an invalid combination of arguments' error, this is due to new transformers library introducing an incompatibility to Coqui TTS (see [here](https://github.com/KoljaB/RealtimeTTS/issues/85)). Please downgrade to an older transformers version: `pip install transformers==4.38.2` or upgrade RealtimeTTS to latest version `pip install realtimetts==0.4.1`.
## About the Project
Integrates the powerful Zephyr 7B language model with real-time speech-to-text and text-to-speech libraries to create a fast and engaging voicebased local chatbot.
https://github.com/KoljaB/LocalAIVoiceChat/assets/7604638/cebacdad-8a57-4a03-bfd1-a469730dda51
> **Hint:** If you run into problems installing llama.cpp please also have a look into my [LocalEmotionalAIVoiceChat project](https://github.com/KoljaB/LocalEmotionalAIVoiceChat). It includes emotion-aware realtime text-to-speech output and has multiple LLM provider options. You can also use it with different AI models.
## Tech Stack
- **[llama_cpp](https://github.com/ggerganov/llama.cpp)** with Zephyr 7B
- library interface for llamabased language models
- **[RealtimeSTT](https://github.com/KoljaB/RealtimeSTT)** with faster_whisper
- real-time speech-to-text transcription library
- **[RealtimeTTS](https://github.com/KoljaB/RealtimeTTS)** with Coqui XTTS
- real-time text-to-speech synthesis library## Notes
This software is in an experimental alpha state and does not provide production ready stability. The current XTTS model used for synthesis still has glitches and also Zephyr - while really good for a 7B model - of course can not compete with the answer quality of GPT 4, Claude or Perplexity.
Please take this as a first attempt to provide an early version of a local realtime chatbot.
### Updates
- Update to Coqui XTTS 2.0 model
- Bugfix to RealtimeTTS (download of Coqui model did not work properly)### Prerequisites
You will need a GPU with around 8 GB VRAM to run this in real-time.
#### For nVidia users
- **NVIDIA CUDA Toolkit 11.8**:
- Access the [NVIDIA CUDA Toolkit Archive](https://developer.nvidia.com/cuda-11-8-0-download-archive).
- Choose version 11.x and follow the instructions for downloading and installation.- **NVIDIA cuDNN 8.7.0 for CUDA 11.x**:
- Navigate to [NVIDIA cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive).
- Locate and download "cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x".
- Follow the provided installation guide.#### For AMD users
- **Install ROCm v.5.7.1**
- Download [ROCm SDK version 5.7.1](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
- Follow the provided installation guide.- **FFmpeg**:
Install FFmpeg according to your operating system:
- **Ubuntu/Debian**:
```shell
sudo apt update && sudo apt install ffmpeg
```- **Arch Linux**:
```shell
sudo pacman -S ffmpeg
```- **macOS (Homebrew)**:
```shell
brew install ffmpeg
```- **Windows (Chocolatey)**:
```shell
choco install ffmpeg
```- **Windows (Scoop)**:
```shell
scoop install ffmpeg
```### Installation Steps
1. Clone the repository or download the source code package.
2. Install llama.cpp
- (for AMD users) Before the next step set env variable `LLAMA_HIPBLAS` value to `on`- Official way:
```python
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
```- If the official installation does not work for you, please install [text-generation-webui](https://github.com/oobabooga/text-generation-webui), which provides some excellent wheels for a lot of platforms and environments
3. Install realtime libraries
- Install the main libraries:
```python
pip install RealtimeSTT==0.1.7
pip install RealtimeTTS==0.2.7
```
4. Download zephyr-7b-beta.Q5_K_M.gguf from [here](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main).
- Open creation_params.json and enter the filepath to the downloaded model into `model_path`.
- Adjust n_gpu_layers (0-35, raise if you have more VRAM) and n_threads (number of CPU threads, i recommend not using all available cores but leave some for TTS)5. If dependency conflicts occur, install specific versions of conflicting libraries:
```python
pip install networkx==2.8.8
pip install typing_extensions==4.8.0
pip install fsspec==2023.6.0
pip install imageio==2.31.6
pip install numpy==1.24.3
pip install requests==2.31.0
```## Running the Application
python ai_voicetalk_local.py## Customize
### Change AI personality
Open chat_params.json to change the talk scenario.
### Change AI Voice
- Open ai_voicetalk_local.py.
- Find this line: coqui_engine = CoquiEngine(cloning_reference_wav="female.wav", language="en")
- Change "female.wav" to the filename of a wave file (44100 or 22050 Hz mono 16-bit) containing the voice to clone### Speech end detection
If the first sentence is transcribed before you get to the second one, raise post_speech_silence_duration on AudioToTextRecorder:
```
AudioToTextRecorder(model="tiny.en", language="en", spinner=False, post_speech_silence_duration = 1.5)
```
## ContributingContributions to enhance or improve the project are warmly welcomed. Feel free to open a pull request with your proposed changes or fixes.
## License
The project is under [Coqui Public Model License 1.0.0](https://coqui.ai/cpml).
This license allows only non-commercial use of a machine learning model and its outputs.
## Contact
Kolja Beigel
- Email: [[email protected]](mailto:[email protected])Feel free to reach out for any queries or support related to this project.