{"id":21019669,"url":"https://github.com/koljab/localaivoicechat","last_synced_at":"2025-04-04T12:09:15.626Z","repository":{"id":205505580,"uuid":"714387282","full_name":"KoljaB/LocalAIVoiceChat","owner":"KoljaB","description":"Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.","archived":false,"fork":false,"pushed_at":"2024-08-12T08:56:33.000Z","size":1488,"stargazers_count":599,"open_issues_count":12,"forks_count":69,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-03-28T11:11:18.161Z","etag":null,"topics":["chatbot","python","realtime"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KoljaB.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-04T18:55:06.000Z","updated_at":"2025-03-28T09:40:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"f5e35554-dbc2-4460-b832-2a0414efcb5c","html_url":"https://github.com/KoljaB/LocalAIVoiceChat","commit_stats":null,"previous_names":["koljab/localaivoicechat"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KoljaB%2FLocalAIVoiceChat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KoljaB%2FLocalAIVoiceChat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KoljaB%2FLocalAIVoiceChat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KoljaB%2FLocalAIVoiceChat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KoljaB","download_url":"https://codeload.github.com/KoljaB/LocalAIVoiceChat/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247174453,"owners_count":20896078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","python","realtime"],"created_at":"2024-11-19T10:33:30.897Z","updated_at":"2025-04-04T12:09:15.592Z","avatar_url":"https://github.com/KoljaB.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Local AI Voice Chat \n\nProvides talk in realtime with AI, completely local on your PC, with customizable AI personality and voice.\n\n\u003e **Hint:** *Anybody interested in state-of-the-art voice solutions please also \u003cstrong\u003ehave a look at [Linguflex](https://github.com/KoljaB/Linguflex)\u003c/strong\u003e. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.*\n\n\u003e **Note:** If you run into 'General synthesis error: isin() received an invalid combination of arguments' error, this is due to new transformers library introducing an incompatibility to Coqui TTS (see [here](https://github.com/KoljaB/RealtimeTTS/issues/85)). Please downgrade to an older transformers version: `pip install transformers==4.38.2` or upgrade RealtimeTTS to latest version `pip install realtimetts==0.4.1`.\n\n## About the Project\n\nIntegrates the powerful Zephyr 7B language model with real-time speech-to-text and text-to-speech libraries to create a fast and engaging voicebased local chatbot. \n\nhttps://github.com/KoljaB/LocalAIVoiceChat/assets/7604638/cebacdad-8a57-4a03-bfd1-a469730dda51\n\n\u003e **Hint:** If you run into problems installing llama.cpp please also have a look into my [LocalEmotionalAIVoiceChat project](https://github.com/KoljaB/LocalEmotionalAIVoiceChat). It includes emotion-aware realtime text-to-speech output and has multiple LLM provider options. You can also use it with different AI models. \n\n## Tech Stack\n\n- **[llama_cpp](https://github.com/ggerganov/llama.cpp)** with Zephyr 7B  \n  - library interface for llamabased language models\n- **[RealtimeSTT](https://github.com/KoljaB/RealtimeSTT)** with faster_whisper  \n  - real-time speech-to-text transcription library\n- **[RealtimeTTS](https://github.com/KoljaB/RealtimeTTS)** with Coqui XTTS  \n  - real-time text-to-speech synthesis library\n\n## Notes\n\nThis software is in an experimental alpha state and does not provide production ready stability. The current XTTS model used for synthesis still has glitches and also Zephyr - while really good for a 7B model - of course can not compete with the answer quality of GPT 4, Claude or Perplexity.\n\nPlease take this as a first attempt to provide an early version of a local realtime chatbot.\n\n### Updates\n\n- Update to Coqui XTTS 2.0 model\n- Bugfix to RealtimeTTS (download of Coqui model did not work properly)\n\n### Prerequisites\n\nYou will need a GPU with around 8 GB VRAM to run this in real-time.\n\n#### For nVidia users\n\n- **NVIDIA CUDA Toolkit 11.8**:\n    - Access the [NVIDIA CUDA Toolkit Archive](https://developer.nvidia.com/cuda-11-8-0-download-archive).\n    - Choose version 11.x and follow the instructions for downloading and installation.\n\n- **NVIDIA cuDNN 8.7.0 for CUDA 11.x**:\n    - Navigate to [NVIDIA cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive).\n    - Locate and download \"cuDNN v8.7.0 (November 28th, 2022), for CUDA 11.x\".\n    - Follow the provided installation guide.\n\n#### For AMD users\n- **Install ROCm v.5.7.1**\n    - Download [ROCm SDK version 5.7.1](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)\n    - Follow the provided installation guide.\n\n\n- **FFmpeg**:\n\n    Install FFmpeg according to your operating system:\n\n    - **Ubuntu/Debian**:\n        ```shell\n        sudo apt update \u0026\u0026 sudo apt install ffmpeg\n        ```\n\n    - **Arch Linux**:\n        ```shell\n        sudo pacman -S ffmpeg\n        ```\n\n    - **macOS (Homebrew)**:\n        ```shell\n        brew install ffmpeg\n        ```\n\n    - **Windows (Chocolatey)**:\n        ```shell\n        choco install ffmpeg\n        ```\n\n    - **Windows (Scoop)**:\n        ```shell\n        scoop install ffmpeg\n        ```    \n\n\n### Installation Steps \n\n1. Clone the repository or download the source code package.\n\n2. Install llama.cpp\n    - (for AMD users) Before the next step set env variable `LLAMA_HIPBLAS` value to `on`\n\n    - Official way:\n     ```python\n     pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose\n     ```\n\n    - If the official installation does not work for you, please install [text-generation-webui](https://github.com/oobabooga/text-generation-webui), which provides some excellent wheels for a lot of platforms and environments\n\n3. Install realtime libraries\n   - Install the main libraries:\n     ```python\n     pip install RealtimeSTT==0.1.7\n     pip install RealtimeTTS==0.2.7\n     ```\n4. Download zephyr-7b-beta.Q5_K_M.gguf from [here](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main). \n   - Open creation_params.json and enter the filepath to the downloaded model into `model_path`.\n   - Adjust n_gpu_layers (0-35, raise if you have more VRAM) and n_threads (number of CPU threads, i recommend not using all available cores but leave some for TTS)\n\n5. If dependency conflicts occur, install specific versions of conflicting libraries:\n     ```python\n     pip install networkx==2.8.8\n     pip install typing_extensions==4.8.0\n     pip install fsspec==2023.6.0\n     pip install imageio==2.31.6\n     pip install numpy==1.24.3\n     pip install requests==2.31.0\n     ```   \n\n## Running the Application\n     python ai_voicetalk_local.py\n\n## Customize\n\n### Change AI personality\n\nOpen chat_params.json to change the talk scenario.\n\n### Change AI Voice\n\n- Open ai_voicetalk_local.py. \n- Find this line: coqui_engine = CoquiEngine(cloning_reference_wav=\"female.wav\", language=\"en\")\n- Change \"female.wav\" to the filename of a wave file (44100 or 22050 Hz mono 16-bit) containing the voice to clone\n\n### Speech end detection\n\nIf the first sentence is transcribed before you get to the second one, raise post_speech_silence_duration on AudioToTextRecorder:\n    ```\n    AudioToTextRecorder(model=\"tiny.en\", language=\"en\", spinner=False, post_speech_silence_duration = 1.5) \n    ```\n    \n## Contributing\n\nContributions to enhance or improve the project are warmly welcomed. Feel free to open a pull request with your proposed changes or fixes.\n\n## License\n\nThe project is under [Coqui Public Model License 1.0.0](https://coqui.ai/cpml).\n\nThis license allows only non-commercial use of a machine learning model and its outputs.\n\n\n## Contact\n\nKolja Beigel  \n- Email: [kolja.beigel@web.de](mailto:kolja.beigel@web.de)  \n\nFeel free to reach out for any queries or support related to this project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoljab%2Flocalaivoicechat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkoljab%2Flocalaivoicechat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoljab%2Flocalaivoicechat/lists"}