{"id":28087401,"url":"https://github.com/thiswillbeyourgithub/simple_voice_chat","last_synced_at":"2025-05-13T11:32:04.054Z","repository":{"id":289908395,"uuid":"972701949","full_name":"thiswillbeyourgithub/simple_voice_chat","owner":"thiswillbeyourgithub","description":"Use fastrtc to do voice chat with any LLMs any STT engine and any TTS engine","archived":false,"fork":false,"pushed_at":"2025-05-10T14:06:26.000Z","size":838,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-10T15:20:28.630Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thiswillbeyourgithub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-25T14:10:57.000Z","updated_at":"2025-05-10T14:06:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"1b3c0638-daa4-4b56-bfee-866b90e7ff8b","html_url":"https://github.com/thiswillbeyourgithub/simple_voice_chat","commit_stats":null,"previous_names":["thiswillbeyourgithub/simple_voice_chat"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2Fsimple_voice_chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2Fsimple_voice_chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2Fsimple_voice_chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thiswillbeyourgithub%2Fsimple_voice_chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thiswillbeyourgithub","download_url":"https://codeload.github.com/thiswillbeyourgithub/simple_voice_chat/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253933024,"owners_count":21986501,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-13T11:32:03.450Z","updated_at":"2025-05-13T11:32:03.994Z","avatar_url":"https://github.com/thiswillbeyourgithub.png","language":"Python","readme":"# Simple Voice Chat\n\nThis project provides a flexible voice chat interface that connects to various Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) services.\n\n![Screenshot](screenshot.png)\n\n**Acknowledgement:** This project heavily relies on the fantastic [fastrtc](https://github.com/gradio-app/fastrtc) library, which simplifies real-time audio streaming over WebRTC and provided crucial examples for setting up the various supported backends, making this application possible.\n\n## Motivation\n\nThis project aims to provide a versatile and cost-effective voice chat interface. While initially driven by the desire for alternatives to OpenAI's real-time voice API, it has evolved to offer multiple backend options, including direct integration with OpenAI's real-time services. This allows users to choose the best STT, LLM, and TTS combination for their needs, whether prioritizing cost, performance, self-hosting, or specific provider features.\n\n## Features\n\n*   🚀 **Multiple Backends:** The application supports three primary backend types for voice processing:\n    *   **Classic Backend:** This is the most flexible option, offering a modular approach where you connect separate services for:\n        *   🗣️ **STT (Speech-to-Text):** Supports API-based services like OpenAI Whisper or self-hosted engines such as [Speaches](https://github.com/speaches-ai/speaches) (which utilizes Faster Whisper).\n        *   🧠 **LLM (Large Language Model):** Integrates with [LiteLLM](https://github.com/BerriAI/litellm), providing access to a vast array of models including OpenAI, Anthropic, Google, Mistral, Cohere, Azure, and local models run via services like [Ollama](https://ollama.com/), LiteLLM proxy, vLLM, and more.\n        *   🔊 **TTS (Text-to-Speech):** Supports API-based services like OpenAI TTS or alternatives such as [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) (which can use [KokoroTTS](https://github.com/kokorotts/)).\n        *   *This backend allows for a fully local setup if desired, using local STT, LLM (e.g., via Ollama), and TTS engines.*\n    *   **OpenAI Backend:** Utilizes OpenAI's real-time voice API for a streamlined, all-in-one voice interaction experience, requiring an OpenAI API key. STT language and output voice can be configured.\n    *   **Gemini Backend:** Leverages Google's Gemini Live Connect API for real-time voice interactions, requiring a Google Gemini API key. STT language and output voice can be configured.\n*   ⚙️ **Highly Configurable:** Adjust backend type, STT/LLM/TTS hosts, ports, models, API keys, STT confidence thresholds (classic backend), TTS voice/speed (classic backend), system messages, STT language (all backends), and output voice (all backends) via CLI arguments or `.env` file.\n*   🌐 **Web Interface:** Simple and responsive UI built with HTML, CSS, and JavaScript.\n*   📊 **Cost Tracking:**\n    *   **Classic Backend:** Real-time cost estimation for OpenAI LLM and TTS usage.\n    *   **OpenAI Backend:** Real-time cost estimation based on token usage for the selected OpenAI real-time model.\n    *   **Gemini Backend:** Real-time cost estimation based on token usage for the selected Gemini model.\n*   ⚡ **Real-time Interaction:** Low-latency voice communication powered by [fastrtc](https://github.com/gradio-app/fastrtc) (WebRTC).\n*   👂 **STT Confidence Filtering (Classic Backend):** Automatically reject low-confidence transcriptions based on configurable thresholds (no speech probability, average log probability, minimum word count).\n*   🎤 **Dynamic Settings Adjustment:**\n    *   **Classic Backend:** Change LLM model, TTS voice, TTS speed, and STT language on-the-fly.\n    *   **OpenAI Backend:** Change STT language and output voice on-the-fly.\n    *   **Gemini Backend:** Change STT language and output voice on-the-fly.\n*   🔍 **Fuzzy Search:** Quickly find models and voices using fuzzy search in the UI dropdowns.\n*   💬 **System Message Support:** Define a custom system message to guide the LLM's behavior.\n*   📝 **Chat History Logging:** Automatically saves conversation history to timestamped JSON files.\n*   🔄 **TTS Audio Replay (Classic Backend):** Replay the audio for any assistant message directly from the chat interface.\n*   ⌨️ **Keyboard Shortcuts:** Control mute (M), clear chat (Ctrl+R), and toggle options (Shift+S) using keyboard shortcuts.\n*   💓 **Connection Monitoring:** Uses a heartbeat mechanism to detect disconnected clients and potentially shut down the server.\n*   🖥️ **Cross-Platform GUI:** Runs as a standalone desktop application using `pywebview` (default) or in a standard web browser (`--browser` flag). The application explicitly uses the QT backend for `pywebview` as the GTK backend lacks necessary WebRTC support.\n\n## Known Issues\n\n*   ⚠️ **Cost Calculation:** The cost calculation for the OpenAI real-time API is currently not functional. Documentation for this feature might be out of date in other sections.\n\n## Installation\n\n\n1.  Clone the repository:\n\n    ```bash\n\n    git clone https://github.com/thiswillbeyourgithub/simple_voice_chat\n\n    cd simple_voice_chat\n\n    ```\n\n2.  Install the Python packages:\n\n    ```bash\n\n    uv pip install -e .\n\n    ```\n\n3.  (Optional) Configure services using environment variables. You can create a `.env` file based on the available options (see `--help` or `utils/env.py`).\n\n\n\n## Usage\n\n\n\nRun the main script using Python:\n\n\n```bash\nsimple-voice-chat --help\n```\n\nThe application will start a web server and attempt to open the interface in a dedicated window (or browser tab if `--browser` is specified).\n\n### Running from a Python Script\n\nYou can also run the application directly from a Python script by importing and calling the `main` function from `simple_voice_chat.simple_voice_chat`. This allows you to pass arguments programmatically.\n\nHere's an example:\n\n```python\nfrom simple_voice_chat.simple_voice_chat import main\n\nif __name__ == \"__main__\":\n    # Example arguments:\n    # Replace these with your desired configuration\n    args = [\n        \"--backend\", \"classic\",\n        \"--llm-model\", \"gpt-4o\",\n        \"--tts-voice\", \"alloy\",\n        \"--stt-language\", \"en\",\n        \"--browser\",  # Launch in browser instead of pywebview GUI\n        # Add other arguments as needed, like:\n        # \"--openai-api-key\", \"YOUR_OPENAI_KEY_HERE\", # If using OpenAI backend\n        # \"--llm-api-key\", \"YOUR_LLM_KEY_HERE\",    # If classic backend needs a key for LLM\n        # \"--stt-api-key\", \"YOUR_STT_KEY_HERE\",    # If classic backend STT needs a key\n        # \"--tts-api-key\", \"YOUR_TTS_KEY_HERE\",    # If classic backend TTS needs a key\n    ]\n    \n    # The main function expects a list of strings, similar to sys.argv\n    # It's decorated with @click.command(), so we call it with .main(args)\n    # or by directly invoking it if click handles parsing internally when called this way.\n    # For programmatic invocation with click, it's often easier to let click parse:\n    import os\n    # To ensure LiteLLM runs in production mode if not already set by the main script early enough\n    os.environ['LITELLM_MODE'] = 'PRODUCTION' \n    \n    # Call the click command directly\n    # Note: click commands usually expect to be called as if from the command line.\n    # To pass arguments programmatically to a click command, you typically invoke `main.main(args=args_list, standalone_mode=False)`.\n    # However, since `main` is already a click command, we can try to directly invoke it.\n    # If `main()` is defined as `def main(): @click.pass_context def cli(ctx, ...)` then `main(args)` works.\n    # If `def main(...)` is the click command itself, it consumes args from sys.argv by default.\n    # The `main` function in simple_voice_chat.py is a click command itself: `@click.command(...) def main(...)`\n    # So, to run it programmatically as if from CLI:\n    try:\n        # sys.argv needs to be manipulated if click is to parse it automatically,\n        # or use the programmatic API if available.\n        # The simplest way with click\n        main.main(args=args, standalone_mode=False) \n    except SystemExit as e:\n        # Click commands often call sys.exit(). We can catch this if running in a script.\n        print(f\"Application exited with status: {e.code}\")\n\n```\n\nWhen calling programmatically, `main.main(args=your_list_of_args, standalone_mode=False)` is the recommended way to invoke a Click command and pass arguments. The `standalone_mode=False` flag prevents Click from trying to exit the entire Python interpreter.\n\nYou can find all available command-line arguments and their corresponding environment variables by running `simple-voice-chat --help`.\n\nYou can choose the backend using the `--backend` option:\n*   `--backend classic` (default): Uses separate STT, LLM, and TTS services.\n*   `--backend openai`: Uses OpenAI's real-time voice API. Requires `--openai-api-key`.\n*   `--backend gemini`: Uses Google's Gemini Live Connect API. Requires `--gemini-api-key`.\n\n**For a detailed list of all configuration options, please use the `--help` flag:**\n\n```bash\nsimple-voice-chat --help\n```\n\nThis will provide the most up-to-date information on available arguments and their corresponding environment variables, including options specific to each backend.\n\n## Configuration Details\n\nSimple Voice Chat offers a flexible configuration system. Settings can be managed through command-line arguments or by creating a `.env` file in the project's root directory.\n\n**Priority:** Command-line arguments take precedence over environment variables defined in the `.env` file. Environment variables loaded via `.env` will be available for `click` options that specify an `envvar`.\n\n**Finding All Options:**\n*   **Command-Line Help:** The most comprehensive list of all available settings, their default values, and corresponding environment variable names can be found by running:\n    ```bash\n    simple-voice-chat --help\n    ```\n*   **Environment Variable Definitions:** You can also inspect the `simple_voice_chat/utils/env.py` file to see how environment variables are loaded as defaults (e.g., `LLM_MODEL_ENV = os.getenv(\"LLM_MODEL\", ...)`). The `envvar` parameter in `click` options in `simple_voice_chat.py` also shows which environment variables are directly checked.\n\n**Common Configuration Areas:**\n\n*   **Backend Selection:** Choose between `classic`, `openai`, or `gemini` backends using the `--backend` command-line argument. (Note: This specific option is primarily controlled via the CLI argument; most other options can also be set via environment variables as detailed in `--help`.)\n*   **API Keys:** Provide necessary API keys for services like OpenAI, Gemini, or other LLM/STT/TTS providers (e.g., set `OPENAI_API_KEY=\"...\"`, `GEMINI_API_KEY=\"...\"`, `LLM_API_KEY=\"...\"` in your `.env` file).\n*   **Service Endpoints (Classic Backend):** Configure host and port for your STT, LLM, and TTS services (e.g., `STT_HOST=\"localhost\"`, `LLM_PORT=\"8080\"`).\n*   **Models and Voices:**\n    *   **Classic Backend:** `LLM_MODEL`, `TTS_VOICE`.\n    *   **OpenAI Backend:** `OPENAI_REALTIME_MODEL`, `OPENAI_REALTIME_VOICE`.\n    *   **Gemini Backend:** `GEMINI_MODEL`, `GEMINI_VOICE`.\n*   **STT Behavior:** Adjust STT language (e.g., `STT_LANGUAGE=\"en\"`). For the `classic` backend, configure confidence thresholds (e.g., `STT_NO_SPEECH_PROB_THRESHOLD=\"0.6\"`, `STT_AVG_LOGPROB_THRESHOLD=\"-0.7\"`).\n*   **TTS Behavior (Classic Backend):** Control TTS speed (e.g., `TTS_SPEED=\"1.1\"`) and specify acronyms to preserve (e.g., `TTS_ACRONYM_PRESERVE_LIST=\"AI,TTS,ASAP\"`).\n*   **Gemini Backend Specifics:** Configure context window compression with `GEMINI_CONTEXT_WINDOW_COMPRESSION_THRESHOLD`.\n*   **Application Behavior:** Set the `SYSTEM_MESSAGE=\"You are a concise assistant.\"`, configure the application port (e.g., `APP_PORT=\"7860\"`), choose to launch in browser mode (using the `--browser` flag), and disable client-disconnect-based server shutdown with `DISABLE_HEARTBEAT=\"True\"`.\n\n**Example `.env` file:**\n\n```env\n# .env Example - uncomment and modify lines as needed\n\n# General Application Settings\n# APP_PORT=7861\nSYSTEM_MESSAGE=\"You are a helpful and friendly voice assistant.\"\nSTT_LANGUAGE=\"en\" # Language for Speech-to-Text (e.g., \"en\", \"es\", \"fr\"). Affects all backends.\n# DISABLE_HEARTBEAT=\"False\" # Set to \"True\" to prevent server shutdown on client disconnect.\n\n# ---- Backend Specific Configuration ----\n# Choose ONE backend via the --backend CLI option (\"classic\", \"openai\", or \"gemini\").\n# The environment variables below are relevant based on that choice.\n\n# ======= OpenAI Backend =======\n# Used if --backend=openai\nOPENAI_API_KEY=\"sk-yourOpenAIapiKeyGoesHereIfUsingOpenAIBackend\"\nOPENAI_REALTIME_MODEL=\"gpt-4o-mini-realtime-preview\" # e.g., gpt-4o-realtime-preview, gpt-4o-mini-realtime-preview\nOPENAI_REALTIME_VOICE=\"alloy\"                     # e.g., alloy, echo, fable, onyx, nova, shimmer, ash\n\n# ======= Gemini Backend =======\n# Used if --backend=gemini\n# GEMINI_API_KEY=\"yourGoogleGeminiApiKeyGoesHere\"\n# GEMINI_MODEL=\"gemini-2.0-flash-live-001\" # e.g., gemini-2.0-flash-live-001\n# GEMINI_VOICE=\"Puck\"                      # e.g., Puck, Charon, Kore, Fenrir, Aoede\n# GEMINI_CONTEXT_WINDOW_COMPRESSION_THRESHOLD=\"16000\" # Default threshold for sliding window context compression\n\n# ======= Classic Backend =======\n# Used if --backend=classic (or if no --backend is specified, as it's the default)\n\n# --- LLM Configuration (Classic Backend) ---\n# LLM_HOST=\"localhost\"               # Optional: Host for LiteLLM proxy\n# LLM_PORT=\"8000\"                    # Optional: Port for LiteLLM proxy\nLLM_MODEL=\"openrouter/google/gemini-flash-1.5\" # Default LLM model (e.g., \"gpt-4o\", \"ollama/llama3\")\nLLM_API_KEY=\"\"                     # Optional: API key for your LLM provider or LiteLLM proxy\n\n# --- STT Configuration (Classic Backend) ---\n# Defaults to OpenAI STT (api.openai.com:443, model whisper-1)\n# STT_HOST=\"api.openai.com\"\n# STT_PORT=\"443\"\n# STT_MODEL=\"whisper-1\"\n# STT_API_KEY=\"sk-yourOpenAIapiKeyGoesHereIfUsingOpenAI_STT_forClassicBackend\" # REQUIRED if using OpenAI STT\n\n# Example for a local STT server like Speaches:\n# STT_HOST=\"localhost\"\n# STT_PORT=\"8088\" # Default Speaches port\n# STT_MODEL=\"distil-whisper/distil-large-v2\" # Example model name for Speaches\n# STT_API_KEY=\"\" # If your local STT server requires an API key\n\n# STT Confidence Thresholds (Classic Backend)\n# STT_NO_SPEECH_PROB_THRESHOLD=\"0.6\"\n# STT_AVG_LOGPROB_THRESHOLD=\"-0.7\"\n# STT_MIN_WORDS_THRESHOLD=\"5\"\n\n# --- TTS Configuration (Classic Backend) ---\n# Defaults to OpenAI TTS (api.openai.com:443, model tts-1, voice ash)\n# TTS_HOST=\"api.openai.com\"\n# TTS_PORT=\"443\"\n# TTS_MODEL=\"tts-1\" # e.g., tts-1, tts-1-hd\n# TTS_VOICE=\"ash\"   # e.g., alloy, echo, fable, onyx, nova, shimmer, ash\n# TTS_API_KEY=\"sk-yourOpenAIapiKeyGoesHereIfUsingOpenAI_TTS_forClassicBackend\" # REQUIRED if using OpenAI TTS\n# TTS_SPEED=\"1.0\"   # TTS speed (0.1 to 4.0)\n\n# Example for a local TTS server like Kokoro-FastAPI:\n# TTS_HOST=\"localhost\"\n# TTS_PORT=\"8002\" # Default Kokoro-FastAPI port\n# TTS_MODEL=\"kokoro-multiple-toyunda-en\" # Example model name for Kokoro\n# TTS_VOICE=\"ToyundaDesktop\" # Example voice for Kokoro\n# TTS_API_KEY=\"\" # If your local TTS server requires an API key\n# TTS_SPEED=\"1.2\"\n# TTS_ACRONYM_PRESERVE_LIST=\"AI,TTS,LLM,ASAP\" # Comma-separated list of acronyms for Kokoro TTS\n```\n\nRemember to remove or comment out settings that are not relevant to your chosen backend or setup. The `simple-voice-chat --help` output is your best reference for all available options and their corresponding environment variables.\n\n---\n\n\n\n*This README was generated with assistance from [aider.chat](https://aider.chat).*\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthiswillbeyourgithub%2Fsimple_voice_chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthiswillbeyourgithub%2Fsimple_voice_chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthiswillbeyourgithub%2Fsimple_voice_chat/lists"}