{"id":15747318,"url":"https://github.com/kristofferv98/voiceprocessingtoolkit","last_synced_at":"2025-09-09T04:37:42.037Z","repository":{"id":217054958,"uuid":"733387301","full_name":"kristofferv98/VoiceProcessingToolkit","owner":"kristofferv98","description":"The VoiceProcessingToolkit is an all-encompassing suite designed for sophisticated voice detection, wake word recognition, text-to-speech synthesis, and advanced audio processing. It offers intuitive interfaces to streamline the integration of voice processing capabilities into your applications","archived":false,"fork":false,"pushed_at":"2025-06-05T11:45:30.000Z","size":35973,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-16T05:28:53.601Z","etag":null,"topics":["api","audio","automation","elevenlabs","gpt-4","multithreading","openai","picovoice","python","speech","text-to-speech","transcription","utility","voice","voice-processing","wake-word","whisper","whisper-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kristofferv98.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-12-19T08:05:55.000Z","updated_at":"2025-06-05T11:45:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"dc726b26-3dd6-455e-b683-a05847ab20c3","html_url":"https://github.com/kristofferv98/VoiceProcessingToolkit","commit_stats":null,"previous_names":["kristofferv98/voiceprocessingtoolkit"],"tags_count":3,"template":true,"template_full_name":null,"purl":"pkg:github/kristofferv98/VoiceProcessingToolkit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristofferv98%2FVoiceProcessingToolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristofferv98%2FVoiceProcessingToolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristofferv98%2FVoiceProcessingToolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristofferv98%2FVoiceProcessingToolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kristofferv98","download_url":"https://codeload.github.com/kristofferv98/VoiceProcessingToolkit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kristofferv98%2FVoiceProcessingToolkit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274244120,"owners_count":25248157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","audio","automation","elevenlabs","gpt-4","multithreading","openai","picovoice","python","speech","text-to-speech","transcription","utility","voice","voice-processing","wake-word","whisper","whisper-api"],"created_at":"2024-10-04T05:04:50.263Z","updated_at":"2025-09-09T04:37:42.029Z","avatar_url":"https://github.com/kristofferv98.png","language":"Python","readme":" # VoiceProcessingToolkit\n \n🚨 This repository is no longer maintained: Visis [Realtime_MLx](https://github.com/kristofferv98/Realtime_mlx_STT) for the latest most up to date server with improved capabilities.\n\n ## Introduction\n VoiceProcessingToolkit is a Python library designed for voice processing tasks, including wake word detection, transcription, and synthesis. It aims to streamline the creation of voice-activated applications.\n\n 1. [Introduction](#introduction)\n2. [Features](#features)\n3. [Installation](#installation)\n4. [Usage](#usage)\n   - [Basic Example](#basic-example)\n   - [Example with Autogen](example_usage/Autogen_voice_assistant_example_pyfile.py)\n5. [Configuration](#configuration)\n6. [Example Usage](#example-usage)\n7. [Contributing](#contributing)\n8. [Support](#support)\n9. [License](#license)\n10. [Development Status](#development-status)\n11. [Acknowledgements](#acknowledgements)\n12. [Contact Information](#contact-information)\n\n### Features\n + Wake word detection using Picovoice Porcupine.\n + High-quality voice recording with adjustable settings for Voice Activation Detection.\n + Fast and accurate speech-to-text transcription with OpenAI's Whisper.\n + Customizable text-to-speech synthesis via ElevenLabs' API.\n + Secure API key management with environment variables.\n + Example scripts for easy demonstration and usage.\n + Extensible architecture for feature additions and customization.\n\n ### Installation\n The VoiceProcessingToolkit is available on PyPI. To install, run the following command:\n ```bash\n pip install VoiceProcessingToolkit\n ```\n\n ## Usage\n ### Basic Example\n The following is a quick-start guide to using the toolkit for wake word detection and speech synthesis with auto-translation from any language to English. \n\n ```python\nfrom VoiceProcessingToolkit.VoiceProcessingManager import VoiceProcessingManager\nfrom dotenv import load_dotenv\n\nimport logging\nimport os\n\n# logging.basicConfig(level=logging.INFO)\nload_dotenv()\n\n# Set environment variables for API keys\nos.getenv('PICOVOICE_APIKEY')\nos.getenv('OPENAI_API_KEY')\nos.getenv('ELEVENLABS_API_KEY')\n\n # Create a VoiceProcessingManager instance with default settings\n vpm = VoiceProcessingManager.create_default_instance(wake_word='computer')\n\n # Run the voice processing manager with transcription and text-to-speech\n text = vpm.run()\n print(text)\n ```\n ### Text-to-Speech Example\n\n For text-to-speech conversion without recording, provide your own text as follows:\n\n ```python\nfrom VoiceProcessingToolkit.VoiceProcessingManager import text_to_speech_stream\nfrom dotenv import load_dotenv\n\nimport logging\nimport os\n\n# logging.basicConfig(level=logging.INFO)\nload_dotenv()\n\n# Set environment variables for API keys\nos.getenv('ELEVENLABS_API_KEY')\n\ntext = \"Hello, welcome to the Voice Processing Toolkit!\"\n\nprint(\"Text to speech conversion in progress...\")\ntext_to_speech_stream(text=text)\n ```\n\n\n The `VoiceProcessingManager` class is the central component of the toolkit, orchestrating the voice processing workflow. It is highly configurable, allowing you to tailor the behavior to your specific needs. Below are some of the key attributes and methods provided by this class:\n\n Attributes of `VoiceProcessingManager` include:\n - `wake_word`: The wake word for triggering voice recording.\n - `sensitivity`: Sensitivity for wake word detection.\n - `output_directory`: Directory for saving recorded audio files\u003c.\n - `audio_format`, `channels`, `rate`, `frames_per_buffer`: Audio stream parameters.\n - `voice_threshold`, `silence_limit`, `inactivity_limit`, `min_recording_length`, `buffer_length`: Voice recording parameters.\n - `use_wake_word`: Flag to use wake word detection.\n - `save_wake_word_recordings`: Flag to save audio buffer that triggered the wake word detection.\n - `play_notification_sound`: Flag to play a sound on detection.\n\n Methods of `VoiceProcessingManager` include:\n - `run(tts=False, streaming=False)`: Processes a voice command with optional text-to-speech functionality.\n - `setup()`: Initializes the components of the voice processing manager.\n - `process_voice_command()`: Processes a voice command using the configured components.\n\n For a more detailed explanation of these attributes and methods, please refer to the inline documentation within the `VoiceProcessingManager.py` file.\n\n ## Getting Started\n To begin using VoiceProcessingToolkit, follow these steps:\n\n To get started with the VoiceProcessingToolkit, follow these simple steps:\n\n 1. Install the toolkit via pip: `pip install VoiceProcessingToolkit`\n\n 2. Obtain API keys from Picovoice, OpenAI, and ElevenLabs.\n\n 3. Set the API keys as environment variables.\n\n 4. Run an example script from the `example_usage` directory.\n\n 5. Customize `VoiceProcessingManager` settings as needed.\n\n For a more detailed explanation of these steps, please refer to the inline documentation and example usage scripts provided in the toolkit. These resources provide detailed instructions on configuration, usage examples, and customization options.\n\n If you have downloaded the github repository, you can run the examples from the `example_usage` directory. \n Rename the `REMOVE_THIS_TEXT.env` file to `.env` and add your API keys.\n ## Example Usage\n The `example_usage` directory contains scripts showcasing various features:\n\n - [Simple Setup](example_usage/Simple_setup.py): Demonstrates the basic setup and usage of the VoiceProcessingManager.\n - [Create Wake Word Data](example_usage/Create_wakeword_data.py): Demonstrates how to create a wake word dataset using the VoiceProcessingManager.\n - [Wake Word Decorators](example_usage/Wakeword_decorators.py): Demonstrates how to register actions with the VoiceProcessingManager that will be triggered when the wake word is detected.\n - [Custom Recording Logic](example_usage/Custom_recording_logic.py): Demonstrates custom recording settings and runs the VoiceProcessingManager without the wake word detector.\n - [Text to Speech](example_usage/Text_to_speach.py): Demonstrates the text to speech functionality with text as input using the VoiceProcessingManager.\n - [Autogen_voice_assistant](example_usage/Autogen_voice_assistant_example.ipynb): Demonstrates how to use the VoiceProcessingManager to create a voice assistant with custom wake words and instructions.\n\n\n ### Configuration\n Customize the toolkit with settings like wake word sensitivity and audio sample rate. See the examples for more details.\n\n ### Contributing\n Contributions are welcome! See CONTRIBUTING.md for guidelines.\n\n ### Support\n For issues or questions, please use the [GitHub issue tracker](https://github.com/kristofferv98/VoiceProcessingToolkit/issues).\n\n ### License\n Licensed under the MIT License. See LICENSE for details.\n\n ### Development Status\n The project is in development. Feedback and contributions are appreciated.\n\n ### Acknowledgements\n Thanks to OpenAI, ElevenLabs, and Picovoice for their tools that enhance this project.\n\n ### Contact Information\n For help or inquiries, reach out via [GitHub Discussions](https://github.com/kristofferv98/VoiceProcessingToolkit/discussions).\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkristofferv98%2Fvoiceprocessingtoolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkristofferv98%2Fvoiceprocessingtoolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkristofferv98%2Fvoiceprocessingtoolkit/lists"}