{"id":13375509,"url":"https://github.com/willwade/tts-wrapper","last_synced_at":"2025-03-13T01:31:50.983Z","repository":{"id":195314769,"uuid":"692681755","full_name":"willwade/tts-wrapper","owner":"willwade","description":"TTS-Wrapper makes it easier to use text-to-speech APIs by providing a unified and easy-to-use interface.","archived":false,"fork":true,"pushed_at":"2025-02-17T07:27:51.000Z","size":3971,"stargazers_count":19,"open_issues_count":8,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-17T08:22:13.112Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"mediatechlab/tts-wrapper","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/willwade.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-09-17T08:51:46.000Z","updated_at":"2025-02-17T07:27:54.000Z","dependencies_parsed_at":"2023-09-17T12:58:18.668Z","dependency_job_id":null,"html_url":"https://github.com/willwade/tts-wrapper","commit_stats":null,"previous_names":["willwade/tts-wrapper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willwade%2Ftts-wrapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willwade%2Ftts-wrapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willwade%2Ftts-wrapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willwade%2Ftts-wrapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/willwade","download_url":"https://codeload.github.com/willwade/tts-wrapper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243322488,"owners_count":20272887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T05:01:41.596Z","updated_at":"2025-03-13T01:31:50.963Z","avatar_url":"https://github.com/willwade.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# py3-TTS-Wrapper\n\n[![PyPI version](https://badge.fury.io/py/py3-tts-wrapper.svg)](https://badge.fury.io/py/py3-tts-wrapper)\n[![codecov](https://codecov.io/gh/willwade/py3-tts-wrapper/branch/master/graph/badge.svg?token=79IG7GAK0B)](https://codecov.io/gh/willwade/py3-tts-wrapper)\n\n\u003e **Contributions are welcome! Check our [contribution guide](./CONTRIBUTING.md).**\n\n_TTS-Wrapper_ simplifies using text-to-speech APIs by providing a unified interface across multiple services, allowing easy integration and manipulation of TTS capabilities. \n\n \u003e ℹ️ Full documentation is available at [https://willwade.github.io/tts-wrapper/](https://willwade.github.io/tts-wrapper/)\n\n## Requirements\n\n- Python 3.10 or higher\n- System dependencies (see below)\n- API credentials for online services\n\n## Supported Services\n\n- AWS Polly\n- Google TTS\n- Microsoft Azure TTS\n- IBM Watson\n- ElevenLabs\n- Wit.Ai \n- eSpeak-NG\n- Play.HT\n- AVSynth (macOS only)\n- SAPI (Windows only)\n- Sherpa-Onnx (NB: Means you can run any ONNX model you want - eg Piper or MMS models)\n\n### Experimental (Not fully featured or in a state of WIP)\n\n- PicoTTS\n- UWP (WinRT) Speech system (win 10+)\n\n## Features\n- **Text to Speech**: Convert text into spoken audio.\n- **SSML Support**: Use Speech Synthesis Markup Language to enhance speech synthesis.\n- **Voice and Language Selection**: Customize the voice and language for speech synthesis.\n- **Streaming and Direct Play**: Stream audio or play it directly.\n- **Pause, Resume, and Stop Controls**: Manage audio playback dynamically.\n- **File Output**: Save spoken audio to files in various formats.\n- **Unified Voice handling** Get Voices across all TTS engines with alike keys\n- **Volume, Pitch, and Rate Controls** Control volume, pitch and rate with unified methods\n\n\n## Feature Matrix\n\n| Engine     | Platform            | Online/Offline | SSML | Word Boundaries | Streaming | Playback Control | Callbacks |\n|------------|--------------------|--------------------|------|-----------------|-----------|------------------|-----------|\n| Polly      | Linux/MacOS/Windows| Online            | Yes  | Yes            | Yes       | Yes              | Full      |\n| Google     | Linux/MacOS/Windows| Online            | Yes  | Yes            | Yes       | Yes              | Full      |\n| Microsoft  | Linux/MacOS/Windows| Online            | Yes  | Yes            | Yes       | Yes              | Full      |\n| Watson     | Linux/MacOS/Windows| Online            | Yes  | Yes            | Yes       | Yes              | Full      |\n| ElevenLabs | Linux/MacOS/Windows| Online            | No*  | Yes            | Yes       | Yes              | Full      |\n| Play.HT    | Linux/MacOS/Windows| Online            | No*  | No**           | Yes       | Yes              | Basic     |\n| Wit.Ai     | Linux/MacOS/Windows| Online            | No*  | No**           | Yes       | Yes              | Basic     |\n| eSpeak     | Linux/MacOS        | Offline           | Yes  | No**           | Yes       | Yes              | Basic     |\n| AVSynth    | MacOS              | Offline           | No   | No**           | Yes       | Yes              | Basic     |\n| SAPI       | Windows            | Offline           | Yes  | Yes            | Yes       | Yes              | Full      |\n| UWP        | Windows            | Offline           | Yes  | Yes            | Yes       | Yes              | Full      |\n| Sherpa-ONNX| Linux/MacOS/Windows| Offline           | No   | No**           | Yes       | Yes              | Basic     |\n\n**Notes**:\n- **SSML**: Entries marked with No* indicate that while the engine doesn't support SSML natively, the wrapper will automatically strip SSML tags and process the plain text.\n- **Word Boundaries**: Entries marked with No** use an estimation-based timing system that may not be accurate for precise synchronization needs.\n- **Callbacks**: \n  - \"Full\" supports accurate word-level timing callbacks, onStart, and onEnd events\n  - \"Basic\" supports onStart and onEnd events, with estimated word timings\n- **Playback Control**: All engines support pause, resume, and stop functionality through the wrapper's unified interface\n- All engines support the following core features:\n  - Voice selection (`set_voice`)\n  - Property control (rate, volume, pitch)\n  - File output (WAV, with automatic conversion to MP3/other formats)\n  - Streaming playback\n  - Audio device selection\n\n### Core Methods Available\n\n| Method                    | Description                                  | Availability |\n|--------------------------|----------------------------------------------|--------------|\n| `speak()`                | Direct speech playback                       | All engines  |\n| `speak_streamed()`       | Streamed speech playback                    | All engines  |\n| `synth_to_file()`        | Save speech to file                         | All engines  |\n| `pause()`, `resume()`    | Playback control                            | All engines  |\n| `stop()`                 | Stop playback                               | All engines  |\n| `set_property()`         | Control rate/volume/pitch                   | All engines  |\n| `get_voices()`           | List available voices                       | All engines  |\n| `set_voice()`           | Select voice                                | All engines  |\n| `connect()`             | Register event callbacks                    | All engines  |\n| `check_credentials()`    | Verify API credentials                      | Online engines|\n| `set_output_device()`    | Select audio output device                  | All engines  |\n\n---\n\n## Installation\n\n### Package Name Note\n\nThis package is published on PyPI as `py3-tts-wrapper` but installs as `tts-wrapper`. This is because it's a fork of the original `tts-wrapper` project with Python 3 support and additional features.\n\n### System Dependencies\n\nThis project requires the following system dependencies on Linux:\n\n```sh\nsudo apt-get install portaudio19-dev\n```\n\nor MacOS, using [Homebrew](https://brew.sh)\n\n```sh\nbrew install portaudio\n```\n\nFor PicoTTS on Debian systems:\n\n```sh\nsudo apt-get install libttspico-utils\n```\n\nThe `espeak` TTS functionality requires the `espeak-ng` C library to be installed on your system:\n\n- **Ubuntu/Debian**: `sudo apt install espeak-ng`\n- **macOS**: `brew install espeak-ng`\n- **Windows**: Download the binaries from https://espeak.sourceforge.net/\n\n### Using pip\n\nInstall from PyPI with selected engines:\n```sh\npip install \"py3-tts-wrapper[google,microsoft,sapi,sherpaonnx,googletrans]\"\n```\n\nInstall from GitHub:\n```sh\npip install \"py3-tts-wrapper[google,microsoft,sapi,sherpaonnx,googletrans]@git+https://github.com/willwade/tts-wrapper\"\n```\n\nNote: On macOS/zsh, you may need to use quotes:\n```sh\npip install \"py3-tts-wrapper[google,watson,polly,elevenlabs,microsoft,sherpaonnx]\"\n```\n\n\n\n## Usage Guide\n\n### Basic Usage\n\n```python\nfrom tts_wrapper import PollyClient\npollyClient = PollyClient(credentials=('aws_key_id', 'aws_secret_access_key'))\n\nfrom tts_wrapper import PollyTTS\n\ntts = PollyTTS(pollyClient)\nssml_text = tts.ssml.add('Hello, \u003cbreak time=\"500ms\"/\u003e world!')\ntts.speak(ssml_text)\n```\n\nYou can use SSML or plain text\n\n```python\nfrom tts_wrapper import PollyClient\npollyClient = PollyClient(credentials=('aws_key_id', 'aws_secret_access_key'))\nfrom tts_wrapper import PollyTTS\n\ntts = PollyTTS(pollyClient)\ntts.speak('Hello world')\n```\n\nFor a full demo see the examples folder. You'll need to fill out the credentials.json (or credentials-private.json). Use them from cd'ing into the examples folder. \nTips on gaining keys are below.\n\n### Authorization\n\nEach service uses different methods for authentication:\n\n#### Polly\n\n```python\nfrom tts_wrapper import PollyTTS, PollyClient\nclient = PollyClient(credentials=('aws_region','aws_key_id', 'aws_secret_access_key'))\n\ntts = PollyTTS(client)\n```\n\n#### Google\n\n```python\nfrom tts_wrapper import GoogleTTS, GoogleClient\nclient = GoogleClient(credentials=('path/to/creds.json'))\n\ntts = GoogleTTS(client)\n```\nor pass the auth file as dict - so in memory\n\n```python\nfrom tts_wrapper import GoogleTTS, GoogleClient\n\nwith open(os.getenv(\"GOOGLE_SA_PATH\"), \"r\") as file:\n    credentials_dict = json.load(file)\n\nclient = GoogleClient(credentials=os.getenv('GOOGLE_SA_PATH'))\nclient = GoogleClient(credentials=credentials_dict)]\n```\n\n#### Microsoft\n\n```python\nfrom tts_wrapper import MicrosoftTTS, MicrosoftClient\nclient = MicrosoftClient(credentials=('subscription_key','subscription_region'))\n\ntts = MicrosoftTTS(client)\n```\n\n#### Watson\n\n```python\nfrom tts_wrapper import WatsonTTS, WatsonClient\nclient = WatsonClient(credentials=('api_key', 'region', 'instance_id'))\n\ntts = WatsonTTS(client)\n```\n\n**Note** If you have issues with SSL certification try\n\n```python\nfrom tts_wrapper import WatsonTTS, WatsonClient\nclient = WatsonClient(credentials=('api_key', 'region', 'instance_id'),disableSSLVerification=True)\n\ntts = WatsonTTS(client)\n```\n\n#### ElevenLabs\n\n```python\nfrom tts_wrapper import ElevenLabsTTS, ElevenLabsClient\nclient = ElevenLabsClient(credentials=('api_key'))\ntts = ElevenLabsTTS(client)\n```\n\n- **Note**: ElevenLabs does not support SSML.\n\n#### Wit.Ai\n\n```python\nfrom tts_wrapper import WitAiTTS, WitAiClient\nclient = WitAiClient(credentials=('token'))\ntts = WitAiTTS(client)\n```\n\n#### Play.HT\n\n```python\nfrom tts_wrapper import PlayHTClient, PlayHTTTS\nclient = PlayHTClient(credentials=('api_key', 'user_id'))\ntts = PlayHTTTS(client)\n```\n\n- **Note**: Play.HT does not support SSML, but we automatically strip SSML tags if present.\n\n#### UWP\n\n```python\nfrom tts_wrapper import UWPTTS, UWPClient\nclient = UWPClient()\ntts = UWPTTS(client)\n```\n\n#### eSpeak\n\n```python\nfrom tts_wrapper import eSpeakClient, eSpeakTTS\n\nclient = eSpeakClient()\ntts = eSpeakTTS(client)\n```\n\nNote: Requires espeak-ng to be installed on your system.\n\n#### SAPI (Windows)\n\n```python\nfrom tts_wrapper import SAPIClient, SAPITTS\n\nclient = SAPIClient()\ntts = SAPITTS(client)\n```\n\nNote: Only available on Windows systems.\n\n#### AVSynth (macOS)\n\n```python\nfrom tts_wrapper import AVSynthClient, AVSynthTTS\n\nclient = AVSynthClient()\ntts = AVSynthTTS(client)\n```\n\nNote: Only available on macOS. Provides high-quality speech synthesis with word timing support and voice property control.\n\n#### GoogleTrans\n\nUses the gTTS library. \n\n```python\nfrom tts_wrapper import GoogleTransClient, GoogleTransTTS\nvoice_id = \"en-co.uk\"  # Example voice ID for UK English\nclient = GoogleTransClient(voice_id)\n# Initialize the TTS engine\ntts = GoogleTransTTS(client)\n```\n\n#### Sherpa-ONNX\n\nYou can provide blank model path and tokens path - and we will use a default location.. \n\n```python\nfrom tts_wrapper import SherpaOnnxClient, SherpaOnnxTTS\nclient = SherpaOnnxClient(model_path=None, tokens_path=None)\ntts = SherpaOnnxTTS(client)\n```\n\nSet a voice like\n\n```python\n# Find voices/langs availables\nvoices = tts.get_voices()\nprint(\"Available voices:\", voices)\n\n# Set the voice using ISO code\niso_code = \"eng\"  # Example ISO code for the voice - also ID in voice details\ntts.set_voice(iso_code)\n```\nand then use speak, speak_streamed etc.. \n\nYou then can perform the following methods.\n\n### Advanced Usage\n\n#### SSML\n\nEven if you don't use SSML features that much its wise to use the same syntax - so pass SSML not text to all engines\n\n```python\nssml_text = tts.ssml.add('Hello world!')\n```\n\n#### Plain Text\n\nIf you want to keep things simple each engine will convert plain text to SSML if its not.\n\n```python\ntts.speak('Hello World!')\n```\n\n#### Speak \n\nThis will use the default audio output of your device to play the audio immediately\n\n```python\ntts.speak(ssml_text)\n```\n\n#### Check Credentials\n\nThis will check if the credentials are valid. Its only on the client object. Eg\n\n```python\n\n    client = MicrosoftClient(\n        credentials=(os.getenv(\"MICROSOFT_TOKEN\"), os.getenv(\"MICROSOFT_REGION\"))\n    )\n    if client.check_credentials():\n        print(\"Credentials are valid.\")\n    else:\n        print(\"Credentials are invalid.\"\n\n```\n\nNB: Each engine has a different way of checking credentials. If they dont have a supported the parent class will check get_voices. If you want to save calls just do a get_voices call.\n\n#### Streaming and Playback Control\n\n#### `pause_audio()`, `resume_audio()`, `stop_audio()`\nThese methods manage audio playback by pausing, resuming, or stopping it.\nNB: Only to be used for speak_streamed\n\nYou need to make sure the optional dependency is included for this\n\n```sh\npip install py3-tts-wrapper[controlaudio,google.. etc\n```\n\nthen\n\n```python\n\nclient = GoogleClient(..)\ntts = GoogleTTS(client)\ntry:\n    text = \"This is a pause and resume test. The text will be longer, depending on where the pause and resume works\"\n    audio_bytes = tts.synth_to_bytes(text)\n    tts.load_audio(audio_bytes)\n    print(\"Play audio for 3 seconds\")\n    tts.play(1)\n    tts.pause(8)\n    tts.resume()\n    time.sleep(6)\nfinally:\n    tts.cleanup()\n\n```\n\n- the pause and resume are in seconds from the start of the audio\n- Please use the cleanup method to ensure the audio is stopped and the audio device is released\n\nNB: to do this we use pyaudio. If you have issues with this you may need to install portaudio19-dev - particularly on linux\n\n```sh\nsudo apt-get install portaudio19-dev\n```\n\n\n#### File Output\n\n```python\ntts.synth_to_file(ssml_text, 'output.mp3', format='mp3')\n```\nthere is also \"synth\" method which is legacy. Note we support saving as mp3, wav or flac. \n\n```Python\ntts.synth('\u003cspeak\u003eHello, world!\u003c/speak\u003e', 'hello.mp3', format='mp3)\n```\nNote you can also stream - and save. Just note it saves at the end of streaming entirely..\n\n```python\nssml_text = tts.ssml.add('Hello world!')\n\ntts.speak_streamed(ssml_text,filepath,'wav')\n```\n\n\n#### Fetch Available Voices\n\n```python\nvoices = tts.get_voices()\nprint(voices)\n```\n\nNB: All voices will have a id, dict of language_codes, name and gender. Just note not all voice engines provide gender\n\n#### Voice Selection\n\n```python\ntts.set_voice(voice_id,lang_code=en-US)\n```\n\ne.g.\n\n```python\ntts.set_voice('en-US-JessaNeural','en-US')\n```\n\nUse the id - not a name\n\n#### SSML\n\n```python\nssml_text = tts.ssml.add('Hello, \u003cbreak time=\"500ms\"/\u003e world!')\ntts.speak(ssml_text)\n```\n\n#### Volume, Rate and Pitch Control\n\nSet volume:\n```python\ntts.set_property(\"volume\", \"90\")\ntext_read = f\"The current volume is 90\"\ntext_with_prosody = tts.construct_prosody_tag(text_read)\nssml_text = tts.ssml.add(text_with_prosody)\n```\n- Volume is set on a scale of 0 (silent) to 100 (maximum).\n- The default volume is 100 if not explicitly specified.\n\nSet rate:\n\n```python\ntts.set_property(\"rate\", \"slow\")\ntext_read = f\"The current rate is SLOW\"\ntext_with_prosody = tts.construct_prosody_tag(text_read)\nssml_text = tts.ssml.add(text_with_prosody)\n```\nSpeech Rate:\n- Rate is controlled using predefined options:\n    - x-slow: Very slow speaking speed.\n    - slow: Slow speaking speed.\n    - medium (default): Normal speaking speed.\n    - fast: Fast speaking speed.\n    - x-fast: Very fast speaking speed.\n- If not specified, the speaking rate defaults to medium.\n\nSet pitch:\n```python\ntts.set_property(\"pitch\", \"high\")\ntext_read = f\"The current pitch is SLOW\"\ntext_with_prosody = tts.construct_prosody_tag(text_read)\nssml_text = tts.ssml.add(text_with_prosody)\n```\nPitch Control:\n- Pitch is adjusted using predefined options that affect the vocal tone:\n    - x-low: Very deep pitch.\n    - low: Low pitch.\n    - medium (default): Normal pitch.\n    - high: High pitch.\n    - x-high: Very high pitch.\n- If not explicitly set, the pitch defaults to medium.\n\nUse the ```tts.ssml.clear_ssml()``` method to clear all entries from the ssml list\n\n#### `set_property()`\nThis method allows setting properties like `rate`, `volume`, and `pitch`.\n\n```python\ntts.set_property(\"rate\", \"fast\")\ntts.set_property(\"volume\", \"80\")\ntts.set_property(\"pitch\", \"high\")\n```\n\n#### `get_property()`\nThis method retrieves the value of properties such as `volume`, `rate`, or `pitch`.\n\n```python\ncurrent_volume = tts.get_property(\"volume\")\nprint(f\"Current volume: {current_volume}\")\n```\n\n\n#### Using callbacks on word-level boundaries\n\nNote only **Polly, Microsoft, Google, ElevenLabs, UWP, SAPI and Watson** can do this **correctly**. We can't do this in anything else but we do do a estimated tonings for all other engines (ie elevenlabs, witAi and Piper)\n\n```python\ndef my_callback(word: str, start_time: float, end_time: float):\n    duration = end_time - start_time\n    print(f\"Word: {word}, Duration: {duration:.3f}s\")\n\ndef on_start():\n    print('Speech started')\n\ndef on_end():\n    print('Speech ended')\n\ntry:\n    text = \"Hello, This is a word timing test\"\n    ssml_text = tts.ssml.add(text)\n    tts.connect('onStart', on_start)\n    tts.connect('onEnd', on_end)\n    tts.start_playback_with_callbacks(ssml_text, callback=my_callback)\nexcept Exception as e:\n    print(f\"Error: {e}\")\n```\n\nand it will output\n\n```bash\nSpeech started\nWord: Hello, Duration: 0.612s\nWord: , Duration: 0.212s\nWord: This, Duration: 0.364s\nWord: is, Duration: 0.310s\nWord: a, Duration: 0.304s\nWord: word, Duration: 0.412s\nWord: timing, Duration: 0.396s\nWord: test, Duration: 0.424s\nSpeech ended\n```\n\n#### `connect()`\nThis method allows registering callback functions for events like `onStart` or `onEnd`.\n\n```python\ndef on_start():\n    print(\"Speech started\")\n\ntts.connect('onStart', on_start)\n```\n\n\n## Audio Output Methods\n\nThe wrapper provides several methods for audio output, each suited for different use cases:\n\n### 1. Direct Playback\n\nThe simplest method - plays audio immediately:\n```python\ntts.speak(\"Hello world\")\n```\n\n### 2. Streaming Playback\n\nRecommended for longer texts - streams audio as it's being synthesized:\n```python\ntts.speak_streamed(\"This is a long text that will be streamed as it's synthesized\")\n```\n\n### 3. File Output\n\nSave synthesized speech to a file:\n```python\ntts.synth_to_file(\"Hello world\", \"output.wav\")\n```\n\n### 4. Raw Audio Data\n\nFor advanced use cases where you need the raw audio data:\n```python\n# Get raw PCM audio data as bytes\naudio_bytes = tts.synth_to_bytes(\"Hello world\")\n```\n\n### Audio Format Notes\n\n- All engines output WAV format by default\n- For MP3 or other formats, use external conversion libraries like `pydub`:\n  ```python\n  from pydub import AudioSegment\n  import io\n  \n  # Get WAV data\n  audio_bytes = tts.synth_to_bytes(\"Hello world\")\n  \n  # Convert to MP3\n  wav_audio = AudioSegment.from_wav(io.BytesIO(audio_bytes))\n  wav_audio.export(\"output.mp3\", format=\"mp3\")\n  ```\n\n---\n\n### Example Use Cases\n\n#### 1. Saving Audio to a File\n\nYou can use the `synth_to_bytestream` method to synthesize audio in any supported format and save it directly to a file.\n\n```python\n# Synthesize text into a bytestream in MP3 format\nbytestream = tts.synth_to_bytestream(\"Hello, this is a test\", format=\"mp3\")\n\n# Save the audio bytestream to a file\nwith open(\"output.mp3\", \"wb\") as f:\n    f.write(bytestream.read())\n\nprint(\"Audio saved to output.mp3\")\n```\n\n**Explanation**:\n- The method synthesizes the given text into audio in MP3 format.\n- The `BytesIO` object is then written to a file using the `.read()` method of the `BytesIO` class.\n\n#### 2. Real-Time Playback Using `sounddevice`\n\nIf you want to play the synthesized audio live without saving it to a file, you can use the `sounddevice` library to directly play the audio from the `BytesIO` bytestream.\n\n```python\nimport sounddevice as sd\nimport numpy as np\n\n# Synthesize text into a bytestream in WAV format\nbytestream = tts.synth_to_bytestream(\"Hello, this is a live playback test\", format=\"wav\")\n\n# Convert the bytestream back to raw PCM audio data for playback\naudio_data = np.frombuffer(bytestream.read(), dtype=np.int16)\n\n# Play the audio using sounddevice\nsd.play(audio_data, samplerate=tts.audio_rate)\nsd.wait()\n\nprint(\"Live playback completed\")\n```\n\n**Explanation**:\n- The method synthesizes the text into a `wav` bytestream.\n- The bytestream is converted to raw PCM data using `np.frombuffer()`, which is then fed into the `sounddevice` library for live playback.\n- `sd.play()` plays the audio in real-time, and `sd.wait()` ensures that the program waits until playback finishes.\n\n### Manual Audio Control\n\nFor advanced use cases where you need direct control over audio playback, you can use the raw audio data methods:\n\n```python\nfrom tts_wrapper import AVSynthClient, AVSynthTTS\nimport numpy as np\nimport sounddevice as sd\n\n# Initialize TTS\nclient = AVSynthClient()\ntts = AVSynthTTS(client)\n\n# Method 1: Direct playback of entire audio\ndef play_audio_stream(tts, text: str):\n    \"\"\"Play entire audio at once.\"\"\"\n    # Get raw audio data\n    audio_data = tts.synth_to_bytes(text)\n    \n    # Convert to numpy array for playback\n    samples = np.frombuffer(audio_data, dtype=np.int16)\n    \n    # Play the audio\n    sd.play(samples, samplerate=tts.audio_rate)\n    sd.wait()\n\n# Method 2: Chunked playback for more control\ndef play_audio_chunked(tts, text: str, chunk_size: int = 4096):\n    \"\"\"Process and play audio in chunks for more control.\"\"\"\n    # Get raw audio data\n    audio_data = tts.synth_to_bytes(text)\n    \n    # Create a continuous stream\n    stream = sd.OutputStream(\n        samplerate=tts.audio_rate,\n        channels=1,  # Mono audio\n        dtype=np.int16\n    )\n    \n    with stream:\n        # Process in chunks\n        for i in range(0, len(audio_data), chunk_size):\n            chunk = audio_data[i:i + chunk_size]\n            if len(chunk) % 2 != 0:  # Ensure even size for 16-bit audio\n                chunk = chunk[:-1]\n            samples = np.frombuffer(chunk, dtype=np.int16)\n            stream.write(samples)\n```\n\nThis manual control allows you to:\n- Process audio data in chunks\n- Implement custom audio processing\n- Control playback timing\n- Add effects or modifications to the audio\n- Implement custom buffering strategies\n\nThe chunked playback method is particularly useful for:\n- Real-time audio processing\n- Custom pause/resume functionality\n- Volume adjustment during playback\n- Progress tracking\n- Memory-efficient handling of long audio\n\n**Note**: Manual audio control requires the `sounddevice` and `numpy` packages:\n```sh\npip install sounddevice numpy\n```\n\n\n## Developer's Guide\n\n### Setting up the Development Environment\n\n#### Using Pipenv\n\n\n1. Clone the repository:\n   ```sh\n   git clone https://github.com/willwade/tts-wrapper.git\n   cd tts-wrapper\n   ```\n\n2. Install the package and system dependencies:\n   ```sh\n   pip install .\n   ```\n\n   To install optional dependencies, use:\n   ```sh\n   pip install .[google, watson, polly, elevenlabs, microsoft]\n   ```\n\nThis will install Python dependencies and system dependencies required for this project. Note that system dependencies will only be installed automatically on Linux.\n\n#### Using UV\n\n1. [Install UV](https://docs.astral.sh/uv/#getting-started)\n   ```sh\n   pip install uv\n   ```\n\n2. Clone the repository:\n   ```sh\n   git clone https://github.com/willwade/tts-wrapper.git\n   cd tts-wrapper\n   ```\n\n3. Install Python dependencies:\n   ```sh\n   uv sync\n   ```\n\n4. Install system dependencies (Linux only):\n   ```sh\n   uv run postinstall\n   ```\n\n\n**NOTE**: to get a requirements.txt file for the project use `uv export --format  requirements-txt --all-extras --no-hashes` juat be warned that this will include all dependencies including dev ones.\n\n## Release a new build\n\n```sh\ngit tag -a v0.1.0 -m \"Release 0.1.0\"\ngit push origin v0.1.0\n```\n\n### Adding a New Engine to TTS Wrapper\n\nThis guide provides a step-by-step approach to adding a new engine to the existing Text-to-Speech (TTS) wrapper system.\n\n#### Step 1: Create Engine Directory Structure\n\n1. **Create a new folder** for your engine within the `engines` directory. Name this folder according to your engine, such as `witai` for Wit.ai.\n\n   Directory structure:\n   \n   ```\n   engines/witai/\n   ```\n\n2. **Create necessary files** within this new folder:\n\n   - `__init__.py` - Makes the directory a Python package.\n   - `client.py` - Handles all interactions with the TTS API.\n   - `engine.py` - Contains the TTS class that integrates with your abstract TTS system.\n   - `ssml.py` - Defines any SSML handling specific to this engine.\n\n   Final directory setup:\n\n   ```\n   engines/\n   └── witai/\n       ├── __init__.py\n       ├── client.py\n       ├── engine.py\n       └── ssml.py\n   ```\n\n#### Step 2: Implement Client Functionality in `client.py`\n\nImplement authentication and necessary setup for API connection. This file should manage tasks such as sending synthesis requests and fetching available voices.\n\n```python\nclass TTSClient:\n    def __init__(self, api_key):\n        self.api_key = api_key\n        # Setup other necessary API connection details here\n\n    def synth(self, text, options):\n        # Code to send a synthesis request to the TTS API\n        pass\n\n    def get_voices(self):\n        # Code to retrieve available voices from the TTS API\n        pass\n```\n\n#### Step 3: Define the TTS Engine in `engine.py`\n\nThis class should inherit from the abstract TTS class and implement required methods such as `get_voices` and `synth_to_bytes`.\n\n```python\nfrom .client import TTSClient\nfrom your_tts_module.abstract_tts import AbstractTTS\n\nclass WitTTS(AbstractTTS):\n    def __init__(self, api_key):\n        super().__init__()\n        self.client = TTSClient(api_key)\n\n    def get_voices(self):\n        return self.client.get_voices()\n\n    def synth_to_bytes(self, text, format='wav'):\n        return self.client.synth(text, {'format': format})\n```\n\n#### Step 4: Implement SSML Handling in `ssml.py`\n\nIf the engine has specific SSML requirements or supports certain SSML tags differently, implement this logic here.\n\n```python\nfrom your_tts_module.abstract_ssml import BaseSSMLRoot, SSMLNode\n\nclass EngineSSML(BaseSSMLRoot):\n    def add_break(self, time='500ms'):\n        self.root.add(SSMLNode('break', attrs={'time': time}))\n```\n\n#### Step 5: Update `__init__.py`\n\nMake sure the `__init__.py` file properly imports and exposes the TTS class and any other public classes or functions from your engine.\n\n```python\nfrom .engine import WitTTS\nfrom .ssml import EngineSSML\n```\n\n#### NB: Credentials Files\n\nYou can store your credentials in either:\n- `credentials.json` - For development\n- `credentials-private.json` - For private credentials (should be git-ignored)\n\nExample structure (do NOT commit actual credentials):\n```json\n{\n    \"Polly\": {\n        \"region\": \"your-region\",\n        \"aws_key_id\": \"your-key-id\",\n        \"aws_access_key\": \"your-access-key\"\n    },\n    \"Microsoft\": {\n        \"token\": \"your-subscription-key\",\n        \"region\": \"your-region\"\n    }\n}\n```\n\n### Service-Specific Setup\n\n#### AWS Polly\n- [Create an AWS account](https://aws.amazon.com/free)\n- [Set up IAM credentials](https://docs.aws.amazon.com/polly/latest/dg/setting-up.html)\n- [Polly API Documentation](https://docs.aws.amazon.com/polly/latest/dg/API_Operations.html)\n\n#### Microsoft Azure\n- [Create an Azure account](https://azure.microsoft.com/free)\n- [Create a Speech Service resource](https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started)\n- [Azure Speech Service Documentation](https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-text-to-speech)\n\n#### Google Cloud\n- [Create a Google Cloud account](https://cloud.google.com/free)\n- [Set up a service account](https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries)\n- [Google TTS Documentation](https://cloud.google.com/text-to-speech/docs)\n\n#### IBM Watson\n- [Create an IBM Cloud account](https://cloud.ibm.com/registration)\n- [Create a Text to Speech service instance](https://cloud.ibm.com/catalog/services/text-to-speech)\n- [Watson TTS Documentation](https://cloud.ibm.com/apidocs/text-to-speech)\n\n#### ElevenLabs\n- [Create an ElevenLabs account](https://elevenlabs.io/)\n- [Get your API key](https://docs.elevenlabs.io/authentication)\n- [ElevenLabs Documentation](https://docs.elevenlabs.io/)\n\n#### Play.HT\n- [Create a Play.HT account](https://play.ht/)\n- [Get your API credentials](https://docs.play.ht/reference/api-getting-started)\n- [Play.HT Documentation](https://docs.play.ht/)\n\n#### Wit.AI\n- [Create a Wit.ai account](https://wit.ai/)\n- [Create a new app and get token](https://wit.ai/docs/quickstart)\n- [Wit.ai Documentation](https://wit.ai/docs)\n\n## License\n\nThis project is licensed under the [MIT License](./LICENSE).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillwade%2Ftts-wrapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillwade%2Ftts-wrapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillwade%2Ftts-wrapper/lists"}