https://github.com/acecentre/sapi-poc
Use a range of TTS systems in SAPI for Windows. A Pipe service, A register Voice GUI and dll that links with SAPI
https://github.com/acecentre/sapi-poc
sapi tts
Last synced: 9 months ago
JSON representation
Use a range of TTS systems in SAPI for Windows. A Pipe service, A register Voice GUI and dll that links with SAPI
- Host: GitHub
- URL: https://github.com/acecentre/sapi-poc
- Owner: AceCentre
- License: mit
- Created: 2024-06-17T10:15:23.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-03T06:36:03.000Z (over 1 year ago)
- Last Synced: 2025-04-07T02:49:49.110Z (12 months ago)
- Topics: sapi, tts
- Language: C++
- Homepage:
- Size: 548 KB
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## SAPI-Bridge
## What are we trying to do?
- Create a SAPI engine
- No synth directly in the dll. Pass all speak calls - infact all calls for other methods to a pipe service (see VoiceServer)
So to be clear
- Have a pipe service running in the background which deals with all sapi calls
- A GUI "Register your voice" app (see VoiceServer/RegisterVoice)
- A SAPI Dll that is registered in windows which for all purposes to windows works as a sapi system but redirects to a pipe service
## Why like this? Why not build all on the dll?
Because some TTS systems we are using have really slow cold-start times to call each time. Take for example sherpa-onnx models. Its far faster to hold the client object in memory and next synth streaming calls are faster
## Are you sure? There are going to be loads of methods you cant support
For sure - we arent totally sure. Infact we can do lots to improve coldstart times - but we already have most of the service code working and a pretty decent tts-wrapper that allows us to do synth streaming calls across a wide range of speech engines that are on and offline.
For all the methods - you are right - we are going to struggle to do things like wordEvents for all engines. But for some we already have this (https://github.com/willwade/tts-wrapper?tab=readme-ov-file#feature-set-overview)
## What was your code originally doing here
We built a simple proof of concept to build a SAPI dll and integrate it into python. Its a total PITA to build and - we figured why not just abstract this and remove the python headaches. Redirect to a pipe service..
**WE HAVENT MIGRATED AWAY FROM THIS DLL CODE MUCH - INFACT IT JUST DOESNT WORK**
** So what should this dll do?**
1. Register as a SAPI engine
2. When voices call this engine it redirects speak calls - and other callls it can support to a pipe service
3. We should safely pass on calls we cant support
** How could/should we test this?**
Use https://www.cross-plus-a.com/balabolka.htm and see if the voices get registered (they do now) but they dont speak
So what follows below is specific code details on the DLL part..
# Building
Use Visual Studio to build the project. The project is a CMake project, so you can use CMake to generate the Visual Studio solution file
So the build process is as follows:
1. Download opensource jsoncpp.
2. Build jsoncpp.lib and jsoncpp.dll
3. Link jsoncpp.lib and dll to pysapittsengine project.
4. Then build the pysapittsengine.dll
```
cd engine
mkdir build
cd build
cmake ..
cmake --build . --config Debug
cmake --build . --config Release
```
# Registering engine (run as Administrator)
```
regsvr32.exe pysapittsengine.dll
```
# Registering voice (run as Administrator)
```
regvoice.exe --token PYTTS-AzureNeural --name "Azure Neural" --vendor Microsoft --path C:\Work\SAPI-POC;C:\Work\build\venv\Lib\site-packages --module voices --class AzureNeuralVoice
```
Or use the GUI to register voices.
See VoiceServer/README.md for more information.