{"id":13436228,"url":"https://github.com/marty1885/paroli","last_synced_at":"2025-04-09T13:03:50.678Z","repository":{"id":213802610,"uuid":"734960110","full_name":"marty1885/paroli","owner":"marty1885","description":"Streaming TTS based on Piper with optional RK3588 NPU support","archived":false,"fork":false,"pushed_at":"2024-12-19T03:16:01.000Z","size":136,"stargazers_count":75,"open_issues_count":4,"forks_count":12,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-02T11:53:41.643Z","etag":null,"topics":["cpp-web-services","drogon","drogonframework","onnx","rk3588","rknpu2","rockchip","text-to-speech","tts","tts-api"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marty1885.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-23T06:50:14.000Z","updated_at":"2025-03-28T03:20:41.000Z","dependencies_parsed_at":"2024-03-17T04:08:29.076Z","dependency_job_id":"fdc17890-c562-4265-af4f-28b95f1a33c3","html_url":"https://github.com/marty1885/paroli","commit_stats":{"total_commits":44,"total_committers":3,"mean_commits":"14.666666666666666","dds":"0.045454545454545414","last_synced_commit":"2f83955d63d346cff5aa64e4e6648cc7ecd349aa"},"previous_names":["marty1885/paroli"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marty1885%2Fparoli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marty1885%2Fparoli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marty1885%2Fparoli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marty1885%2Fparoli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marty1885","download_url":"https://codeload.github.com/marty1885/paroli/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045230,"owners_count":21038553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp-web-services","drogon","drogonframework","onnx","rk3588","rknpu2","rockchip","text-to-speech","tts","tts-api"],"created_at":"2024-07-31T03:00:45.674Z","updated_at":"2025-04-09T13:03:50.661Z","avatar_url":"https://github.com/marty1885.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"# Paroli\n\nStreaming mode implementation of the Piper TTS system in C++ with (optional) RK3588 NPU acceleration support. Named after \"speaking\" in Esperanto.\n\n## How to use\n\nBefore building, you will need to fulfill the following dependencies\n\n* xtensor\n* spdlog\n* libfmt\n* piper-phoenomize\n* onnxruntime (1.14 or 1.15)\n* A C++20 capable compiler\n\n(API/Web server)\n* Drogon\n* libsoxr\n* libopusenc\n    * You'll need to build this from source if on Ubuntu 22.04. Package available starting on 23.04\n\n(RKNN support)\n* [rknnrt \u003e= 1.6.0](https://github.com/rockchip-linux/rknn-toolkit2/tree/v1.6.0/rknpu2/runtime/Linux/librknn_api)\n\nIn which `piper-phoenomize` and `onnxruntime` binary (not the source! Unless you want to build yourselves!) likely needs to be downloaded and decompressed manually. Afterwards run CMake and point to the folders you recompressed them.\n\n```bash\nmkdir build\ncd build\ncmake .. -DORT_ROOT=/path/to/your/onnxruntime-linux-aarch64-1.14.1 -DPIPER_PHONEMIZE_ROOT=/path/to/your/piper-phonemize-2023-11-14 -DCMAKE_BUILD_TYPE=Release\nmake -j\n# IMPORTANT! Copy espeak-ng-data or pass `--espeak_data` CLI flag\ncp -r /path/to/your/piper-phonemize-2023-11-14/share/espeak-ng-data .\n```\n\nAfterwards run `paroli-cli` and type into the console to synthesize speech. Please refer to later sections for generating the models.\n\n```plaintext\n./paroli-cli --encoder /path/to/your/encoder.onnx --decoder /path/to/your/decoder.onnx -c /path/to/your/model.json\n...\n[2023-12-23 03:13:12.452] [paroli] [info] Wrote /home/marty/Documents/rkpiper/build/./1703301190238261389.wav\n[2023-12-23 03:13:12.452] [paroli] [info] Real-time factor: 0.16085024956315996 (infer=2.201744556427002 sec, audio=13.688163757324219 sec)\n```\n\n### The API server\n\nAn web API server is also provided so other applications can easily perform text to speech. For details, please refer to the [web API document](paroli-server/docs/web_api.md) for details. By default, a demo UI can be accessed at the root of the URL. The API server supports both responding with compressed audio to reduce bandwidth requirement and streaming audio via WebSocket. \n\nTo run it:\n\n```bash\n./paroli-server --encoder /path/to/your/encoder.onnx --decoder /path/to/your/decoder.onnx -c /path/to/your/model.json --ip 0.0.0.0 --port 8848\n```\n\nAnd to invoke TSS\n\n```bash\ncurl http://your.server.address:8848/api/v1/synthesise -X POST -H 'Content-Type: application/json' -d '{\"text\": \"To be or not to be, that is the question\"}' \u003e test.opus\n```\n\nDemo:\n\n[![Watch the video](https://img.youtube.com/vi/QkIF9FBrAM8/maxresdefault.jpg)](https://youtu.be/QkIF9FBrAM8)\n\n#### Authentication\n\nTo enable use cases where the service is exposed for whatever reason. The API server supports a basic authentication scheme. The `--auth` flag will generate a bearer token that is different every time and both websocket and HTTP synthesis API will only work if enabled. `--auth [YOUR_TOKEN]` will set the token to YOUR_TOKEN. Furthermore setting the `PAROLI_TOKEN` environment variable will set the bearer token to whatever the environment variable is set to.\n\n```plaintext\nAuthentication: Bearer \u003cinsert the token\u003e\n```\n\n**The Web UI will not work when authentication is enabled**\n\n## Obtaining models\n\nTo obtain the encoder and decoder models, you'll either need to download them or creating one from checkpoints. Checkpoints are the trained raw model piper generates. Please refer to [piper's TRAINING.md](https://github.com/rhasspy/piper/blob/master/TRAINING.md) for details. To convert checkpoints into ONNX file pairs, you'll need [mush42's piper fork and the streaming branch](https://github.com/mush42/piper/tree/streaming). Run\n\n```bash\npython3 -m piper_train.export_onnx_streaming /path/to/your/traning/lighting_logs/version_0/checkpoints/blablablas.ckpt /path/to/output/directory\n```\n\n### Downloading models\n\nSome 100% legal models are provided on [HuggingFace](https://huggingface.co/marty1885/streaming-piper/tree/main).\n\n## Accelerators\n\nBy default the models run on the CPU and could be power hungry and slow. If you'd like to use a GPU and, etc.. You can pass the `--accelerator cuda` flag in the CLI to enable it. For now the only supported accelerator is CUDA. But ROCm can be easily supported, just I don't have the hardware to test it. Feel free to contribute.\n\nThis is the list of supported accelerators:\n* `cuda` - NVIDIA CUDA\n* `tensorrt` - NVIDIA TensorRT\n\n\n### Rockchip NPU (RK3588)\n\nAdditionally, on RK3588 based systems, the NPU support can be enabled by passing `-DUSE_RKNN=ON` into CMake and passing an RKNN model instead of ONNX as the decoder. Resulting in ~4.3x speedup compare to running on the RK3588 CPU cores. Note that the `accelerator` flag has no effect when the a RKNN model is used and only the decoder can run on the RK3588 NPU.\n\nRockchip does not provide any package of some sort to install the libraries and headers. This has to be done manually.\n\n```bash\ngit clone https://github.com/rockchip-linux/rknn-toolkit2\ncd rknn-toolkit2/rknpu2/runtime/Linux/librknn_api\nsudo cp aarch64/librknnrt.so /usr/lib/\nsudo cp include/* /usr/include/\n```\n\nAlso, converting ONNX to RKNN has to be done on an x64 computer. As of writing this document, you likely want to install the version for Python 3.10 as this is the same version that works with upstream piper. rknn-toolkit2 version 1.6.0 is required.\n\n```bash\n# Install rknn-toolkit2\ngit clone https://github.com/rockchip-linux/rknn-toolkit2\ncd rknn-toolkit2/tree/master/rknn-toolkit2/packages\npip install rknn_toolkit2-1.6.0+81f21f4d-cp310-cp310-linux_x86_64.whl\n\n# Run the conversion script\npython tools/decoder2rknn.py /path/to/model/decoder.onnx /path/to/model/decoder.rknn\n```\n\nTo use RKNN for inference, simply pass the RKNN model in the CLI. An error will appear if RKNN is passed in but RKNN support not enabled during compiling.\n\n```bash\n./paroli-cli --encoder /path/to/your/encoder.rknn --decoder /path/to/your/decoder.onnx -c /path/to/your/model.json\n#                                           ^^^^\n#                                      The only change\n```\n\n## Developer notes\n\nTODO:\n\n- [ ] Code cleanup\n- [ ] Investigate ArmNN to accelerate encoder inference\n- [ ] Better handling for authentication\n* RKNN\n    - [ ] Add dynamic shape support when Rockchip fixes them\n    - [ ] Try using quantization see if the speedup is worth the lowered quality\n\n## Notes\n\nThere's no good way to reduce synthesis latency on RK3588 besides Rockchip improving rknnrt and their compiler. The encoder is a dynamic graph thus RKNN won't work. And how they implement multi-NPU co-process prohibits faster single batch inference. Multi batch can be made faster but I don't see the value of it as it is already fast enough for home use.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarty1885%2Fparoli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarty1885%2Fparoli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarty1885%2Fparoli/lists"}