{"id":17311799,"url":"https://github.com/zdisket/tensorvox","last_synced_at":"2026-01-23T20:12:03.675Z","repository":{"id":45642875,"uuid":"287195029","full_name":"ZDisket/TensorVox","owner":"ZDisket","description":"Desktop application for neural speech synthesis written in C++","archived":false,"fork":false,"pushed_at":"2023-03-01T06:26:45.000Z","size":16226,"stargazers_count":211,"open_issues_count":1,"forks_count":20,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-12-19T03:09:31.420Z","etag":null,"topics":["desktop","fastspeech2","mb-melgan","multiband-melgan","phoneme","real-time","speech-synthesis","tacotron2","text-to-speech","tts","voice-synthesis"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZDisket.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-13T06:02:18.000Z","updated_at":"2024-11-22T12:07:22.000Z","dependencies_parsed_at":"2024-10-15T12:41:43.048Z","dependency_job_id":"9987e39e-7a6a-4919-b549-4e0006ddf068","html_url":"https://github.com/ZDisket/TensorVox","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZDisket%2FTensorVox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZDisket%2FTensorVox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZDisket%2FTensorVox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZDisket%2FTensorVox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZDisket","download_url":"https://codeload.github.com/ZDisket/TensorVox/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231468021,"owners_count":18381174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["desktop","fastspeech2","mb-melgan","multiband-melgan","phoneme","real-time","speech-synthesis","tacotron2","text-to-speech","tts","voice-synthesis"],"created_at":"2024-10-15T12:41:30.092Z","updated_at":"2026-01-23T20:12:03.670Z","avatar_url":"https://github.com/ZDisket.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TensorVox\n\n\nTensorVox is an application designed to enable user-friendly and lightweight neural speech synthesis in the desktop, aimed at increasing accessibility to such technology. \n\nBeing able to load models by [TensorFlowTTS](https://github.com/TensorSpeech/TensorFlowTTS),  [Coqui-TTS](https://github.com/coqui-ai/TTS), [VITS](https://github.com/jaywalnut310/vits), and VITS EVO, it is written in pure C++/Qt, using the ONNX Runtime, and supporting TensorFlow and LibTorch as legacy backends.\n\n![Interface with Tac2 model loaded](https://i.imgur.com/wtPzzNh.png)\n\n\n### Try it out\n**System requirements:** Windows 10 64-bit and a CPU that supports the AVX instruction set (pretty much anything made after 2010). As for GPU, to use it you need one that supports DirectX 12 (only with ONNX models)\n\n\n[Detailed guide in Google Docs](https://docs.google.com/document/d/1OS1kfb19bvpPPkF71Vbak_b735mi7epjUanIfPG671M/edit?usp=sharing)\n\nGrab a copy from the releases, extract the .zip and check [the Google Drive folder](https://drive.google.com/drive/folders/1atUyxBbstKZpMqQEZMdNmRF2AKrlahKy?usp=sharing) for models and installation instructions\n\nIf you're interested in using your own model, first you need to train then export it. \n\n\n## Supported architectures\n\nTensorVox supports models from four repos. \n - **VITS Evolution:** This is my fully upgraded version of VITS, with ONNX support\n - **jaywalnut310/VITS:** VITS, which is a fully E2E model. (Stressed IPA as phonemes) Export notebook: [\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\"\u003e](https://colab.research.google.com/drive/1BSGE5DQYweXBWrwPOmb6CRPUU8H5mBvb?usp=sharing)\n - **TensorFlowTTS**: FastSpeech2, Tacotron2, both char and phoneme based and Multi-Band MelGAN. Here's a Colab notebook demonstrating how to export the LJSpeech pretrained, char-based Tacotron2 model: [\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\"\u003e](https://colab.research.google.com/drive/1KLqZ1rkD4Enw7zpTgXGL6if7e5s0UeWa?usp=sharing) \n - **Coqui-TTS:** Tacotron2 (phoneme-based IPA) and Multi-Band MelGAN, after converting from PyTorch to Tensorflow. Here's a notebook showing how to export the LJSpeech DDC model: [\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\"\u003e](https://colab.research.google.com/drive/15CdGEAu_-KezV1XxwzVfQiFSm0tveBkC?usp=sharing)\n\nMore support of modern TTS models is being actively worked on!\n\nThese examples should provide you with enough guidance to understand what is needed. If you're looking to train a model specifically for this purpose, then stay tuned... \n*Or if you’d rather skip the training and export work, you can also get a TensorVox-ready model directly from me. (see contact details at the bottom of this)*\n\nAs for languages, out-of-the-box support is provided for English, German and Spanish (only TensorFlowTTS); that is, you won't have to do anything. You can add languages without modifying code, as long as the phoneme set are IPA (stressed or nonstressed), ARPA, or GlobalPhone, (open an issue and I'll explain it to you)\n\n## Backends\nTensorVox currently supports multiple inference backends.\n\nLibTorch (TorchScript) and TensorFlow backends are maintained for compatibility with older models and projects created before ONNX export was refined enough.\n\nNew development and active support are focused on ONNX Runtime, with DirectML used for GPU acceleration on Windows. This backend provides the best portability, long-term stability, and hardware coverage.\n\n\n## Build instructions\nCurrently, only Windows 10 x64 (although I've heard reports of it running on 8.1) is supported.\n\n**Requirements:**\n 1. Qt Creator\n 2. MSVC 2017 (v141) compiler\n\n**Primed build (with all provided libraries):**\n\n 1. Download [precompiled binary dependencies and includes](https://drive.google.com/file/d/1N6IxSpsgemS94z_v82toXhiNs2tLXkz6/view?usp=sharing)\n 2. Unzip it so that the `deps` folder is in the same place as the .pro and main source files.\n 3. Open the project with Qt Creator, add your compiler and compile\n\nNote that to try your shiny new executable you'll need to download a release of program as described above and replace the executable in that release with your new one, so you have all the DLLs in place.\n\nTODO: Add instructions for compile from scratch.\n\n## Externals (and thanks)\n\n - **ONNX Runtime** :https://onnxruntime.ai/\n\n - **Tensorflow C API**: [https://www.tensorflow.org/install/lang_c](https://www.tensorflow.org/install/lang_c)\n - **CppFlow** (TF C API -\u003e C++ wrapper): [https://github.com/serizba/cppflow](https://github.com/serizba/cppflow) \n - **AudioFile** (for WAV export): [https://github.com/adamstark/AudioFile](https://github.com/adamstark/AudioFile)\n - **Frameless Dark Style Window**: https://github.com/Jorgen-VikingGod/Qt-Frameless-Window-DarkStyle\n - **JSON for modern C++**: https://github.com/nlohmann/json\n - **r8brain-free-src** (Resampling): https://github.com/avaneev/r8brain-free-src\n - **rnnoise** (CMake version, denoising output): https://github.com/almogh52/rnnoise-cmake\n - **Logitech LED Illumination SDK** (Mouse RGB integration): https://www.logitechg.com/en-us/innovation/developer-lab.html\n - **QCustomPlot** : https://www.qcustomplot.com/index.php/introduction\n - **libnumbertext** : https://github.com/Numbertext/libnumbertext\n\n\n## Contact\nYou can open an issue here or join the [Discord server](https://discord.gg/B9fGwXgz) and discuss/ask anything there\n\nCustom model training, fine-tuning, and compatible exports are available on request (not free). Use email or DM me on Xitter\n\nFollow me on X (formerly Twitter): [ZD1908 (@ZDi____) / X](https://x.com/ZDi____)\nFor any formal inquiries, send to this email: nika109021@gmail.com\n\n## Note about licensing\n\nThis program itself is MIT licensed, but for the models you use, their license terms apply. For example, if you're in Vietnam and using TensorFlowTTS models, you'll have to check [here](https://github.com/TensorSpeech/TensorFlowTTS#license) for some details\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzdisket%2Ftensorvox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzdisket%2Ftensorvox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzdisket%2Ftensorvox/lists"}