https://github.com/allisterb/victor
Voice Interactive Controller
https://github.com/allisterb/victor
assistive-technology blindness cui dotnet nlu voice-control
Last synced: about 1 year ago
JSON representation
Voice Interactive Controller
- Host: GitHub
- URL: https://github.com/allisterb/victor
- Owner: allisterb
- License: gpl-3.0
- Created: 2019-09-19T21:33:19.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-08-03T21:35:02.000Z (almost 2 years ago)
- Last Synced: 2024-08-03T22:34:31.342Z (almost 2 years ago)
- Topics: assistive-technology, blindness, cui, dotnet, nlu, voice-control
- Language: C#
- Homepage:
- Size: 4.33 MB
- Stars: 6
- Watchers: 5
- Forks: 3
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Victor - Voice Interactive Controller
Victor is an free cross-platform programmable voice control framework for desktops that was started during my entry into the [Mozilla Voice Challenge](https://www.herox.com/voice) to test some ideas for an integrated open-source voice stack, and then as the base client platform for the [Victor CX](https://github.com/allisterb/Victor/tree/master/src/CUI) auditory CUI that was [my entry](https://devpost.com/software/victorcx) into RedHat's ReBoot Customer Experience Hackathon.
The following videos are available demoing and documenting some aspects of Victor (click the screenshot):
[](https://youtu.be/Lvw4WmbTTBk "Victor Test 1")
[](https://youtu.be/LQLpoyohYtE "Victor Test 2")
## Architecture
Victor currently uses the following open-source projects:
### Julius (ASR)
Julius is a hi-speed accurate and flexible LVCSR library whicn can decode and recognize speech in real-time using a variety of models built for different languages like Japanese, English and Polish. Unlike other ASR libraries including Facebook's wav2letter++ Julius is fully supported on Windows and unlike Mozilla's own DeepSpeech, Julis can decode speech waveform input via a system mic device in real-time with appropriate handling of silences and pauses. Julius is used in the [Simon](https://simon.kde.org/) voice control program for KDE. Leslaw Pawlaczyk has [created](https://discourse.mozilla.org/t/julius-speech-models-based-on-mozilla-corpus/27651) Julius models based on the Mozilla corpus and has modified Julius to support DNN-HMM models as well as GMM-HMM.
Julius can be built as a statically-linked binary and run as a sub-process of Victor. Victor communicates with Julius by monitoring its stdout stream and detecting the different states the program is in:
[](https://www.youtube.com/watch?v=1PFBRR15F-A "Victor Debug Mode")
The desired Julius configuration is specified in a plain text file and passed to the Julius executable as a startup argument. In this way Julius can be used by any program on any hardware or operation system platform supported by Julius. Julius's portability and real-time input recognition capabilities make it a good choice for the ASR component of an integrated voice stack.
### SnipsNLU (NLU)
[Snips NLU](https://github.com/snipsco/snips-nlu-rs) is a hi-speed accurate open-source NLU inference engine which can recognize intents and entities in utterances for a particular domain in real-time. It is written in Rust and has an FFI allowing it to be used by any language that call C libraries.
Victor [interfaces](https://github.com/allisterb/Victor/blob/master/src/NLU/Victor.NLU.Snips/SnipsApi.cs) with the Snips NLU engine using its C FFI e.g in C# calling a SnipsNLU function in a native DLL looks like:
```
[DllImport("snips_nlu_ffi", CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi)]
internal static extern SNIPS_RESULT snips_nlu_engine_create_from_dir
([In, MarshalAs(UnmanagedType.LPStr)] string root_dir, [In, Out] ref IntPtr client);
```
Abstractions over the lower-level Snips functions are built-up to avoid other code having to manage the details of calling the library code. This is the standard procedure used for Snips bindings to other languages like Python. This ability to interface with the Snips library directly removes the need for an intermediate Python interpreter or REST API makes SnipsNLU a good choice for the NLU component of an integrated voice stack.
### Mimic (TTS)
Victor can use the Mimic TTS engine but generally it is better to rely on the operating system's narrator or TTS capabilities or the user's installed screen reader.