{"id":21487201,"url":"https://github.com/dtinth/vxtron","last_synced_at":"2025-10-09T01:13:06.006Z","repository":{"id":139905232,"uuid":"162834641","full_name":"dtinth/vxtron","owner":"dtinth","description":"An electron app that listens to my voice, converts to text, and copies it to the clipboard. Powered by Google Cloud Speech-To-Text API.","archived":false,"fork":false,"pushed_at":"2019-01-08T18:54:16.000Z","size":574,"stargazers_count":20,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-10-09T01:11:26.918Z","etag":null,"topics":["cloud-speech-api","electron","web-speech-api"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dtinth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-12-22T18:37:08.000Z","updated_at":"2025-08-05T20:32:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"7f71186e-0f35-404b-b9b6-a4e612d2ad40","html_url":"https://github.com/dtinth/vxtron","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dtinth/vxtron","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtinth%2Fvxtron","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtinth%2Fvxtron/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtinth%2Fvxtron/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtinth%2Fvxtron/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dtinth","download_url":"https://codeload.github.com/dtinth/vxtron/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtinth%2Fvxtron/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000725,"owners_count":26082894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-speech-api","electron","web-speech-api"],"created_at":"2024-11-23T13:27:20.960Z","updated_at":"2025-10-09T01:13:06.001Z","avatar_url":"https://github.com/dtinth.png","language":"TypeScript","readme":"# vxtron\n\nA little application listens to my voice and converts to text and copies it to\nthe clipboard. Built using Electron.\n\n## Why\n\n- I suffer from repetitive strain injury (osteoarthritis in the fingers, I\n  guess), so, it helps a lot if I can type using my voice.\n\n- I am not a native English speaker — macOS’s dictation fails to accurately\n  recognize my voice accent.\n\n- macOS’ Dictation does not have a public API that apps can use (not hackable).\n\n- Google Cloud Text to speech enhanced voice models are much more accurate than\n  macOS’ Dictation and the free webkitSpeechRecognition API. It also comes with\n  automatic punctuation insertion, which means it can automatically add full\n  stops, commas, and question marks.\n\n## Supported speech recognizers\n\n`vx` support pluggable speech recognizers. You can choose to use one of the\nfollowing:\n\n- **Google Chrome’s webkitSpeechRecognition** (free, default)\n\n  - Can be used free of charge\n  - No time limit\n  - Requires opening a browser tab in background\n  - Recognition quality is not so great\n  - No automatic punctuation insertion\n\n- **Google Speech-To-Text API** (paid)\n\n  - Much better recognition accuracy for English language\n  - Automatically adds punctuation marks\n  - Does not require opening a browser tab in background\n  - Costs money\n  - Each session is limited to 60 seconds\n\n## Development setup\n\n1. Clone this repository.\n\n2. Install the dependencies for the Electron app:\n\n   ```\n   yarn\n   ```\n\n3. Install the dependencies for the React app, located in `vxgui` foler:\n\n   ```\n   (cd vxgui \u0026\u0026 yarn)\n   ```\n\n4. Build the React app:\n\n   ```\n   (cd vxgui \u0026\u0026 yarn build)\n   ```\n\n5. Build an `.app` bundle:\n\n   ```\n   yarn build\n   ```\n\n## Configuration with Google Chrome\n\nGoogle Chrome provides the webkitSpeechRecognition API which is available for\nfree. However it can only be used inside Google Chrome (which means it is\n[not available in other Chromium-based environment, including Electron](https://stackoverflow.com/questions/36214413/webkitspeechrecognition-returning-network-error-in-electron)).\n`vx` uses a hacky workaround by launching Google Chrome to a webpage which helps\nexpose the webkitSpeechRecognition API to the Electron app via socket.io.\n\n1. This is the default behavior; you don't need to configure anything to use\n   this mode.\n\n2. You can configure more options by creating `~/.vxrc.yml` in the home\n   directory with the following configuration:\n\n   ```yml\n   speechProvider: chrome\n   speechProviderOptions:\n     port: 5555\n     openBrowser: false # default: true\n     app: Google Chrome # see: https://www.npmjs.com/package/opn#app\n   ```\n\n## Configuration with Google Cloud Speech-To-Text\n\n1. Create a Google Cloud platform project and enable billing on it.\n\n2. Go to\n   [Google Cloud API library](https://console.cloud.google.com/apis/library) and\n   enable the Google Cloud Speech API.\n\n3. To get access to enhanced voice models,\n   [turn on data logging](https://console.cloud.google.com/apis/api/speech.googleapis.com/data_logging).\n\n4. [Set up authentication with a service account](https://cloud.google.com/docs/authentication/getting-started).\n   Download a service account file and save it to your computer.\n\n5. Create `~/.vxrc.yml` with the following configuration:\n\n   ```yml\n   speechProvider: google-cloud\n   speechProviderOptions:\n     serviceAccount: /path/to/service-account.json\n     recordProgram: /usr/bin/rec\n   ```\n\n   - `serviceAccount` is the full path to your service account file\n   - `recordProgram` is the full path to [SoX’s](http://sox.sourceforge.net/)\n     `rec` executable.\n\n## Usage\n\n- Launch the Electron app at `dist/mac/vxtron.app`.\n\n- Press **Cmd+Shift+L** to dictate English text. Press it again to make it stop\n  listening.\n\n- Press **Cmd+Alt+Shift+L** to dictate Thai text. Press it again to make it stop\n  listening.\n\n- As soon as you finish speaking, the recognized text will be copied to the\n  clipboard automatically.\n\n- The app remembers the past texts (not persistent), and you can use\n  **Cmd+Alt+Up** and **Cmd+Alt+Down** to cycle through them. As you cycle\n  through the history, the recalled text will also be copied to the clipboard\n  automatically.\n\n## Development\n\n### Developing the GUI\n\nA browser-based development environment is available. It is purely browser-based\nand doesn't use Electron APIs or Google Cloud Speech-To-Text. Instead, it uses\nthe `webkitSpeechRecognition` API to recognize your voice.\n\nThis means it doesn't cost anything while development, but recognition accuracy\nwill suffer, and automatic punctuation insertion will not be available.\n\n1. Run `yarn start` in `vxgui` directory:\n\n   ```\n   (cd vxgui \u0026\u0026 yarn start)\n   ```\n\n2. This will launch create-react-app development server. A browser should open\n   to `localhost:3000` automatically. Make sure you are using **Google Chrome**\n   (otherwise the speech recognition API will not be available).\n\n3. The key bindings are the same, except that you use **Ctrl** instead of\n   **Cmd** key. For example, press **Ctrl+Alt+L** to listen to text.\n\n   The accelerator key is changed to prevent conflict between the development\n   version and the electron version, which may be running at the same time.\n\n4. The copy functionality will not work because a webapp may not copy stuff to\n   the clipboard without a user interaction. However, this can be circumvented\n   by exposing Chrome DevTool’s `copy` function into the webapp. You can do that\n   by running the following command in the JavaScript console:\n\n   ```js\n   copy('...')\n   Object.assign(window, { copy })\n   ```\n\n### Building the GUI as a static HTML file for electron\n\nOnce you finish developing, run `yarn build` in `vxgui` directory.\n\n```\n(cd vxgui \u0026\u0026 yarn build)\n```\n\nThis will build the files into the `vxgui/build` directory.\n\n### Testing the development app in Electron\n\nSometimes, you really need to test some Electron-specific APIs, and having to\nrebuild a bundle every time we want to test it is not ideal.\n\nAlternatively, with the development server running, you can run the Electron app\nwith an environment variable `VX_DEV=1` to make the Electron app load the app\nfrom `localhost:3000` instead of the built files.\n\n```\nVX_DEV=1 yarn start\n```\n\n## Architecture\n\nThere are two main components in this project:\n\n1. The web application, built using React and TypeScript.\n\n   - It contains the core application logic, such as how the transcript from the\n     speech recognition service is handled.\n   - It is designed to run both in Browser environment (for development) and\n     Electron environment (for actual use).\n\n   | Environment            | Browser                     | Electron                    |\n   | ---------------------- | --------------------------- | --------------------------- |\n   | Use case               | For development             | For real-world usage        |\n   | Display                | As a web app                | As an overlay HUD           |\n   | Activation             | Only inside web app         | Available system-wide       |\n   | Speech recognition API | webkitSpeechRecognition API | Google Cloud Speech-To-Text |\n   | Recognition quality    | Not so accurate for me      | Very accurate               |\n   | Automatic punctuation  | Not supported               | Supported                   |\n   | Cost of usage          | Free                        | \\$0.048/min                 |\n\n2. The electron application\n\n- Provides the overlay GUI.\n- Provides access to global hotkeys\n- Provides access to Google Cloud Speech APIs.\n\n## Cost\n\nI have to use the premium \"video\" voice model which is able to recognize my\nvoice with acceptable accuracy (none of the other models can do this). The model\nis also much better at recognizing speech with a lot of technical terms,\ncompared to the default model.\n\nIt costs USD 0.048 per minute to use. The first 60 minutes per month are free.\n\nWhen the speech API is being used, vx keeps track of its usage log in\n`~/.vx-google-cloud-speech.log`. It is a TSV file with 3 columns:\n\n1. Timestamp\n2. Usage in seconds, rounded up.\n3. The pricing plan (1: normal speech recognition at\n   $0.024/min, 2: enhanced video speech recognition at $0.048/min).\n\nThere is also a simple Ruby script that displays a summary of how much is spent\non this API per day. You can run it using `ruby cost.rb`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtinth%2Fvxtron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdtinth%2Fvxtron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtinth%2Fvxtron/lists"}