{"id":19212340,"url":"https://github.com/leafo/talkxtyper","last_synced_at":"2025-05-12T20:14:57.605Z","repository":{"id":244449312,"uuid":"815276432","full_name":"leafo/talkxtyper","owner":"leafo","description":"Type with your voice","archived":false,"fork":false,"pushed_at":"2024-09-08T17:59:17.000Z","size":191,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-07T20:09:39.150Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leafo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-14T18:36:48.000Z","updated_at":"2024-09-08T17:59:20.000Z","dependencies_parsed_at":"2024-06-14T19:52:18.290Z","dependency_job_id":"0583d198-55cf-4fb3-955a-859bec4cbaee","html_url":"https://github.com/leafo/talkxtyper","commit_stats":null,"previous_names":["leafo/talkxtyper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leafo%2Ftalkxtyper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leafo%2Ftalkxtyper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leafo%2Ftalkxtyper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leafo%2Ftalkxtyper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leafo","download_url":"https://codeload.github.com/leafo/talkxtyper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253814991,"owners_count":21968561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T13:46:34.713Z","updated_at":"2025-05-12T20:14:57.581Z","avatar_url":"https://github.com/leafo.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TalkXTyper\n\nTalkXTyper is a desktop application that will, on command, record your voice,\ntranscribe it using the OpenAI Whisper API, and \"type\" it to your computer. It\nis activated with a global hotkey so that you do not lose focus on the area\nyou're typing into.\n\n## Rationale\n\nThere are a few transcription tools out there, but I wanted to create my own so\nI could explore different ideas based around my own workflow.\n\nAlthough Whisper is very good, it lacks context for what is going on on the\nscreen. For example, if you are coding and want to reference a variable on the\nscreen named `my_variable`, saying \"my variable\" will often produce \"My\nvariable\" instead of the symbol on the screen.\n\n### Attempts\n\n1. **Send screenshot of desktop to to gpt-4o**\n   - [x] Idea: take and send screenshot of the desktop while audio is being recorded,\n   send image to gpt-4o to ask it to extract relevant textual features from the\n   image. Combine the extracted information with the whisper output to attempt\n   to fix the transcription to match text on the screen.\n     - Resut: gpt-4o with vision is too slow, it makes the typing experience too slow\n   - [ ] Use Claude Sonnet 3.5, it appears to be much faster with image processing\n\n2. **Using the `prompt` parameter with Whisper API**\n   The whisper API includes a `prompt` parameter that can be used for basic\n   instruction during transcription. The results were poor and the max size is\n   short. Haven't found a use for it\n\n3. **Extract text from running app**\n   Idea: Query what the currently focused app is, then have custom code to\n   extract the text from the screen.\n   - [x] Implement text extraction from nvim using the `nvim` remote API\n   - [ ] Explore extracting text from browser. (Consider a browser extension)\n\n## Configuration\n\nThe configuration for TalkXTyper is stored in a JSON file located in your user\nconfiguration directory. The file is named `talkxtyper-config.json`.\n\n### Configuration Options\n\n- `OpenAIKey`: Your API key for the OpenAI Whisper API.\n- `IncludeScreen`: A boolean value indicating whether to analyze the screen to augment the transcription. The config file will be updated automatically if you change this value in the program.\n- `IncludeNvim`: A boolean value indicating whether to analyze the screen to augment the transcription.\n\n## Web interface\n\n`ListenAddress` can be specified in the config file to enable the web\ninterface. The web interface includes some experimental functionality. The web\ninterface is not enabled by default.\n\nEg. Setting `ListenAddress` to `\"localhost:9898\"` will make the web interface\naccessible at `http://localhost:9898`.\n\nSECURITY NOTE: The web interface adds a HTTP API for controlling recording and\ntranscribing, in addition to taking screenshots of the desktop. Don't leave it\nrunning if you don't need it.\n\nThe web interface exposes a way to review transcription history via `/history`\nand listen to the audio files that were recorded. You can use this to debug if\nrecording is working as expected.\n\n## Installation\n\nTo install TalkXTyper, you will need to have Go installed. Run the following command:\n\n    go install github.com/leafo/talkxtyper@latest\n\nThis project has only been tested on Linux, but it uses cross-platform libraries, so it should work on other platforms.\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleafo%2Ftalkxtyper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleafo%2Ftalkxtyper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleafo%2Ftalkxtyper/lists"}