{"id":25314142,"url":"https://github.com/perrette/scribe","last_synced_at":"2026-02-17T14:32:23.185Z","repository":{"id":277208909,"uuid":"931691200","full_name":"perrette/scribe","owner":"perrette","description":"scribe is a local speech recognition tool that provides real-time transcription using vosk and whisper AI, with the goal of serving as a virtual keyboard on a computer","archived":false,"fork":false,"pushed_at":"2025-03-10T10:50:43.000Z","size":212,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-06T23:02:44.215Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/perrette.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["perrette"]}},"created_at":"2025-02-12T17:41:06.000Z","updated_at":"2025-03-10T10:50:47.000Z","dependencies_parsed_at":"2025-10-06T23:00:38.131Z","dependency_job_id":"efa062aa-643d-4af2-90c2-3419b845b16c","html_url":"https://github.com/perrette/scribe","commit_stats":null,"previous_names":["perrette/voskrealtime","perrette/scribe"],"tags_count":33,"template":false,"template_full_name":null,"purl":"pkg:github/perrette/scribe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fscribe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fscribe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fscribe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fscribe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/perrette","download_url":"https://codeload.github.com/perrette/scribe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/perrette%2Fscribe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29547404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T13:00:00.370Z","status":"ssl_error","status_checked_at":"2026-02-17T12:57:14.072Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-13T16:54:52.860Z","updated_at":"2026-02-17T14:32:23.158Z","avatar_url":"https://github.com/perrette.png","language":"Python","funding_links":["https://github.com/sponsors/perrette"],"categories":[],"sub_categories":[],"readme":"[![pypi](https://img.shields.io/pypi/v/scribe-cli)](https://pypi.org/project/scribe-cli)\n![](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fperrette%2Fscribe%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)\n\n# Scribe  \u003cimg src=\"https://github.com/perrette/bard/raw/main/bard_data/share/icon.png\" width=48px\u003e\n\n`scribe` is a speech recognition tool that provides real-time transcription using cutting-edge AI models, with the goal of serving as a virtual keyboard on a computer.\n\nIt features local, downloadable models with the `vosk` and `whisper` backends, as well as a client to open AI via `openaiapi` backend (API key required).\n\n## Compatibility\n\nThe package is initially developped for python 3.12 with Ubuntu 24.04 with Gnome + Wayland, but it should work on other platforms as well (feedback welcome).\nBasically check the pages of the dependencies for more info (i.e. pynput for the keyboard, pystray for the app).\n\n- python 3.13:\n    - at the time of writing, `openai-whisper` does not install.\n\n- Ubuntu:\n    - see caveats in the use of the keyboard under Wayland [keyboard section](#use-the-keyboard-with-wayland).\n- MacOS:\n    - tested on a Macbook Air M1 8Gb RAM, with python 3.12. It runs, but poorly, presumably because of the low memory: prefer the `openaiapi` backend for such machines\n    - I expect better memory specs will have the local models run fine\n- Windows:\n    - not tested yet\n\n## Installation\n\nInstall PortAudio library (required by `sounddevice`) and xclip library (required by `pyperclip`). E.g. on Ubuntu:\n\n```bash\nsudo apt-get install portaudio19-dev xclip\n```\n\n(`portaudio19-dev` becomes `portaudio ` with homebrew)\n\nSee additional requirements for the [icon tray](#system-tray-icon-experimental-) and [keyboard](#virtual-keyboard-experimental) options. The python dependencies should be dealt with automatically:\n\n```bash\npip install scribe-cli[all]\n```\n\n(note the `-cli` suffix for client)\n\nor for local development:\n\n```bash\ngit clone https://github.com/perrette/scribe.git\ncd scribe\npip install -e .[all]\n```\n\nYou can leave the optional dependencies (leave out `[all]`) but must install at least one of `vosk` or `openai-whisper` or `openai` packages (see Usage below).\n\nAt the time of writing `openai-whisper` does not install on `python 3.13`. You can install the packages manually and skip that package. This makes the `whisper` API unavailable.\n\n### Manual selection of the dependencies\n\n```bash\n# language models (at least one must be installed !)\npip install vosk\npip install openai soundfile  # openaiapi\npip install openai-whisper   # FAILS IN PYTHON 3.13 on Ubuntu\n\n# PortAUDIO (sounddevice)\npip install sounddevice # automatically installed as required dependency\nsudo apt-get install portaudio19-dev\n# MAC OS: brew install portaudio\n\n# clipboard\npip install pyperclip  # automatically installed as required dependency\nsudo apt-get install xclip\n\n# keyboard\npip install pynput\n\n# app mode\nsudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1  # Ubuntu ONLY (not needed on MacOS)\npip install PyGObject # Ubuntu ONLY (not needed on MacOS)\npip install pystray\n\n# And finally\npip install scribe-cli\n```\n\nThe language models for local backends `vosk` and `whisper` will download on-the-fly.\nThe default download folder is `$XDG_CACHE_HOME/{backend}` where `$XDG_CACHE_HOME` defaults to `$HOME/.cache`.\n\n## Usage\n\nJust type in the terminal:\n\n```bash\nscribe\n```\nand the script will guide you through the choice of backend (`whisper` or `vosk` or `openaiapi`) and the specific language model.\nAfter this, you will be prompted to start recording your microphone and print the transcribed text in real-time (`vosk`)\nor until after recording is complete (`whisper`).\nYou can interrupt the recording via Ctrl + C and start again or change model.\n\nThe default (`whisper`) is excellent at transcribing a full-length audio sequences in [many languages](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). It is really impressive,\nbut it cannot do real-time, and depending on the model can have relatively long execution time, especially with the `turbo` model (at least on my laptop with CPU only). The `small` model is also excellent and runs much faster. It is selected as default in `scribe` for that reason.\nWith the `whisper` model (`whisper` and `openaiapi` backends) the registration continues for 2 minutes until you stop the registration manually to trigger the transcription (Stop in the app, Ctrl + C in the terminal).\nThese parameters can be changed. There is also the possibility to interrupt after a silence is detected. You would do: `--silence -40 --duration-duration 2` to interrupt the registration when a silence (less than -40 db recorded) lasts for more than 2 seconds. This is experimental, and the default is an exceedingly low silence threshold of -200 db and a silence duration of 120 s to effectively disable that feature and keep full manual control.\n\nThe `vosk` backend is much faster and very good at doing real-time transcription for one language, but tended to make more mistakes in my tests and it does not do punctuation.\nIt becomes really powerful when used for longer or interactive typing session with the [keyboard](#virtual-keyboard-experimental) option, e.g. to make notes or chat with an AI.\nThere are many [vosk models](https://alphacephei.com/vosk/models) available, and here a few are associated to [a handful of languages](scribe/models.toml) `en`, `fr`, `it`, `de` (so far).\n\nThe `openaiapi` backend uses `whisper-1` model at the time of writing. It requires an API key best passed as an environment variable, e.g. in bash:\n```bash\nexport OPENAI_API_KEY=YOURAPIKEY\nscribe --backend openaiapi\n```\nThe `openaiapi` backend is lightweight and handy if you have an API (you can create one for free for testing) and a low-spec computer (and don't care too much about privacy, obviously).\n\n## Output media\n\nBy default the transcription is printed on the terminal, but other output media are supported.\n\n### Clipboard\n\nThe most straightforward is the clipboard:\n\n```bash\nscribe --clipboard\n```\nThe content of the (full) transcription is then pasted to the clipboard, and it is up to the user to paste (e.g. Ctrl + V).\n\n### Output file\n\nAlternatively an output file can be indicated:\n\n```bash\nscribe -o transcription.txt\n```\n\n### Virtual keyboard (experimental)\n\nWith the `--keyboard` option `scribe` will attempt to simulate a keyboard and send transcribed characters to the application under focus:\n\n```bash\nscribe --keyboard\n```\n\nThis can be extremely useful with the `vosk` backend and its realtime transcription, or alternatively with the `--restart` option with the `whisper` backend.\n\nThe `--keyboard` option relies on the optional `pynput` dependency (installed together with `scribe` if you used the `[all]` or `[keyboard]` option).\nDepending on your operating system, `pynput` may require additional configuration to work around its [limitations](https://pynput.readthedocs.io/en/latest/limitations.html).\n\n#### Use the keyboard with Wayland\n\nIn my Ubuntu 24.04 + Wayland system the keyboard simulation works out-of-the-box in chromium based applications (including vscode) but it does not in firefox and sublime text and any of the rest (not even in a terminal !). I am told this is because Chromium runs an X server emulator and so is compatible with the default pynput backend.\n\nOne workaround is to use the Xorg version of GNOME: in `etc/gdm3/custom.conf` uncomment `# WaylandEnable=false` and restart your computer.\n\nAnother workaround while staying with Wayland is to use the low-level `uinput` backend of `pynput`, but that requires that `scribe` is run as root (sudo), and likely other configurations like activating the `uinput` system module (`sudo modprobe uinput` for a one-time test, or adding `uinput` to `/etc/modules-load.d/modules.conf` to make that persistent).\nMoreover, the keyboard must be set with an appropriate layout, for example to have the letter `é` you'd want a French or Italian layout otherwise the English will drop it or replace with something else. Another caveat I encountered is that the special characters (`é`) were inserted at the wrong place. Adding a small delay was enough to fix that with the additional parameter `--latency 0.01` Finally if you run as sudo you may need to reset some environment variable so that the list of audio devices (`XDG_RUNTIME_DIR`) and the download folder remain the same. To sum-up, that gives something like:\n```bash\nsudo modprobe uinput\nsudo HOME=$HOME XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR PYNPUT_BACKEND_KEYBOARD=uinput $(which scribe)  --latency 0.01\n```\nYou're on the right path :)\n\n## System tray icon (experimental) \u003cimg src=\"https://github.com/perrette/bard/raw/main/bard_data/share/icon.png\" width=48px\u003e\n\n\u003cimg src=https://github.com/user-attachments/assets/4c97f4b1-1a65-4d49-9f5a-a9f4287cfa5a width=300px\u003e\n\nTo avoid switching back and forth with the terminal, it's possible to interact with the program via an icon tray.\nTo activate start with:\n```bash\nscribe --app ...\n```\nor toggle the app option in the interactive menu. The scribe icon will show, with Record and other options. The icon will change based on what the app is doing. It is possible to choose from a set\nof predefined models (controlled by `--vosk-models` and `whisper-models`) and options, or to Quit and choose from the terminal before pressing Enter again.\nFor the vosk model, there are only two states : recording + transcribing or Idle. For the whisper model there are three states visible from the icon: recording/waiting, transcribing and idle.\nThat option requires `pystray` to be installed. This is included with the `pip install ...[all]` option.\n\nThe `--vosk-models` and `--whisper-models` allow to predefined the set of available models to choose from in the app manu. E.g.\n```bash\nscribe --app --vosk-models vosk-model-fr-0.22 --whisper-models small turbo ...\n```\n\n### Ubuntu\n\nIn Ubuntu the following dependencies were required to make the menus appear:\n\n```bash\nsudo apt install libcairo-dev libgirepository1.0-dev gir1.2-appindicator3-0.1\npip install PyGObject\n```\n\n## Start as an application in GNOME\n\nIf you run Ubuntu (or else?) with GNOME, the script `scribe-install [...]` will create a `scribe.desktop` file and place it under `$HOME/.local/share/applications`\nto make it available from the quick launch menu. Any option will be passed on to `scribe`, with the additional options `--name` and `--no-terminal`.\n`--no-terminal` means no terminal will show up, and it also implies the options `--app --no-prompt`.\n\nConsider the following two flavors:\n```bash\nscribe-install --name \"Scribe Terminal\" --clipboard ...\nscribe-install --name \"Scribe\" --no-terminal --clipboard ...\n```\nThe first will create an app named Scribe (the default) that simply opens a terminal and execute the command `scribe --clipboard ...`.\nThe second will create an app named Scribe App that executes in a hidden terminal: `scribe --no-prompt --app --clipboard ...`, thus leaving the tray icon as only mode of interaction.\n\n\n## Fine tuning\n\nThere are a number of options to control the silence threshold, duration and more.\nBest is to check the available options in the online help:\n\n```bash\nscribe --help\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrette%2Fscribe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fperrette%2Fscribe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperrette%2Fscribe/lists"}