{"id":26503092,"url":"https://github.com/quantiusbenignus/spoken","last_synced_at":"2025-03-20T18:58:06.627Z","repository":{"id":219372013,"uuid":"595215327","full_name":"QuantiusBenignus/Spoken","owner":"QuantiusBenignus","description":"Joplin text notes and to-dos via OFFLINE speech recognition. To-do reminders set directly from transcribed audio. Input from microphone or audio files. Output to Joplin or clipboard.","archived":false,"fork":false,"pushed_at":"2023-03-29T15:00:41.000Z","size":905,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-01-27T01:58:18.142Z","etag":null,"topics":["bash","command-line","date-time","joplin","joplin-api","knowledge-management","linux","nlp-parsing","note-taking","notes","notes-app","openai","reminders","speech-recognition","speech-to-text","todo","todo-app","todos","whisper","zsh"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QuantiusBenignus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-01-30T16:26:14.000Z","updated_at":"2024-01-27T01:58:20.579Z","dependencies_parsed_at":"2024-01-27T02:15:07.786Z","dependency_job_id":null,"html_url":"https://github.com/QuantiusBenignus/Spoken","commit_stats":null,"previous_names":["quantiusbenignus/spoken"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantiusBenignus%2FSpoken","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantiusBenignus%2FSpoken/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantiusBenignus%2FSpoken/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantiusBenignus%2FSpoken/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QuantiusBenignus","download_url":"https://codeload.github.com/QuantiusBenignus/Spoken/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244676317,"owners_count":20491827,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","command-line","date-time","joplin","joplin-api","knowledge-management","linux","nlp-parsing","note-taking","notes","notes-app","openai","reminders","speech-recognition","speech-to-text","todo","todo-app","todos","whisper","zsh"],"created_at":"2025-03-20T18:58:06.003Z","updated_at":"2025-03-20T18:58:06.618Z","avatar_url":"https://github.com/QuantiusBenignus.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# *Joplin Text Notes and To-Dos via Speech*\n##### *Voice memos recorded from the microphone, transcribed offline to text and converted to Joplin notes or To-Do tasks with automatic notifications. Can also transcribe batches of existing voice memos.*\n---\n_(This repository expands on the older [NoteWhispers](https://github.com/QuantiusBenignus/NoteWhispers) by bringing new tools and functionality, such as recording Joplin to-do tasks with automatic alarms and the ability to transcribe multiple voice-memo/to-do files. \nThe note transcription utility **vm** is mature and will not see many changes.\nThe to-do utility **td** has the added complexity of parsing fuzzy datetime references and the code is ongoing development. \nWhile very much work in progress, the zsh version of **td** is ahead of the bash version in terms of pizzaz.)_\n\n#### DESCRIPTION:  \nThese **two Linux command-line utilities** (with optional GNOME integration) are named respectively **vm** and **td** for brevity and quick access from the command line (check your PATH for conflicts and rename accordingly if needed.)\n\n**vm** and **td** utilize previously unavailable, high-quality **offline** automatic speech recognition (ASR) technology (a derivative of [Open AI's](https://openai.com/) open-sourced [Whisper ASR models](https://github.com/openai/whisper)) to convert user speech, such as voice memos captured from the microphone (or pre-recorded audio files), into textual notes that are  automatically saved in the awesome, **open-source, note-taking application** [Joplin](https://joplinapp.org/). At its core, each utility records a voice memo from the default audio input channel (microphone) or uses one or multiple audio files as the input,  transcribes it into text using [whisper.cpp](https://github.com/ggerganov/whisper.cpp) (a C/C++ port of Open AI's Whisper) and either: \n   - **sends it (properly formated) to the clipboard**, or\n   -  **creates a new note in a running instance of the Joplin note-taking app**  using Joplin's data API, or \n   -   **creates a new to-do task in the Joplin running instance** (in the case of **td**),  parsing the transcribed text for a valid datetime to set a notification/alarm (see details below)\n   -   if Joplin is not running, **stores the note or to-do task in a file for later collection**.\n   -   On a next invocation of either utility, if Joplin is up, **the temporarily stored notes / to-dos are collected** \n\n![vmJoplin-todo.png](resources/vmJoplin-todo.png)\n![whisper-todo.png](resources/whisper-todo.png)\n  \n\nAs CLI scripts relying on built-in Linux tools under the hood (plus a few optional but common utilities such as `sox` and `curl`), **vm**'s and **td**'s feature set is exposed by a few command line arguments:\n#### SYNOPSIS:\n`vm [-b|-c|-bc|-cb|-h|--help] ... [filenames]`\n\n`td [-b|-c|-bc|-cb|-h|--help|-a datetimespec] ... [filename(s)]`\n\n- `vm` the default: use the 'tiny' whisper.cpp ASR model file and create a note in Joplin \n- `td` the default: create a to-do note in Joplin \n- `vm -b|--base` transcribes to a Joplin note using the larger (more accurate but slower) 'base' model\n- `td -b|--base` transcribes a to-do task using the 'base' model\n- `vm -c|--clip` will transcribe and output to the clipboard\n- `td -c|--clip` will also output to the clipboard but text formated as inline to-do task\n- `td -a|--alarm datetimespec` will create a to-do task with alarm set to trigger on 'datetimespec'\n- `vm -bc` or  `td -bc , td -cb , td -[cb]a datetimespec` etc. - valid compound options are acceptable\n- `vm -h|--help`  or `td -h|--help` will print help info\n- any and all non-option arguments are treated as input audio files to be converted\n      * *(tested on Ubuntu 22.04 LTS under Gnome version 42.5, with the English language models )*\n---\n\n![td-mermaid-diagram.png](resources/td-mermaid-diagram.png)\n\nPlease, note that these 2 command line utilities for **Linux** are written for *zsh* but a quick (and possibly dirty) translation to bash (see folder For_bash_users) is provided for users of *bash*, who should use those instead of the zsh originals.\n\n\n#### PREPARING THE ENVIRONMENT\n\n##### PREREQUISITES:\n* Joplin Linux desktop installation with the WebClipper plugin enabled (see https://joplinapp.org/) \n*  Whisper.cpp installation (see https://github.com/ggerganov/whisper.cpp)\n*  Recent versions of 'sox', 'curl', 'jq', 'xsel' command line utilities from your system's repositories.\n*  A working microphone (in GNOME one can set a keyboard shortcut to turn it ON and OFF )\n\u003e *DISCLAIMER: Setting up the environment for this to work requires a bit of attention and, quite likely for the novice user, reading about the Linux internals and making informed choices. Some of the proposed actions, if implemented, will alter how your system works internally (e.g. systemwide temporary file storage and memory management). The author neither takes credit nor assumes any responsibility for any outcome that may or may not result from interacting with the contents of this document.*\n##### CONFIGURATION\nInside each script, near the begining, there is a clearly marked section, named **\"USER CONFIGURATION BLOCK\"**, where all the user-configurable variables (described in the following section) have been collected. Some can be left as is but others will require to be set to the user-specific values as determined by the specific instance of Joplin.\n##### Temporary directory and files\n*(NB. Everything in this section is based on the author's choice and opinion and may not fit the taste or the particular situation of everyone; please, adjust the script as you like. )*\n\nAudio-to-text transcription is memory- and CPU-intensive task and fast storage for read and write access can only help. That is why **vm** and **td** are designed to store temporary and resource files in memory, for speed and to reduce SSD/HDD \"grinding\": `TEMPD='/dev/shm'`. \nThis mount point of type \"tmpfs\" is created in RAM (let's assume that you have enough, say, at least 8GB) and is made available by the kernel for user-space applications. When the computer is shut down it is automatically wiped out, which is fine since we do not need the intermediate files.\nIn fact, for Joplin and any other applications (Electron-based or not) that are stored in Appimage format, it would be beneficial (IMHO) to have the systemwide /tmp mount point also kept in RAM. Every time you start Joplin, it expands itself in /tmp writing about 500 MB to your SSD or HDD and moving /tmp to RAM may speed up application startup a bit. A welcome speedup for any Electron app.  In its simplest form, this transition is easy, just run:\n```\necho \"tmpfs /tmp tmpfs rw,nosuid,nodev\" | sudo tee -a /etc/fstab\n```\nand then restart your Linux computer.\nFor the aforementioned reasons, the scripts also expect to find the ASR model files needed by whisper.cpp in the same location (/dev/shm). These are large files, that can be transferred to this location at the start of a terminal session (or at system startup). This can be done using your .zshrc (or .bashrc) file by placing something like this in it: \n```\n([ -f /dev/shm/ggml-tiny.en.bin ] || cp /path/to/your/local/whisper.cpp/models/ggml* /dev/shm/)\n```\n\n#### \"INSTALLATION\"\n*(Assuming whisper.cpp is available and the \"main\" executable compiled with 'make' in the cloned whisper.cpp repo. See Prerequisites section)*\n* Place the scripts **vm** and **td**  (or, for bash users, the scripts found in the \"For_bash_users\") somewhere in your PATH. \n* Create a symbolic link (the code expects 'transcribe' in your PATH) to the compiled \"main\" executable in the whisper.cpp directory. For example, create it in your $HOME/bin\u003e with \n```ln -s /full/path/to/whisper.cpp/main $HOME/bin/transcribe```.\n* Edit your personal NOTEBOOK_ID and AUTH_TOKEN variables in the code using the values from your Joplin app (see next section).  \n\nIf you are using the GNOME integration (recommended), don't forget to:\n* Place `SpokenNotes.desktop` in `$HOME/.local/share/applications/\n* Replace USERNAME and YOURPROFILENAME in the file with your values.\n* Move the icon referenced in the .desktop file to the specified directory in $HOME/.local/...\n* Find \"Whispers\" in your Activities and click \"Add to Favorites\" to pin it to the dock\n* Create a new profile in gnome-terminal and edit it to suit your taste. Use its name in the .desktop file\n\n##### Other environment variables\n\nIf Joplin is not running while the voice memo is being captured and transcribed, the scripts store transcribed notes or to-dos in the Joplin configuration directory for later processing (you can change this location as needed). This is in the code: `JOPLIND=$HOME'/.config/joplin-desktop/resources' `\n\nThe next two variables are for the Joplin data API. (***Please, replace with your own values from the Joplin desktop app for Linux***)\n\nThe first parameter is the id of the Joplin notebook where the new note will be created.\n`NOTEBOOK_ID=\"PLACE_HERE_NOTEBOOK_ID_FROM_JOPLIN_RIGHT_CLICK_ON_NOTEBOOK_NAME\"`\nOf course, one can have these point to two different notebooks in **vm** and **td** to file ordinary notes separatelly from to-do tasks. \n\nThe second variable is the authentication token generated by the Web Clipper plugin in Joplin. (Make sure web clipper is enabled, the token is needed to successfully interact with the REST API).\n`AUTH_TOKEN=\"PLACE_HERE_TOKEN_FROM_JOPLIN_TOOLS_OPTIONS_WEB_CLIPPER_ADVANCED_OPTIONS\"`\n\n---\n#### DETAILS\nSox is recording in wav format at 16k rate, the only currently accepted by whisper.cpp:\n`rec -t wav $ramf rate 16k silence 1 0.1 3% 1 3.0 4% `\nIt will attempt to stop on silence of 2s with threshold of 6%, but you can always press CTRL-C to stop it manually. This is the only intervention that may be needed. \nAfter the memo is captured, it will be passed to `transcribe` (whisper.cpp) for speech recognition.\nThis will take a couple of seconds (fewer on a computer with fast CPU). One can adjust the number of processing threads used by adding  `-t n` to the command line parameters of transcribe (please, see whisper.cpp documentation). After transcription, the text is stored in a .txt file (-otxt argument in `transcribe -m $model -f $ramf -otxt`), in this case /dev/shm/vmfile.txt . \n\nThe script will then format the data in the appropriate format (JSON for note creation via the data API) and send it to the desired output. If note creation was requested, a check will be made whether the REST API is exposed by the Web Clipper server (i.e. Joplin is running). If not, the JSON data will be stored in a `{timestamp}.json`  file to be picked up  on a later invocation of the script, when Joplin is running.\n\nIf a to-do task is being created, there is an additional intermediate step (quite a few steps actually) to be taken before contacting the API:\n\n#### PARSING THE TRANSCRIBED TEXT FOR A TIME REFERENCE (in **td** - to set up a to-do alarm)\n\n\u003e*(N.B. Only spoken English time constructs, operation in the user's current locale and time zone.)*\n\nIf explicit datetime is not supplied, the transcribed text is parsed for a valid notification/alarm datetime.\nIt is quite difficult for computers to parse our spoken time references and using only built-in tools (i.e. coreutils date -d) presents a huge challenge when parsing arbitrary datetime text.\nThere are dedicated, complex NLP tools that work better but they are not perfect either.\n\nThat is why, to make things a bit easier, a keyword is used to separate the note body from the date-time reference to be parsed. This keyword can be used in the note body freely, it is the last instance within the text that is considered as the separator. For example, if the keyword is **\"notification\"** (this is user-configurable), then the last \"notification\" in the transcribed text is used to isolate the time reference:\nFor example:\n* *\"Need to see my dentist next week. Set **notification** for Tuesday\"*       - this is valid.\n* *\"Scheduled a company meeting with **notification** for 2023/5/24 at 8pm\"*   - also OK.\n* *\"Guests need prior notification. Set one **notification** for March the 3rd in the evening.\"*  -OK\n* *\"...**notification** for next week\"*\n* *\"...**notification** in 3 hours\"*\n* *\"...**notification** tomorrow morning\"*  (see source code for \"morning\" \u0026 other adjustable definitions)\n* *\"...**notification** in 33 hours and 5 minutes\"*\n* *\"...**notification** on the ninth month +1000 seconds\"*\n... are all valid.\n* or even *\"...**notification** at the usual time\"* for some extra customization (see code for ideas).\n\nSpeaking literaly \"YYYY/MM/DD\", followed by time (if needed) e.g. *\"2024 slash 5 slash 23 at 1pm\"* works well too.\nAs a minimum, the month should precede the date and time, e.g. \"March 12\" not \"12 of March\".\nIf parsing is unsuccessful, the utility will not set an alarm in Joplin and it has to be done manualy. \nA warning will be issued but the to-do task will be created successfully. \nThe failure can be due to errors in the user instructions, errors in the speech recognition, limitations of the simplistic datetime preprocessor etc. With practice (and good diction:-) the error rate can be comparable to the error rate for speech recognition.\nIn some edge cases, successful parsing gives incorrect datetime. Some practice needed to avoid those\nFor scheduling critically-important stuff with this utility, use the command-line option \"-a\"\nand provide explicit *datetime specification* or instead, simply set the to-do alarm time in Joplin.\n\n#### Gnome desktop integration\nTo make interaction with this CLI utility more convenient, one can create a GNOME desktop entry (if using GNOME) with a custom profile for the terminal window (small window, custom color, transparent, etc., see `gnome-terminal` documentation on creating named profiles ) so that the window will be visible on top a maximized Joplin window. \nOne can also choose whether to keep the terminal window open, or close it after the transcription (see the gnome-terminal settings for your custom profile - YOURPROFILENAME in the code below.)\nSample `SpokenNotes.desktop` (Replace USERNAME and YOURPROFILENAME and place in your ` $HOME/.local/share/applications/`):\n```\n[Desktop Entry]\nName=Spoken_Notes_To-Dos\nComment=For use with the vm and td CLI utilities\nExec=gnome-terminal --window-with-profile=YOURPROFILENAME --hide-menubar --geometry=64x6+380+920 --title=Speech-to-Joplin\nIcon=/home/USERNAME/.local/share/icons/hicolor/128x128/apps/mic128.png\nTerminal=true\nType=Application\nCategories=Application\nActions=new-note;new-clip;new-todo;inline-todo;\n\n[Desktop Action new-note]\nName=New Joplin Note\nExec=gnome-terminal --window-with-profile=Lilico --hide-menubar --geometry=64x6+380+920 --title=NewNote -- vm\n\n[Desktop Action new-clip]\nName=Record To Clipboard\nExec=gnome-terminal --window-with-profile=Lilico --hide-menubar --geometry=64x6+380+920 --title=NewClip -- vm -c\n\n[Desktop Action new-todo]\nName=New Joplin ToDo\nExec=gnome-terminal --window-with-profile=Lilico --hide-menubar --geometry=64x6+380+920 --title=NewTodo -- td\n\n[Desktop Action inline-todo]\nName=ToDo to Clipboard\nExec=gnome-terminal --window-with-profile=Lilico --hide-menubar --geometry=64x6+380+920 --title=InlineTodo -- td -c\n\n```\nWith the above `gnome-terminal` desktop entry ( please, adjust profile and username), the  utility will be accessible from the system dock, after you add it to your \"Favorites\" (right mouse click brings up the shown context menu):\n\n\u003e![whispers-menu3.png](resources/whispers-menu3.png)\n\n\nThe .desktop entry is set so that just clicking on the dock icon with the left mouse button will open the terminal and wait for a command (such as `td -ba \"2023 March 23 23:00\"`), while invoking one of the context menu commands will immediatlely start recording and will close the window when finished transcribing the note / to-do. The \"tiny\" ASR model file is used by default in the desktop menu actions but that can be changed as desired in the .desktop file by adding  the `-b` option to the respective command.\n\nIf using X11 (instead of the restrictive Wayland), one can use the `--geometry` command line argument to position the small terminal window  in front of a dead space in the Joplin window and set it to stay on top (screenshots):\n\n\n\n![vmJoplin.png](resources/vmJoplin.png)\n\n\nEven with the default \"tiny\" model, the accuracy  (English language tested) is impressive and on a faster computer (not mine) it takes less than a second to transcribe a 30s audio clip with essentially no errors. As such, this command-line utility, combined with the power of [Whisper](https://github.com/openai/whisper) from Open AI (its [whisper.cpp](https://github.com/ggerganov/whisper.cpp) port, to be more precise), proves quite useful and practical, especially in the context of a note-taking app such as the versatile, customizable [Joplin](https://joplinapp.org/). Enjoy!\n\n### Credits\n* Open AI (for [Whisper](https://github.com/openai/whisper))\n* Georgi Gerganov and community ( for Whisper's C/C++ port [whisper.cpp](https://github.com/ggerganov/whisper.cpp))\n* Laurent Cozic and community (for the [Joplin](https://github.com/laurent22/joplin) note-taking app)\n* The **curl** developer community (for the versatile and powerful **[curl](https://github.com/curl/curl)**)\n* The **sox** developers (for the venerable \"Swiss Army knife of sound processing tools\")\n* Stephen Dolan and community (for **jq**, *\"the sed for JSON\"*)\n* The creators and maintainers of old and new utilities such as **xsel, xclip**, the heaviweight **ffmpeg** and others that make the Linux environment (CLI and GUI) such a powerful paradigm.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantiusbenignus%2Fspoken","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantiusbenignus%2Fspoken","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantiusbenignus%2Fspoken/lists"}