https://github.com/exectx/dictation
push to talk dictation cli tool for macos
https://github.com/exectx/dictation
Last synced: 9 months ago
JSON representation
push to talk dictation cli tool for macos
- Host: GitHub
- URL: https://github.com/exectx/dictation
- Owner: exectx
- Created: 2025-08-29T04:07:51.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-29T04:11:10.000Z (10 months ago)
- Last Synced: 2025-08-29T08:26:04.751Z (10 months ago)
- Language: Python
- Size: 187 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dictation POC
Hold-to-record speech-to-text prototype that transcribes audio locally using a Parakeet MLX model and rapidly injects the resulting Unicode text via macOS CoreGraphics events.
## Features
- Press and hold Right Shift (configurable) to record; release to transcribe & type.
- Non-blocking keyboard listener + background transcription.
- Single concurrent transcription (drops overlapping recordings gracefully).
- Rich (optional) color logging with debug timing, frame counts, and preprocessing/model breakdown.
- Plain logger mode for piping to files / CI.
## Installation
This project uses a modern Python (>=3.11). Install dependencies (uv / pip):
```
uv sync
# or
pip install -e .
```
## Usage
```
python main.py # normal run (info level, colored logs)
python main.py --debug # verbose diagnostics & timings
python main.py --timestamps # add HH:MM:SS to log records
python main.py --no-color # disable Rich; plain text logs
python main.py --log-file run.log # also append plain formatted logs to run.log
```
While running, hold Right Shift to capture audio. Release to trigger transcription. The recognized text is injected immediately at the current cursor location.
## Logging Notes
- Rich output omits path for cleanliness; enable timestamps for performance tuning.
- Debug mode reports preprocessing & model inference timings plus sample normalization stats.
- Use `--no-color` when redirecting output, or `--log-file` to simultaneously keep colored console output and a plain file.
## Safety / Edge Cases
- Very short taps (<40ms default) are ignored to avoid stray noise.
- Only one transcription runs at a time; overlapping captures while one is processing are dropped with a warning.
- On shutdown (Ctrl+C) the program waits (up to 5s) for background workers.
## Customization
Change activation key inside `main.py` (`ACTIVATION_KEY`). Minimum duration and data type are constructor parameters of `Recorder`.