https://github.com/appautomaton/tnt-asr
Terminal voice-to-text TUI โ Qwen3-ASR-1.7B on the Apple GPU via MLX (mlx-speech). Fully local, no PyTorch, transcribes in ~1s. macOS Apple Silicon.
https://github.com/appautomaton/tnt-asr
apple-silicon asr dictation macos mlx on-device-ai python qwen speech-recognition speech-to-text terminal tui voice-to-text whisper-alternative
Last synced: 8 days ago
JSON representation
Terminal voice-to-text TUI โ Qwen3-ASR-1.7B on the Apple GPU via MLX (mlx-speech). Fully local, no PyTorch, transcribes in ~1s. macOS Apple Silicon.
- Host: GitHub
- URL: https://github.com/appautomaton/tnt-asr
- Owner: appautomaton
- License: mit
- Created: 2026-02-14T15:56:37.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2026-06-22T17:41:48.000Z (13 days ago)
- Last Synced: 2026-06-22T19:24:02.890Z (13 days ago)
- Topics: apple-silicon, asr, dictation, macos, mlx, on-device-ai, python, qwen, speech-recognition, speech-to-text, terminal, tui, voice-to-text, whisper-alternative
- Language: Python
- Homepage: https://pypi.org/project/automaton-tnt/
- Size: 358 KB
- Stars: 12
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# TNT ๐งจ
[](https://appautomaton.github.io/tnt-asr/)
[](LICENSE)
[](https://pypi.org/project/automaton-tnt/)
[](https://www.python.org/downloads/)
[](https://developer.apple.com/documentation/apple-silicon)
**๐ [appautomaton.github.io/tnt-asr](https://appautomaton.github.io/tnt-asr/)** โ the project landing page.
Terminal voice-to-text. Tap Space, speak, tap Space โ your words land in the transcript and on the clipboard.
Qwen3-ASR-1.7B runs in-process on the Apple GPU via [mlx-speech](https://github.com/appautomaton/mlx-speech) as an 8-bit (int8) quantized checkpoint โ ~2.5 GB resident: the model loads once, stays resident, and transcribes a short take in a fraction of a second. Fully local โ no cloud, no runtime network calls. The microphone is captured natively through AVFoundation by a small Swift helper process, so a misbehaving audio stack can never trap the mic: TNT just kills the helper and macOS releases it.
> [!NOTE]
> Using Termux on Android? Use the preserved
> `legacy/android-termux-qwen0.6b` branch instead of `master`.
> It is a legacy proot setup and may need device-specific fixes; validate it
> locally and adapt it with your own tools or agentic AI workflow.
>
> ```bash
> git fetch origin
> git switch --track origin/legacy/android-termux-qwen0.6b
> ```
## Features
- **In-process GPU inference** โ pure MLX, no PyTorch
- **8-bit quantized** โ int8 weights (~2.5 GB), about half the memory of BF16 with a faster decode
- **Resident model** โ loads once in the background at startup; every take is warm
- **Native mic capture** โ AVFoundation via an isolated Swift helper process; the mic can always be reclaimed
- **English, Chinese, and mixed speech** โ language auto-detected, or forced via env var
- **Live braille oscilloscope** โ real audio levels while you record
- **Clipboard-first** โ new transcriptions auto-copy; click any past entry to copy it again
- **Responsive TUI** โ side-rail layout on wide terminals, stacked on narrow ones
## Setup
> [!IMPORTANT]
> Requires an Apple Silicon Mac (M1 or later), Python 3.13+,
> [uv](https://docs.astral.sh/uv/), and the Xcode command line tools
> (`xcode-select --install`) โ the mic capture helper is compiled from Swift
> on first launch and cached.
```bash
git clone https://github.com/appautomaton/tnt-asr.git
cd tnt-asr
uv sync
./bootstrap-mlx-asr.sh # downloads + links the int8 checkpoint (~2.5 GB, cached by Hugging Face)
uv run tnt
```
Or install from PyPI ([`automaton-tnt`](https://pypi.org/project/automaton-tnt/)):
```bash
uv tool install automaton-tnt
TNT_MLX_MODEL=/path/to/qwen3-asr-1.7b-int8-mlx tnt
```
(Instead of exporting `TNT_MLX_MODEL`, you can symlink the checkpoint at
`~/.local/share/tnt/qwen3-asr-mlx`.)
### Model checkpoint
TNT expects a converted Qwen3-ASR-1.7B MLX checkpoint. A ready-to-use int8
build (~2.5 GB) is published at
[appautomaton/qwen3-asr-1.7b-int8-mlx](https://huggingface.co/appautomaton/qwen3-asr-1.7b-int8-mlx).
The bootstrap script takes three forms:
```bash
./bootstrap-mlx-asr.sh # download the int8 build from Hugging Face, then link it
./bootstrap-mlx-asr.sh # download a specific Hugging Face repo
./bootstrap-mlx-asr.sh /path/to/checkpoint # link a checkpoint you already have (no download)
```
Downloads use `huggingface_hub` (already installed via mlx-speech) and land in
the shared Hugging Face cache (`~/.cache/huggingface`); the script symlinks
`bin/qwen3-asr-mlx` to the cached snapshot. It is idempotent โ if the model is
already cached, or you pass a local path, nothing is re-downloaded, so you
never keep two copies of the 2.5 GB weights. BF16 and mxfp8 builds work too โ
mlx-speech reads the quantization from the checkpoint's `config.json`, so
switching is just a relink. Alternatively, convert the upstream
[Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) weights
yourself with [mlx-speech](https://github.com/appautomaton/mlx-speech)'s
`scripts/convert/qwen3_asr.py`.
## Configuration
| Environment variable | Default | Description |
|----------------------|---------|-------------|
| `TNT_MLX_MODEL` | `bin/qwen3-asr-mlx`, else `~/.local/share/tnt/qwen3-asr-mlx` | Path to the converted MLX checkpoint |
| `TNT_MLX_LANGUAGE` | `auto` | `Chinese`, `English`, or `auto`. Use `Chinese` to keep mixed Chinese/English speech from being translated to English |
| `TNT_INPUT_DEVICE` | system default | Microphone, by index or name |
| `TNT_CAPTURE_BACKEND` | `auto` | macOS always uses native AVFoundation (needs the Xcode command line tools: `xcode-select --install`); other platforms use PortAudio. `portaudio` is rejected on macOS |
## Keybindings
| Key | Action |
|-----|--------|
| Space | Start / stop recording, or hold to record until release; cancels during transcription |
| c | Copy the last transcript entry |
| mouse click | Copy the clicked transcript entry |
| x | Clear the transcript |
| q | Quit |
## Project structure
```text
src/tnt/
โโโ app.py # Textual TUI, state machine, keybindings
โโโ audio.py # Recorder protocol, backend selection, PortAudio (non-macOS)
โโโ avf_audio.py # Native AVFoundation capture via helper process (macOS)
โโโ mic_helper.swift # AVFoundation helper source, compiled on demand
โโโ async_threads.py # Daemon-thread helpers for blocking work
โโโ transcriber.py # In-process MLX Qwen3-ASR transcription
โโโ widgets/
โโโ transcript.py # Scrollable transcript log
โโโ status.py # Braille oscilloscope + state rail
bin/
โโโ qwen3-asr-mlx # Symlink to converted MLX checkpoint (gitignored)
```
> [!TIP]
> The inference path expects 16 kHz mono PCM WAV; the recorder produces exactly
> that. Cancelling a transcription abandons its result โ the in-process
> generation cannot be killed mid-flight and quietly finishes in the background.
## Related projects
- [mlx-speech](https://github.com/appautomaton/mlx-speech) โ our MLX-native speech runtime that powers TNT ([PyPI](https://pypi.org/project/mlx-speech/))
- [qwen3-asr-1.7b-int8-mlx](https://huggingface.co/appautomaton/qwen3-asr-1.7b-int8-mlx) โ our int8 MLX checkpoint that TNT runs (converted from Qwen3-ASR-1.7B)
## More from appautomaton
- ๐ [appautomaton.github.io](https://appautomaton.github.io) โ our site
- ๐ค [huggingface.co/appautomaton](https://huggingface.co/appautomaton) โ our models and checkpoints on Hugging Face
- ๐ [github.com/appautomaton](https://github.com/appautomaton) โ our open-source projects
## License
MIT. See [`LICENSE`](LICENSE).