An open API service indexing awesome lists of open source software.

https://github.com/appautomaton/tnt-asr

Terminal voice-to-text TUI โ€” Qwen3-ASR-1.7B on the Apple GPU via MLX (mlx-speech). Fully local, no PyTorch, transcribes in ~1s. macOS Apple Silicon.
https://github.com/appautomaton/tnt-asr

apple-silicon asr dictation macos mlx on-device-ai python qwen speech-recognition speech-to-text terminal tui voice-to-text whisper-alternative

Last synced: 8 days ago
JSON representation

Terminal voice-to-text TUI โ€” Qwen3-ASR-1.7B on the Apple GPU via MLX (mlx-speech). Fully local, no PyTorch, transcribes in ~1s. macOS Apple Silicon.

Awesome Lists containing this project

README

          

# TNT ๐Ÿงจ

[![Website](https://img.shields.io/badge/website-appautomaton.github.io-ff4fd8?logo=github&logoColor=white)](https://appautomaton.github.io/tnt-asr/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![PyPI](https://img.shields.io/badge/PyPI-automaton--tnt-3775A9?logo=pypi&logoColor=white)](https://pypi.org/project/automaton-tnt/)
[![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/)
[![Platform](https://img.shields.io/badge/platform-Apple%20Silicon-black?logo=apple)](https://developer.apple.com/documentation/apple-silicon)

**๐ŸŒ [appautomaton.github.io/tnt-asr](https://appautomaton.github.io/tnt-asr/)** โ€” the project landing page.

Terminal voice-to-text. Tap Space, speak, tap Space โ€” your words land in the transcript and on the clipboard.

Qwen3-ASR-1.7B runs in-process on the Apple GPU via [mlx-speech](https://github.com/appautomaton/mlx-speech) as an 8-bit (int8) quantized checkpoint โ€” ~2.5 GB resident: the model loads once, stays resident, and transcribes a short take in a fraction of a second. Fully local โ€” no cloud, no runtime network calls. The microphone is captured natively through AVFoundation by a small Swift helper process, so a misbehaving audio stack can never trap the mic: TNT just kills the helper and macOS releases it.

> [!NOTE]
> Using Termux on Android? Use the preserved
> `legacy/android-termux-qwen0.6b` branch instead of `master`.
> It is a legacy proot setup and may need device-specific fixes; validate it
> locally and adapt it with your own tools or agentic AI workflow.
>
> ```bash
> git fetch origin
> git switch --track origin/legacy/android-termux-qwen0.6b
> ```

## Features

- **In-process GPU inference** โ€” pure MLX, no PyTorch
- **8-bit quantized** โ€” int8 weights (~2.5 GB), about half the memory of BF16 with a faster decode
- **Resident model** โ€” loads once in the background at startup; every take is warm
- **Native mic capture** โ€” AVFoundation via an isolated Swift helper process; the mic can always be reclaimed
- **English, Chinese, and mixed speech** โ€” language auto-detected, or forced via env var
- **Live braille oscilloscope** โ€” real audio levels while you record
- **Clipboard-first** โ€” new transcriptions auto-copy; click any past entry to copy it again
- **Responsive TUI** โ€” side-rail layout on wide terminals, stacked on narrow ones

## Setup

> [!IMPORTANT]
> Requires an Apple Silicon Mac (M1 or later), Python 3.13+,
> [uv](https://docs.astral.sh/uv/), and the Xcode command line tools
> (`xcode-select --install`) โ€” the mic capture helper is compiled from Swift
> on first launch and cached.

```bash
git clone https://github.com/appautomaton/tnt-asr.git
cd tnt-asr
uv sync
./bootstrap-mlx-asr.sh # downloads + links the int8 checkpoint (~2.5 GB, cached by Hugging Face)
uv run tnt
```

Or install from PyPI ([`automaton-tnt`](https://pypi.org/project/automaton-tnt/)):

```bash
uv tool install automaton-tnt
TNT_MLX_MODEL=/path/to/qwen3-asr-1.7b-int8-mlx tnt
```

(Instead of exporting `TNT_MLX_MODEL`, you can symlink the checkpoint at
`~/.local/share/tnt/qwen3-asr-mlx`.)

### Model checkpoint

TNT expects a converted Qwen3-ASR-1.7B MLX checkpoint. A ready-to-use int8
build (~2.5 GB) is published at
[appautomaton/qwen3-asr-1.7b-int8-mlx](https://huggingface.co/appautomaton/qwen3-asr-1.7b-int8-mlx).
The bootstrap script takes three forms:

```bash
./bootstrap-mlx-asr.sh # download the int8 build from Hugging Face, then link it
./bootstrap-mlx-asr.sh # download a specific Hugging Face repo
./bootstrap-mlx-asr.sh /path/to/checkpoint # link a checkpoint you already have (no download)
```

Downloads use `huggingface_hub` (already installed via mlx-speech) and land in
the shared Hugging Face cache (`~/.cache/huggingface`); the script symlinks
`bin/qwen3-asr-mlx` to the cached snapshot. It is idempotent โ€” if the model is
already cached, or you pass a local path, nothing is re-downloaded, so you
never keep two copies of the 2.5 GB weights. BF16 and mxfp8 builds work too โ€”
mlx-speech reads the quantization from the checkpoint's `config.json`, so
switching is just a relink. Alternatively, convert the upstream
[Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) weights
yourself with [mlx-speech](https://github.com/appautomaton/mlx-speech)'s
`scripts/convert/qwen3_asr.py`.

## Configuration

| Environment variable | Default | Description |
|----------------------|---------|-------------|
| `TNT_MLX_MODEL` | `bin/qwen3-asr-mlx`, else `~/.local/share/tnt/qwen3-asr-mlx` | Path to the converted MLX checkpoint |
| `TNT_MLX_LANGUAGE` | `auto` | `Chinese`, `English`, or `auto`. Use `Chinese` to keep mixed Chinese/English speech from being translated to English |
| `TNT_INPUT_DEVICE` | system default | Microphone, by index or name |
| `TNT_CAPTURE_BACKEND` | `auto` | macOS always uses native AVFoundation (needs the Xcode command line tools: `xcode-select --install`); other platforms use PortAudio. `portaudio` is rejected on macOS |

## Keybindings

| Key | Action |
|-----|--------|
| Space | Start / stop recording, or hold to record until release; cancels during transcription |
| c | Copy the last transcript entry |
| mouse click | Copy the clicked transcript entry |
| x | Clear the transcript |
| q | Quit |

## Project structure

```text
src/tnt/
โ”œโ”€โ”€ app.py # Textual TUI, state machine, keybindings
โ”œโ”€โ”€ audio.py # Recorder protocol, backend selection, PortAudio (non-macOS)
โ”œโ”€โ”€ avf_audio.py # Native AVFoundation capture via helper process (macOS)
โ”œโ”€โ”€ mic_helper.swift # AVFoundation helper source, compiled on demand
โ”œโ”€โ”€ async_threads.py # Daemon-thread helpers for blocking work
โ”œโ”€โ”€ transcriber.py # In-process MLX Qwen3-ASR transcription
โ””โ”€โ”€ widgets/
โ”œโ”€โ”€ transcript.py # Scrollable transcript log
โ””โ”€โ”€ status.py # Braille oscilloscope + state rail
bin/
โ””โ”€โ”€ qwen3-asr-mlx # Symlink to converted MLX checkpoint (gitignored)
```

> [!TIP]
> The inference path expects 16 kHz mono PCM WAV; the recorder produces exactly
> that. Cancelling a transcription abandons its result โ€” the in-process
> generation cannot be killed mid-flight and quietly finishes in the background.

## Related projects

- [mlx-speech](https://github.com/appautomaton/mlx-speech) โ€” our MLX-native speech runtime that powers TNT ([PyPI](https://pypi.org/project/mlx-speech/))
- [qwen3-asr-1.7b-int8-mlx](https://huggingface.co/appautomaton/qwen3-asr-1.7b-int8-mlx) โ€” our int8 MLX checkpoint that TNT runs (converted from Qwen3-ASR-1.7B)

## More from appautomaton

- ๐ŸŒ [appautomaton.github.io](https://appautomaton.github.io) โ€” our site
- ๐Ÿค— [huggingface.co/appautomaton](https://huggingface.co/appautomaton) โ€” our models and checkpoints on Hugging Face
- ๐Ÿ™ [github.com/appautomaton](https://github.com/appautomaton) โ€” our open-source projects

## License

MIT. See [`LICENSE`](LICENSE).