An open API service indexing awesome lists of open source software.

https://github.com/mks-01/readback

Terminal-first offline article reader — paste a URL, hear the whole article in a natural neural voice (CSM-1B on MLX). 100% on-device on Apple Silicon. No cloud, no API keys.
https://github.com/mks-01/readback

apple-silicon article-reader bun cli csm fastapi ink local-llm macos mlx offline-first ollama terminal text-to-speech tts voice-cloning

Last synced: 3 days ago
JSON representation

Terminal-first offline article reader — paste a URL, hear the whole article in a natural neural voice (CSM-1B on MLX). 100% on-device on Apple Silicon. No cloud, no API keys.

Awesome Lists containing this project

README

          


readback


Paste a URL or snap a book — hear it read aloud by a neural voice, entirely on your Mac.

No cloud. No API keys. Nothing leaves your machine.


Runs 100% offline
Apple Silicon
CSM-1B neural TTS


CI
MIT License
Built with Claude Code


Landing page ·
Getting started ·
How it works ·
Voices ·
Config ·
Pi deploy ·
Design system ·
Roadmap


readback CLI — terminal player with the word-synced transcript highlight

The terminal client mid-read: seekable player, live word-by-word transcript sync.

---

## Getting started

> **Requires macOS on Apple Silicon (M1–M5).** The entire stack — summary LLM, vision OCR, and TTS — runs **in-process on MLX/Metal**. No external daemons, no network calls. Speed scales with GPU cores and unified memory.

**First time? One command sets it all up:**

```bash
git clone https://github.com/MKS-01/readback.git && cd readback
bash scripts/setup.sh
```

`setup.sh` is idempotent — safe to re-run. It checks platform, creates `.venv`, installs readback + CLI + dashboard, and optionally pre-downloads the MLX summary model and CSM-1B weights (~6 GB).

Needs [Bun](https://bun.sh/) — the script tells you if it's missing. Then:

```bash
readback-cli # from anywhere; auto-starts the server
```

Paste a URL → audio plays in your shell.

Prefer to set it up by hand?

```bash
# 1. Install the server
git clone https://github.com/MKS-01/readback.git && cd readback
python3.12 -m venv .venv && source .venv/bin/activate
pip install -e . # csm-mlx + mlx-lm are git/PyPI deps, pulled automatically

# 2. Build + install the terminal client → ~/.local/bin/readback-cli
cd src/cli && ./install.sh && cd ..

# 3. Read something
readback-cli # from anywhere; auto-starts the server
```


readback CLI — home screen

The CLI auto-starts the server and kills it on exit. It's a full terminal player:

- **space** pause, **←/→** seek ±5 s, **t** toggle transcript (word-by-word highlight synced to the voice)
- `/voice`, `/model` (summary LLM, RAM-fit check), `/vision` (image/book OCR model), `/mode`, `/lib` (browse + **space** to preview inline, **enter** for full player), `/help`
- `q` to quit (or any time the input field is empty)

macOS only (`afplay` playback). Details: [`src/cli/README.md`](src/cli/README.md).

First read downloads CSM-1B weights (~6 GB) and the summary LLM (~5.5 GB) into the HuggingFace cache, then warms up the MLX graph — slow once, fast after. All three models (TTS, summary, OCR) run **in the same process** on Metal — no Ollama, no external daemon, no API keys. The vision OCR model (~5 GB) downloads lazily the first time you read an image or book scan. See [SETUP.md](docs/SETUP.md) for details.

---

## Library dashboard

Every read is saved to a local SQLite library. The dashboard lets you **replay any past read** — no LLM, no GPU, just the saved audio.


readback library dashboard — searchable list of past reads with an inline player and word-synced transcript

Search, sort, and replay past reads — seekable player + word-by-word transcript highlight.

- **Search** title / summary / URL, **sort** newest↔oldest, **paginate** 20 at a time
- **Full player** per card — click-to-seek, ±5 s skip, `space` + `←/→` keyboard shortcuts
- **Synced transcript** — word-by-word highlight in blue, same as the CLI
- **Delete** removes the DB row *and* its WAV

A lightweight **Vue 3** SPA (pure REST client). Built `dist/` is served at `/` by the same `readback` process; `bun run dev` runs Vite on `:5173` for development. Details: [`src/dashboard/README.md`](src/dashboard/README.md).

Generation stays on the CLI (Mac GPU) — the dashboard only replays. This split also enables [Pi deployment](#pi-deployment): the Mac generates, a home Pi serves the library.

---

## How it works

```mermaid
flowchart LR
U["URL · image · book scan"] --> P

subgraph P["readback server · 100% on-device"]
direction LR
E["extract
trafilatura · vision OCR"] --> L["summarize
local LLM · optional"] --> T["synthesize
CSM-1B neural TTS"]
end

T --> DB[("readback-audio-db
WAV files + SQLite")]
DB --> CLI["CLI
generate + play live"]
DB --> WEB["Dashboard
search + replay anytime"]
```

1. **Extract** — `trafilatura` pulls article text (browser-UA fallback for 403s). Images/book scans → mlx-vlm vision OCR (in-process). Folders/globs → multi-page: OCR'd in filename order and stitched into one document.
2. **Summarize** *(optional)* — mlx-lm rewrites it as a spoken explanation (in-process). Full mode skips this entirely.
3. **Synthesize** — sentence-aware chunks → CSM-1B (in-process) → silence-trimmed → fade-out → joined with small gaps. Re-reads skip the entire pipeline (cache hit by URL + mode + voice + model).
4. **Serve** — WAV over HTTP; progress streams live over the WebSocket.

**Source-aware tone** — a URL reads as a livelier article explainer; a book scan reads calmer, opening by naming the chapter. Automatic, nothing to set. Long scans **map-reduce** instead of truncating.

See [ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full system view.

---

## Tech stack

| Layer | Technology |
|---|---|
| **Extraction** | [trafilatura](https://trafilatura.readthedocs.io/) — URL → clean text (+ browser-UA fallback); [mlx-vlm](https://github.com/Blaizzy/mlx-vlm) vision OCR for images / book scans (in-process, Metal) |
| **Summary (optional)** | [mlx-lm](https://github.com/ml-explore/mlx-lm) — in-process on Metal; default `Qwen3.5-9B-4bit`, any downloaded MLX chat model works |
| **TTS** | [CSM-1B](https://huggingface.co/senstella/csm-1b-mlx) (Sesame) via [csm-mlx](https://github.com/senstella/csm-mlx) — in-process, Metal, 24 kHz, bf16 |
| **Voices** | 2 built-in reading voices + **clone any voice from a short clip** + optional **LoRA fine-tuning** |
| **Server** | [FastAPI](https://fastapi.tiangolo.com/) + WebSocket — streams progress, serves the WAV, REST library |
| **CLI client** | Bun + TypeScript + [Ink](https://github.com/vadimdemedes/ink) — terminal UI, `afplay` playback |
| **Dashboard** | [Vue 3](https://vuejs.org/) + [Vite](https://vite.dev/) + TS — replay past reads (search/sort/player); stdlib SQLite library |

---

## Voices

CSM conditions on a short reference clip — the clip's timbre and accent are what you hear.

- **Built-in** — `conversational_a` (female ★) / `conversational_b` (male)
- **Clone** — 5–8 s mono clip + exact transcript in `config.yaml`:

```yaml
tts:
csm:
speaker: "codeword"
temperature: 0.7 # delivery: lower = composed, higher = livelier
voices:
- name: "codeword"
label: "Codeword ★"
wav: "src/voice/voice_codeword.wav"
speaker: 0
ref_text: "Exact transcript of the clip." # MUST match the audio
```

- **LoRA fine-tune** — for higher fidelity with more audio: [`src/finetune/`](src/finetune/README.md)

---

## Configuration

Edit `config.yaml` (or pass `--config path`). The defaults work out of the box.

| Key | What | Default |
|---|---|---|
| `llm.model` | MLX model for Summary mode (HuggingFace ID) | `mlx-community/Qwen3.5-9B-4bit` |
| `ocr.model` | MLX vision model for image / book-scan OCR (its own section) | `mlx-community/Qwen2.5-VL-7B-Instruct-4bit` |
| `tts.csm.speaker` | Active voice (`conversational_a`/`_b` or a clone `name`) | `codeword` |
| `tts.csm.precision` | `bf16` (clean+fast) / `fp16` / `fp32` (slowest, cleanest) | `bf16` |
| `tts.csm.temperature` | Delivery: lower = composed, higher = livelier | `0.7` |
| `tts.csm.voices` | Clone voices (`name`, `label`, `wav`, `ref_text`, `speaker`) | sample `codeword` |
| `tts.csm.lora_path` | LoRA adapter dir from a `csm-mlx finetune` run | `null` |
| `reader.default_mode` | `full` (verbatim) or `summary` (LLM) | `full` |
| `reader.output_dir` | Where generated WAVs are written/served (a `readback-audio-db/` folder beside the repo) | `../readback-audio-db/audio` |
| `reader.gap_sec` | Silence inserted between synthesized chunks | `0.18` |
| `reader.summary_max_chars` | Per-pass chunk size for Summary mode — longer inputs (book scans) are map-reduced across batches of this size, not truncated | `16000` |
| `reader.library_db` | SQLite library of past reads (powers the dashboard) | `../readback-audio-db/library.db` |

Audio + library DB default to a **`readback-audio-db/`** folder beside the repo. Point `output_dir` / `library_db` anywhere (absolute or `~` both work).

**Flags:** `readback --model `, `--host`, `--port`, `--config`. Use `--host 0.0.0.0` for LAN access.

---

## Pi deployment

Generation stays on the Mac (CSM-1B + MLX need Apple Silicon). A Raspberry Pi runs the lightweight read-only server — library REST, Vue dashboard, and audio serving — so your reads are accessible from **any browser on the network**.

The Pi runs readback under [PiZoW](https://github.com/MKS-01/pizow) (PM2, survives reboots, ~68 MB).


PiZoW Monitor showing Readback running on a Raspberry Pi

PiZoW Monitor — Readback online at 6 MB alongside the other Pi services.

```bash
# one-time setup
cp .env.example .env # fill in PI_USER, PI_HOST, PI_PATH
bash scripts/deploy-pi.sh # build dashboard → rsync → venv + pip → PM2
ssh PI_USER@PI_HOST "pm2 startup && pm2 save" # survive reboots

# after each new read on Mac
bash scripts/sync-pi.sh # incremental — only new WAVs since last sync
bash scripts/sync-pi.sh --full # or full sync (cleans orphans on Pi)
```

Dashboard is live at `http://:8090`.

---

## Design system

The Ghost palette, type scale, and every UI component — documented as live specimens you can browse locally.


readback design system — Ghost palette, tints, type scale

Ghost palette, tints, type scale, spacing, and motion tokens — the foundation every surface is built on.


readback design system — component specimens and UI kits

Component specimens (Badge, Button, SeekBar, WaveformPlayer, ReadCard…) and interactive UI Kits.

**7 type rungs** · **9 components** (Badge, Button, PromptLine, SearchInput, SeekBar, WaveformPlayer, ReadCard, Wordmark, SectionHeader) · **3 UI kits** (Terminal, Dashboard, Landing) — all interactive.

```bash
cd src/design-system && python3 -m http.server 8111
# open http://localhost:8111
```

Canonical tokens live in `src/design-system/tokens/` — the dashboard imports them via `@import`; the landing page inlines the same values (deployed standalone). The CLI mirrors the palette in `src/cli/src/theme.ts`.

---

## Documentation

| Doc | What's inside |
|---|---|
| [`docs/SETUP.md`](docs/SETUP.md) | Setup, flags, troubleshooting |
| [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | Pipeline, concurrency, WS protocol |
| [`docs/ROADMAP.md`](docs/ROADMAP.md) | What's planned and recently shipped |
| [`docs/JOURNEY.md`](docs/JOURNEY.md) | Devlog — built agent-first with Claude Code |
| [`src/cli/README.md`](src/cli/README.md) | Terminal client internals |
| [`src/dashboard/README.md`](src/dashboard/README.md) | Web dashboard (Vue 3) |
| [`src/finetune/README.md`](src/finetune/README.md) | LoRA voice fine-tuning |

---

## License

MIT — see [LICENSE](LICENSE).


Built agent-first with Claude Coderead the devlog →