https://github.com/mks-01/readback

Terminal-first offline article reader — paste a URL, hear the whole article in a natural neural voice (CSM-1B on MLX). 100% on-device on Apple Silicon. No cloud, no API keys.
https://github.com/mks-01/readback

apple-silicon article-reader bun cli csm fastapi ink local-llm macos mlx offline-first ollama terminal text-to-speech tts voice-cloning

Last synced: 3 days ago
JSON representation

Terminal-first offline article reader — paste a URL, hear the whole article in a natural neural voice (CSM-1B on MLX). 100% on-device on Apple Silicon. No cloud, no API keys.

Host: GitHub
URL: https://github.com/mks-01/readback
Owner: MKS-01
License: mit
Created: 2026-05-05T20:08:43.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-06-15T21:32:06.000Z (15 days ago)
Last Synced: 2026-06-15T23:16:13.341Z (15 days ago)
Topics: apple-silicon, article-reader, bun, cli, csm, fastapi, ink, local-llm, macos, mlx, offline-first, ollama, terminal, text-to-speech, tts, voice-cloning
Language: Python
Homepage: https://mks-01.github.io/readback/
Size: 29.5 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


  





  Paste a URL or snap a book — hear it read aloud by a neural voice, entirely on your Mac.


  No cloud. No API keys. Nothing leaves your machine.





  

  

  





  

  

  





  Landing page ·

  Getting started ·

  How it works ·

  Voices ·

  Config ·

  Pi deploy ·

  Design system ·

  Roadmap





  


  _{The terminal client mid-read: seekable player, live word-by-word transcript sync.}



---

## Getting started

> **Requires macOS on Apple Silicon (M1–M5).** The entire stack — summary LLM, vision OCR, and TTS — runs **in-process on MLX/Metal**. No external daemons, no network calls. Speed scales with GPU cores and unified memory.

**First time? One command sets it all up:**

```bash

git clone https://github.com/MKS-01/readback.git && cd readback

bash scripts/setup.sh

```

`setup.sh` is idempotent — safe to re-run. It checks platform, creates `.venv`, installs readback + CLI + dashboard, and optionally pre-downloads the MLX summary model and CSM-1B weights (~6 GB).

Needs [Bun](https://bun.sh/) — the script tells you if it's missing. Then:

```bash

readback-cli            # from anywhere; auto-starts the server

```

Paste a URL → audio plays in your shell.

Prefer to set it up by hand?

```bash

# 1. Install the server

git clone https://github.com/MKS-01/readback.git && cd readback

python3.12 -m venv .venv && source .venv/bin/activate

pip install -e .                        # csm-mlx + mlx-lm are git/PyPI deps, pulled automatically

# 2. Build + install the terminal client → ~/.local/bin/readback-cli

cd src/cli && ./install.sh && cd ..

# 3. Read something

readback-cli                            # from anywhere; auto-starts the server

```



  



The CLI auto-starts the server and kills it on exit. It's a full terminal player:

- **space** pause, **←/→** seek ±5 s, **t** toggle transcript (word-by-word highlight synced to the voice)

- `/voice`, `/model` (summary LLM, RAM-fit check), `/vision` (image/book OCR model), `/mode`, `/lib` (browse + **space** to preview inline, **enter** for full player), `/help`

- `q` to quit (or any time the input field is empty)

macOS only (`afplay` playback). Details: [`src/cli/README.md`](src/cli/README.md).

First read downloads CSM-1B weights (~6 GB) and the summary LLM (~5.5 GB) into the HuggingFace cache, then warms up the MLX graph — slow once, fast after. All three models (TTS, summary, OCR) run **in the same process** on Metal — no Ollama, no external daemon, no API keys. The vision OCR model (~5 GB) downloads lazily the first time you read an image or book scan. See [SETUP.md](docs/SETUP.md) for details.

---

## Library dashboard

Every read is saved to a local SQLite library. The dashboard lets you **replay any past read** — no LLM, no GPU, just the saved audio.



  


  _{Search, sort, and replay past reads — seekable player + word-by-word transcript highlight.}



- **Search** title / summary / URL, **sort** newest↔oldest, **paginate** 20 at a time

- **Full player** per card — click-to-seek, ±5 s skip, `space` + `←/→` keyboard shortcuts

- **Synced transcript** — word-by-word highlight in blue, same as the CLI

- **Delete** removes the DB row *and* its WAV

A lightweight **Vue 3** SPA (pure REST client). Built `dist/` is served at `/` by the same `readback` process; `bun run dev` runs Vite on `:5173` for development. Details: [`src/dashboard/README.md`](src/dashboard/README.md).

Generation stays on the CLI (Mac GPU) — the dashboard only replays. This split also enables [Pi deployment](#pi-deployment): the Mac generates, a home Pi serves the library.

---

## How it works

```mermaid

flowchart LR

    U["URL · image · book scan"] --> P

    subgraph P["readback server · 100% on-device"]

        direction LR

        E["extract
trafilatura · vision OCR"] --> L["summarize
local LLM · optional"] --> T["synthesize
CSM-1B neural TTS"]

    end

    T --> DB[("readback-audio-db
WAV files + SQLite")]

    DB --> CLI["CLI
generate + play live"]

    DB --> WEB["Dashboard
search + replay anytime"]

```

1. **Extract** — `trafilatura` pulls article text (browser-UA fallback for 403s). Images/book scans → mlx-vlm vision OCR (in-process). Folders/globs → multi-page: OCR'd in filename order and stitched into one document.

2. **Summarize** *(optional)* — mlx-lm rewrites it as a spoken explanation (in-process). Full mode skips this entirely.

3. **Synthesize** — sentence-aware chunks → CSM-1B (in-process) → silence-trimmed → fade-out → joined with small gaps. Re-reads skip the entire pipeline (cache hit by URL + mode + voice + model).

4. **Serve** — WAV over HTTP; progress streams live over the WebSocket.

**Source-aware tone** — a URL reads as a livelier article explainer; a book scan reads calmer, opening by naming the chapter. Automatic, nothing to set. Long scans **map-reduce** instead of truncating.

See [ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full system view.

---

## Tech stack

| Layer | Technology |

|---|---|

| **Extraction** | [trafilatura](https://trafilatura.readthedocs.io/) — URL → clean text (+ browser-UA fallback); [mlx-vlm](https://github.com/Blaizzy/mlx-vlm) vision OCR for images / book scans (in-process, Metal) |

| **Summary (optional)** | [mlx-lm](https://github.com/ml-explore/mlx-lm) — in-process on Metal; default `Qwen3.5-9B-4bit`, any downloaded MLX chat model works |

| **TTS** | [CSM-1B](https://huggingface.co/senstella/csm-1b-mlx) (Sesame) via [csm-mlx](https://github.com/senstella/csm-mlx) — in-process, Metal, 24 kHz, bf16 |

| **Voices** | 2 built-in reading voices + **clone any voice from a short clip** + optional **LoRA fine-tuning** |

| **Server** | [FastAPI](https://fastapi.tiangolo.com/) + WebSocket — streams progress, serves the WAV, REST library |

| **CLI client** | Bun + TypeScript + [Ink](https://github.com/vadimdemedes/ink) — terminal UI, `afplay` playback |

| **Dashboard** | [Vue 3](https://vuejs.org/) + [Vite](https://vite.dev/) + TS — replay past reads (search/sort/player); stdlib SQLite library |

---

## Voices

CSM conditions on a short reference clip — the clip's timbre and accent are what you hear.

- **Built-in** — `conversational_a` (female ★) / `conversational_b` (male)

- **Clone** — 5–8 s mono clip + exact transcript in `config.yaml`:

  ```yaml

  tts:

    csm:

      speaker: "codeword"

      temperature: 0.7          # delivery: lower = composed, higher = livelier

      voices:

        - name: "codeword"

          label: "Codeword ★"

          wav: "src/voice/voice_codeword.wav"

          speaker: 0

          ref_text: "Exact transcript of the clip."   # MUST match the audio

  ```

- **LoRA fine-tune** — for higher fidelity with more audio: [`src/finetune/`](src/finetune/README.md)

---

## Configuration

Edit `config.yaml` (or pass `--config path`). The defaults work out of the box.

| Key | What | Default |

|---|---|---|

| `llm.model` | MLX model for Summary mode (HuggingFace ID) | `mlx-community/Qwen3.5-9B-4bit` |

| `ocr.model` | MLX vision model for image / book-scan OCR (its own section) | `mlx-community/Qwen2.5-VL-7B-Instruct-4bit` |

| `tts.csm.speaker` | Active voice (`conversational_a`/`_b` or a clone `name`) | `codeword` |

| `tts.csm.precision` | `bf16` (clean+fast) / `fp16` / `fp32` (slowest, cleanest) | `bf16` |

| `tts.csm.temperature` | Delivery: lower = composed, higher = livelier | `0.7` |

| `tts.csm.voices` | Clone voices (`name`, `label`, `wav`, `ref_text`, `speaker`) | sample `codeword` |

| `tts.csm.lora_path` | LoRA adapter dir from a `csm-mlx finetune` run | `null` |

| `reader.default_mode` | `full` (verbatim) or `summary` (LLM) | `full` |

| `reader.output_dir` | Where generated WAVs are written/served (a `readback-audio-db/` folder beside the repo) | `../readback-audio-db/audio` |

| `reader.gap_sec` | Silence inserted between synthesized chunks | `0.18` |

| `reader.summary_max_chars` | Per-pass chunk size for Summary mode — longer inputs (book scans) are map-reduced across batches of this size, not truncated | `16000` |

| `reader.library_db` | SQLite library of past reads (powers the dashboard) | `../readback-audio-db/library.db` |

Audio + library DB default to a **`readback-audio-db/`** folder beside the repo. Point `output_dir` / `library_db` anywhere (absolute or `~` both work).

**Flags:** `readback --model `, `--host`, `--port`, `--config`. Use `--host 0.0.0.0` for LAN access.

---

## Pi deployment

Generation stays on the Mac (CSM-1B + MLX need Apple Silicon). A Raspberry Pi runs the lightweight read-only server — library REST, Vue dashboard, and audio serving — so your reads are accessible from **any browser on the network**.

The Pi runs readback under [PiZoW](https://github.com/MKS-01/pizow) (PM2, survives reboots, ~68 MB).



  


  _{PiZoW Monitor — Readback online at 6 MB alongside the other Pi services.}



```bash

# one-time setup

cp .env.example .env              # fill in PI_USER, PI_HOST, PI_PATH

bash scripts/deploy-pi.sh        # build dashboard → rsync → venv + pip → PM2

ssh PI_USER@PI_HOST "pm2 startup && pm2 save"   # survive reboots

# after each new read on Mac

bash scripts/sync-pi.sh          # incremental — only new WAVs since last sync

bash scripts/sync-pi.sh --full   # or full sync (cleans orphans on Pi)

```

Dashboard is live at `http://:8090`.

---

## Design system

The Ghost palette, type scale, and every UI component — documented as live specimens you can browse locally.



  


  _{Ghost palette, tints, type scale, spacing, and motion tokens — the foundation every surface is built on.}





  


  _{Component specimens (Badge, Button, SeekBar, WaveformPlayer, ReadCard…) and interactive UI Kits.}



**7 type rungs** · **9 components** (Badge, Button, PromptLine, SearchInput, SeekBar, WaveformPlayer, ReadCard, Wordmark, SectionHeader) · **3 UI kits** (Terminal, Dashboard, Landing) — all interactive.

```bash

cd src/design-system && python3 -m http.server 8111

# open http://localhost:8111

```

Canonical tokens live in `src/design-system/tokens/` — the dashboard imports them via `@import`; the landing page inlines the same values (deployed standalone). The CLI mirrors the palette in `src/cli/src/theme.ts`.

---

## Documentation

| Doc | What's inside |

|---|---|

| [`docs/SETUP.md`](docs/SETUP.md) | Setup, flags, troubleshooting |

| [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | Pipeline, concurrency, WS protocol |

| [`docs/ROADMAP.md`](docs/ROADMAP.md) | What's planned and recently shipped |

| [`docs/JOURNEY.md`](docs/JOURNEY.md) | Devlog — built agent-first with Claude Code |

| [`src/cli/README.md`](src/cli/README.md) | Terminal client internals |

| [`src/dashboard/README.md`](src/dashboard/README.md) | Web dashboard (Vue 3) |

| [`src/finetune/README.md`](src/finetune/README.md) | LoRA voice fine-tuning |

---

## License

MIT — see [LICENSE](LICENSE).



  _{Built agent-first with Claude Code — read the devlog →}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mks-01/readback

Awesome Lists containing this project

README