{"id":51197257,"url":"https://github.com/mks-01/readback","last_synced_at":"2026-06-27T22:00:24.181Z","repository":{"id":356823420,"uuid":"1230238178","full_name":"MKS-01/readback","owner":"MKS-01","description":"Terminal-first offline article reader — paste a URL, hear the whole article in a natural neural voice (CSM-1B on MLX). 100% on-device on Apple Silicon. No cloud, no API keys.","archived":false,"fork":false,"pushed_at":"2026-06-15T21:32:06.000Z","size":30890,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-15T23:16:13.341Z","etag":null,"topics":["apple-silicon","article-reader","bun","cli","csm","fastapi","ink","local-llm","macos","mlx","offline-first","ollama","terminal","text-to-speech","tts","voice-cloning"],"latest_commit_sha":null,"homepage":"https://mks-01.github.io/readback/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MKS-01.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-05T20:08:43.000Z","updated_at":"2026-06-15T21:32:10.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MKS-01/readback","commit_stats":null,"previous_names":["mks-01/local-tts","mks-01/readback"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MKS-01/readback","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MKS-01%2Freadback","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MKS-01%2Freadback/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MKS-01%2Freadback/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MKS-01%2Freadback/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MKS-01","download_url":"https://codeload.github.com/MKS-01/readback/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MKS-01%2Freadback/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34869004,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-27T02:00:06.362Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","article-reader","bun","cli","csm","fastapi","ink","local-llm","macos","mlx","offline-first","ollama","terminal","text-to-speech","tts","voice-cloning"],"created_at":"2026-06-27T22:00:16.987Z","updated_at":"2026-06-27T22:00:24.152Z","avatar_url":"https://github.com/MKS-01.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/media/wordmark.png\" alt=\"readback\" width=\"487\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003ePaste a URL or snap a book — hear it read aloud by a neural voice, entirely on your Mac.\u003c/strong\u003e\u003cbr\u003e\n  No cloud. No API keys. Nothing leaves your machine.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Runs-100%25_offline-6366f1?style=for-the-badge\u0026logo=ghostery\u0026logoColor=white\" alt=\"Runs 100% offline\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Apple_Silicon-MLX_·_Metal-black?style=for-the-badge\u0026logo=apple\u0026logoColor=white\" alt=\"Apple Silicon\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Voice-CSM--1B-ec4899?style=for-the-badge\u0026logo=soundcharts\u0026logoColor=white\" alt=\"CSM-1B neural TTS\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/MKS-01/readback/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/MKS-01/readback/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/MIT-22c55e?style=flat\" alt=\"MIT License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Built_with-Claude_Code-D97757?style=flat\u0026logo=claude\u0026logoColor=white\" alt=\"Built with Claude Code\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://mks-01.github.io/readback/\"\u003eLanding page\u003c/a\u003e ·\n  \u003ca href=\"#getting-started\"\u003eGetting started\u003c/a\u003e ·\n  \u003ca href=\"#how-it-works\"\u003eHow it works\u003c/a\u003e ·\n  \u003ca href=\"#voices\"\u003eVoices\u003c/a\u003e ·\n  \u003ca href=\"#configuration\"\u003eConfig\u003c/a\u003e ·\n  \u003ca href=\"#pi-deployment\"\u003ePi deploy\u003c/a\u003e ·\n  \u003ca href=\"#design-system\"\u003eDesign system\u003c/a\u003e ·\n  \u003ca href=\"docs/ROADMAP.md\"\u003eRoadmap\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/media/cli-player.png\" alt=\"readback CLI — terminal player with the word-synced transcript highlight\" width=\"640\"\u003e\u003cbr\u003e\n  \u003csub\u003eThe terminal client mid-read: seekable player, live word-by-word transcript sync.\u003c/sub\u003e\n\u003c/p\u003e\n\n---\n\n## Getting started\n\n\u003e **Requires macOS on Apple Silicon (M1–M5).** The entire stack — summary LLM, vision OCR, and TTS — runs **in-process on MLX/Metal**. No external daemons, no network calls. Speed scales with GPU cores and unified memory.\n\n**First time? One command sets it all up:**\n\n```bash\ngit clone https://github.com/MKS-01/readback.git \u0026\u0026 cd readback\nbash scripts/setup.sh\n```\n\n`setup.sh` is idempotent — safe to re-run. It checks platform, creates `.venv`, installs readback + CLI + dashboard, and optionally pre-downloads the MLX summary model and CSM-1B weights (~6 GB).\n\nNeeds [Bun](https://bun.sh/) — the script tells you if it's missing. Then:\n\n```bash\nreadback-cli            # from anywhere; auto-starts the server\n```\n\nPaste a URL → audio plays in your shell.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003ePrefer to set it up by hand?\u003c/strong\u003e\u003c/summary\u003e\n\n```bash\n# 1. Install the server\ngit clone https://github.com/MKS-01/readback.git \u0026\u0026 cd readback\npython3.12 -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -e .                        # csm-mlx + mlx-lm are git/PyPI deps, pulled automatically\n\n# 2. Build + install the terminal client → ~/.local/bin/readback-cli\ncd src/cli \u0026\u0026 ./install.sh \u0026\u0026 cd ..\n\n# 3. Read something\nreadback-cli                            # from anywhere; auto-starts the server\n```\n\u003c/details\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/media/cli-home.png\" alt=\"readback CLI — home screen\" width=\"640\"\u003e\n\u003c/p\u003e\n\nThe CLI auto-starts the server and kills it on exit. It's a full terminal player:\n\n- **space** pause, **←/→** seek ±5 s, **t** toggle transcript (word-by-word highlight synced to the voice)\n- `/voice`, `/model` (summary LLM, RAM-fit check), `/vision` (image/book OCR model), `/mode`, `/lib` (browse + **space** to preview inline, **enter** for full player), `/help`\n- `q` to quit (or any time the input field is empty)\n\nmacOS only (`afplay` playback). Details: [`src/cli/README.md`](src/cli/README.md).\n\nFirst read downloads CSM-1B weights (~6 GB) and the summary LLM (~5.5 GB) into the HuggingFace cache, then warms up the MLX graph — slow once, fast after. All three models (TTS, summary, OCR) run **in the same process** on Metal — no Ollama, no external daemon, no API keys. The vision OCR model (~5 GB) downloads lazily the first time you read an image or book scan. See [SETUP.md](docs/SETUP.md) for details.\n\n---\n\n## Library dashboard\n\nEvery read is saved to a local SQLite library. The dashboard lets you **replay any past read** — no LLM, no GPU, just the saved audio.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/media/dashboard.png\" alt=\"readback library dashboard — searchable list of past reads with an inline player and word-synced transcript\" width=\"720\"\u003e\u003cbr\u003e\n  \u003csub\u003eSearch, sort, and replay past reads — seekable player + word-by-word transcript highlight.\u003c/sub\u003e\n\u003c/p\u003e\n\n- **Search** title / summary / URL, **sort** newest↔oldest, **paginate** 20 at a time\n- **Full player** per card — click-to-seek, ±5 s skip, `space` + `←/→` keyboard shortcuts\n- **Synced transcript** — word-by-word highlight in blue, same as the CLI\n- **Delete** removes the DB row *and* its WAV\n\nA lightweight **Vue 3** SPA (pure REST client). Built `dist/` is served at `/` by the same `readback` process; `bun run dev` runs Vite on `:5173` for development. Details: [`src/dashboard/README.md`](src/dashboard/README.md).\n\nGeneration stays on the CLI (Mac GPU) — the dashboard only replays. This split also enables [Pi deployment](#pi-deployment): the Mac generates, a home Pi serves the library.\n\n---\n\n## How it works\n\n```mermaid\nflowchart LR\n    U[\"URL · image · book scan\"] --\u003e P\n\n    subgraph P[\"readback server · 100% on-device\"]\n        direction LR\n        E[\"extract\u003cbr/\u003etrafilatura · vision OCR\"] --\u003e L[\"summarize\u003cbr/\u003elocal LLM · optional\"] --\u003e T[\"synthesize\u003cbr/\u003eCSM-1B neural TTS\"]\n    end\n\n    T --\u003e DB[(\"readback-audio-db\u003cbr/\u003eWAV files + SQLite\")]\n    DB --\u003e CLI[\"CLI\u003cbr/\u003egenerate + play live\"]\n    DB --\u003e WEB[\"Dashboard\u003cbr/\u003esearch + replay anytime\"]\n```\n\n1. **Extract** — `trafilatura` pulls article text (browser-UA fallback for 403s). Images/book scans → mlx-vlm vision OCR (in-process). Folders/globs → multi-page: OCR'd in filename order and stitched into one document.\n2. **Summarize** *(optional)* — mlx-lm rewrites it as a spoken explanation (in-process). Full mode skips this entirely.\n3. **Synthesize** — sentence-aware chunks → CSM-1B (in-process) → silence-trimmed → fade-out → joined with small gaps. Re-reads skip the entire pipeline (cache hit by URL + mode + voice + model).\n4. **Serve** — WAV over HTTP; progress streams live over the WebSocket.\n\n**Source-aware tone** — a URL reads as a livelier article explainer; a book scan reads calmer, opening by naming the chapter. Automatic, nothing to set. Long scans **map-reduce** instead of truncating.\n\nSee [ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full system view.\n\n---\n\n## Tech stack\n\n| Layer | Technology |\n|---|---|\n| **Extraction** | [trafilatura](https://trafilatura.readthedocs.io/) — URL → clean text (+ browser-UA fallback); [mlx-vlm](https://github.com/Blaizzy/mlx-vlm) vision OCR for images / book scans (in-process, Metal) |\n| **Summary (optional)** | [mlx-lm](https://github.com/ml-explore/mlx-lm) — in-process on Metal; default `Qwen3.5-9B-4bit`, any downloaded MLX chat model works |\n| **TTS** | [CSM-1B](https://huggingface.co/senstella/csm-1b-mlx) (Sesame) via [csm-mlx](https://github.com/senstella/csm-mlx) — in-process, Metal, 24 kHz, bf16 |\n| **Voices** | 2 built-in reading voices + **clone any voice from a short clip** + optional **LoRA fine-tuning** |\n| **Server** | [FastAPI](https://fastapi.tiangolo.com/) + WebSocket — streams progress, serves the WAV, REST library |\n| **CLI client** | Bun + TypeScript + [Ink](https://github.com/vadimdemedes/ink) — terminal UI, `afplay` playback |\n| **Dashboard** | [Vue 3](https://vuejs.org/) + [Vite](https://vite.dev/) + TS — replay past reads (search/sort/player); stdlib SQLite library |\n\n---\n\n## Voices\n\nCSM conditions on a short reference clip — the clip's timbre and accent are what you hear.\n\n- **Built-in** — `conversational_a` (female ★) / `conversational_b` (male)\n- **Clone** — 5–8 s mono clip + exact transcript in `config.yaml`:\n\n  ```yaml\n  tts:\n    csm:\n      speaker: \"codeword\"\n      temperature: 0.7          # delivery: lower = composed, higher = livelier\n      voices:\n        - name: \"codeword\"\n          label: \"Codeword ★\"\n          wav: \"src/voice/voice_codeword.wav\"\n          speaker: 0\n          ref_text: \"Exact transcript of the clip.\"   # MUST match the audio\n  ```\n\n- **LoRA fine-tune** — for higher fidelity with more audio: [`src/finetune/`](src/finetune/README.md)\n\n---\n\n## Configuration\n\nEdit `config.yaml` (or pass `--config path`). The defaults work out of the box.\n\n| Key | What | Default |\n|---|---|---|\n| `llm.model` | MLX model for Summary mode (HuggingFace ID) | `mlx-community/Qwen3.5-9B-4bit` |\n| `ocr.model` | MLX vision model for image / book-scan OCR (its own section) | `mlx-community/Qwen2.5-VL-7B-Instruct-4bit` |\n| `tts.csm.speaker` | Active voice (`conversational_a`/`_b` or a clone `name`) | `codeword` |\n| `tts.csm.precision` | `bf16` (clean+fast) / `fp16` / `fp32` (slowest, cleanest) | `bf16` |\n| `tts.csm.temperature` | Delivery: lower = composed, higher = livelier | `0.7` |\n| `tts.csm.voices` | Clone voices (`name`, `label`, `wav`, `ref_text`, `speaker`) | sample `codeword` |\n| `tts.csm.lora_path` | LoRA adapter dir from a `csm-mlx finetune` run | `null` |\n| `reader.default_mode` | `full` (verbatim) or `summary` (LLM) | `full` |\n| `reader.output_dir` | Where generated WAVs are written/served (a `readback-audio-db/` folder beside the repo) | `../readback-audio-db/audio` |\n| `reader.gap_sec` | Silence inserted between synthesized chunks | `0.18` |\n| `reader.summary_max_chars` | Per-pass chunk size for Summary mode — longer inputs (book scans) are map-reduced across batches of this size, not truncated | `16000` |\n| `reader.library_db` | SQLite library of past reads (powers the dashboard) | `../readback-audio-db/library.db` |\n\nAudio + library DB default to a **`readback-audio-db/`** folder beside the repo. Point `output_dir` / `library_db` anywhere (absolute or `~` both work).\n\n**Flags:** `readback --model \u003cname\u003e`, `--host`, `--port`, `--config`. Use `--host 0.0.0.0` for LAN access.\n\n---\n\n## Pi deployment\n\nGeneration stays on the Mac (CSM-1B + MLX need Apple Silicon). A Raspberry Pi runs the lightweight read-only server — library REST, Vue dashboard, and audio serving — so your reads are accessible from **any browser on the network**.\n\nThe Pi runs readback under [PiZoW](https://github.com/MKS-01/pizow) (PM2, survives reboots, ~68 MB).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/media/home-server.png\" alt=\"PiZoW Monitor showing Readback running on a Raspberry Pi\" width=\"720\"\u003e\u003cbr\u003e\n  \u003csub\u003ePiZoW Monitor — Readback online at 6 MB alongside the other Pi services.\u003c/sub\u003e\n\u003c/p\u003e\n\n```bash\n# one-time setup\ncp .env.example .env              # fill in PI_USER, PI_HOST, PI_PATH\nbash scripts/deploy-pi.sh        # build dashboard → rsync → venv + pip → PM2\nssh PI_USER@PI_HOST \"pm2 startup \u0026\u0026 pm2 save\"   # survive reboots\n\n# after each new read on Mac\nbash scripts/sync-pi.sh          # incremental — only new WAVs since last sync\nbash scripts/sync-pi.sh --full   # or full sync (cleans orphans on Pi)\n```\n\nDashboard is live at `http://\u003cPI_HOST\u003e:8090`.\n\n---\n\n## Design system\n\nThe Ghost palette, type scale, and every UI component — documented as live specimens you can browse locally.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/media/design-system.png\" alt=\"readback design system — Ghost palette, tints, type scale\" width=\"720\"\u003e\u003cbr\u003e\n  \u003csub\u003eGhost palette, tints, type scale, spacing, and motion tokens — the foundation every surface is built on.\u003c/sub\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/media/design-system-components.png\" alt=\"readback design system — component specimens and UI kits\" width=\"720\"\u003e\u003cbr\u003e\n  \u003csub\u003eComponent specimens (Badge, Button, SeekBar, WaveformPlayer, ReadCard…) and interactive UI Kits.\u003c/sub\u003e\n\u003c/p\u003e\n\n**7 type rungs** · **9 components** (Badge, Button, PromptLine, SearchInput, SeekBar, WaveformPlayer, ReadCard, Wordmark, SectionHeader) · **3 UI kits** (Terminal, Dashboard, Landing) — all interactive.\n\n```bash\ncd src/design-system \u0026\u0026 python3 -m http.server 8111\n# open http://localhost:8111\n```\n\nCanonical tokens live in `src/design-system/tokens/` — the dashboard imports them via `@import`; the landing page inlines the same values (deployed standalone). The CLI mirrors the palette in `src/cli/src/theme.ts`.\n\n---\n\n## Documentation\n\n| Doc | What's inside |\n|---|---|\n| [`docs/SETUP.md`](docs/SETUP.md) | Setup, flags, troubleshooting |\n| [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | Pipeline, concurrency, WS protocol |\n| [`docs/ROADMAP.md`](docs/ROADMAP.md) | What's planned and recently shipped |\n| [`docs/JOURNEY.md`](docs/JOURNEY.md) | Devlog — built agent-first with Claude Code |\n| [`src/cli/README.md`](src/cli/README.md) | Terminal client internals |\n| [`src/dashboard/README.md`](src/dashboard/README.md) | Web dashboard (Vue 3) |\n| [`src/finetune/README.md`](src/finetune/README.md) | LoRA voice fine-tuning |\n\n---\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n\u003cp align=\"center\"\u003e\n  \u003csub\u003eBuilt agent-first with \u003ca href=\"https://claude.ai/code\"\u003eClaude Code\u003c/a\u003e — \u003ca href=\"docs/JOURNEY.md\"\u003eread the devlog →\u003c/a\u003e\u003c/sub\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmks-01%2Freadback","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmks-01%2Freadback","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmks-01%2Freadback/lists"}