https://github.com/alainbrown/catm
On-device long-form text-to-speech as a Chrome extension. Kokoro 82M TTS via ONNX Runtime Web (WebGPU/WASM). No server, no upload, no account.
https://github.com/alainbrown/catm
chrome-extension kokoro local-first manifest-v3 onnxruntime-web text-to-speech tts webgpu
Last synced: 3 days ago
JSON representation
On-device long-form text-to-speech as a Chrome extension. Kokoro 82M TTS via ONNX Runtime Web (WebGPU/WASM). No server, no upload, no account.
- Host: GitHub
- URL: https://github.com/alainbrown/catm
- Owner: alainbrown
- License: mit
- Created: 2026-05-25T03:56:11.000Z (9 days ago)
- Default Branch: main
- Last Pushed: 2026-05-29T21:44:37.000Z (4 days ago)
- Last Synced: 2026-05-29T23:14:38.759Z (4 days ago)
- Topics: chrome-extension, kokoro, local-first, manifest-v3, onnxruntime-web, text-to-speech, tts, webgpu
- Language: TypeScript
- Homepage: https://alainbrown.com/catm/
- Size: 35.2 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# catm — come and talk to me
[](https://github.com/alainbrown/catm/actions/workflows/test.yml)
[](./LICENSE)
[](https://chromewebstore.google.com/detail/imkogmgncjobcjeofjeekegihcikgbbm)
**Get it →** [alainbrown.com/catm](https://alainbrown.com/catm/)
A 100% in-browser long-form text-to-speech reader, shipped as a Chrome extension. Select any text on any page, right-click **Read aloud**, and the side panel reads it back to you. Synthesis runs locally on your machine — no server, no upload, no account.
The Kokoro 82M TTS model is downloaded once into the extension's HTTP cache (~310 MB) and run via ONNX Runtime Web (WebGPU when available, WASM fallback). After the first read, catm works fully offline.
## Highlights
- **Local TTS.** Kokoro 82M via ONNX Runtime Web. Text never leaves the browser.
- **Progressive playback.** Streamed HLS — start listening within a few seconds of clicking Read; the rest synthesises and appends as you go.
- **Persistent library.** Sessions are saved as fragmented MP4 in OPFS. Open old reads instantly, no re-synth.
- **Right-click to listen.** A context-menu entry on any page drops the selection into the side panel and starts reading.
- **Offline-first.** Once the model is cached, the entire extension — including all reads — works with no network.
## Run locally
```bash
npm install
npm run dev # http://localhost:5173
```
Other scripts:
| Command | What it does |
| --- | --- |
| `npm run build` | Typecheck + production build of the extension into `extension/app/` |
| `npm run lint` | Biome check (lint + format diff) |
| `npm run format` | Biome autoformat |
| `npm run check:marketing` | Verify every relative href/src in `marketing/*.html` resolves |
| `npm test` | Full suite — Vitest (unit) + Playwright (e2e) |
The marketing site at `alainbrown.com/catm` has no build step — `deploy.yml` uploads `marketing/` directly to GitHub Pages.
## Browser requirements
- Chrome (or Chromium / Edge / Brave) — the extension uses Chrome's side panel API.
- The page must be **cross-origin isolated** (COOP/COEP) so the worker can use SharedArrayBuffer and threaded WASM. The extension manifest sets the headers directly.
- WebGPU is used when available; otherwise the worker falls back to multi-threaded WASM.
## Architecture in one paragraph
`src/App.tsx` owns the state and drives a Web Worker (`src/worker/kokoro.worker.ts`) that runs Kokoro. Text is streamed sentence-by-sentence through kokoro-js (with a phoneme-token-aware splitter that respects Kokoro's 510-token input cap), each sentence is synthesised to PCM, fed through a WebCodecs AAC encoder into fragmented MP4, and written to OPFS as live HLS (`init.mp4` + `seg-N.m4s` + a continually-updated `playlist.m3u8`). `hls.js` plays it back via a custom `opfs://` loader. Session metadata lives in IndexedDB; the audio bytes live in OPFS. A small `fetch` wrapper in the worker provides a cache-first route for the Kokoro model weights from huggingface.co into the `catm-model-v1` Cache Storage bucket.
See [`CLAUDE.md`](./CLAUDE.md) for the deeper map (storage layers, worker concurrency invariant, extension ingest).
## Browser extension
`extension/` is a Chrome MV3 extension that hosts the full catm app inside Chrome's **side panel**, plus a right-click **"Read aloud"** entry that drops the current selection into the panel and opens it. The same bundle runs in a popped-out tab via the arrow icon in the panel's brand bar — both views share OPFS / IndexedDB / the cached model since they're the same `chrome-extension://` origin.
The extension is self-contained: it bundles the React app under `extension/app/` and uses no remote scripts. WebGPU works inside the panel — the worker disables ORT-Web's blob-based loaders (`numThreads = 1`, `proxy = false`, `wasmPaths = undefined`) so nothing trips MV3's CSP.
To build and load it:
```bash
npm run build
```
Then `chrome://extensions` → Developer mode → **Load unpacked** → pick the `extension/` directory.
## Privacy
Everything runs in your browser — no server, no upload, no account. Full policy in [PRIVACY.md](./PRIVACY.md).
## License
MIT.