https://github.com/elien666/diarize
On-device speaker diarization and transcription for macOS — CLI, SwiftUI app, and Swift library powered by FluidAudio and GRDB.
https://github.com/elien666/diarize
audio cli coreml fluidaudio grdb macos speaker-diarization swift swiftui transcription
Last synced: 7 days ago
JSON representation
On-device speaker diarization and transcription for macOS — CLI, SwiftUI app, and Swift library powered by FluidAudio and GRDB.
- Host: GitHub
- URL: https://github.com/elien666/diarize
- Owner: elien666
- License: mit
- Created: 2026-05-18T07:31:29.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-06-15T13:37:46.000Z (18 days ago)
- Last Synced: 2026-06-15T14:16:33.805Z (18 days ago)
- Topics: audio, cli, coreml, fluidaudio, grdb, macos, speaker-diarization, swift, swiftui, transcription
- Language: Swift
- Size: 1.32 MB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# diarize
**On-device speaker diarization and transcription for macOS — CLI, SwiftUI app, and Swift library.**
`diarize` records audio (microphone, system audio, or both), splits it by speaker, transcribes each segment, and matches voices across recordings so the same person keeps the same identity over time. Everything runs locally on Apple Silicon — no cloud, no API keys.
Built on [FluidAudio](https://github.com/FluidInference/FluidAudio) (diarization + ASR via Core ML), [GRDB](https://github.com/groue/GRDB.swift) (SQLite with FTS5 full-text search), and Swift 6.
---
## Features
- **Record & transcribe in one step** — capture mic + system audio simultaneously (great for meetings), auto-transcribe on stop. → [docs](docs/recording.md)
- **Stereo channel separation** — when recording mic + system audio, each goes on its own channel (mic = left, system = right) and is diarized independently, so speaker echo never collapses everyone into one voice. → [docs](docs/recording.md#mic--system-audio-together-stereo-separation)
- **Auto Recording Mode** — detects when a call starts (another app grabs the mic) and records it hands-free, stopping and transcribing on its own. → [docs](docs/auto-recording.md)
- **Cross-recording speaker matching** — voice embeddings are stored once; the same person is recognized in every future recording. → [docs](docs/transcripts-and-speakers.md#how-speakers-are-recognized)
- **Manual speaker correction** — rename speakers globally, reassign or split segments, and merge duplicate identities when the diarizer guesses wrong. → [docs](docs/transcripts-and-speakers.md#correcting-speakers)
- **Synced playback** — play the audio and watch the transcript highlight and auto-scroll; click any timestamp to jump. → [docs](docs/transcripts-and-speakers.md#reading-a-transcript)
- **Live recording feedback** — per-device level meters, mic selection, and automatic recovery if the input device changes mid-recording. → [docs](docs/recording.md#live-level-meters)
- **Full-text search** — SQLite FTS5 across every transcript, with snippets and ranking. → [docs](docs/search.md)
- **Folders & organization** — group recordings into nested folders with drag-and-drop and inline rename. → [docs](docs/organizing.md)
- **Privacy-first** — fully on-device; delete raw audio while keeping transcripts (GDPR-friendly), with optional auto-clean of old audio and a menu-bar stealth mode. → [docs](docs/privacy.md)
- **MCP server for agents** — expose the library to local AI agents over [Model Context Protocol](https://modelcontextprotocol.io): read recordings/speakers, find unprocessed work, mark recordings processed, retry failed analyses, manage titles and folders, and assess + correct diarization quality (reassign mis-attributed segments, name/merge speakers, split turns) — all on-device. → [docs](docs/mcp.md)
- **Markdown + JSON output** — transcripts are written as readable Markdown and queryable JSON.
- **Local archive** — recordings, transcripts, and the speaker database live under `~/Library/Application Support/diarize/` (configurable).
- **Two front-ends + an agent interface** — a scriptable CLI (`diarize`) and a native SwiftUI app (`diarize-app`), plus an MCP server, all backed by the same `DiarizeCore` library.
📖 **New here?** Start with the [User Guide](docs/README.md).
## Requirements
- macOS 14 (Sonoma) or newer
- Apple Silicon (M1+) recommended — Core ML models run on the Neural Engine
- Swift 6 / Xcode 16
- Microphone permission (for `record`); Screen Recording permission (for system-audio capture)
## Install
```sh
git clone https://github.com/elien666/diarize.git
cd diarize
swift build -c release
cp .build/release/diarize /usr/local/bin/ # or anywhere on $PATH
```
To build the SwiftUI app:
```sh
./Scripts/build-app.sh
open build/Diarize.app
```
## CLI quick start
```sh
# Transcribe an existing audio file (mp3, wav, m4a, …)
diarize transcribe meeting.m4a --lang en --title "Q2 planning"
# Record mic + system audio, auto-transcribe on stop (Ctrl-C)
diarize record --title "1:1 with Sam"
# Search across every transcript
diarize search "roadmap"
# Manage the speaker library
diarize speakers list
diarize speakers label spk_a1b2c3 "Sam"
diarize speakers merge spk_a1b2c3 spk_d4e5f6
# Inspect or reprocess the archive
diarize archive list
diarize archive reprocess
# Show / change config
diarize config show
diarize config set default.language en
# Serve the library to local AI agents (Model Context Protocol)
diarize mcp
```
All commands accept `--help` for full options. Full command reference: [docs/cli.md](docs/cli.md).
## Documentation
User-facing guides live in [`docs/`](docs/README.md):
| Guide | What it covers |
| --- | --- |
| [Getting Started](docs/getting-started.md) | Install, permissions, first recording |
| [Recording](docs/recording.md) | Sources, mic selection, level meters, stereo separation |
| [Auto Recording Mode](docs/auto-recording.md) | Hands-free call capture |
| [Transcripts & Speakers](docs/transcripts-and-speakers.md) | Reading transcripts and correcting speakers |
| [Organizing Recordings](docs/organizing.md) | Folders, drag-and-drop, renaming |
| [Search](docs/search.md) | Full-text search across transcripts |
| [Privacy & Data](docs/privacy.md) | On-device processing, audio deletion, stealth mode |
| [Settings](docs/settings.md) | Language, matching threshold, archive, maintenance |
| [CLI Reference](docs/cli.md) | Every `diarize` command and option |
| [MCP Server](docs/mcp.md) | Expose the library to local AI agents (tools, setup, safety) |
## Configuration
Resolution order (highest wins): **CLI flag → env var → `~/.config/diarize/config.json` → default**.
| Key | Env var | Default |
| ---------------------- | -------------------------------- | -------------------------------------------------- |
| `archive.path` | `DIARIZE_ARCHIVE_PATH` | `~/Library/Application Support/diarize/archive` |
| `default.language` | `DIARIZE_LANG_DEFAULT` | `auto` (also: `de`, `en`) |
| `similarity.threshold` | `DIARIZE_SIMILARITY_THRESHOLD` | `0.6` (cosine similarity for speaker matching) |
## Project layout
```
Sources/
DiarizeCore/ Library: audio I/O, diarization, ASR, storage, search
Audio/ Recorder, mixer, loader, WAV writer
Pipeline/ Diarization, transcription, speaker matching, calibration
Storage/ GRDB models, migrations, speaker store
Render/ Markdown + JSON renderers
MCP/ Model Context Protocol server (tools, resources) for AI agents
DiarizeCLI/ `diarize` executable (ArgumentParser)
DiarizeApp/ `diarize-app` SwiftUI app (sidebar/folders, recording detail,
search, auto-recording mode, permissions, privacy cleanup, menu bar)
Resources/icon/ App icon (SVG + .icns)
Scripts/ Build helpers (app bundle, icon, code signing)
Tests/ DiarizeCore unit tests
```
## How it works
1. **Capture** — `AudioRecorder` taps the microphone via `AVAudioEngine` and system audio via a `ScreenCaptureKit` / CoreAudio process tap; `AudioMixer` writes a WAV. With both sources active it writes **stereo** (mic = left, system = right) so the two can be diarized in isolation; a single source is written mono.
2. **Diarize** — FluidAudio segments the waveform by speaker and emits an embedding per segment. For stereo recordings each channel is diarized independently and merged with `local` / `remote` prefixes, avoiding echo-induced speaker confusion.
3. **Match** — `SpeakerMatcher` compares each new embedding against the SQLite speaker library (cosine similarity ≥ threshold) and either reuses an existing speaker ID or mints a new one.
4. **Transcribe** — each segment is fed to FluidAudio's ASR model in the chosen language.
5. **Persist** — `SpeakerStore` writes recording, segments, and transcript text into SQLite (with FTS5); Markdown + JSON renderers produce human-readable artifacts under the archive.
## License
MIT — see [LICENSE](LICENSE).
## Acknowledgements
- [FluidAudio](https://github.com/FluidInference/FluidAudio) — Core ML diarization and ASR
- [GRDB.swift](https://github.com/groue/GRDB.swift) — SQLite toolkit
- [swift-argument-parser](https://github.com/apple/swift-argument-parser) — CLI