https://github.com/pmarreck/speakrs_ffi
C FFI for speakrs speaker diarization — PCM samples in, JSON speaker turns out. Python/ctypes-ready, Nix-first packaging (zero build-time downloads).
https://github.com/pmarreck/speakrs_ffi
Last synced: 3 days ago
JSON representation
C FFI for speakrs speaker diarization — PCM samples in, JSON speaker turns out. Python/ctypes-ready, Nix-first packaging (zero build-time downloads).
- Host: GitHub
- URL: https://github.com/pmarreck/speakrs_ffi
- Owner: pmarreck
- Created: 2026-06-10T18:38:10.000Z (8 days ago)
- Default Branch: yolo
- Last Pushed: 2026-06-10T20:42:35.000Z (8 days ago)
- Last Synced: 2026-06-10T22:13:16.000Z (8 days ago)
- Language: Rust
- Size: 719 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# speakrs_ffi
[](https://garnix.io/repo/pmarreck/speakrs_ffi)
C FFI for [speakrs](https://github.com/avencera/speakrs) — speaker diarization
(who spoke when) with pyannote-level accuracy at hundreds-of-× realtime, callable
from anything that speaks C: Python (`ctypes`), C, Zig, LuaJIT, Swift, …
```
consumer (Python / C CLI / …) ──► C FFI ──► speakrs (Rust: CoreML / ONNX Runtime)
```
Measured on an Apple Silicon Mac (CoreML mode): a 21.5-minute video diarized in
**5.8 seconds** (~220× realtime), 8 speakers, 207 turns.
## Design
**Pure in-memory transform: PCM samples in, JSON out.** The library does no
file I/O and no audio decoding — callers decode to mono 16 kHz f32 PCM first:
```sh
ffmpeg -i input.mp3 -f f32le -ac 1 -ar 16000 output.pcm
```
Every failure — including a Rust panic — returns as `{"ok":false,"error":"…"}`.
Panics never unwind across the FFI boundary.
## C API
```c
#include
const char *speakrs_ffi_version(void); /* static; do not free */
char *speakrs_ffi_diarize(const float *samples, size_t n, /* mono 16 kHz f32 PCM */
const char *opts_json); /* NULL = defaults */
void speakrs_ffi_free(char *s);
```
Options JSON (all fields optional):
```json
{
"mode": "coreml", // cpu | coreml | coreml-fast | cuda | cuda-fast | migraphx
"models_dir": "/path" // omit → auto-download from HF on first use
}
```
Default mode is `coreml` on macOS, `cpu` elsewhere. Result:
```json
{"ok": true,
"segments": [{"start": 0.14, "end": 0.99, "speaker": "SPEAKER_05"}, …],
"speakers": ["SPEAKER_00", …]}
```
## CLI
`speakrs-diarize` is a C program that consumes the FFI exactly like any
external consumer (dogfooding the header and linkage):
```sh
ffmpeg -i talk.mp3 -f f32le -ac 1 -ar 16000 - | speakrs-diarize -
speakrs-diarize --mode coreml-fast --models-dir ~/models talk.pcm
```
JSON to stdout, progress to stderr, exit 0/1.
## Python (ctypes)
```python
import ctypes, json, subprocess
lib = ctypes.CDLL("libspeakrs_ffi.dylib")
lib.speakrs_ffi_diarize.restype = ctypes.c_void_p # keep pointer for free()
lib.speakrs_ffi_diarize.argtypes = [ctypes.POINTER(ctypes.c_float), ctypes.c_size_t, ctypes.c_char_p]
pcm = subprocess.run(["ffmpeg", "-v", "error", "-i", "in.mp3",
"-f", "f32le", "-ac", "1", "-ar", "16000", "-"],
capture_output=True, check=True).stdout
buf = (ctypes.c_float * (len(pcm) // 4)).from_buffer_copy(pcm)
ptr = lib.speakrs_ffi_diarize(buf, len(buf), None)
result = json.loads(ctypes.cast(ptr, ctypes.c_char_p).value)
lib.speakrs_ffi_free(ctypes.c_void_p(ptr))
```
## Building (Nix)
```sh
./build # nix build → result/{lib,include,bin}
./test # cargo tests + CLI tests (offline) + functional test (real models)
```
**Nothing downloads during the build** — that's the point of this packaging:
| upstream default | what it does | what we use instead |
|---|---|---|
| `default-linalg` | fetches Intel MKL / static OpenBLAS at build time | `openblas-system` → nixpkgs openblas via pkg-config |
| ort prebuilt binaries | downloads ONNX Runtime during the build | `load-dynamic` → dlopen at runtime via `ORT_DYLIB_PATH` |
Models (`avencera/speakrs-models`, no HF token needed) download at **runtime**
on first use, or load offline from `models_dir`. In CoreML mode, ONNX Runtime
is never loaded at all; for `cpu`/`cuda` modes set `ORT_DYLIB_PATH` to a
`libonnxruntime` (the Nix-built CLI has a default wired in; the flake exposes
it as `packages..default.passthru.ortLib`).
CI runs real diarization hermetically: `checks.functional-test` pins the
cpu-mode model files as fixed-output derivations and diarizes a committed
two-speaker fixture (A-B-A pattern -- see `tests/fixtures/README.md`) inside
the pure sandbox, asserting exactly two speakers and correct re-identification.
## As a flake input
```nix
inputs.speakrs-ffi.url = "github:pmarreck/speakrs_ffi";
# then: speakrs-ffi.packages.${system}.default → lib/, include/, bin/
```
## License
Apache-2.0, same as speakrs.