{"id":45219068,"url":"https://github.com/ssmall256/mlx-spectro","last_synced_at":"2026-03-02T02:13:13.446Z","repository":{"id":339809987,"uuid":"1160304816","full_name":"ssmall256/mlx-spectro","owner":"ssmall256","description":"High-performance STFT/iSTFT for Apple MLX with fused Metal kernels and autograd support","archived":false,"fork":false,"pushed_at":"2026-02-20T16:22:25.000Z","size":92,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-21T23:20:08.746Z","etag":null,"topics":["apple-silicon","audio","gpu","istft","machine-learning","metal","mlx","signal-processing","spectral","stft"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssmall256.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-17T19:32:15.000Z","updated_at":"2026-02-19T21:59:55.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ssmall256/mlx-spectro","commit_stats":null,"previous_names":["ssmall256/mlx-spectro"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ssmall256/mlx-spectro","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssmall256%2Fmlx-spectro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssmall256%2Fmlx-spectro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssmall256%2Fmlx-spectro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssmall256%2Fmlx-spectro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssmall256","download_url":"https://codeload.github.com/ssmall256/mlx-spectro/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssmall256%2Fmlx-spectro/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29725287,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T19:57:12.410Z","status":"ssl_error","status_checked_at":"2026-02-22T19:54:50.710Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","audio","gpu","istft","machine-learning","metal","mlx","signal-processing","spectral","stft"],"created_at":"2026-02-20T18:07:57.341Z","updated_at":"2026-03-02T02:13:13.439Z","avatar_url":"https://github.com/ssmall256.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mlx-spectro\n\nHigh-performance STFT/iSTFT for [Apple MLX](https://github.com/ml-explore/mlx) — **2–3x faster STFT** and **5–8x faster iSTFT** than `torch.stft`/`torch.istft` on MPS, via fused Metal kernels.\n\n```python\nfrom mlx_spectro import SpectralTransform\n\ntransform = SpectralTransform(n_fft=2048, hop_length=512, window_fn=\"hann\")\n\nspec = transform.stft(audio)                      # [B, T] → complex spectrogram\nreconstructed = transform.istft(spec, length=T)    # complex spectrogram → [B, T]\n```\n\n```python\nfrom mlx_spectro import MelSpectrogramTransform\n\nmel = MelSpectrogramTransform(\n    sample_rate=24000,\n    n_fft=2048,\n    hop_length=240,\n    n_mels=128,\n    top_db=80.0,\n    mode=\"torchaudio_compat\",\n)\nmel_db = mel(audio)  # [B, n_mels, frames]\n```\n\n[mlx-audio-separator](https://github.com/ssmall256/mlx-audio-separator) uses mlx-spectro for MLX-native stem separation (Roformer, MDX, Demucs) and runs **1.8–3.1x faster end-to-end** than python-audio-separator on torch+MPS. See [benchmarks](#real-world-mlx-audio-separator) below.\n\n## Install\n\n```bash\npip install mlx-spectro\n```\n\nWith optional torch fallback support:\n\n```bash\npip install mlx-spectro[torch]\n```\n\n## Features\n\n- Fused overlap-add with autotuned Metal kernels\n- PyTorch-compatible STFT/iSTFT semantics\n- Cached transforms for zero-overhead repeated calls\n- Differentiable transforms for training with `mx.grad`\n- `mx.compile`-friendly for tight inference loops\n- Optional torch fallback for strict numerical parity\n\n## Quick Start\n\n```python\nimport mlx.core as mx\nfrom mlx_spectro import SpectralTransform\n\ntransform = SpectralTransform(\n    n_fft=2048,\n    hop_length=512,\n    window_fn=\"hann\",\n)\n\naudio = mx.random.normal((1, 44100))\nspec = transform.stft(audio, output_layout=\"bnf\")\nreconstructed = transform.istft(spec, length=44100, input_layout=\"bnf\")\n```\n\n## API\n\n### `SpectralTransform`\n\nMain class for STFT/iSTFT operations.\n\n```python\nSpectralTransform(\n    n_fft: int,\n    hop_length: int,\n    win_length: int | None = None,\n    window_fn: str = \"hann\",       # \"hann\", \"hamming\", \"rect\"\n    window: mx.array | None = None,  # custom window array\n    periodic: bool = True,\n    center: bool = True,\n    normalized: bool = False,\n    istft_backend_policy: str | None = None,  # \"auto\", \"mlx_fft\", \"metal\", \"torch_fallback\"\n)\n```\n\n**Methods:**\n- `stft(x, output_layout=\"bfn\")` — Forward STFT. Input: `[T]` or `[B, T]`.\n- `istft(z, length=None, ...)` — Inverse STFT. Returns `[B, T]`.\n- `compiled_pair(length, layout=\"bnf\", warmup_batch=None)` — Return compiled `(stft_fn, istft_fn)` for steady-state loops (10–20% faster).\n- `warmup(batch=1, length=4096)` — Force kernel compilation.\n\n### `MelSpectrogramTransform`\n\nMel frontend powered by `SpectralTransform`.\n\n```python\nMelSpectrogramTransform(\n    sample_rate: int = 24000,\n    n_fft: int = 2048,\n    hop_length: int = 240,\n    win_length: int | None = None,\n    n_mels: int = 128,\n    f_min: float = 0.0,\n    f_max: float | None = None,\n    power: float = 2.0,\n    norm: str | None = None,      # None or \"slaney\"\n    mel_scale: str = \"htk\",       # \"htk\" or \"slaney\"\n    top_db: float | None = 80.0,\n    mode: str = \"mlx_native\",     # \"mlx_native\" or \"torchaudio_compat\"; \"default\" alias -\u003e \"mlx_native\"\n)\n```\n\n**Methods:**\n- `spectrogram(x)` — Returns power spectrogram `[B, F, N]`.\n- `mel_spectrogram(x, to_db=True)` / `__call__(x, to_db=True)` — Returns `[B, n_mels, N]`.\n\n**Mode semantics:**\n- `mode=\"mlx_native\"`: per-example `top_db` clipping (batch-independent behavior).\n- `mode=\"torchaudio_compat\"`: torchaudio-compatible packed-batch clipping semantics for parity-sensitive pipelines.\n\n### `get_transform_mlx(**kwargs)`\n\nFactory that returns cached `SpectralTransform` instances for repeated use.\n\n### `make_window(window, window_fn, win_length, n_fft, periodic)`\n\nCreate or validate a 1D analysis window.\n\n### `resolve_fft_params(n_fft, hop_length, win_length, pad)`\n\nResolve effective FFT parameters with PyTorch-compatible defaults.\n\n## Benchmarks\n\nApple M4 Max, macOS 26.3, MLX 0.30.6, PyTorch 2.10.0, 20 iterations (5 warmup).\n\n### STFT Forward\n\n| Config | mlx-spectro | torch MPS | mlx-stft | vs torch | vs mlx-stft |\n|---|---|---|---|---|---|\n| B=1 T=16k nfft=512 | 0.16 ms | 0.21 ms | 0.31 ms | 1.4x | 1.9x |\n| B=4 T=160k nfft=1024 | 0.37 ms | 1.00 ms | 1.09 ms | **2.7x** | **3.0x** |\n| B=8 T=160k nfft=1024 | 0.28 ms | 0.71 ms | 1.53 ms | **2.5x** | **5.6x** |\n| B=4 T=1.3M nfft=1024 | 0.77 ms | 2.18 ms | 5.03 ms | **2.8x** | **6.5x** |\n| B=8 T=480k nfft=1024 | 0.58 ms | 1.30 ms | 3.73 ms | **2.2x** | **6.4x** |\n\n### iSTFT Forward\n\n| Config | mlx-spectro | torch MPS | mlx-stft | vs torch | vs mlx-stft |\n|---|---|---|---|---|---|\n| B=1 T=16k nfft=512 | 0.17 ms | 0.49 ms | 0.25 ms | 3.0x | 1.5x |\n| B=4 T=160k nfft=1024 | 0.21 ms | 1.00 ms | 0.98 ms | **4.7x** | **4.7x** |\n| B=8 T=160k nfft=1024 | 0.30 ms | 1.61 ms | 1.62 ms | **5.4x** | **5.4x** |\n| B=4 T=1.3M nfft=1024 | 0.81 ms | 5.76 ms | 6.68 ms | **7.1x** | **8.2x** |\n| B=8 T=480k nfft=1024 | 0.60 ms | 4.10 ms | 4.55 ms | **6.8x** | **7.6x** |\n\n### Roundtrip (STFT → iSTFT) Forward + Backward\n\n| Config | mlx-spectro | torch MPS | vs torch |\n|---|---|---|---|\n| B=4 T=160k nfft=1024 | 0.62 ms | 2.25 ms | **3.6x** |\n| B=8 T=160k nfft=1024 | 1.04 ms | 4.38 ms | **4.2x** |\n| B=4 T=480k nfft=1024 | 1.59 ms | 6.59 ms | **4.1x** |\n| B=4 T=1.3M nfft=1024 | 4.33 ms | 17.63 ms | **4.1x** |\n| B=1 T=1.3M nfft=1024 | 1.21 ms | 4.20 ms | **3.5x** |\n\n### Roundtrip Accuracy (STFT → iSTFT max abs error)\n\n| Config | mlx-spectro | torch MPS |\n|---|---|---|\n| B=1 T=16k nfft=512 | 1.67e-06 | 2.38e-06 |\n| B=4 T=160k nfft=2048 | 2.86e-06 | 5.25e-06 |\n| B=8 T=480k nfft=1024 | 3.81e-06 | 4.77e-06 |\n\nTo reproduce:\n- Full suite: `python scripts/benchmark.py`\n- Dispatch overhead profile: `python scripts/benchmark.py --dispatch-profile`\n\n### Real-world: mlx-audio-separator\n\n[mlx-audio-separator](https://github.com/ssmall256/mlx-audio-separator) is an MLX-native music stem separation library supporting Roformer, MDX, Demucs, and more. End-to-end separation speedup vs python-audio-separator (torch on MPS), measured on 30s stereo 44.1 kHz tracks. Apple M4 Max, PyTorch 2.10.0, MLX 0.30.6, ABBA ordering, 2 repeats.\n\n| Model | Arch | torch+MPS (s) | MLX (s) | E2E speedup |\n|---|---|--:|--:|--:|\n| UVR-MDX-NET-Inst_HQ_3 | MDX | 4.25 | 1.36 | **3.1x** |\n| htdemucs | Demucs | 3.35 | 1.29 | **2.6x** |\n| Mel-Roformer Karaoke | MDXC | 5.60 | 2.66 | **2.1x** |\n| BS-Roformer | MDXC | 6.48 | 3.56 | **1.8x** |\n\nSTFT/iSTFT kernel speedups within these pipelines are even larger (2–3x STFT, 5–8x iSTFT vs torch).\n\n### Compiled Mode\n\nFor tight inference loops with fixed input shapes, `compiled_pair` eliminates\nper-call Python dispatch overhead (10–20% faster for small workloads):\n\n```python\nt = SpectralTransform(n_fft=1024, hop_length=256, window_fn=\"hann\")\nstft, istft = t.compiled_pair(length=44100, warmup_batch=2)\n\nfor chunk in audio_stream:\n    z = stft(chunk)\n    z = process(z)\n    y = istft(z)\n    mx.eval(y)\n```\n\nUse the eager `t.stft()` / `t.istft()` methods when input shapes vary.\n\n## Environment Variables\n\n| Variable | Default | Description |\n|---|---|---|\n| `SPEC_MLX_AUTOTUNE` | `1` | Enable Metal kernel autotuning |\n| `SPEC_MLX_TGX` | — | Force threadgroup size (e.g. `256` or `kernel:256`) |\n| `SPEC_MLX_AUTOTUNE_PERSIST` | `1` | Persist autotune results to disk |\n| `SPEC_MLX_AUTOTUNE_CACHE_PATH` | — | Override autotune cache file path |\n| `MLX_OLA_FUSE_NORM` | `1` | Enable fused OLA+normalization kernel |\n| `SPEC_MLX_CACHE_STATS` | `0` | Enable cache debug counters |\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssmall256%2Fmlx-spectro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssmall256%2Fmlx-spectro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssmall256%2Fmlx-spectro/lists"}