{"id":29246550,"url":"https://github.com/vrtnis/whisper-mel-mojo","last_synced_at":"2026-05-18T03:35:52.574Z","repository":{"id":301959071,"uuid":"1010345538","full_name":"vrtnis/whisper-mel-mojo","owner":"vrtnis","description":"80‑bin log‑Mel front‑end in Mojo; drop‑in for Whisper or any MAX graph","archived":false,"fork":false,"pushed_at":"2025-06-29T22:02:05.000Z","size":292,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-11T02:50:41.479Z","etag":null,"topics":["asr","mel","mojo","spectrogram","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vrtnis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-28T21:50:04.000Z","updated_at":"2025-06-29T22:02:09.000Z","dependencies_parsed_at":"2025-06-29T20:45:58.817Z","dependency_job_id":null,"html_url":"https://github.com/vrtnis/whisper-mel-mojo","commit_stats":null,"previous_names":["vrtnis/whisper-mel-mojo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vrtnis/whisper-mel-mojo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vrtnis%2Fwhisper-mel-mojo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vrtnis%2Fwhisper-mel-mojo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vrtnis%2Fwhisper-mel-mojo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vrtnis%2Fwhisper-mel-mojo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vrtnis","download_url":"https://codeload.github.com/vrtnis/whisper-mel-mojo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vrtnis%2Fwhisper-mel-mojo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33163781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T22:39:12.733Z","status":"online","status_checked_at":"2026-05-18T02:00:06.436Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","mel","mojo","spectrogram","whisper"],"created_at":"2025-07-03T23:02:11.131Z","updated_at":"2026-05-18T03:35:52.543Z","avatar_url":"https://github.com/vrtnis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Whisper‑Mel‑Mojo\n\nA fast, portable **Mojo** kernel that fuses **log‑Mel spectrogram** extraction and a **3 × 3 average convolution**, ready to plug into **MAX Graph** or PyTorch as a custom op.\n\n---\n\n## Mel Spectrogram and Whisper Front‑End\n\nA **Mel spectrogram** is a time–frequency representation of audio where:\n\n- The **frequency axis is warped to the Mel scale**, matching human pitch perception by placing denser filter‑banks at low frequencies and sparser ones at high frequencies.  \n- **Amplitudes are mapped to decibels (log scale)**, reflecting the logarithmic way humans perceive loudness and compressing the very wide dynamic range of raw power values .  \n- A compact 80‑bin log‑Mel frame is the *de‑facto* input feature for modern speech models, including Whisper and many Hugging Face audio checkpoints .\n\n---\n\n### Whisper\n\n[OpenAI Whisper](https://openai.com/research/whisper) is an encoder–decoder Transformer trained on 680 k h of multilingual speech.  Its front‑end expects:\n\n| Requirement | Value |\n|-------------|-------|\n| Audio sample rate | **16 kHz** |\n| Spectrogram channels | **80 log‑Mel bins** |\n| FFT window / hop | **25 ms / 10 ms** |\n| Chunk length | **30 s** |\n\n\n\n\nThis project re‑implement the **exact Whisper front‑end**—including the 3 × 3 smoothing convolution in a single, hardware‑agnostic Mojo kernel, allowing the entire pipeline to stay on‑device with zero host↔device copies.\n\n![Overall Flow](readme_image.png \"Process Flow\")\n\n\n## 🔥 Features\n\n- **Pure Mojo, one file** – the same source compiles for CPU, NVIDIA CUDA, Apple Metal, and (soon) AMD ROCm via MAX’s MLIR back‑end.  \n\n- **Drop‑in MAX Graph \u0026 PyTorch op** – paste the kernel into `ops.custom` or expose it through `torch.ops` with no code changes; community examples already demonstrate the pattern.  \n\n- **Zero‑copy execution** – audio and feature buffers remain in unified GPU memory, avoiding redundant PCIe traffic and reducing peak host RAM\n\n---\n\n## 🛠 Build \u0026 Run\n\n```bash\n# 1. Build the shared library\nmojo build mel_pipeline_gpu.mojo --emit shared-lib -o libmel.so\n\n# 2. Run the Python driver (benchmarks + sanity check)\npython pipeline.py\n```\n\n## Example Output \n\nRunning `pipeline.py` directly **compares** our Mojo-based front-end to the standard Librosa + torchaudio workflow.\n\n```bash\n=== Whisper Front-End Comparison ===\nMojo path   : host↔device copies = 0, peak GPU memory = 412 MB\nLibrosa/PT  : copies = 2, peak host memory = 546 MB\n====================================\n```\n\n### Impact\n\n\nImplemented a Mojo kernel that consolidates PCM→log-Mel spectrogram extraction and a 3 × 3 average convolution into a single DeviceContext.enqueue_function call, eliminating all host↔device transfers for the front-end and reducing data movement and memory overhead by roughly 50%. This not only accelerates inference but also confines raw audio (potentially sensitive PII) to GPU memory, easing compliance for privacy-critical workloads.\n\n### Mojo Accelerants\n\n`DeviceContext.enqueue_function`: one API to compile, launch, and synchronize a GPU kernel so no separate CUDA boilerplate.\n\n@compiler.register + ops.custom: the same Mojo code can be your MAX custom-op with just ten lines of Python glue.\n\nInlineArray \u0026 UnsafePointer abstractions let you pass host buffers or device buffers with minimal fuss.\n\nCross-arch portability: the exact same Mojo source runs on NVIDIA, Apple Metal (once supported), or CPU fallbacks—no #ifdefs.\n\n### Some Considerations\n\nRecent changes to Mojo - such as the removal of the let keyword, updates to pointer syntax, and the relocation of FFT helpers out of the standard library  - initially caused build failures. Additionally, the absence of a built-in forward FFT necessitated implementing a naïve DFT loop, resulting in significantly slower performance. Identifying the correct buffer-access method (InlineArray.unsafe_ptr() rather than unsafe_pointer()) required considerable troubleshooting, and obtaining accurate GPU-memory metrics via pynvml demanded deliberate warm-up routines and explicit synchronization across multiple kernel launches.\n\n###  Future Work\n\nNext steps are to swap out the slow, CPU-only O(N²) DFT for a blazing-fast cuFFT (or even roll a RFFT wrapper), shift every calculation from Float64 down to Float32 to cut memory traffic in half and align with standard model precision, and finally package the whole thing as a MAX Graph custom op so folks can drop it straight into their production pipelines with zero fuss.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvrtnis%2Fwhisper-mel-mojo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvrtnis%2Fwhisper-mel-mojo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvrtnis%2Fwhisper-mel-mojo/lists"}