{"id":51124774,"url":"https://github.com/morehardy/echoalign-asr-mlx","last_synced_at":"2026-06-25T06:30:31.168Z","repository":{"id":350682936,"uuid":"1207562914","full_name":"morehardy/echoalign-asr-mlx","owner":"morehardy","description":"Local Apple Silicon CLI for ASR, subtitles, WebVTT/SRT, and timestamp-aligned JSON with MLX + Qwen3","archived":false,"fork":false,"pushed_at":"2026-05-26T03:43:23.000Z","size":968,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-26T05:27:13.635Z","etag":null,"topics":["apple-silicon","asr","automatic-speech-recognition","cli","forced-alignment","local-ai","mlx","python","qwen3","speech-recognition","srt","subtitles","webvtt"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/morehardy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-11T05:05:01.000Z","updated_at":"2026-05-26T03:43:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/morehardy/echoalign-asr-mlx","commit_stats":null,"previous_names":["morehardy/asr","morehardy/echoalign-asr-mlx"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/morehardy/echoalign-asr-mlx","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morehardy%2Fechoalign-asr-mlx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morehardy%2Fechoalign-asr-mlx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morehardy%2Fechoalign-asr-mlx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morehardy%2Fechoalign-asr-mlx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/morehardy","download_url":"https://codeload.github.com/morehardy/echoalign-asr-mlx/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/morehardy%2Fechoalign-asr-mlx/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34763481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-25T02:00:05.521Z","response_time":101,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","asr","automatic-speech-recognition","cli","forced-alignment","local-ai","mlx","python","qwen3","speech-recognition","srt","subtitles","webvtt"],"created_at":"2026-06-25T06:30:29.223Z","updated_at":"2026-06-25T06:30:31.161Z","avatar_url":"https://github.com/morehardy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/morehardy/echoalign-asr-mlx/main/docs/assets/asr-logo.png\" alt=\"echoalign-asr-mlx logo\" width=\"720\"\u003e\n\u003c/p\u003e\n\n# echoalign-asr-mlx\n\n[![CI](https://github.com/morehardy/echoalign-asr-mlx/actions/workflows/ci.yml/badge.svg)](https://github.com/morehardy/echoalign-asr-mlx/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/echoalign-asr-mlx.svg)](https://pypi.org/project/echoalign-asr-mlx/)\n[![Python](https://img.shields.io/pypi/pyversions/echoalign-asr-mlx.svg)](https://pypi.org/project/echoalign-asr-mlx/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n\n`easr` is a local Apple Silicon CLI that turns audio and video files into\nsubtitle files (`.srt`, `.vtt`) and timestamp-aligned JSON.\n\nUse it when you want local speech recognition, forced alignment, readable\nsubtitles, and machine-friendly timing data without running a server.\n\nCurrent scope:\n\n- runtime target: macOS on Apple Silicon\n- backend: MLX with Qwen3 ASR and Qwen3 ForcedAligner\n- output: SRT, WebVTT, and JSON\n- license: MIT\n- not included yet: translation, speaker diarization, Linux/Windows support\n\n## What You Get\n\nFor each supported media file, `easr` writes:\n\n- `\u003cname\u003e.srt` for subtitle players and editors\n- `\u003cname\u003e.vtt` for web video workflows\n- `\u003cname\u003e.json` for downstream tools that need segments, tokens, timestamps,\n  language metadata, and provider metadata\n- `\u003cname\u003e.metrics.json` when `--verbose` is enabled\n\n`easr` accepts files, directories, and glob patterns. Directory scans are\nnon-recursive by default, and recursive processing is opt-in with `--recursive`.\n\n## Requirements\n\n- macOS on Apple Silicon\n- Python `\u003e=3.14,\u003c3.15`\n- `ffmpeg` and `ffprobe` available on `PATH`\n- `uv` if you run from a source checkout\n- network access on first run so the models can be downloaded from Hugging Face\n\nInstall the media tools with Homebrew:\n\n```bash\nbrew install ffmpeg\n```\n\nDefault provider models:\n\n- [`mlx-community/Qwen3-ASR-1.7B-bf16`](https://huggingface.co/mlx-community/Qwen3-ASR-1.7B-bf16)\n- [`mlx-community/Qwen3-ForcedAligner-0.6B-bf16`](https://huggingface.co/mlx-community/Qwen3-ForcedAligner-0.6B-bf16)\n\n## Installation\n\nInstall from PyPI:\n\n```bash\npython3.14 -m pip install \"echoalign-asr-mlx[mlx]\"\neasr --help\n```\n\nRun from a source checkout:\n\n```bash\nuv sync --extra mlx\nuv run --python 3.14 --extra mlx easr --help\n```\n\nIf you use the source checkout flow, prefix examples in this README with:\n\n```bash\nuv run --python 3.14 --extra mlx easr ...\n```\n\n## Quick Start\n\nTranscribe one file:\n\n```bash\neasr ./demo.mp4\n```\n\nWrite outputs to a custom directory:\n\n```bash\neasr ./demo.mp4 --output-dir ./subtitles\n```\n\nProcess a directory:\n\n```bash\neasr ./media\n```\n\nProcess a directory recursively:\n\n```bash\neasr ./media --recursive\n```\n\nProcess a glob pattern:\n\n```bash\neasr \"./media/**/*.mp4\" --recursive\n```\n\nExport token-level subtitle and JSON views:\n\n```bash\neasr ./demo.mp4 --granularity token\n```\n\nShow detailed progress and write metrics:\n\n```bash\neasr ./demo.mp4 --verbose\n```\n\n## Supported Formats\n\nAudio:\n\n- `wav`\n- `mp3`\n- `m4a`\n- `flac`\n- `aac`\n\nVideo:\n\n- `mp4`\n- `mov`\n- `m4v`\n- `mkv`\n- `webm`\n\n## Output Layout\n\nDefault output directory name: `outputs`.\n\nWhen the input is a single file, outputs are written next to that file:\n\n```text\n/project/media/demo.mp4\n/project/media/outputs/demo.srt\n/project/media/outputs/demo.vtt\n/project/media/outputs/demo.json\n```\n\nWhen the input is a directory or the current directory, outputs are written\nunder that input root:\n\n```text\n/project/media/\n  a.mp4\n  nested/b.wav\n\n/project/media/outputs/\n  a.srt\n  a.vtt\n  a.json\n  nested/b.srt\n  nested/b.vtt\n  nested/b.json\n```\n\nUse `--output-dir` to choose another output root.\n\n## JSON Output\n\nThe JSON export keeps the readable transcript and the alignment data used to\ncreate subtitle views.\n\nCommon top-level fields:\n\n- `source_path`\n- `provider_name`\n- `detected_language`\n- `segments`\n- `source_media`\n- `granularity`\n- `items`\n\nEach segment includes text, start/end timestamps, language metadata, optional\nspeaker metadata, and token timing when available. `source_media` includes the\nprepared audio path, VAD metadata, and provider diagnostics such as processing\nstrategy, duration, window counts, quality pass counts, and window diagnostics.\n\n## CLI Options\n\n| Option | Meaning |\n| --- | --- |\n| `inputs` | File, directory, or glob pattern. Defaults to the current directory. |\n| `--recursive` | Recursively scan directory inputs. |\n| `--output-dir PATH` | Override the default output directory root. |\n| `--granularity sentence` | Use segment boundaries for subtitle entries and JSON `items`. This is the default. |\n| `--granularity token` | Use token timing for subtitle entries and JSON `items`. |\n| `--no-vad` | Disable voice activity detection preprocessing. |\n| `--verbose` | Print detailed progress and write `\u003cname\u003e.metrics.json`. |\n| `--version` | Show the installed package version. |\n| `--help` | Show CLI help. |\n\n## VAD Preprocessing\n\nVoice activity detection is enabled by default. `easr` scans the prepared audio,\nfinds likely speech ranges, groups them into padded chunks, and asks the\nprovider to process only those ranges. Final subtitle timestamps remain on the\noriginal media timeline.\n\nDisable VAD when you want full-duration provider processing:\n\n```bash\neasr ./demo.mp4 --no-vad\n```\n\nIf VAD fails, `easr` falls back to full-duration processing. If VAD succeeds and\nfinds no speech, `easr` writes successful empty subtitle outputs.\n\n## Shell Completion\n\nFish shell users can generate or install completions:\n\n```bash\neasr completion fish\neasr completion install fish\n```\n\nThe install command writes:\n\n```text\n~/.config/fish/completions/easr.fish\n```\n\nExisting completion files at that path are overwritten.\n\n## Runtime Behavior\n\n- exit code `0`: all discovered files processed successfully\n- exit code `1`: no supported input was found, environment preflight failed, or\n  at least one file failed in a batch\n- batch files are processed one by one\n- failures are reported per file to stderr\n- other files continue processing after a per-file failure\n- the first run may be slower because model files are downloaded and cached\n\n## Troubleshooting\n\n### Missing `ffmpeg` or `ffprobe`\n\nInstall the media tools and make sure they are visible from the shell running\n`easr`:\n\n```bash\nbrew install ffmpeg\nwhich ffmpeg\nwhich ffprobe\n```\n\n### MLX or Metal preflight failed\n\nCheck that you are on Apple Silicon, using the expected Python environment, and\ninstalled the MLX runtime extra:\n\n```bash\npython3.14 -m pip install \"echoalign-asr-mlx[mlx]\"\n```\n\nFor a source checkout:\n\n```bash\nuv sync --extra mlx\nuv run --python 3.14 --extra mlx easr --help\n```\n\n### First run is slow\n\nThis is expected when the Qwen3 model files are downloaded and the local cache\nis warmed. Later runs should be faster.\n\n## Current Limitations\n\n- Translation is not implemented.\n- Speaker diarization is not implemented.\n- Subtitle segmentation quality depends on model and alignment behavior.\n- The public CLI does not expose provider selection.\n\n## Development and Community\n\n- [Contributing guide](CONTRIBUTING.md)\n- [Development guide](docs/development.md)\n- [Roadmap](ROADMAP.md)\n- [Changelog](CHANGELOG.md)\n- [Security policy](SECURITY.md)\n- [Code of conduct](CODE_OF_CONDUCT.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorehardy%2Fechoalign-asr-mlx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmorehardy%2Fechoalign-asr-mlx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmorehardy%2Fechoalign-asr-mlx/lists"}