{"id":50816158,"url":"https://github.com/wuwangzhang1216/ora","last_synced_at":"2026-06-13T09:34:03.747Z","repository":{"id":351405227,"uuid":"1210686658","full_name":"wuwangzhang1216/ora","owner":"wuwangzhang1216","description":"Real-time on-device speech translation for macOS. Silero VAD + Qwen3-ASR-1.7B + Qwen3.5 (MLX) on Apple Silicon. No cloud, no API keys, no telemetry.","archived":false,"fork":false,"pushed_at":"2026-05-07T00:20:58.000Z","size":8639,"stargazers_count":28,"open_issues_count":1,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-07T02:29:01.630Z","etag":null,"topics":["apple-silicon","local-first","macos","menu-bar-app","mlx","ollama","on-device","privacy","qwen","real-time","silero-vad","speech-to-text","speech-translation","swift","translator"],"latest_commit_sha":null,"homepage":"https://main.d1cdtuylr567og.amplifyapp.com/","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wuwangzhang1216.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-14T16:54:37.000Z","updated_at":"2026-05-07T00:21:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/wuwangzhang1216/ora","commit_stats":null,"previous_names":["wuwangzhang1216/ora"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/wuwangzhang1216/ora","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuwangzhang1216%2Fora","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuwangzhang1216%2Fora/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuwangzhang1216%2Fora/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuwangzhang1216%2Fora/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wuwangzhang1216","download_url":"https://codeload.github.com/wuwangzhang1216/ora/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wuwangzhang1216%2Fora/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34279898,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","local-first","macos","menu-bar-app","mlx","ollama","on-device","privacy","qwen","real-time","silero-vad","speech-to-text","speech-translation","swift","translator"],"created_at":"2026-06-13T09:34:03.683Z","updated_at":"2026-06-13T09:34:03.739Z","avatar_url":"https://github.com/wuwangzhang1216.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/app-icon.png\" width=\"160\" alt=\"Ora app icon\"/\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eOra\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eReal-time local speech translation for macOS.\u003c/strong\u003e\u003cbr/\u003e\n  Everything runs on your Mac — no cloud, no API keys, no data ever leaves the device.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/wuwangzhang1216/ora/releases/latest\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/v/release/wuwangzhang1216/ora?label=download\u0026color=0a78be\" alt=\"Latest release\"/\u003e\n  \u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/macOS-15%2B-0a78be\" alt=\"macOS 15+\"/\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Apple%20Silicon-required-0a78be\" alt=\"Apple Silicon\"/\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-0a78be\" alt=\"MIT\"/\u003e\n\u003c/p\u003e\n\n---\n\n## What is Ora?\n\nOra listens to your microphone and streams live translations of what you say into a floating caption window, using on-device MLX models for both speech recognition and translation. It's designed as a small, focused menu-bar app — click once, talk, read.\n\n- 🎙 **Native real-time**: on-device voice activity detection, speech recognition, and translation, all on the Metal GPU\n- 🔒 **100% local**: no network calls after the one-time model download, no API keys, no telemetry\n- ⚡️ **Low latency**: sub-second caption updates while you're still speaking\n- 🪟 **Minimal UI**: menu bar icon + a single floating caption card, configurable-shortcut driven\n- 🌍 **Multilingual**: translate between Chinese, English, Japanese, Korean, French, German, Spanish, and more\n- 🎚 **Tunable**: preferences for target language, quality tier, global hotkey, VAD sensitivity, end-of-speech window\n\n## Screenshots\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/caption-window.png\" width=\"640\" alt=\"Live caption window with Chinese source and English translation\"/\u003e\n  \u003cbr/\u003e\n  \u003cem\u003eFloating caption card — source text above, large translation below, live status indicator + target-language chip.\u003c/em\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/preferences.png\" width=\"420\" alt=\"Preferences window\"/\u003e\n  \u003cbr/\u003e\n  \u003cem\u003ePreferences — target language, quality tier, ASR source hint, VAD sensitivity + end-of-speech window, hotkey.\u003c/em\u003e\n\u003c/p\u003e\n\n## Download\n\nGrab the signed and notarized `Ora.dmg` from the [latest release](https://github.com/wuwangzhang1216/ora/releases/latest), double-click to mount, drag **Ora.app** to **Applications**, launch.\n\n- Requires **macOS 15 (Sequoia) or later** and an **Apple Silicon** Mac (M1/M2/M3/M4)\n- ~1.2 GB of model weights download on first launch\n- First launch prompts for microphone access — required for speech capture\n\n### What's new in 0.6.2\n\n- Fixed a freeze during long listening sessions where captions stopped updating and old translations repeated — the audio/VAD pipeline could fall ever further behind real time when ASR + translation were slower than incoming speech\n- Audio and VAD-event buffers are now bounded, so the pipeline sheds the oldest backlog instead of accumulating it without limit (applies to both the macOS app and the Python reference CLI)\n\n### What's new in 0.6.1\n\n- Experimental Rapid-MLX backend for lower-latency local translation in the macOS app\n- Preferences toggle for MLX Swift vs Rapid-MLX, with configurable local server URL and model name\n- Rapid-MLX benchmark tooling and documented setup for the Python reference CLI\n\n### What's new in 0.6.0\n\n- New Transcript History window for browsing past sessions inside Ora\n- Search, copy, refresh, and export transcript sessions without opening JSONL files manually\n- Preferences reorganized into General, Captions, Advanced, and History tabs\n- Faster access to transcript history from the menu bar and caption hover controls\n\n### What's new in 0.5.1\n\n- Configurable macOS Start / Stop Listening hotkey in Preferences\n- New default global shortcut: ⌥Space, avoiding Chrome / Brave's reopen-closed-tab shortcut\n- Legacy ⌘⇧T remains available as an opt-in shortcut\n\n### What's new in 0.5.0\n\n- Native macOS caption layouts: Bilingual, Translation Only, and Compact\n- Room-aware VAD presets: Quiet Room, Meeting, Noisy Room, and Custom\n- Faster daily workflow controls: copy current / last translation and export transcript history\n- CLI setup upgrades: preflight checks, microphone selection, demo mode, room presets, and Markdown / TXT / JSONL / SRT session export\n\n## Usage\n\n1. Click the echo-ring icon in the menu bar.\n2. Choose **Start Listening** (or press ⌥Space from anywhere).\n3. Speak. The floating caption window appears automatically.\n4. Hover the caption window to reveal ⏸ / ⧉ / ✕ controls (pause, copy translation, hide).\n5. Press ⌘, for Preferences — change target language, quality tier, global hotkey, caption layout, VAD room preset, and caption size.\n\n### Keyboard shortcuts\n\n| Shortcut | Action |\n|----------|--------|\n| ⌥Space | Start / stop listening (global, configurable in Preferences) |\n| ⌘⇧H | Show / hide caption window |\n| ⌘, | Preferences |\n| ⌘Q | Quit Ora |\n\n### macOS UX controls\n\nThe native macOS app includes the same daily-use tuning as the reference CLI:\n\n- **Caption layout**: Bilingual, Translation Only, or Compact for screen sharing\n- **Configurable hotkey**: change Start / Stop Listening from the default ⌥Space, including a legacy ⌘⇧T option\n- **Room presets**: Quiet Room, Meeting, Noisy Room, or Custom VAD settings\n- **Fast copy**: copy the current or last translation from the menu bar, or from the caption card hover controls\n- **Transcript history**: export the current session or all history as TXT, SRT, JSON, or Markdown\n\n### Quality tiers\n\n| Tier | Download | Best for |\n|------|----------|----------|\n| **Standard** (default) | ~1.2 GB | Casual conversation, news, video |\n| **High** | ~3 GB | Nuanced content, technical terms |\n| **Extra High** | ~6 GB | Literary content, specialized terminology |\n\nSwitch at any time from the menu bar → **Quality**. Higher tiers are more accurate but slower and use more memory; the weights download automatically on first use.\n\n### Experimental Rapid-MLX backend\n\nOra's macOS app defaults to in-process MLX Swift translation. For latency experiments, Preferences → **General** → **LLM Backend** can switch the app to a local Rapid-MLX server.\n\n```bash\nuv pip install --python .venv/bin/python rapid-mlx\n.venv/bin/rapid-mlx serve qwen3.5-4b \\\n  --served-model-name default \\\n  --host 127.0.0.1 \\\n  --port 8000 \\\n  --no-thinking \\\n  --pin-system-prompt \\\n  --stream-interval 1\n```\n\nThen choose **Rapid-MLX** in Preferences, keep the URL as `http://127.0.0.1:8000/v1`, and click **Reconnect Translator**. This is opt-in; packaged releases still work offline with MLX Swift and do not manage the Rapid-MLX process.\n\n## How it works\n\n```\n┌──────────┐    ┌───────────┐    ┌──────────────┐    ┌────────────────┐\n│  Mic     │───▶│   VAD     │───▶│     ASR      │───▶│  Translator    │\n│          │    │ endpoint  │    │ on-device    │    │   on-device    │\n│          │    │ detection │    │  Metal GPU   │    │   Metal GPU    │\n└──────────┘    └───────────┘    └──────────────┘    └────────────────┘\n     │                                                        │\n     └── AVAudioEngine ──────────────────────▶ SwiftUI Caption Card\n```\n\nFour stages run entirely on the Metal GPU via [MLX Swift](https://github.com/ml-explore/mlx-swift) — no Python, no Ollama, no external server. Partial results stream back to the caption card while you're still speaking; the final translation is committed once a short silence is detected.\n\n\u003e The native Swift source for the Ora macOS app lives in [`macos/Ora`](macos/Ora).\n\u003e The Python reference implementation below mirrors the same architecture with\n\u003e open dependencies for fast experimentation and terminal-first testing.\n\n## Python CLI (open-source reference implementation)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/screenshots/cli.png\" width=\"720\" alt=\"Python CLI running in terminal with live VAD meter\"/\u003e\n  \u003cbr/\u003e\n  \u003cem\u003eLive rich-terminal UI — status bar, per-utterance source + translation, scrolling history, and a real-time VAD probability meter.\u003c/em\u003e\n\u003c/p\u003e\n\nA Python implementation lives in [`main.py`](main.py) — the same architecture as the Ora macOS app, built on top of `mls` (an MLX model serving daemon) for ASR and a local LLM server for translation. Ollama remains the default backend; Rapid-MLX is available as an experimental low-latency backend. It's useful for:\n\n- Running on macOS versions that don't meet the Ora app's 15.0 requirement\n- Reading / forking a fully open-source implementation of the same pipeline\n- Iterating on prompts or VAD settings without rebuilding anything\n- Watching a live VAD-level meter in a rich terminal UI\n\nThe CLI mirrors the Ora app's endpointing and partial-commit cadence, and exposes the same Standard / High / Extra High quality tiers via `--quality`.\n\n```bash\n# One-shot install (creates .venv, pulls translator models, clones the ASR server, preloads weights)\n./setup.sh\n\n# Start ASR + translator server + CLI\n./run.sh --target English --asr-lang zh\n\n# Bump translation quality\n./run.sh --quality high\n./run.sh --quality extra-high\n```\n\n### Rapid-MLX backend experiment\n\nThe CLI can also talk to a local [Rapid-MLX](https://github.com/raullenchai/Rapid-MLX) OpenAI-compatible server. This keeps translation local while lowering LLM request latency in short real-time caption workloads.\n\n```bash\n# Install the optional server into the project venv\nuv pip install --python .venv/bin/python rapid-mlx\n\n# Start Rapid-MLX in another terminal\n.venv/bin/rapid-mlx serve qwen3.5-4b \\\n  --served-model-name default \\\n  --host 127.0.0.1 \\\n  --port 8000 \\\n  --no-thinking \\\n  --pin-system-prompt \\\n  --stream-interval 1\n\n# Run the CLI against Rapid-MLX\n.venv/bin/python main.py --llm-backend rapid-mlx\n```\n\nBenchmark command:\n\n```bash\n.venv/bin/python tools/benchmark_llm_backends.py \\\n  --backend rapid-mlx \\\n  --runs 5 \\\n  --warmup 2 \\\n  --jsonl benchmark-results/rapid-mlx-qwen35-4b.jsonl\n```\n\nLocal Qwen3.5 4B test results from 30 short translation requests:\n\n| Backend | Success | TTFT median | TTFT p95 | Total median | Total p95 |\n|---------|---------|-------------|----------|--------------|-----------|\n| Rapid-MLX | 30/30 | 111 ms | 123 ms | 202 ms | 227 ms |\n| Ollama | 30/30 | 224 ms | 257 ms | 428 ms | 487 ms |\n\n### CLI UX tools\n\nThe reference CLI includes a few daily-use affordances that make it easier to set up, tune, and review a session:\n\n```bash\n# Run the terminal UI without mic / mls / Ollama, useful for a quick visual check\npython main.py --demo --save-session\n\n# Inspect microphones, then pick one by id or name\npython main.py --list-devices\n./run.sh --device \"MacBook Pro Microphone\"\n\n# Tune endpointing for the room\n./run.sh --preset quiet\n./run.sh --preset meeting\n./run.sh --preset noisy\n\n# Save finalized bilingual captions\n./run.sh --save-session --output-format markdown\n./run.sh --save-session --output-format txt\n./run.sh --save-session --output-format jsonl\n./run.sh --save-session --output-format srt\n```\n\nOn normal runs, Ora now performs a preflight readiness check before opening the mic: microphone availability, `mls`, the selected LLM backend, and the selected translator model. Use `--skip-preflight` only when you intentionally want the old direct-start behavior.\n\nSee [setup.sh](setup.sh) and [run.sh](run.sh) for the full dependency chain.\n\n## Privacy\n\nOra doesn't phone home. The only network traffic is the initial HuggingFace model download, after which the app runs fully offline. No telemetry, no crash reporting, no analytics. Microphone audio never leaves your machine.\n\n## License\n\nMIT.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwuwangzhang1216%2Fora","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwuwangzhang1216%2Fora","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwuwangzhang1216%2Fora/lists"}