{"id":47968129,"url":"https://github.com/5queezer/better-echo","last_synced_at":"2026-04-04T10:39:59.635Z","repository":{"id":345030782,"uuid":"1183782876","full_name":"5queezer/better-echo","owner":"5queezer","description":"CLI tool for enhanced audio transcription with language detection and speaker voting","archived":false,"fork":false,"pushed_at":"2026-03-17T12:30:00.000Z","size":404,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-04T10:39:58.995Z","etag":null,"topics":["audio","cli","nlp","python","transcription"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/5queezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-17T00:18:51.000Z","updated_at":"2026-03-18T06:52:42.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/5queezer/better-echo","commit_stats":null,"previous_names":["5queezer/better-echo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/5queezer/better-echo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5queezer%2Fbetter-echo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5queezer%2Fbetter-echo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5queezer%2Fbetter-echo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5queezer%2Fbetter-echo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/5queezer","download_url":"https://codeload.github.com/5queezer/better-echo/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/5queezer%2Fbetter-echo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31397055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","cli","nlp","python","transcription"],"created_at":"2026-04-04T10:39:59.561Z","updated_at":"2026-04-04T10:39:59.630Z","avatar_url":"https://github.com/5queezer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# better-echo\n\n**Real-time speech-to-text that actually gets your words right.** Powered by Whisper and a local LLM, better-echo fixes domain-specific terminology, punctuation, and misheard words — all running on your machine, with zero data leaving your network.\n\n## Why better-echo?\n\nStandard speech-to-text tools stumble on technical jargon — \"Kubernetes\" becomes \"Cooper Netties,\" \"Celery\" becomes \"salary.\" better-echo solves this by piping Whisper's output through a local [Ollama](https://ollama.com) model that knows your vocabulary. You define your domain terms once, and every transcript comes back clean.\n\n**Key features:**\n- **Local \u0026 private** — all processing on your hardware, no cloud APIs, no telemetry\n- **Domain-aware correction** — teach it your terminology via a simple `terms.txt` file\n- **Speaker diarization** — identify who said what (via [pyannote](https://github.com/pyannote/pyannote-audio) + [diart](https://github.com/juanmc2005/diart))\n- **Auto language detection** — detects and tracks language per speaker with voting stabilization\n- **Transcript export** — save as human-readable text, structured JSONL, or both\n- **Cross-platform** — Linux (NVIDIA CUDA) and macOS (Apple Silicon MPS)\n\n## Quick start\n\n```bash\n# 1. Clone and install (Python 3.13, uv)\ngit clone https://github.com/5queezer/better-echo.git\ncd better-echo\nuv sync\n\n# 2. Start Ollama with a correction model\nollama pull llama3.2 \u0026\u0026 ollama serve\n\n# 3. Run\nuv run python main.py\n```\n\nOpen the printed URL in your browser, allow microphone access, and start speaking. That's it.\n\n## Setup details\n\n### Environment\n\n```bash\ncp .env.example .env   # then fill in your values\n```\n\n### Platform notes\n\n- **Linux (NVIDIA GPU)** — `uv sync` installs CUDA 12.9 binaries automatically. Requires NVIDIA drivers.\n- **macOS (Apple Silicon)** — uses PyTorch MPS backend automatically. For real-time performance, use a smaller model (`--model-size base` or `small`).\n- **macOS (Intel)** — CPU only, no GPU acceleration.\n\n### Speaker diarization (optional)\n\nRequires a HuggingFace token with access to gated pyannote models.\n\n1. Set `HF_TOKEN` in `.env`\n2. Accept the model licenses:\n   - https://huggingface.co/pyannote/segmentation-3.0\n   - https://huggingface.co/pyannote/embedding\n\n## Usage\n\n```bash\n# Basic\nuv run python main.py\n\n# With speaker diarization\nuv run python main.py --diarization --diarization-backend diart\n\n# Common options (passed through to whisperlivekit)\nuv run python main.py --model-size large-v3 --language de\n```\n\n### Makefile targets\n\n```bash\nmake serve              # Run without diarization\nmake serve-diart        # Run with diart speaker diarization\nmake process            # Reprocess a raw transcript with a different model\n```\n\n### Reprocessing transcripts\n\nRaw transcripts are always saved as JSONL. You can reprocess them later with a different LLM model:\n\n```bash\nuv run python process.py\n```\n\nThis launches an interactive CLI that lets you pick a raw transcript, choose an Ollama model, and output corrected text or JSON.\n\n## Configuration\n\n| Variable | Default | Description |\n|---|---|---|\n| `HF_TOKEN` | — | HuggingFace token (required for diarization) |\n| `OLLAMA_URL` | `http://localhost:11434` | Ollama API endpoint |\n| `OLLAMA_MODEL` | `llama3.2` | Model used for transcription correction |\n| `TRANSCRIPT_FORMAT` | `none` | Save transcripts: `text`, `json`, `both`, or `none` |\n| `LOG_LEVEL` | `INFO` | Logging verbosity (`DEBUG`, `INFO`, `WARNING`, etc.) |\n| `ALLOWED_LANGUAGES` | — | Comma-separated language codes to restrict auto-detection (e.g. `en,de,fr`) |\n\n### Domain vocabulary\n\nEdit `terms.txt` to add domain-specific terms the LLM should know about:\n\n```\nKubernetes\nCelery: Python distributed task queue\nFastAPI\npyannote\n```\n\n### Transcript saving\n\nSet `TRANSCRIPT_FORMAT` to continuously save transcripts. Each session creates timestamped files (e.g. `transcript_2026-03-17_01-54-30`).\n\n- **`text`** — human-readable with timestamps and speaker labels:\n  ```\n  [0:01:23.45 - 0:01:25.67] Speaker 1: Hello, how are you?\n  ```\n- **`json`** — JSONL with both raw Whisper output and corrected text:\n  ```json\n  {\"start\": 83.45, \"end\": 85.67, \"speaker\": \"1\", \"raw\": \"hello how are you\", \"corrected\": \"Hello, how are you?\"}\n  ```\n- **`both`** — saves `.txt` and `.jsonl` side by side\n\n## How it works\n\n```\nMicrophone → Whisper (speech-to-text) → Ollama LLM (correction) → Browser UI\n                                    ↗\n                          terms.txt (domain vocabulary)\n```\n\nbetter-echo wraps [whisperlivekit](https://github.com/QuentinFuworworklP/whisperlivekit) and streams audio from your browser over a WebSocket. Finalized transcript lines are sent to a local Ollama model that corrects terminology and punctuation using your custom term list. Results stream back to the browser in real time.\n\n## Privacy\n\nNo telemetry, analytics, or tracking. All audio and text processing happens locally. The only network call is to your own Ollama instance for transcription correction.\n\n## License\n\n[MIT](LICENSE) — use it however you want.\n\n## Changelog\n\n### 2026-03-17\n- Revamp README with hook, feature highlights, quick-start guide, and architecture overview (#11)\n- Improved language auto-detection with voting stabilization and per-speaker tracking (#9)\n- Updated uv.lock to include all declared dependencies\n\n### 2026-03-16\n- Auto-install mlx-whisper on macOS for Apple Silicon performance (#8)\n- Add raw transcript storage and reprocessing CLI (#7)\n- Fix TypeError in audio processor diarization: coerce string speakers to int (#6)\n- Fix DiartDiarization kwarg mismatch in whisperlivekit 0.2.20 (#5)\n\n### 2026-03-15\n- Add privacy section to README (#4)\n- Add MIT license (#3)\n- Add Makefile with serve and serve-diart targets (#2)\n- Add macOS (Apple Silicon) compatibility (#1)\n- Transcript writer (initial)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F5queezer%2Fbetter-echo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F5queezer%2Fbetter-echo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F5queezer%2Fbetter-echo/lists"}