{"id":35003790,"url":"https://github.com/steipete/summarize","last_synced_at":"2026-06-13T10:00:50.062Z","repository":{"id":329494833,"uuid":"1118209243","full_name":"steipete/summarize","owner":"steipete","description":"Point at any URL/YouTube/Podcast or file. Get the gist. CLI and Chrome Extension.","archived":false,"fork":false,"pushed_at":"2026-06-13T05:15:08.000Z","size":53364,"stargazers_count":6176,"open_issues_count":2,"forks_count":407,"subscribers_count":19,"default_branch":"main","last_synced_at":"2026-06-13T05:18:14.361Z","etag":null,"topics":["ai","cli","summarize","typescript"],"latest_commit_sha":null,"homepage":"https://summarize.sh","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/steipete.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-12-17T12:19:51.000Z","updated_at":"2026-06-13T05:11:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"95dc33d8-ed36-422d-8b61-d8831810911a","html_url":"https://github.com/steipete/summarize","commit_stats":null,"previous_names":["steipete/summarize"],"tags_count":30,"template":false,"template_full_name":null,"purl":"pkg:github/steipete/summarize","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/steipete","download_url":"https://codeload.github.com/steipete/summarize/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34279898,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","cli","summarize","typescript"],"created_at":"2025-12-27T04:20:20.367Z","updated_at":"2026-06-13T10:00:50.054Z","avatar_url":"https://github.com/steipete.png","language":"TypeScript","funding_links":[],"categories":["Table of Contents","Repos","TypeScript","A01_文本生成_文本对话","🛠️ Developer Tools"],"sub_categories":["Context","大语言对话模型及数据"],"readme":"# Summarize 📝 — Chrome Side Panel + CLI\n\nFast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar.\n\n## Highlights\n\n- Chrome Side Panel **chat** (streaming agent + history) inside the sidebar.\n- **Video slides**: screenshots + OCR + transcript cards for YouTube, direct video URLs, and local video files.\n- Media-aware summaries: auto‑detect video/audio vs page content.\n- Coding CLI backends: Codex, Claude, Gemini, Cursor Agent, OpenClaw, OpenCode.\n- Streaming Markdown + metrics + cache‑aware status.\n- CLI supports URLs, files, podcasts, YouTube, audio/video, PDFs.\n\n## Feature overview\n\n- URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, podcasts, RSS.\n- Slide extraction for video sources (YouTube, direct video URLs, local video files) with OCR + timestamped cards.\n- Transcript-first media flow: published transcripts when available, then Groq/ONNX/whisper.cpp/AssemblyAI/Gemini/OpenAI/FAL transcription fallback when not.\n- Coding CLI providers: Claude, Codex, Gemini, Cursor Agent, OpenClaw, OpenCode, GitHub Copilot, Antigravity, pi.\n- Streaming output with Markdown rendering, metrics, and cache-aware status.\n- Local, paid, and free models: OpenAI‑compatible local endpoints, paid providers, plus an OpenRouter free preset.\n- Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates.\n- Smart default: if content is shorter than the requested length, we return it as-is (use `--force-summary` to override).\n\n## Get the extension (recommended)\n\n![Summarize extension screenshot](docs/assets/summarize-extension.png)\n\nOne‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown.\n\n**Chrome Web Store:** [Summarize Side Panel](https://chromewebstore.google.com/detail/summarize/cejgnmmhbbpdmjnfppjdfkocebngehfg)\n\nYouTube slide screenshots (from the browser):\n\n![Summarize YouTube slide screenshots](docs/assets/youtube-slides.png)\n\n### Beginner quickstart (extension)\n\n1. Install the CLI (choose one):\n   - **npm** (cross‑platform): `npm i -g @steipete/summarize`\n   - **Homebrew** (Homebrew/core): `brew install summarize`\n2. Install the extension (Chrome Web Store link above) and open the Side Panel.\n3. The panel shows a token + install command. Run it in Terminal:\n   - `summarize daemon install --token \u003cTOKEN\u003e`\n\nWhy a daemon/service?\n\n- Browser mode works without the daemon for page summaries, ranged video slides through MediaBunny/WebCodecs, and fetchable YouTube/direct/embedded media transcription with browser-cached Whisper.\n- The optional daemon on `127.0.0.1` is faster and adds native ffmpeg, configurable transcription providers, OCR, and broader media support.\n- The service autostarts (launchd/systemd/Scheduled Task) so the Side Panel is always ready.\n\nIf you only want the **CLI**, you can skip the daemon install entirely.\n\nNotes:\n\n- Summarization only runs when the Side Panel is open.\n- Auto mode summarizes on navigation (incl. SPAs); otherwise use the button.\n- Daemon is localhost-only and requires a shared token; rerunning `summarize daemon install --token \u003cTOKEN\u003e` adds another paired browser token instead of invalidating the old one.\n- Autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task).\n- Windows containers: `summarize daemon install` starts the daemon for the current container session but does not register a Scheduled Task. Run it each time the container starts or add that command to your container startup, and publish port `8787` so the host browser can reach the daemon.\n- Tip: configure `free` via `summarize refresh-free` (needs `OPENROUTER_API_KEY`). Add `--set-default` to set model=`free`.\n\nMore:\n\n- Step-by-step install: [apps/chrome-extension/README.md](apps/chrome-extension/README.md)\n- Architecture + troubleshooting: [docs/chrome-extension.md](docs/chrome-extension.md)\n- Firefox compatibility notes: [apps/chrome-extension/docs/firefox.md](apps/chrome-extension/docs/firefox.md)\n\n### Slides (extension)\n\n- Select **Video + Slides** in the Summarize picker.\n- Slides render at the top; expand to full‑width cards with timestamps.\n- Click a slide to seek the video; toggle **Transcript/OCR** when OCR is significant.\n- Browser mode uses MediaBunny with native WebCodecs and ranged network reads for fetchable videos, then falls back to visible-tab capture when the source or codec is unavailable.\n- Daemon mode adds `yt-dlp`, native ffmpeg, and optional `tesseract` OCR.\n\n### Advanced (unpacked / dev)\n\n1. Build + load the extension (unpacked):\n   - Chrome: `pnpm -C apps/chrome-extension build`\n     - `chrome://extensions` → Developer mode → Load unpacked\n     - Pick: `apps/chrome-extension/.output/chrome-mv3`\n   - Firefox: `pnpm -C apps/chrome-extension build:firefox`\n     - `about:debugging#/runtime/this-firefox` → Load Temporary Add-on\n     - Pick: `apps/chrome-extension/.output/firefox-mv3/manifest.json`\n2. Open Side Panel/Sidebar → copy token.\n3. Install daemon in dev mode:\n   - `pnpm summarize daemon install --token \u003cTOKEN\u003e --dev`\n\n## CLI\n\n![Summarize CLI screenshot](docs/assets/summarize-cli.png)\n\n### Install\n\nRequires Node 24+.\n\n- npx (no install):\n\n```bash\nnpx -y @steipete/summarize \"https://example.com\"\n```\n\n- npm (global):\n\n```bash\nnpm i -g @steipete/summarize\n```\n\n- npm (library / minimal deps):\n\n```bash\nnpm i @steipete/summarize-core\n```\n\n```ts\nimport { createLinkPreviewClient } from \"@steipete/summarize-core/content\";\n```\n\n- Homebrew:\n\n```bash\nbrew install summarize\n```\n\nHomebrew ships from `homebrew/core` via `brew install summarize`.\nIf Homebrew is unavailable in your environment, use the npm global install above.\n\n### Optional local dependencies\n\nInstall these if you want media-heavy features:\n\n- `ffmpeg`: optional native accelerator with broader codec support; bundled WebAssembly is the fallback\n- `yt-dlp`: required for YouTube slide extraction and some remote media flows\n- `tesseract`: optional OCR for `--slides-ocr`\n- Optional cloud transcription providers:\n  - `GROQ_API_KEY`\n  - `ASSEMBLYAI_API_KEY`\n  - `ELEVENLABS_API_KEY` (speaker diarization)\n  - `GEMINI_API_KEY` / `GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY`\n  - `OPENAI_API_KEY`\n  - `FAL_KEY`\n\nmacOS (Homebrew):\n\n```bash\nbrew install ffmpeg yt-dlp\nbrew install tesseract # optional, for --slides-ocr\n```\n\nIf native `ffmpeg`/`ffprobe` are unavailable, Summarize uses the bundled WebAssembly build. Native ffmpeg remains recommended for speed and broader codec/filter support.\n\n### CLI vs extension\n\n- **CLI only:** just install via npm/Homebrew and run `summarize ...` (no daemon needed).\n- **Chrome extension:** Browser mode works without the CLI or daemon; install the daemon for faster and broader media support.\n- **Firefox extension:** install the CLI and daemon for media extraction.\n\n### Quickstart\n\n```bash\nsummarize \"https://example.com\"\n```\n\nInspect the effective model setup. Status only lists configured or usable providers; it never prints\nkeys or missing-provider noise.\n\n```bash\nsummarize status\nsummarize status --verbose\nsummarize status --probe\nsummarize status --json\n```\n\n`--probe` checks supported model-list endpoints without running paid inference. CLI providers are\nreported as available when their enabled executable is present; API providers are reported as\nconfigured when an effective key is present.\n\n### Inputs\n\nURLs or local paths:\n\n```bash\nsummarize \"/path/to/file.pdf\" --model google/gemini-3-flash\nsummarize \"https://example.com/report.pdf\" --model google/gemini-3-flash\nsummarize \"/path/to/audio.mp3\"\nsummarize \"/path/to/video.mp4\"\n```\n\nStdin (pipe content using `-`):\n\n```bash\necho \"content\" | summarize -\npbpaste | summarize -\n# binary stdin also works (PDF/image/audio/video bytes)\ncat /path/to/file.pdf | summarize -\n```\n\n**Notes:**\n\n- Stdin has a 50MB size limit\n- The `-` argument tells summarize to read from standard input\n- Text stdin is treated as UTF-8 text (whitespace-only input is rejected as empty)\n- Binary stdin is preserved as raw bytes and file type is auto-detected when possible\n- Useful for piping clipboard content or command output\n\nYouTube (supports `youtube.com` and `youtu.be`):\n\n```bash\nsummarize \"https://youtu.be/dQw4w9WgXcQ\" --youtube auto\n```\n\nPodcast RSS (transcribes latest enclosure):\n\n```bash\nsummarize \"https://feeds.npr.org/500005/podcast.xml\"\n```\n\nApple Podcasts episode page:\n\n```bash\nsummarize \"https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432\"\n```\n\nSpotify episode page (best-effort; may fail for exclusives):\n\n```bash\nsummarize \"https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY\"\n```\n\nHLS playlist:\n\n```bash\nsummarize \"https://example.com/master.m3u8\"\n```\n\n### Output length\n\n`--length` controls how much output we ask for (guideline), not a hard cap.\nThe built-in default is `long`.\n\nSet a default in `~/.summarize/config.json` with `output.length`.\n\n```bash\nsummarize \"https://example.com\" --length long\nsummarize \"https://example.com\" --length 20k\n```\n\n- Presets: `short|medium|long|xl|xxl`\n- Character targets: `1500`, `20k`, `20000`\n- Optional hard cap: `--max-output-tokens \u003ccount\u003e` (e.g. `2000`, `2k`)\n  - Provider/model APIs still enforce their own maximum output limits.\n  - If omitted, no max token parameter is sent (provider default).\n  - Prefer `--length` unless you need a hard cap.\n- Short content: when extracted content is shorter than the requested length, the CLI returns the content as-is.\n  - Override with `--force-summary` to always run the LLM.\n- Minimums: `--length` numeric values must be \u003e= 10 chars; `--max-output-tokens` must be \u003e= 16.\n- Preset targets (source of truth: `packages/core/src/prompts/summary-lengths.ts`):\n  - short: target ~900 chars (range 600-1,200)\n  - medium: target ~1,800 chars (range 1,200-2,500)\n  - long: target ~4,200 chars (range 2,500-6,000)\n  - xl: target ~9,000 chars (range 6,000-14,000)\n  - xxl: target ~17,000 chars (range 14,000-22,000)\n\n### What file types work?\n\nBest effort and provider-dependent. These usually work well:\n\n- `text/*` and common structured text (`.txt`, `.md`, `.json`, `.yaml`, `.xml`, ...)\n  - Text-like files are inlined into the prompt for better provider compatibility.\n- PDFs: `application/pdf` (provider support varies; Google is the most reliable here)\n- Images: `image/jpeg`, `image/png`, `image/webp`, `image/gif`\n- Audio/Video: `audio/*`, `video/*` (local audio/video files MP3/WAV/M4A/OGG/FLAC/MP4/MOV/WEBM automatically transcribed, when supported by the model)\n\nNotes:\n\n- If a provider rejects a media type, the CLI fails fast with a friendly message.\n- xAI models do not support attaching generic files (like PDFs) via the AI SDK; use Google/OpenAI/Anthropic for those.\n\n### Model ids\n\nUse gateway-style ids: `\u003cprovider\u003e/\u003cmodel\u003e`.\n\nExamples:\n\n- `openai/gpt-5.4`\n- `openai/gpt-5.4-mini`\n- `openai/gpt-5.4-nano`\n- `openai/gpt-5-mini`\n- `openai/gpt-5-nano`\n- `github-copilot/gpt-5.4`\n- `anthropic/claude-sonnet-4-5`\n- `xai/grok-4-fast-non-reasoning`\n- `google/gemini-3-flash`\n- `zai/glm-4.7`\n- `minimax/MiniMax-M3`\n- `openrouter/openai/gpt-5-mini` (force OpenRouter)\n\nNote: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).\n`gpt-5.4-mini` and `gpt-5.4-nano` are treated as real model ids; the same shorthand also works under `github-copilot/...`.\n\n### OpenAI fast mode and thinking\n\nFast mode is a request option, not a model id:\n\n```bash\nsummarize \"https://example.com\" --model openai/gpt-5.5 --fast --thinking medium\nsummarize \"https://example.com\" --model openai/gpt-5.4 --service-tier fast --thinking low\n```\n\n- `--fast` is shorthand for `--service-tier fast`.\n- `--service-tier default|fast|priority|flex` controls OpenAI service tier. `fast` is the summarize/Codex-facing spelling and is sent to OpenAI as `service_tier=\"priority\"`.\n- `--thinking none|low|medium|high|xhigh` controls OpenAI reasoning effort. Aliases: `off` → `none`, `min` → `low`, `mid` / `med` → `medium`, `x-high` / `extra-high` → `xhigh`.\n- `--service-tier default` clears a configured tier for one run.\n\nConfig equivalent:\n\n```json\n{\n  \"model\": \"openai/gpt-5.5\",\n  \"openai\": {\n    \"serviceTier\": \"fast\",\n    \"thinking\": \"medium\"\n  }\n}\n```\n\nCompatibility aliases still work, but prefer the explicit flags above:\n\n- `--model gpt-fast` / `--model fast` → `openai/gpt-5.5` + fast tier + medium thinking\n- `--model openai/gpt-5.5-fast` → `openai/gpt-5.5` + fast tier\n\n### Limits\n\n- Text inputs over 10 MB are rejected before tokenization.\n- Text prompts are preflighted against the model input limit (LiteLLM catalog), using a GPT tokenizer.\n\n### Common flags\n\n```bash\nsummarize \u003cinput\u003e [flags]\n```\n\nUse `summarize --help` or `summarize help` for the full help text.\n\n- `--model \u003cprovider/model\u003e`: which model to use (defaults to `auto`)\n- `--model auto`: automatic model selection + fallback (default)\n- `--model \u003cname\u003e`: use a built-in or config-defined preset (see Configuration)\n- `--timeout \u003cduration\u003e`: `30s`, `2m`, `5000ms` (default `2m`)\n- `--retries \u003ccount\u003e`: LLM retry attempts on timeout (default `1`)\n- `--length short|medium|long|xl|xxl|s|m|l|\u003cchars\u003e`\n- `--language, --lang \u003clanguage\u003e`: output language (`auto` = match source)\n- `--max-output-tokens \u003ccount\u003e`: hard cap for LLM output tokens\n- `--cli [provider]`: use a CLI provider (`--model cli/\u003cprovider\u003e`). Supports `claude`, `gemini`, `codex`, `agent`, `openclaw`, `opencode`, `copilot`, `agy`, `pi`. If omitted, uses auto selection with CLI enabled.\n- `--stream auto|on|off`: stream LLM output (`auto` = TTY only; disabled in `--json` mode)\n- `--plain`: keep raw output (no ANSI/OSC Markdown rendering)\n- `--no-color`: disable ANSI colors\n- `--theme \u003cname\u003e`: CLI theme (`aurora`, `ember`, `moss`, `mono`)\n- `--format md|text`: website/file content format (default `text`)\n- `--markdown-mode off|auto|llm|readability`: HTML -\u003e Markdown mode (default `readability`)\n- `--preprocess off|auto|always`: controls `uvx markitdown` usage (default `auto`)\n  - Install `uvx`: `brew install uv` (or https://astral.sh/uv/)\n  - Image-only PDFs can fall back to OpenAI vision OCR when `OPENAI_API_KEY` is set; override the OCR model with `MARKITDOWN_OCR_MODEL` or page render DPI with `MARKITDOWN_OCR_DPI`.\n- `--extract`: print extracted content and exit (URLs only; stdin `-` is not supported)\n  - Deprecated alias: `--extract-only`\n- `--slides`: extract slides for YouTube, direct video URLs, or local video files and render them inline in the summary narrative (auto-renders inline in supported terminals)\n- `--slides-ocr`: run OCR on extracted slides (requires `tesseract`)\n- `--slides-dir \u003cdir\u003e`: base output dir for slide images (default `./slides`)\n- `--slides-scene-threshold \u003cvalue\u003e`: scene detection threshold (0.1-1.0)\n- `--slides-max \u003ccount\u003e`: maximum slides to extract (default `6`)\n- `--slides-min-duration \u003cseconds\u003e`: minimum seconds between slides\n- `--json`: machine-readable output with diagnostics, prompt, `metrics`, and optional summary\n- `--verbose`: debug/diagnostics on stderr\n- `--metrics off|on|detailed`: metrics output (default `on`)\n\n### Coding CLIs (Codex, Claude, Gemini, Agent, OpenClaw, OpenCode, Copilot, Antigravity, pi)\n\nSummarize can use common coding CLIs as local model backends:\n\n- `codex` -\u003e `--cli codex` / `--model cli/codex/\u003cmodel\u003e`\n- `claude` -\u003e `--cli claude` / `--model cli/claude/\u003cmodel\u003e`\n- `gemini` -\u003e `--cli gemini` / `--model cli/gemini/\u003cmodel\u003e`\n- `agent` (Cursor Agent CLI) -\u003e `--cli agent` / `--model cli/agent/\u003cmodel\u003e`\n- `openclaw` -\u003e `--cli openclaw` / `--model cli/openclaw/\u003cmodel\u003e` or `--model openclaw/\u003cmodel\u003e`\n- `opencode` -\u003e `--cli opencode` / `--model cli/opencode/\u003cmodel\u003e` (`--model cli/opencode` uses the OpenCode runtime default)\n- `agy` (Antigravity CLI) -\u003e `--cli agy` / `--model cli/agy` (uses agy's active session model; per-call model selection is not supported by agy print mode)\n- `pi` (Pi Coding Agent) -\u003e `--cli pi` / `--model cli/pi` or `--model cli/pi/\u003cmodel\u003e`\n\nBuilt-in preset:\n\n- `--model codex-fast` runs Codex with GPT-5.5 Fast mode and requires `codex login`.\n\nRequirements:\n\n- Binary installed and on `PATH` (or set `CODEX_PATH`, `CLAUDE_PATH`, `GEMINI_PATH`, `AGENT_PATH`, `OPENCLAW_PATH`, `OPENCODE_PATH`, `AGY_PATH`, `PI_PATH`)\n- Provider authenticated (`codex login`, `claude auth`, `gemini` login flow, `agent login` or `CURSOR_API_KEY`, `opencode auth login`, `agy` login flow or `ANTIGRAVITY_API_KEY`, `pi` uses configured provider API keys)\n\nQuick smoke test:\n\n```bash\nprintf \"Summarize CLI smoke input.\\nOne short paragraph. Reply can be brief.\\n\" \u003e/tmp/summarize-cli-smoke.txt\n\nsummarize --cli codex --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli claude --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli gemini --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli agent --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli openclaw --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli opencode --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli agy --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli pi --plain --timeout 2m /tmp/summarize-cli-smoke.txt\n```\n\nSet explicit CLI allowlist/order:\n\n```json\n{\n  \"cli\": { \"enabled\": [\"codex\", \"claude\", \"gemini\", \"agent\", \"openclaw\", \"opencode\", \"agy\", \"pi\"] }\n}\n```\n\nConfigure implicit auto CLI fallback:\n\n```json\n{\n  \"cli\": {\n    \"autoFallback\": {\n      \"enabled\": true,\n      \"onlyWhenNoApiKeys\": true,\n      \"order\": [\"claude\", \"gemini\", \"codex\", \"agent\", \"openclaw\", \"opencode\"]\n    }\n  }\n}\n```\n\nMore details: [`docs/cli.md`](docs/cli.md)\n\n### Auto model ordering\n\n`--model auto` builds candidate attempts from built-in rules (or your `model.rules` overrides).\nCLI attempts are prepended when:\n\n- `cli.enabled` is set (explicit allowlist/order), or\n- implicit auto selection is active and `cli.autoFallback` is enabled.\n\nDefault fallback behavior: only when no API keys are configured, order `claude, gemini, codex, agent, openclaw, opencode, copilot`, and remember/prioritize last successful provider (`~/.summarize/cli-state.json`). Antigravity and pi are opt-in unless you add them to `cli.autoFallback.order`.\n\nSet explicit CLI attempts:\n\n```json\n{\n  \"cli\": { \"enabled\": [\"gemini\"] }\n}\n```\n\nDisable implicit auto CLI fallback:\n\n```json\n{\n  \"cli\": { \"autoFallback\": { \"enabled\": false } }\n}\n```\n\nNote: explicit `--model auto` does not trigger implicit auto CLI fallback unless `cli.enabled` is set.\n\n### Website extraction (Firecrawl + Markdown)\n\nNon-YouTube URLs go through a fetch -\u003e extract pipeline. When direct fetch/extraction is blocked or too thin,\n`--firecrawl auto` can fall back to Firecrawl (if configured).\n\n- `--firecrawl off|auto|always` (default `auto`)\n- `--extract --format md|text` (default `text`; if `--format` is omitted, `--extract` defaults to `md` for non-YouTube URLs)\n- `--markdown-mode off|auto|llm|readability` (default `readability`)\n  - `auto`: use an LLM converter when configured; may fall back to `uvx markitdown`\n  - `llm`: force LLM conversion (requires a configured model key)\n  - `off`: disable LLM conversion (still may return Firecrawl Markdown when configured)\n- Plain-text mode: use `--format text`.\n\n### YouTube transcripts\n\n`--youtube auto` tries best-effort web transcript endpoints first. When captions are not available, it falls back to:\n\n1. yt-dlp + Whisper (if `yt-dlp` is available): downloads audio, then transcribes with local `whisper.cpp` when installed\n   (preferred), otherwise falls back to Groq (`GROQ_API_KEY`), AssemblyAI (`ASSEMBLYAI_API_KEY`), Gemini\n   (`GEMINI_API_KEY` / Google aliases), OpenAI (`OPENAI_API_KEY`), then FAL (`FAL_KEY`)\n2. Android VR direct audio + the same configured transcription chain when `yt-dlp` is unavailable or fails\n3. Apify (if `APIFY_API_TOKEN` is set): uses a scraping actor (`faVsWy9VTSNVIhWpR`)\n\nEnvironment variables for yt-dlp mode:\n\n- `YT_DLP_PATH` - optional path to yt-dlp binary (otherwise `yt-dlp` is resolved via `PATH`)\n- `SUMMARIZE_WHISPER_CPP_MODEL_PATH` - optional override for the local `whisper.cpp` model file\n- `SUMMARIZE_WHISPER_CPP_BINARY` - optional override for the local binary (default: `whisper-cli`)\n- `SUMMARIZE_DISABLE_LOCAL_WHISPER_CPP=1` - disable local whisper.cpp (force remote)\n- `GROQ_API_KEY` - Groq Whisper transcription\n- `ASSEMBLYAI_API_KEY` - AssemblyAI transcription\n- `GEMINI_API_KEY` - Gemini transcription (`GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY` also work)\n- `OPENAI_API_KEY` - OpenAI Whisper transcription\n- `OPENAI_WHISPER_BASE_URL` - optional OpenAI-compatible Whisper endpoint override\n- `FAL_KEY` - FAL AI Whisper fallback\n\nApify costs money but tends to be more reliable when captions exist.\n\nSpeaker-labelled transcripts for YouTube, local audio/video, and direct media URLs:\n\n```bash\nsummarize \"https://www.youtube.com/watch?v=...\" --extract --diarize\nsummarize \"./interview.mp3\" --extract --diarize\nsummarize \"https://cdn.example.com/interview.mp4\" --extract --diarize openai\nsummarize \"./interview.mp4\" --extract --diarize openai \\\n  --identify-speakers --speaker-at \"0:00=Host\" --speaker-at \"0:12=Guest\"\nsummarize \"https://www.youtube.com/watch?v=...\" --extract --diarize elevenlabs\nsummarize \"https://www.youtube.com/watch?v=...\" --extract --diarize openai --timestamps\nsummarize \"https://www.youtube.com/watch?v=...\" --extract --diarize elevenlabs \\\n  --identify-speakers --speaker-profile my-podcast \\\n  --speaker-at \"0:12=Host Name\" --remember-speakers\n```\n\nBare `--diarize` prefers ElevenLabs Scribe v2 (`ELEVENLABS_API_KEY`) and falls back to OpenAI\n`gpt-4o-transcribe-diarize` (`OPENAI_API_KEY`). Speaker changes are emitted as `Speaker \u003clabel\u003e: ...`;\ncombine with `--timestamps` for `[mm:ss] Speaker \u003clabel\u003e: ...`. Before upload, local video is reduced\nto mono 16 kHz MP3 with native or bundled FFmpeg and the same audio file is reused across provider\nfallbacks. Local audio is passed through unless OpenAI's upload limit requires compression. YouTube\ndiarization downloads audio only. When combined with `--slides`, one yt-dlp invocation downloads\nseparate audio and slide-quality video streams; diarization uploads the audio while slides reuse the\nvideo. Remote direct media uses its normal audio download path.\nYouTube transcript extraction also prints the current public view count and exposes the resolved\nvideo ID and observation timestamp in `extracted.sourceMetrics` in JSON output.\nLong OpenAI recordings are split into bounded chunks; timestamps are reassembled and\nchunk-local provider labels stay distinct so label resets cannot silently merge different voices.\n\n`--identify-speakers` replaces generic labels with names for YouTube and direct media. Repeat `--speaker-at \u003ctimestamp=name\u003e`\nfor authoritative examples; unresolved labels are inferred with OpenAI GPT-5.5 and only accepted above\nthe configured confidence threshold. `--remember-speakers` stores the profile, anchors, and a\ntranscript-hash-guarded mapping in `~/.summarize/config.json` for later runs. See\n[YouTube speaker identification](docs/youtube.md#speaker-identification).\n\n### Slide extraction (YouTube + direct video URLs + local video files)\n\nExtract slide screenshots (scene detection via `ffmpeg`) and optional OCR:\n\nRequirements:\n\n- bundled FFmpeg WebAssembly, or native `ffmpeg` for faster extraction and broader codec support\n- `yt-dlp` for YouTube video download/stream resolution\n- `tesseract` only when using `--slides-ocr`\n\n```bash\nsummarize \"https://www.youtube.com/watch?v=...\" --slides\nsummarize \"https://www.youtube.com/watch?v=...\" --slides --slides-ocr\nsummarize \"/path/to/video.webm\" --slides\n```\n\nOutputs are written under `./slides/\u003csourceId\u003e/` (or `--slides-dir`). OCR results are included in JSON output\n(`--json`) and stored in `slides.json` inside the slide directory. When scene detection is too sparse, the\nextractor also samples at a fixed interval to improve coverage.\nWhen using `--slides`, supported terminals (kitty/iTerm/Konsole) render inline thumbnails automatically inside the\nsummary narrative (the model inserts `[slide:N]` markers). Timestamp links are clickable when the terminal supports\nOSC-8 (YouTube/Vimeo/Loom/Dropbox). If inline images are unsupported, Summarize prints a note with the on-disk\nslide directory. Local video files stay on the slide-aware path, transcribe in place, and avoid fake download labels.\n\nUse `--slides --extract` to print the full timed transcript and insert slide images inline at matching timestamps.\n\nFormat the extracted transcript as Markdown (headings + paragraphs) via an LLM:\n\n```bash\nsummarize \"https://www.youtube.com/watch?v=...\" --extract --format md --markdown-mode llm\n```\n\n### Media transcription (Whisper)\n\nLocal audio/video files are transcribed first, then summarized. `--video-mode transcript` forces\ndirect media URLs (and embedded media) through Whisper first. Prefers local `whisper.cpp` when available; otherwise requires\none of `GROQ_API_KEY`, `ASSEMBLYAI_API_KEY`, `GEMINI_API_KEY` (or Google aliases), `OPENAI_API_KEY`, or `FAL_KEY`.\nUse `--diarize [auto|elevenlabs|openai]` for speaker-labelled MP3/MP4 and other supported media;\ndiarization requires `ELEVENLABS_API_KEY` or `OPENAI_API_KEY`.\n\n### Local ONNX transcription (Parakeet/Canary)\n\nSummarize can use NVIDIA Parakeet/Canary ONNX models via a local CLI you provide. Auto selection (default) prefers ONNX when configured.\n\n- Setup helper: `summarize transcriber setup`\n- Install `sherpa-onnx` from upstream binaries/build (Homebrew may not have a formula)\n- Auto selection: set `SUMMARIZE_ONNX_PARAKEET_CMD` or `SUMMARIZE_ONNX_CANARY_CMD` (no flag needed)\n- Force a model: `--transcriber parakeet|canary|whisper|auto`\n- Docs: `docs/nvidia-onnx-transcription.md`\n\n### Verified podcast services (2025-12-25)\n\nRun: `summarize \u003curl\u003e`\n\n- Apple Podcasts\n- Spotify\n- Amazon Music / Audible podcast pages\n- Podbean\n- Podchaser\n- RSS feeds (Podcasting 2.0 transcripts when available)\n- Embedded YouTube podcast pages (e.g. JREPodcast)\n\nTranscription: prefers local `whisper.cpp` when installed; otherwise uses Groq, AssemblyAI, Gemini, OpenAI, or FAL when keys are set.\n\n### Translation paths\n\n`--language/--lang` controls the output language of the summary (and other LLM-generated text). Default is `auto`.\n\nWhen the input is audio/video, the CLI needs a transcript first. The transcript comes from one of these paths:\n\n1. Existing transcript (preferred)\n   - YouTube: uses `youtubei` / `captionTracks` when available.\n   - Podcasts: uses Podcasting 2.0 RSS `\u003cpodcast:transcript\u003e` (JSON/VTT) when the feed publishes it.\n2. Whisper transcription (fallback)\n   - YouTube: prefers yt-dlp audio download, then Android VR direct audio, plus Whisper transcription when configured; Apify is a last resort.\n   - Prefers local `whisper.cpp` when installed + model available.\n   - Otherwise uses cloud transcription in this order: Groq (`GROQ_API_KEY`) → AssemblyAI (`ASSEMBLYAI_API_KEY`) → Gemini (`GEMINI_API_KEY` / Google aliases) → OpenAI (`OPENAI_API_KEY`) → FAL (`FAL_KEY`).\n\nFor direct media URLs, use `--video-mode transcript` to force transcribe -\u003e summarize:\n\n```bash\nsummarize https://example.com/file.mp4 --video-mode transcript --lang en\n```\n\n### Configuration\n\nSingle config location:\n\n- `~/.summarize/config.json`\n\nRun `summarize status` to inspect the effective default model, configured presets, and model\nproviders available from config, environment variables, local endpoints, or installed CLIs.\n\nSupported keys today:\n\n```json\n{\n  \"model\": { \"id\": \"openai/gpt-5-mini\" },\n  \"env\": { \"OPENAI_API_KEY\": \"sk-...\" },\n  \"output\": { \"length\": \"long\" },\n  \"ui\": { \"theme\": \"ember\" }\n}\n```\n\nShorthand (equivalent):\n\n```json\n{\n  \"model\": \"openai/gpt-5-mini\"\n}\n```\n\nAlso supported:\n\n- `model: { \"mode\": \"auto\" }` (automatic model selection + fallback; see [docs/model-auto.md](docs/model-auto.md))\n- `model.rules` (customize candidates / ordering)\n- `models` (define presets selectable via `--model \u003cpreset\u003e`; overrides built-ins like `free`)\n- `env` (generic env var defaults; process env still wins)\n- `apiKeys` (legacy shortcut, mapped to env names; prefer `env` for new configs)\n- `output.length` (default: `long`; accepts `short|medium|long|xl|xxl|20k`)\n- `cache.media` (media download cache: TTL 7 days, 2048 MB cap by default; `--no-media-cache` disables)\n- `media.videoMode: \"auto\"|\"transcript\"|\"understand\"`\n- `media.embeddedVideo: \"auto\"|\"off\"|\"prefer\"|\"both\"` (default `auto`: combine substantial articles with primary embedded YouTube captions)\n- `slides.enabled` / `slides.max` / `slides.ocr` / `slides.dir` (defaults for `--slides`)\n- `ui.theme: \"aurora\"|\"ember\"|\"moss\"|\"mono\"`\n- `openai.useChatCompletions: true` (force OpenAI-compatible chat completions)\n- `openai.serviceTier: \"fast\"|\"priority\"|\"flex\"` (use `\"fast\"` for the friendly alias)\n- `openai.thinking` / `openai.reasoningEffort: \"none\"|\"low\"|\"medium\"|\"high\"|\"xhigh\"`\n- `openai.textVerbosity: \"low\"|\"medium\"|\"high\"`\n\nNote: the config is parsed leniently (JSON5), but comments are not allowed. Unknown keys are ignored.\n\nMedia cache defaults:\n\n```json\n{\n  \"cache\": {\n    \"media\": { \"enabled\": true, \"ttlDays\": 7, \"maxMb\": 2048, \"verify\": \"size\" }\n  }\n}\n```\n\nNote: `--no-cache` bypasses summary caching only (LLM output). Extract/transcript caches still apply. Use `--no-media-cache` to skip media files.\n\nPrecedence:\n\n1. `--model`\n2. `SUMMARIZE_MODEL`\n3. `~/.summarize/config.json`\n4. default (`auto`)\n\nTheme precedence:\n\n1. `--theme`\n2. `SUMMARIZE_THEME`\n3. `~/.summarize/config.json` (`ui.theme`)\n4. default (`aurora`)\n\nEnvironment variable precedence:\n\n1. process env\n2. `~/.summarize/config.json` (`env`)\n3. `~/.summarize/config.json` (`apiKeys`, legacy)\n\n### Environment variables\n\nSet the key matching your chosen `--model`:\n\n- Optional fallback defaults can be stored in config:\n  - `~/.summarize/config.json` -\u003e `\"env\": { \"OPENAI_API_KEY\": \"sk-...\" }`\n  - process env always takes precedence\n  - legacy `\"apiKeys\"` still works (mapped to env names)\n\n- `OPENAI_API_KEY` (for `openai/...`)\n- `NVIDIA_API_KEY` (for `nvidia/...`)\n- `MINIMAX_API_KEY` (for `minimax/...`)\n- `ANTHROPIC_API_KEY` (for `anthropic/...`)\n- `XAI_API_KEY` (for `xai/...`)\n- `Z_AI_API_KEY` (for `zai/...`; supports `ZAI_API_KEY` alias)\n- `GEMINI_API_KEY` (for `google/...`)\n  - also accepts `GOOGLE_GENERATIVE_AI_API_KEY` and `GOOGLE_API_KEY` as aliases\n\nOpenAI-compatible chat completions toggle:\n\n- `OPENAI_USE_CHAT_COMPLETIONS=1` (or set `openai.useChatCompletions` in config)\n\nUI theme:\n\n- `SUMMARIZE_THEME=aurora|ember|moss|mono`\n- `SUMMARIZE_TRUECOLOR=1` (force 24-bit ANSI)\n- `SUMMARIZE_NO_TRUECOLOR=1` (disable 24-bit ANSI)\n\nOpenRouter (OpenAI-compatible):\n\n- Set `OPENROUTER_API_KEY=...`\n- Prefer forcing OpenRouter per model id: `--model openrouter/\u003cauthor\u003e/\u003cslug\u003e`\n- Built-in preset: `--model free` (uses a default set of OpenRouter `:free` models)\n\n### `summarize refresh-free`\n\nQuick start: make free the default (keep `auto` available)\n\n```bash\nsummarize refresh-free --set-default\nsummarize \"https://example.com\"\nsummarize \"https://example.com\" --model auto\n```\n\nRegenerates the `free` preset (`models.free` in `~/.summarize/config.json`) by:\n\n- Fetching OpenRouter `/models`, filtering `:free`\n- Skipping models that look very small (\u003c27B by default) based on the model id/name\n- Testing which ones return non-empty text (concurrency 4, timeout 10s)\n- Picking a mix of smart-ish (bigger `context_length` / output cap) and fast models\n- Refining timings and writing the sorted list back\n\nIf `--model free` stops working, run:\n\n```bash\nsummarize refresh-free\n```\n\nFlags:\n\n- `--runs 2` (default): extra timing runs per selected model (total runs = 1 + runs)\n- `--smart 3` (default): how many smart-first picks (rest filled by fastest)\n- `--min-params 27b` (default): ignore models with inferred size smaller than N billion parameters\n- `--max-age-days 180` (default): ignore models older than N days (set 0 to disable)\n- `--set-default`: also sets `\"model\": \"free\"` in `~/.summarize/config.json`\n\nExample:\n\n```bash\nOPENROUTER_API_KEY=sk-or-... summarize \"https://example.com\" --model openrouter/meta-llama/llama-3.1-8b-instruct:free\nOPENROUTER_API_KEY=sk-or-... summarize \"https://example.com\" --model openrouter/minimax/minimax-m2.5\n```\n\nIf your OpenRouter account enforces an allowed-provider list, make sure at least one provider\nis allowed for the selected model. When routing fails, `summarize` prints the exact providers to allow.\n\nLegacy: `OPENAI_BASE_URL=https://openrouter.ai/api/v1` (and either `OPENAI_API_KEY` or `OPENROUTER_API_KEY`) also works.\n\nNVIDIA API Catalog (OpenAI-compatible; free credits):\n\n- Set `NVIDIA_API_KEY=...`\n- Optional: `NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1`\n- Credits: API Catalog trial starts with 1000 free API credits on signup (up to 5000 total via “Request More” in the API Catalog profile)\n- Pick a model id from `/v1/models` (examples: fast `stepfun-ai/step-3.5-flash`, strong but slower `z-ai/glm5`)\n\n```bash\nexport NVIDIA_API_KEY=\"nvapi-...\"\nsummarize \"https://example.com\" --model nvidia/stepfun-ai/step-3.5-flash\n```\n\nZ.AI (OpenAI-compatible):\n\n- `Z_AI_API_KEY=...` (or `ZAI_API_KEY=...`)\n- Optional base URL override: `Z_AI_BASE_URL=...`\n\nMiniMax (OpenAI-compatible):\n\n- Set `MINIMAX_API_KEY=...`\n- Optional base URL override: `MINIMAX_BASE_URL=...` (default `https://api.minimax.io/v1`; use the\n  China endpoint or a proxy if needed)\n- Pick a MiniMax model id (e.g. `MiniMax-M3`, `MiniMax-M2.5`) using MiniMax's exact casing\n- Reasoning is requested through MiniMax's separated response fields and omitted from summary text\n\n```bash\nexport MINIMAX_API_KEY=\"...\"\nsummarize \"https://example.com\" --model minimax/MiniMax-M3\n```\n\nOptional services:\n\n- `FIRECRAWL_API_KEY` (website extraction fallback)\n- `YT_DLP_PATH` (path to yt-dlp binary for audio extraction)\n- `GROQ_API_KEY` (Groq Whisper transcription)\n- `ASSEMBLYAI_API_KEY` (AssemblyAI transcription)\n- `ELEVENLABS_API_KEY` (ElevenLabs Scribe v2 speaker diarization)\n- `GEMINI_API_KEY` / `GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY` (Gemini transcription)\n- `OPENAI_API_KEY` / `OPENAI_WHISPER_BASE_URL` (OpenAI Whisper transcription)\n- `FAL_KEY` (FAL AI API key for audio transcription via Whisper)\n- `APIFY_API_TOKEN` (YouTube transcript fallback)\n\n### Model limits\n\nThe CLI uses the LiteLLM model catalog for model limits (like max output tokens):\n\n- Downloaded from: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`\n- Cached at: `~/.summarize/cache/`\n\n### Library usage (optional)\n\nRecommended (minimal deps):\n\n- `@steipete/summarize-core/content`\n- `@steipete/summarize-core/prompts`\n\nCompatibility (pulls in CLI deps):\n\n- `@steipete/summarize/content`\n- `@steipete/summarize/prompts`\n\n### Development\n\n```bash\npnpm install\npnpm check\n```\n\n## More\n\n- Docs index: [docs/README.md](docs/README.md)\n- CLI providers and config: [docs/cli.md](docs/cli.md)\n- Auto model rules: [docs/model-auto.md](docs/model-auto.md)\n- Website extraction: [docs/website.md](docs/website.md)\n- YouTube handling: [docs/youtube.md](docs/youtube.md)\n- Media pipeline: [docs/media.md](docs/media.md)\n- Config schema and precedence: [docs/config.md](docs/config.md)\n\n## Troubleshooting\n\n- \"Receiving end does not exist\": Chrome did not inject the content script yet.\n  - Extension details -\u003e Site access -\u003e On all sites (or allow this domain)\n  - Reload the tab once.\n- \"Failed to fetch\" / daemon unreachable:\n  - `summarize daemon status`\n  - Logs: `~/.summarize/logs/daemon.err.log`\n\nLicense: MIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteipete%2Fsummarize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsteipete%2Fsummarize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteipete%2Fsummarize/lists"}