{"id":35003790,"url":"https://github.com/steipete/summarize","last_synced_at":"2026-04-08T01:02:43.232Z","repository":{"id":329494833,"uuid":"1118209243","full_name":"steipete/summarize","owner":"steipete","description":"Point at any URL/YouTube/Podcast or file. Get the gist. CLI and Chrome Extension.","archived":false,"fork":false,"pushed_at":"2026-01-12T04:57:38.000Z","size":9348,"stargazers_count":737,"open_issues_count":14,"forks_count":47,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-01-12T05:56:06.210Z","etag":null,"topics":["ai","cli","summarize","typescript"],"latest_commit_sha":null,"homepage":"https://summarize.sh","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/steipete.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-12-17T12:19:51.000Z","updated_at":"2026-01-12T04:59:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/steipete/summarize","commit_stats":null,"previous_names":["steipete/summarize"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/steipete/summarize","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/steipete","download_url":"https://codeload.github.com/steipete/summarize/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/steipete%2Fsummarize/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28400397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-13T14:36:09.778Z","status":"ssl_error","status_checked_at":"2026-01-13T14:35:19.697Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","cli","summarize","typescript"],"created_at":"2025-12-27T04:20:20.367Z","updated_at":"2026-04-08T01:02:43.211Z","avatar_url":"https://github.com/steipete.png","language":"TypeScript","readme":"# Summarize 📝 — Chrome Side Panel + CLI\n\nFast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar.\n\n## Highlights\n\n- Chrome Side Panel **chat** (streaming agent + history) inside the sidebar.\n- **Video slides**: screenshots + OCR + transcript cards for YouTube, direct video URLs, and local video files.\n- Media-aware summaries: auto‑detect video/audio vs page content.\n- Coding CLI backends: Codex, Claude, Gemini, Cursor Agent, OpenClaw, OpenCode.\n- Streaming Markdown + metrics + cache‑aware status.\n- CLI supports URLs, files, podcasts, YouTube, audio/video, PDFs.\n\n## Feature overview\n\n- URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, podcasts, RSS.\n- Slide extraction for video sources (YouTube, direct video URLs, local video files) with OCR + timestamped cards.\n- Transcript-first media flow: published transcripts when available, then Groq/ONNX/whisper.cpp/AssemblyAI/Gemini/OpenAI/FAL transcription fallback when not.\n- Coding CLI providers: Claude, Codex, Gemini, Cursor Agent, OpenClaw, OpenCode.\n- Streaming output with Markdown rendering, metrics, and cache-aware status.\n- Local, paid, and free models: OpenAI‑compatible local endpoints, paid providers, plus an OpenRouter free preset.\n- Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates.\n- Smart default: if content is shorter than the requested length, we return it as-is (use `--force-summary` to override).\n\n## Get the extension (recommended)\n\n![Summarize extension screenshot](docs/assets/summarize-extension.png)\n\nOne‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown.\n\n**Chrome Web Store:** [Summarize Side Panel](https://chromewebstore.google.com/detail/summarize/cejgnmmhbbpdmjnfppjdfkocebngehfg)\n\nYouTube slide screenshots (from the browser):\n\n![Summarize YouTube slide screenshots](docs/assets/youtube-slides.png)\n\n### Beginner quickstart (extension)\n\n1. Install the CLI (choose one):\n   - **npm** (cross‑platform): `npm i -g @steipete/summarize`\n   - **Homebrew** (Homebrew/core): `brew install summarize`\n2. Install the extension (Chrome Web Store link above) and open the Side Panel.\n3. The panel shows a token + install command. Run it in Terminal:\n   - `summarize daemon install --token \u003cTOKEN\u003e`\n\nWhy a daemon/service?\n\n- The extension can’t run heavy extraction inside the browser. It talks to a local background service on `127.0.0.1` for fast streaming and media tools (yt‑dlp, ffmpeg, OCR, transcription).\n- The service autostarts (launchd/systemd/Scheduled Task) so the Side Panel is always ready.\n\nIf you only want the **CLI**, you can skip the daemon install entirely.\n\nNotes:\n\n- Summarization only runs when the Side Panel is open.\n- Auto mode summarizes on navigation (incl. SPAs); otherwise use the button.\n- Daemon is localhost-only and requires a shared token; rerunning `summarize daemon install --token \u003cTOKEN\u003e` adds another paired browser token instead of invalidating the old one.\n- Autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task).\n- Windows containers: `summarize daemon install` starts the daemon for the current container session but does not register a Scheduled Task. Run it each time the container starts or add that command to your container startup, and publish port `8787` so the host browser can reach the daemon.\n- Tip: configure `free` via `summarize refresh-free` (needs `OPENROUTER_API_KEY`). Add `--set-default` to set model=`free`.\n\nMore:\n\n- Step-by-step install: [apps/chrome-extension/README.md](apps/chrome-extension/README.md)\n- Architecture + troubleshooting: [docs/chrome-extension.md](docs/chrome-extension.md)\n- Firefox compatibility notes: [apps/chrome-extension/docs/firefox.md](apps/chrome-extension/docs/firefox.md)\n\n### Slides (extension)\n\n- Select **Video + Slides** in the Summarize picker.\n- Slides render at the top; expand to full‑width cards with timestamps.\n- Click a slide to seek the video; toggle **Transcript/OCR** when OCR is significant.\n- Requirements: `yt-dlp` + `ffmpeg` for extraction; `tesseract` for OCR. Missing tools show an in‑panel notice.\n\n### Advanced (unpacked / dev)\n\n1. Build + load the extension (unpacked):\n   - Chrome: `pnpm -C apps/chrome-extension build`\n     - `chrome://extensions` → Developer mode → Load unpacked\n     - Pick: `apps/chrome-extension/.output/chrome-mv3`\n   - Firefox: `pnpm -C apps/chrome-extension build:firefox`\n     - `about:debugging#/runtime/this-firefox` → Load Temporary Add-on\n     - Pick: `apps/chrome-extension/.output/firefox-mv3/manifest.json`\n2. Open Side Panel/Sidebar → copy token.\n3. Install daemon in dev mode:\n   - `pnpm summarize daemon install --token \u003cTOKEN\u003e --dev`\n\n## CLI\n\n![Summarize CLI screenshot](docs/assets/summarize-cli.png)\n\n### Install\n\nRequires Node 22+.\n\n- npx (no install):\n\n```bash\nnpx -y @steipete/summarize \"https://example.com\"\n```\n\n- npm (global):\n\n```bash\nnpm i -g @steipete/summarize\n```\n\n- npm (library / minimal deps):\n\n```bash\nnpm i @steipete/summarize-core\n```\n\n```ts\nimport { createLinkPreviewClient } from \"@steipete/summarize-core/content\";\n```\n\n- Homebrew:\n\n```bash\nbrew install summarize\n```\n\nHomebrew ships from `homebrew/core` via `brew install summarize`.\nIf Homebrew is unavailable in your environment, use the npm global install above.\n\n### Optional local dependencies\n\nInstall these if you want media-heavy features:\n\n- `ffmpeg`: required for `--slides` and many local media/transcription flows\n- `yt-dlp`: required for YouTube slide extraction and some remote media flows\n- `tesseract`: optional OCR for `--slides-ocr`\n- Optional cloud transcription providers:\n  - `GROQ_API_KEY`\n  - `ASSEMBLYAI_API_KEY`\n  - `GEMINI_API_KEY` / `GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY`\n  - `OPENAI_API_KEY`\n  - `FAL_KEY`\n\nmacOS (Homebrew):\n\n```bash\nbrew install ffmpeg yt-dlp\nbrew install tesseract # optional, for --slides-ocr\n```\n\nIf `--slides` is enabled and these tools are missing, Summarize warns and continues without slides.\n\n### CLI vs extension\n\n- **CLI only:** just install via npm/Homebrew and run `summarize ...` (no daemon needed).\n- **Chrome/Firefox extension:** install the CLI **and** run `summarize daemon install --token \u003cTOKEN\u003e` so the Side Panel can stream results and use local tools.\n\n### Quickstart\n\n```bash\nsummarize \"https://example.com\"\n```\n\n### Inputs\n\nURLs or local paths:\n\n```bash\nsummarize \"/path/to/file.pdf\" --model google/gemini-3-flash\nsummarize \"https://example.com/report.pdf\" --model google/gemini-3-flash\nsummarize \"/path/to/audio.mp3\"\nsummarize \"/path/to/video.mp4\"\n```\n\nStdin (pipe content using `-`):\n\n```bash\necho \"content\" | summarize -\npbpaste | summarize -\n# binary stdin also works (PDF/image/audio/video bytes)\ncat /path/to/file.pdf | summarize -\n```\n\n**Notes:**\n\n- Stdin has a 50MB size limit\n- The `-` argument tells summarize to read from standard input\n- Text stdin is treated as UTF-8 text (whitespace-only input is rejected as empty)\n- Binary stdin is preserved as raw bytes and file type is auto-detected when possible\n- Useful for piping clipboard content or command output\n\nYouTube (supports `youtube.com` and `youtu.be`):\n\n```bash\nsummarize \"https://youtu.be/dQw4w9WgXcQ\" --youtube auto\n```\n\nPodcast RSS (transcribes latest enclosure):\n\n```bash\nsummarize \"https://feeds.npr.org/500005/podcast.xml\"\n```\n\nApple Podcasts episode page:\n\n```bash\nsummarize \"https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432\"\n```\n\nSpotify episode page (best-effort; may fail for exclusives):\n\n```bash\nsummarize \"https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY\"\n```\n\nHLS playlist:\n\n```bash\nsummarize \"https://example.com/master.m3u8\"\n```\n\n### Output length\n\n`--length` controls how much output we ask for (guideline), not a hard cap.\n\nSet a default in `~/.summarize/config.json` with `output.length`.\n\n```bash\nsummarize \"https://example.com\" --length long\nsummarize \"https://example.com\" --length 20k\n```\n\n- Presets: `short|medium|long|xl|xxl`\n- Character targets: `1500`, `20k`, `20000`\n- Optional hard cap: `--max-output-tokens \u003ccount\u003e` (e.g. `2000`, `2k`)\n  - Provider/model APIs still enforce their own maximum output limits.\n  - If omitted, no max token parameter is sent (provider default).\n  - Prefer `--length` unless you need a hard cap.\n- Short content: when extracted content is shorter than the requested length, the CLI returns the content as-is.\n  - Override with `--force-summary` to always run the LLM.\n- Minimums: `--length` numeric values must be \u003e= 50 chars; `--max-output-tokens` must be \u003e= 16.\n- Preset targets (source of truth: `packages/core/src/prompts/summary-lengths.ts`):\n  - short: target ~900 chars (range 600-1,200)\n  - medium: target ~1,800 chars (range 1,200-2,500)\n  - long: target ~4,200 chars (range 2,500-6,000)\n  - xl: target ~9,000 chars (range 6,000-14,000)\n  - xxl: target ~17,000 chars (range 14,000-22,000)\n\n### What file types work?\n\nBest effort and provider-dependent. These usually work well:\n\n- `text/*` and common structured text (`.txt`, `.md`, `.json`, `.yaml`, `.xml`, ...)\n  - Text-like files are inlined into the prompt for better provider compatibility.\n- PDFs: `application/pdf` (provider support varies; Google is the most reliable here)\n- Images: `image/jpeg`, `image/png`, `image/webp`, `image/gif`\n- Audio/Video: `audio/*`, `video/*` (local audio/video files MP3/WAV/M4A/OGG/FLAC/MP4/MOV/WEBM automatically transcribed, when supported by the model)\n\nNotes:\n\n- If a provider rejects a media type, the CLI fails fast with a friendly message.\n- xAI models do not support attaching generic files (like PDFs) via the AI SDK; use Google/OpenAI/Anthropic for those.\n\n### Model ids\n\nUse gateway-style ids: `\u003cprovider\u003e/\u003cmodel\u003e`.\n\nExamples:\n\n- `openai/gpt-5.4`\n- `openai/gpt-5.4-mini`\n- `openai/gpt-5.4-nano`\n- `openai/gpt-5-mini`\n- `openai/gpt-5-nano`\n- `github-copilot/gpt-5.4`\n- `anthropic/claude-sonnet-4-5`\n- `xai/grok-4-fast-non-reasoning`\n- `google/gemini-3-flash`\n- `zai/glm-4.7`\n- `openrouter/openai/gpt-5-mini` (force OpenRouter)\n\nNote: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider).\n`gpt-5.4-mini` and `gpt-5.4-nano` are treated as real model ids; the same shorthand also works under `github-copilot/...`.\n\n### Limits\n\n- Text inputs over 10 MB are rejected before tokenization.\n- Text prompts are preflighted against the model input limit (LiteLLM catalog), using a GPT tokenizer.\n\n### Common flags\n\n```bash\nsummarize \u003cinput\u003e [flags]\n```\n\nUse `summarize --help` or `summarize help` for the full help text.\n\n- `--model \u003cprovider/model\u003e`: which model to use (defaults to `auto`)\n- `--model auto`: automatic model selection + fallback (default)\n- `--model \u003cname\u003e`: use a config-defined model (see Configuration)\n- `--timeout \u003cduration\u003e`: `30s`, `2m`, `5000ms` (default `2m`)\n- `--retries \u003ccount\u003e`: LLM retry attempts on timeout (default `1`)\n- `--length short|medium|long|xl|xxl|s|m|l|\u003cchars\u003e`\n- `--language, --lang \u003clanguage\u003e`: output language (`auto` = match source)\n- `--max-output-tokens \u003ccount\u003e`: hard cap for LLM output tokens\n- `--cli [provider]`: use a CLI provider (`--model cli/\u003cprovider\u003e`). Supports `claude`, `gemini`, `codex`, `agent`, `openclaw`, `opencode`. If omitted, uses auto selection with CLI enabled.\n- `--stream auto|on|off`: stream LLM output (`auto` = TTY only; disabled in `--json` mode)\n- `--plain`: keep raw output (no ANSI/OSC Markdown rendering)\n- `--no-color`: disable ANSI colors\n- `--theme \u003cname\u003e`: CLI theme (`aurora`, `ember`, `moss`, `mono`)\n- `--format md|text`: website/file content format (default `text`)\n- `--markdown-mode off|auto|llm|readability`: HTML -\u003e Markdown mode (default `readability`)\n- `--preprocess off|auto|always`: controls `uvx markitdown` usage (default `auto`)\n  - Install `uvx`: `brew install uv` (or https://astral.sh/uv/)\n- `--extract`: print extracted content and exit (URLs only; stdin `-` is not supported)\n  - Deprecated alias: `--extract-only`\n- `--slides`: extract slides for YouTube, direct video URLs, or local video files and render them inline in the summary narrative (auto-renders inline in supported terminals)\n- `--slides-ocr`: run OCR on extracted slides (requires `tesseract`)\n- `--slides-dir \u003cdir\u003e`: base output dir for slide images (default `./slides`)\n- `--slides-scene-threshold \u003cvalue\u003e`: scene detection threshold (0.1-1.0)\n- `--slides-max \u003ccount\u003e`: maximum slides to extract (default `6`)\n- `--slides-min-duration \u003cseconds\u003e`: minimum seconds between slides\n- `--json`: machine-readable output with diagnostics, prompt, `metrics`, and optional summary\n- `--verbose`: debug/diagnostics on stderr\n- `--metrics off|on|detailed`: metrics output (default `on`)\n\n### Coding CLIs (Codex, Claude, Gemini, Agent, OpenClaw, OpenCode)\n\nSummarize can use common coding CLIs as local model backends:\n\n- `codex` -\u003e `--cli codex` / `--model cli/codex/\u003cmodel\u003e`\n- `claude` -\u003e `--cli claude` / `--model cli/claude/\u003cmodel\u003e`\n- `gemini` -\u003e `--cli gemini` / `--model cli/gemini/\u003cmodel\u003e`\n- `agent` (Cursor Agent CLI) -\u003e `--cli agent` / `--model cli/agent/\u003cmodel\u003e`\n- `openclaw` -\u003e `--cli openclaw` / `--model cli/openclaw/\u003cmodel\u003e` or `--model openclaw/\u003cmodel\u003e`\n- `opencode` -\u003e `--cli opencode` / `--model cli/opencode/\u003cmodel\u003e` (`--model cli/opencode` uses the OpenCode runtime default)\n\nRequirements:\n\n- Binary installed and on `PATH` (or set `CODEX_PATH`, `CLAUDE_PATH`, `GEMINI_PATH`, `AGENT_PATH`, `OPENCLAW_PATH`, `OPENCODE_PATH`)\n- Provider authenticated (`codex login`, `claude auth`, `gemini` login flow, `agent login` or `CURSOR_API_KEY`, `opencode auth login`)\n\nQuick smoke test:\n\n```bash\nprintf \"Summarize CLI smoke input.\\nOne short paragraph. Reply can be brief.\\n\" \u003e/tmp/summarize-cli-smoke.txt\n\nsummarize --cli codex --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli claude --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli gemini --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli agent --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli openclaw --plain --timeout 2m /tmp/summarize-cli-smoke.txt\nsummarize --cli opencode --plain --timeout 2m /tmp/summarize-cli-smoke.txt\n```\n\nSet explicit CLI allowlist/order:\n\n```json\n{\n  \"cli\": { \"enabled\": [\"codex\", \"claude\", \"gemini\", \"agent\", \"openclaw\", \"opencode\"] }\n}\n```\n\nConfigure implicit auto CLI fallback:\n\n```json\n{\n  \"cli\": {\n    \"autoFallback\": {\n      \"enabled\": true,\n      \"onlyWhenNoApiKeys\": true,\n      \"order\": [\"claude\", \"gemini\", \"codex\", \"agent\", \"openclaw\", \"opencode\"]\n    }\n  }\n}\n```\n\nMore details: [`docs/cli.md`](docs/cli.md)\n\n### Auto model ordering\n\n`--model auto` builds candidate attempts from built-in rules (or your `model.rules` overrides).\nCLI attempts are prepended when:\n\n- `cli.enabled` is set (explicit allowlist/order), or\n- implicit auto selection is active and `cli.autoFallback` is enabled.\n\nDefault fallback behavior: only when no API keys are configured, order `claude, gemini, codex, agent, openclaw, opencode`, and remember/prioritize last successful provider (`~/.summarize/cli-state.json`).\n\nSet explicit CLI attempts:\n\n```json\n{\n  \"cli\": { \"enabled\": [\"gemini\"] }\n}\n```\n\nDisable implicit auto CLI fallback:\n\n```json\n{\n  \"cli\": { \"autoFallback\": { \"enabled\": false } }\n}\n```\n\nNote: explicit `--model auto` does not trigger implicit auto CLI fallback unless `cli.enabled` is set.\n\n### Website extraction (Firecrawl + Markdown)\n\nNon-YouTube URLs go through a fetch -\u003e extract pipeline. When direct fetch/extraction is blocked or too thin,\n`--firecrawl auto` can fall back to Firecrawl (if configured).\n\n- `--firecrawl off|auto|always` (default `auto`)\n- `--extract --format md|text` (default `text`; if `--format` is omitted, `--extract` defaults to `md` for non-YouTube URLs)\n- `--markdown-mode off|auto|llm|readability` (default `readability`)\n  - `auto`: use an LLM converter when configured; may fall back to `uvx markitdown`\n  - `llm`: force LLM conversion (requires a configured model key)\n  - `off`: disable LLM conversion (still may return Firecrawl Markdown when configured)\n- Plain-text mode: use `--format text`.\n\n### YouTube transcripts\n\n`--youtube auto` tries best-effort web transcript endpoints first. When captions are not available, it falls back to:\n\n1. Apify (if `APIFY_API_TOKEN` is set): uses a scraping actor (`faVsWy9VTSNVIhWpR`)\n2. yt-dlp + Whisper (if `yt-dlp` is available): downloads audio, then transcribes with local `whisper.cpp` when installed\n   (preferred), otherwise falls back to Groq (`GROQ_API_KEY`), AssemblyAI (`ASSEMBLYAI_API_KEY`), Gemini\n   (`GEMINI_API_KEY` / Google aliases), OpenAI (`OPENAI_API_KEY`), then FAL (`FAL_KEY`)\n\nEnvironment variables for yt-dlp mode:\n\n- `YT_DLP_PATH` - optional path to yt-dlp binary (otherwise `yt-dlp` is resolved via `PATH`)\n- `SUMMARIZE_WHISPER_CPP_MODEL_PATH` - optional override for the local `whisper.cpp` model file\n- `SUMMARIZE_WHISPER_CPP_BINARY` - optional override for the local binary (default: `whisper-cli`)\n- `SUMMARIZE_DISABLE_LOCAL_WHISPER_CPP=1` - disable local whisper.cpp (force remote)\n- `GROQ_API_KEY` - Groq Whisper transcription\n- `ASSEMBLYAI_API_KEY` - AssemblyAI transcription\n- `GEMINI_API_KEY` - Gemini transcription (`GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY` also work)\n- `OPENAI_API_KEY` - OpenAI Whisper transcription\n- `OPENAI_WHISPER_BASE_URL` - optional OpenAI-compatible Whisper endpoint override\n- `FAL_KEY` - FAL AI Whisper fallback\n\nApify costs money but tends to be more reliable when captions exist.\n\n### Slide extraction (YouTube + direct video URLs + local video files)\n\nExtract slide screenshots (scene detection via `ffmpeg`) and optional OCR:\n\nRequirements:\n\n- `ffmpeg` for scene detection and frame extraction\n- `yt-dlp` for YouTube video download/stream resolution\n- `tesseract` only when using `--slides-ocr`\n\n```bash\nsummarize \"https://www.youtube.com/watch?v=...\" --slides\nsummarize \"https://www.youtube.com/watch?v=...\" --slides --slides-ocr\nsummarize \"/path/to/video.webm\" --slides\n```\n\nOutputs are written under `./slides/\u003csourceId\u003e/` (or `--slides-dir`). OCR results are included in JSON output\n(`--json`) and stored in `slides.json` inside the slide directory. When scene detection is too sparse, the\nextractor also samples at a fixed interval to improve coverage.\nWhen using `--slides`, supported terminals (kitty/iTerm/Konsole) render inline thumbnails automatically inside the\nsummary narrative (the model inserts `[slide:N]` markers). Timestamp links are clickable when the terminal supports\nOSC-8 (YouTube/Vimeo/Loom/Dropbox). If inline images are unsupported, Summarize prints a note with the on-disk\nslide directory. Local video files stay on the slide-aware path, transcribe in place, and avoid fake download labels.\n\nUse `--slides --extract` to print the full timed transcript and insert slide images inline at matching timestamps.\n\nFormat the extracted transcript as Markdown (headings + paragraphs) via an LLM:\n\n```bash\nsummarize \"https://www.youtube.com/watch?v=...\" --extract --format md --markdown-mode llm\n```\n\n### Media transcription (Whisper)\n\nLocal audio/video files are transcribed first, then summarized. `--video-mode transcript` forces\ndirect media URLs (and embedded media) through Whisper first. Prefers local `whisper.cpp` when available; otherwise requires\none of `GROQ_API_KEY`, `ASSEMBLYAI_API_KEY`, `GEMINI_API_KEY` (or Google aliases), `OPENAI_API_KEY`, or `FAL_KEY`.\n\n### Local ONNX transcription (Parakeet/Canary)\n\nSummarize can use NVIDIA Parakeet/Canary ONNX models via a local CLI you provide. Auto selection (default) prefers ONNX when configured.\n\n- Setup helper: `summarize transcriber setup`\n- Install `sherpa-onnx` from upstream binaries/build (Homebrew may not have a formula)\n- Auto selection: set `SUMMARIZE_ONNX_PARAKEET_CMD` or `SUMMARIZE_ONNX_CANARY_CMD` (no flag needed)\n- Force a model: `--transcriber parakeet|canary|whisper|auto`\n- Docs: `docs/nvidia-onnx-transcription.md`\n\n### Verified podcast services (2025-12-25)\n\nRun: `summarize \u003curl\u003e`\n\n- Apple Podcasts\n- Spotify\n- Amazon Music / Audible podcast pages\n- Podbean\n- Podchaser\n- RSS feeds (Podcasting 2.0 transcripts when available)\n- Embedded YouTube podcast pages (e.g. JREPodcast)\n\nTranscription: prefers local `whisper.cpp` when installed; otherwise uses Groq, AssemblyAI, Gemini, OpenAI, or FAL when keys are set.\n\n### Translation paths\n\n`--language/--lang` controls the output language of the summary (and other LLM-generated text). Default is `auto`.\n\nWhen the input is audio/video, the CLI needs a transcript first. The transcript comes from one of these paths:\n\n1. Existing transcript (preferred)\n   - YouTube: uses `youtubei` / `captionTracks` when available.\n   - Podcasts: uses Podcasting 2.0 RSS `\u003cpodcast:transcript\u003e` (JSON/VTT) when the feed publishes it.\n2. Whisper transcription (fallback)\n   - YouTube: falls back to yt-dlp (audio download) + Whisper transcription when configured; Apify is a last resort.\n   - Prefers local `whisper.cpp` when installed + model available.\n   - Otherwise uses cloud transcription in this order: Groq (`GROQ_API_KEY`) → AssemblyAI (`ASSEMBLYAI_API_KEY`) → Gemini (`GEMINI_API_KEY` / Google aliases) → OpenAI (`OPENAI_API_KEY`) → FAL (`FAL_KEY`).\n\nFor direct media URLs, use `--video-mode transcript` to force transcribe -\u003e summarize:\n\n```bash\nsummarize https://example.com/file.mp4 --video-mode transcript --lang en\n```\n\n### Configuration\n\nSingle config location:\n\n- `~/.summarize/config.json`\n\nSupported keys today:\n\n```json\n{\n  \"model\": { \"id\": \"openai/gpt-5-mini\" },\n  \"env\": { \"OPENAI_API_KEY\": \"sk-...\" },\n  \"output\": { \"length\": \"long\" },\n  \"ui\": { \"theme\": \"ember\" }\n}\n```\n\nShorthand (equivalent):\n\n```json\n{\n  \"model\": \"openai/gpt-5-mini\"\n}\n```\n\nAlso supported:\n\n- `model: { \"mode\": \"auto\" }` (automatic model selection + fallback; see [docs/model-auto.md](docs/model-auto.md))\n- `model.rules` (customize candidates / ordering)\n- `models` (define presets selectable via `--model \u003cpreset\u003e`)\n- `env` (generic env var defaults; process env still wins)\n- `apiKeys` (legacy shortcut, mapped to env names; prefer `env` for new configs)\n- `output.length` (default `--length`: `short|medium|long|xl|xxl|20k`)\n- `cache.media` (media download cache: TTL 7 days, 2048 MB cap by default; `--no-media-cache` disables)\n- `media.videoMode: \"auto\"|\"transcript\"|\"understand\"`\n- `slides.enabled` / `slides.max` / `slides.ocr` / `slides.dir` (defaults for `--slides`)\n- `ui.theme: \"aurora\"|\"ember\"|\"moss\"|\"mono\"`\n- `openai.useChatCompletions: true` (force OpenAI-compatible chat completions)\n\nNote: the config is parsed leniently (JSON5), but comments are not allowed. Unknown keys are ignored.\n\nMedia cache defaults:\n\n```json\n{\n  \"cache\": {\n    \"media\": { \"enabled\": true, \"ttlDays\": 7, \"maxMb\": 2048, \"verify\": \"size\" }\n  }\n}\n```\n\nNote: `--no-cache` bypasses summary caching only (LLM output). Extract/transcript caches still apply. Use `--no-media-cache` to skip media files.\n\nPrecedence:\n\n1. `--model`\n2. `SUMMARIZE_MODEL`\n3. `~/.summarize/config.json`\n4. default (`auto`)\n\nTheme precedence:\n\n1. `--theme`\n2. `SUMMARIZE_THEME`\n3. `~/.summarize/config.json` (`ui.theme`)\n4. default (`aurora`)\n\nEnvironment variable precedence:\n\n1. process env\n2. `~/.summarize/config.json` (`env`)\n3. `~/.summarize/config.json` (`apiKeys`, legacy)\n\n### Environment variables\n\nSet the key matching your chosen `--model`:\n\n- Optional fallback defaults can be stored in config:\n  - `~/.summarize/config.json` -\u003e `\"env\": { \"OPENAI_API_KEY\": \"sk-...\" }`\n  - process env always takes precedence\n  - legacy `\"apiKeys\"` still works (mapped to env names)\n\n- `OPENAI_API_KEY` (for `openai/...`)\n- `NVIDIA_API_KEY` (for `nvidia/...`)\n- `ANTHROPIC_API_KEY` (for `anthropic/...`)\n- `XAI_API_KEY` (for `xai/...`)\n- `Z_AI_API_KEY` (for `zai/...`; supports `ZAI_API_KEY` alias)\n- `GEMINI_API_KEY` (for `google/...`)\n  - also accepts `GOOGLE_GENERATIVE_AI_API_KEY` and `GOOGLE_API_KEY` as aliases\n\nOpenAI-compatible chat completions toggle:\n\n- `OPENAI_USE_CHAT_COMPLETIONS=1` (or set `openai.useChatCompletions` in config)\n\nUI theme:\n\n- `SUMMARIZE_THEME=aurora|ember|moss|mono`\n- `SUMMARIZE_TRUECOLOR=1` (force 24-bit ANSI)\n- `SUMMARIZE_NO_TRUECOLOR=1` (disable 24-bit ANSI)\n\nOpenRouter (OpenAI-compatible):\n\n- Set `OPENROUTER_API_KEY=...`\n- Prefer forcing OpenRouter per model id: `--model openrouter/\u003cauthor\u003e/\u003cslug\u003e`\n- Built-in preset: `--model free` (uses a default set of OpenRouter `:free` models)\n\n### `summarize refresh-free`\n\nQuick start: make free the default (keep `auto` available)\n\n```bash\nsummarize refresh-free --set-default\nsummarize \"https://example.com\"\nsummarize \"https://example.com\" --model auto\n```\n\nRegenerates the `free` preset (`models.free` in `~/.summarize/config.json`) by:\n\n- Fetching OpenRouter `/models`, filtering `:free`\n- Skipping models that look very small (\u003c27B by default) based on the model id/name\n- Testing which ones return non-empty text (concurrency 4, timeout 10s)\n- Picking a mix of smart-ish (bigger `context_length` / output cap) and fast models\n- Refining timings and writing the sorted list back\n\nIf `--model free` stops working, run:\n\n```bash\nsummarize refresh-free\n```\n\nFlags:\n\n- `--runs 2` (default): extra timing runs per selected model (total runs = 1 + runs)\n- `--smart 3` (default): how many smart-first picks (rest filled by fastest)\n- `--min-params 27b` (default): ignore models with inferred size smaller than N billion parameters\n- `--max-age-days 180` (default): ignore models older than N days (set 0 to disable)\n- `--set-default`: also sets `\"model\": \"free\"` in `~/.summarize/config.json`\n\nExample:\n\n```bash\nOPENROUTER_API_KEY=sk-or-... summarize \"https://example.com\" --model openrouter/meta-llama/llama-3.1-8b-instruct:free\nOPENROUTER_API_KEY=sk-or-... summarize \"https://example.com\" --model openrouter/minimax/minimax-m2.5\n```\n\nIf your OpenRouter account enforces an allowed-provider list, make sure at least one provider\nis allowed for the selected model. When routing fails, `summarize` prints the exact providers to allow.\n\nLegacy: `OPENAI_BASE_URL=https://openrouter.ai/api/v1` (and either `OPENAI_API_KEY` or `OPENROUTER_API_KEY`) also works.\n\nNVIDIA API Catalog (OpenAI-compatible; free credits):\n\n- Set `NVIDIA_API_KEY=...`\n- Optional: `NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1`\n- Credits: API Catalog trial starts with 1000 free API credits on signup (up to 5000 total via “Request More” in the API Catalog profile)\n- Pick a model id from `/v1/models` (examples: fast `stepfun-ai/step-3.5-flash`, strong but slower `z-ai/glm5`)\n\n```bash\nexport NVIDIA_API_KEY=\"nvapi-...\"\nsummarize \"https://example.com\" --model nvidia/stepfun-ai/step-3.5-flash\n```\n\nZ.AI (OpenAI-compatible):\n\n- `Z_AI_API_KEY=...` (or `ZAI_API_KEY=...`)\n- Optional base URL override: `Z_AI_BASE_URL=...`\n\nOptional services:\n\n- `FIRECRAWL_API_KEY` (website extraction fallback)\n- `YT_DLP_PATH` (path to yt-dlp binary for audio extraction)\n- `GROQ_API_KEY` (Groq Whisper transcription)\n- `ASSEMBLYAI_API_KEY` (AssemblyAI transcription)\n- `GEMINI_API_KEY` / `GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY` (Gemini transcription)\n- `OPENAI_API_KEY` / `OPENAI_WHISPER_BASE_URL` (OpenAI Whisper transcription)\n- `FAL_KEY` (FAL AI API key for audio transcription via Whisper)\n- `APIFY_API_TOKEN` (YouTube transcript fallback)\n\n### Model limits\n\nThe CLI uses the LiteLLM model catalog for model limits (like max output tokens):\n\n- Downloaded from: `https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json`\n- Cached at: `~/.summarize/cache/`\n\n### Library usage (optional)\n\nRecommended (minimal deps):\n\n- `@steipete/summarize-core/content`\n- `@steipete/summarize-core/prompts`\n\nCompatibility (pulls in CLI deps):\n\n- `@steipete/summarize/content`\n- `@steipete/summarize/prompts`\n\n### Development\n\n```bash\npnpm install\npnpm check\n```\n\n## More\n\n- Docs index: [docs/README.md](docs/README.md)\n- CLI providers and config: [docs/cli.md](docs/cli.md)\n- Auto model rules: [docs/model-auto.md](docs/model-auto.md)\n- Website extraction: [docs/website.md](docs/website.md)\n- YouTube handling: [docs/youtube.md](docs/youtube.md)\n- Media pipeline: [docs/media.md](docs/media.md)\n- Config schema and precedence: [docs/config.md](docs/config.md)\n\n## Troubleshooting\n\n- \"Receiving end does not exist\": Chrome did not inject the content script yet.\n  - Extension details -\u003e Site access -\u003e On all sites (or allow this domain)\n  - Reload the tab once.\n- \"Failed to fetch\" / daemon unreachable:\n  - `summarize daemon status`\n  - Logs: `~/.summarize/logs/daemon.err.log`\n\nLicense: MIT\n","funding_links":[],"categories":["🛠️ Developer Tools","Repos","TypeScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteipete%2Fsummarize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsteipete%2Fsummarize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsteipete%2Fsummarize/lists"}