{"id":45167994,"url":"https://github.com/firede/agent-fetch","last_synced_at":"2026-02-22T11:57:12.575Z","repository":{"id":339149417,"uuid":"1160663913","full_name":"firede/agent-fetch","owner":"firede","description":"Fetch web pages as clean Markdown for AI-agent workflows.","archived":false,"fork":false,"pushed_at":"2026-02-19T08:43:56.000Z","size":69,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-20T09:33:53.014Z","etag":null,"topics":["agent-tools","cli","go","markdown","markdown-for-agents","web-fetch"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/firede.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-18T08:12:46.000Z","updated_at":"2026-02-19T08:43:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/firede/agent-fetch","commit_stats":null,"previous_names":["firede/agent-fetch"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/firede/agent-fetch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firede%2Fagent-fetch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firede%2Fagent-fetch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firede%2Fagent-fetch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firede%2Fagent-fetch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/firede","download_url":"https://codeload.github.com/firede/agent-fetch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firede%2Fagent-fetch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29711625,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T10:34:24.778Z","status":"ssl_error","status_checked_at":"2026-02-22T10:32:23.200Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-tools","cli","go","markdown","markdown-for-agents","web-fetch"],"created_at":"2026-02-20T07:30:57.456Z","updated_at":"2026-02-22T11:57:12.570Z","avatar_url":"https://github.com/firede.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# agent-fetch\n\nLocal Agent Material Pipeline for LLMs.\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)\n[![Go](https://github.com/firede/agent-fetch/actions/workflows/ci.yml/badge.svg)](https://github.com/firede/agent-fetch/actions/workflows/ci.yml)\n[![Release](https://img.shields.io/github/v/release/firede/agent-fetch)](https://github.com/firede/agent-fetch/releases)\u003cbr/\u003e\n![Local-only](https://img.shields.io/badge/Local--only-Yes-emerald)\n![Token-efficient](https://img.shields.io/badge/Token--efficient-Yes-emerald)\n![Agent-ready](https://img.shields.io/badge/Agent--ready-Yes-emerald)\n\n[中文](./README.zh.md) | **English**\n\n\u003c/div\u003e\n\n## Highlights\n\n- **Markdown-first output pipeline** -- readability extraction + HTML-to-Markdown conversion, so agents receive clean text instead of noisy HTML/JS/CSS\n- **Headless browser fallback** -- renders JavaScript-heavy pages (SPAs, dynamic dashboards) when static extraction falls short\n- **Custom request headers** -- send `Authorization`, `Cookie`, or any header to access authenticated endpoints\n- **Multi-URL batch fetching** -- fetch multiple pages concurrently with structured, per-URL output\n\n## Quick Start\n\n```bash\n# Install\ngo install github.com/firede/agent-fetch/cmd/agent-fetch@latest\n\n# Fetch a page\nagent-fetch https://example.com\n```\n\nOr download a prebuilt binary from [Releases](https://github.com/firede/agent-fetch/releases).\n\n## How It Works\n\nIn the default `auto` mode, agent-fetch runs a three-stage fallback pipeline:\n\n```\nRequest with Accept: text/markdown\n        |\n        v\n  Markdown response? --yes--\u003e Return as-is\n        | no\n        v\n  Static HTML extraction\n  + Markdown conversion --quality OK?--\u003e Return\n        | no\n        v\n  Headless browser render\n  + extraction + conversion --\u003e Return\n```\n\nThis means most pages are handled without a browser, keeping things fast, while JS-heavy pages still get rendered correctly.\n\n## Modes\n\n| Mode             | Behavior                                                                     | Browser needed                  |\n| ---------------- | ---------------------------------------------------------------------------- | ------------------------------- |\n| `auto` (default) | Three-stage fallback: native Markdown -\u003e static extraction -\u003e browser render | Only when static quality is low |\n| `static`         | Static HTML extraction only, no browser                                      | No                              |\n| `browser`        | Always use headless Chrome/Chromium                                          | Yes                             |\n| `raw`            | Send `Accept: text/markdown`, return HTTP body verbatim                      | No                              |\n\n## Installation\n\n### From Releases\n\n1. Download the archive for your platform from [GitHub Releases](https://github.com/firede/agent-fetch/releases).\n2. Extract and make the binary executable:\n\n```bash\nchmod +x ./agent-fetch\n```\n\n3. Move the binary to a directory on your `PATH`, or run it directly:\n\n```bash\n./agent-fetch https://example.com\n```\n\n#### macOS note\n\nRelease binaries are not yet notarized by Apple, so Gatekeeper may block execution. Remove the quarantine attribute to proceed:\n\n```bash\nxattr -dr com.apple.quarantine ./agent-fetch\n```\n\n### With Go\n\n```bash\ngo install github.com/firede/agent-fetch/cmd/agent-fetch@latest\n```\n\nInstall a specific version:\n\n```bash\ngo install github.com/firede/agent-fetch/cmd/agent-fetch@v0.3.0\n```\n\nEnsure `$(go env GOPATH)/bin` (usually `~/go/bin`) is in your `PATH`.\n\n## Usage\n\n```bash\nagent-fetch [options] \u003curl\u003e [url ...]\nagent-fetch web [options] \u003curl\u003e [url ...]\nagent-fetch doctor [options]\n```\n\n### Flags\n\n| Flag                | Default           | Description                                                                                                             |\n| ------------------- | ----------------- | ----------------------------------------------------------------------------------------------------------------------- |\n| `--mode`            | `auto`            | Fetch mode: `auto` \\| `static` \\| `browser` \\| `raw`                                                                    |\n| `--format`          | `markdown`        | Output format: `markdown` \\| `jsonl`                                                                                    |\n| `--meta`            | `true`            | Include `title`/`description` metadata (`markdown`: front matter, `jsonl`: `meta` field; use `--meta=false` to disable) |\n| `--timeout`         | `20s`             | HTTP request timeout (applies to static/auto modes)                                                                     |\n| `--browser-timeout` | `30s`             | Page-load timeout (applies to browser/auto modes)                                                                       |\n| `--network-idle`    | `1200ms`          | Wait time after last network activity before capturing content                                                          |\n| `--wait-selector`   |                   | CSS selector to wait for before capturing, e.g. `article`                                                               |\n| `--header`          |                   | Custom request header, repeatable. e.g. `--header 'Authorization: Bearer token'`                                        |\n| `--user-agent`      | `agent-fetch/0.1` | User-Agent header                                                                                                       |\n| `--max-body-bytes`  | `8388608`         | Max response bytes to read                                                                                              |\n| `--concurrency`     | `4`               | Max concurrent fetches for multi-URL requests                                                                           |\n| `--browser-path`    |                   | Browser executable path/name override for `browser` and `auto` modes                                                    |\n\n### Examples\n\n```bash\n# Default auto mode\nagent-fetch https://example.com\n\n# Force browser rendering for a JS-heavy page\nagent-fetch --mode browser --wait-selector 'article' https://example.com\n\n# Force a specific browser binary (useful in containers/custom installs)\nagent-fetch --mode browser --browser-path /usr/bin/chromium https://example.com\n\n# Static extraction without front matter\nagent-fetch --mode static --meta=false https://example.com\n\n# Get raw HTTP response body\nagent-fetch --mode raw https://example.com\n\n# Authenticated request\nagent-fetch --header \"Authorization: Bearer $TOKEN\" https://example.com\n\n# Batch fetch with concurrency control\nagent-fetch --concurrency 4 https://example.com https://example.org\n\n# Structured JSONL output\nagent-fetch --format jsonl https://example.com\n\n# Check environment readiness\nagent-fetch doctor\n\n# Check environment readiness with explicit browser path\nagent-fetch doctor --browser-path /usr/bin/chromium\n```\n\n## Multi-URL Batch (Markdown)\n\nWhen multiple URLs are provided, requests run concurrently (controlled by `--concurrency`) and output is emitted in input order using task markers:\n\n```text\n\u003c!-- count: 3, succeeded: 2, failed: 1 --\u003e\n\u003c!-- task[1]: https://example.com/hello --\u003e\n...markdown...\n\u003c!-- /task[1] --\u003e\n\u003c!-- task[2](failed): https://abc.com --\u003e\n\u003c!-- error[2]: ... --\u003e\n```\n\nExit codes: `0` all succeeded, `1` any task failed, `2` argument/usage error.\n\n## JSONL Output Contract\n\nWhen `--format jsonl` is used, each task emits one JSON line (no summary line):\n\n```json\n{\"seq\":1,\"url\":\"https://example.com\",\"resolved_mode\":\"static\",\"content\":\"...\",\"meta\":{\"title\":\"...\",\"description\":\"...\"}}\n{\"seq\":2,\"url\":\"https://bad.example\",\"error\":\"http request failed: timeout\"}\n```\n\nField notes:\n\n- `url`: input URL\n- `resolved_url`: emitted only when different from `url`\n- `resolved_mode`: one of `markdown`, `static`, `browser`, `raw`\n- `meta`: emitted only when `--meta=true` and metadata exists\n\n## Agent Integration\n\nThis project ships a [SKILL.md](./skills/agent-fetch/SKILL.md) that can be used with coding agents that support skill files. Point your skill directory to `skills/agent-fetch` and the agent will be able to invoke `agent-fetch` when its built-in fetch capability is insufficient.\n\n`agent-fetch` reads from the command line and writes results to stdout (`markdown` or `jsonl`), making it easy to integrate into any agent pipeline or shell-based tool call:\n\n```bash\nresult=$(agent-fetch --mode static https://example.com)\n```\n\n## When Do You Need This?\n\nThe table below compares agent-fetch with the built-in web-fetch capabilities found in some coding agents. Actual built-in capabilities vary by product and version.\n\n| Scenario                                           | Built-in web fetch |             agent-fetch             |\n| -------------------------------------------------- | :----------------: | :---------------------------------: |\n| Basic page fetch with HTML simplification          |        Yes         |                 Yes                 |\n| JavaScript-rendered pages (SPAs)                   |       Varies       |       Yes (headless browser)        |\n| Custom headers (auth, cookies)                     |       Varies       |          Yes (`--header`)           |\n| No AI summarization (outputs extracted body as-is) |       Varies       | Yes (subject to `--max-body-bytes`) |\n| Batch fetch multiple URLs concurrently             |       Varies       |        Yes (`--concurrency`)        |\n| CSS selector-based wait/extraction                 |       Varies       |       Yes (`--wait-selector`)       |\n| Works outside coding agents (CLI, CI/CD)           |        N/A         |        Yes (standalone CLI)         |\n\n**How built-in web fetch typically works:** Tools like Claude Code's WebFetch and Codex's built-in fetch retrieve a page over HTTP, convert the HTML to Markdown, and then pass the content through an AI model that may summarize or truncate it to fit the context window. This pipeline is fast and sufficient for most pages, but it usually does not execute JavaScript (so SPA or JS-rendered pages may return incomplete content), does not support custom request headers, and processes one URL at a time.\n\n- **No built-in web fetch available** (other agent frameworks, CLI pipelines, CI/CD) -- use agent-fetch as your primary fetch tool.\n- **Built-in web fetch available** -- use agent-fetch as a complement for JS-heavy pages, authenticated endpoints, batch fetching, or when you need the extracted content without summarization.\n\n## Runtime Dependencies\n\n`browser` and `auto` modes may require Chrome or Chromium on the host.\n\nUse `--mode static` or `--mode raw` to avoid the browser dependency entirely.\n\n- Run `agent-fetch doctor` to validate runtime/browser readiness and get guided fixes.\n- Use `--browser-path` when the browser is installed in a non-default location (common in container images).\n\n## Build\n\n```bash\ngo build -o agent-fetch ./cmd/agent-fetch\n```\n\n## License\n\nThis project is open-sourced under the [MIT License](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirede%2Fagent-fetch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffirede%2Fagent-fetch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirede%2Fagent-fetch/lists"}