{"id":51026910,"url":"https://github.com/gitstq/webpilot-cli","last_synced_at":"2026-06-21T20:02:31.124Z","repository":{"id":355342803,"uuid":"1227723872","full_name":"gitstq/webpilot-cli","owner":"gitstq","description":"🌐 WebPilot-CLI - Lightweight AI Browser Automation CLI Tool | YAML-driven workflows, smart content extraction, multi-format output, zero dependencies, Python 3.8+","archived":false,"fork":false,"pushed_at":"2026-05-03T04:51:14.000Z","size":44,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-03T06:35:15.180Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/gitstq/webpilot-cli","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gitstq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-03T04:33:47.000Z","updated_at":"2026-05-03T04:51:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gitstq/webpilot-cli","commit_stats":null,"previous_names":["gitstq/webpilot-cli"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/gitstq/webpilot-cli","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fwebpilot-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fwebpilot-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fwebpilot-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fwebpilot-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gitstq","download_url":"https://codeload.github.com/gitstq/webpilot-cli/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fwebpilot-cli/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34623906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-21T20:02:31.015Z","updated_at":"2026-06-21T20:02:31.108Z","avatar_url":"https://github.com/gitstq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"#简体中文\"\u003e简体中文\u003c/a\u003e |\n  \u003ca href=\"#繁體中文\"\u003e繁體中文\u003c/a\u003e |\n  \u003ca href=\"#english\"\u003eEnglish\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n\u003ca id=\"简体中文\"\u003e\u003c/a\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch1\u003e🎉 WebPilot-CLI — 轻量级 AI 浏览器自动化命令行工具\u003c/h1\u003e\u003c/summary\u003e\n\n\u003e 🚀 零外部依赖 · YAML 工作流驱动 · 智能内容提取 · 终端原生体验\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/version-v1.0.0-blue\" alt=\"Version\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-3.8%2B-green\" alt=\"Python 3.8+\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-orange\" alt=\"License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/dependencies-zero-brightgreen\" alt=\"Zero Dependencies\"\u003e\n\u003c/p\u003e\n\n---\n\n## 🎉 项目介绍\n\n**WebPilot-CLI** 是一款专为 AI Agent 和开发者打造的轻量级浏览器自动化命令行工具。它完全基于 Python 标准库构建，**零外部依赖**，开箱即用。\n\n### 🎯 项目定位\n\n在 AI Agent 日益普及的今天，浏览器自动化是 Agent 与 Web 世界交互的核心能力。然而，现有的自动化工具要么体积庞大（Selenium、Playwright），要么依赖复杂（需要浏览器驱动），要么功能单一（只能抓取、不能编排）。**WebPilot-CLI** 的定位是：\n\n\u003e **一个足够轻、足够智能、足够灵活的终端浏览器自动化瑞士军刀。**\n\n### 💡 解决的痛点\n\n| 痛点 | WebPilot-CLI 的方案 |\n|------|-------------------|\n| 现有工具依赖沉重，安装配置繁琐 | 纯 Python 标准库实现，`pip install` 即可使用 |\n| 网页抓取结果充满噪声（导航栏、广告、页脚） | 智能噪声过滤引擎，自动识别并过滤无关内容 |\n| 自动化操作需要写大量代码 | YAML 工作流引擎，用声明式配置替代命令式编程 |\n| 缺少终端原生的可视化方案 | ASCII 艺术截图 + HTML Canvas 截图，终端也能\"看到\"网页 |\n| 多步骤操作难以编排和复用 | 变量传递、条件分支、循环控制，完整的流程编排能力 |\n\n### 🌟 差异化亮点\n\n- 🪶 **极致轻量**：零外部依赖，整个工具包仅使用 Python 标准库（`urllib`、`html.parser`、`http.cookiejar`）\n- 🧠 **智能提取**：内置噪声识别引擎，自动过滤导航栏、侧边栏、广告、页脚等无关内容\n- 📝 **YAML 驱动**：用声明式 YAML 定义复杂的浏览器操作流程，支持变量传递与条件分支\n- 🖥️ **终端友好**：ASCII 艺术截图让你在终端中也能\"看到\"网页布局\n- 🔄 **会话管理**：内置 Cookie/Session 管理，轻松处理需要登录的场景\n- 📤 **多格式输出**：支持 JSON、Markdown、纯文本三种输出格式，适配不同下游消费场景\n\n---\n\n## ✨ 核心特性\n\n- 🌐 **网页浏览** — 一条命令获取网页内容，自动提取标题、正文、链接、图片\n- 📸 **智能截图** — 支持 ASCII 艺术截图和 HTML Canvas 截图两种模式\n- 🔍 **内容提取** — 智能过滤噪声，提取结构化内容（标题/描述/正文/链接/图片）\n- ⚙️ **YAML 工作流** — 声明式定义多步骤自动化流程，支持 navigate/extract/screenshot/wait/condition/loop\n- 🔄 **变量传递** — 步骤间通过 `${var}` 语法传递数据，支持条件分支和循环\n- 🍪 **会话管理** — 自动管理 Cookie 和 Session，支持跨请求状态保持\n- 🖥️ **交互模式** — 内置 REPL 交互式浏览器会话，实时探索网页\n- 📤 **多格式输出** — JSON / Markdown / 纯文本，适配管道和脚本集成\n- 🎨 **彩色终端** — 丰富的 ANSI 彩色输出，提升终端阅读体验\n- 📊 **进度展示** — 工作流执行时显示实时进度条\n- 🛡️ **健壮性** — 自动重试、超时控制、编码检测、错误恢复\n- 🪶 **零依赖** — 完全基于 Python 标准库，无需安装任何第三方包\n\n---\n\n## 🚀 快速开始\n\n### 环境要求\n\n- Python 3.8 或更高版本\n- 网络连接（用于访问目标网页）\n- 无需安装浏览器或浏览器驱动\n\n### 安装\n\n```bash\n# 方式一：从 PyPI 安装（推荐）\npip install git+https://github.com/gitstq/webpilot-cli.git\n\n# 方式二：从源码安装\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npip install -e .\n```\n\n### 快速体验\n\n```bash\n# 浏览网页，查看结构化内容\nwebpilot browse https://example.com\n\n# 以 JSON 格式输出\nwebpilot browse https://example.com --output json\n\n# 截取网页 ASCII 截图\nwebpilot screenshot https://example.com -o screenshot.txt -f ascii\n\n# 截取网页 HTML 截图（可在浏览器中打开）\nwebpilot screenshot https://example.com -o screenshot.html -f html\n\n# 提取网页结构化内容\nwebpilot extract https://example.com --fields title description text\n\n# 执行 YAML 工作流\nwebpilot run workflow.yaml\n\n# 启动交互式浏览器会话\nwebpilot interactive\n```\n\n---\n\n## 📖 详细使用指南\n\n### 全局选项\n\n所有子命令均支持以下全局选项：\n\n| 选项 | 缩写 | 说明 | 默认值 |\n|------|------|------|--------|\n| `--output` | `-o` | 输出格式：`json` / `markdown` / `text` | `markdown` |\n| `--no-color` | | 禁用彩色终端输出 | 关闭 |\n| `--verbose` | `-v` | 启用详细输出模式 | 关闭 |\n| `--version` | `-V` | 显示版本号 | — |\n\n### `browse` — 浏览网页\n\n获取并展示网页的结构化内容，自动过滤噪声。\n\n```bash\n# 基本用法\nwebpilot browse \u003curl\u003e\n\n# 指定超时时间和输出格式\nwebpilot browse https://example.com --timeout 60 --output json\n\n# 启用详细模式\nwebpilot browse https://example.com -v\n```\n\n**参数说明：**\n\n| 参数 | 说明 | 默认值 |\n|------|------|--------|\n| `url` | 目标网页 URL（必填） | — |\n| `--timeout` / `-t` | 请求超时时间（秒） | 30 |\n\n### `screenshot` — 网页截图\n\n支持两种截图模式：终端友好的 ASCII 艺术截图和可在浏览器中打开的 HTML Canvas 截图。\n\n```bash\n# HTML 截图（默认，可在浏览器中打开查看）\nwebpilot screenshot https://example.com -o page.html\n\n# ASCII 截图（终端友好）\nwebpilot screenshot https://example.com -f ascii -o page.txt\n\n# 自定义终端宽度\nwebpilot screenshot https://example.com -f ascii --width 120\n```\n\n**参数说明：**\n\n| 参数 | 说明 | 默认值 |\n|------|------|--------|\n| `url` | 目标网页 URL（必填） | — |\n| `--output-file` / `-o` | 输出文件路径 | `screenshot.html` |\n| `--format` / `-f` | 截图格式：`ascii` / `html` | `html` |\n| `--width` | ASCII 截图的终端宽度（字符数） | 80 |\n\n### `extract` — 提取结构化内容\n\n从网页中提取结构化数据，支持按字段筛选。\n\n```bash\n# 提取所有内容\nwebpilot extract https://example.com\n\n# 仅提取标题和描述\nwebpilot extract https://example.com --fields title description\n\n# 仅提取链接和图片\nwebpilot extract https://example.com --fields links images\n\n# JSON 格式输出，方便程序处理\nwebpilot extract https://example.com --output json\n```\n\n**参数说明：**\n\n| 参数 | 说明 | 默认值 |\n|------|------|--------|\n| `url` | 目标网页 URL（必填） | — |\n| `--fields` | 提取字段：`title` / `description` / `text` / `links` / `images`（可多选） | 全部 |\n| `--timeout` / `-t` | 请求超时时间（秒） | 30 |\n\n### `run` — 执行 YAML 工作流\n\n通过 YAML 文件定义和执行多步骤浏览器自动化流程。\n\n```bash\n# 执行工作流\nwebpilot run workflow.yaml\n\n# 导出执行结果\nwebpilot run workflow.yaml --export result.json\n\n# 详细模式\nwebpilot run workflow.yaml -v\n```\n\n**参数说明：**\n\n| 参数 | 说明 | 默认值 |\n|------|------|--------|\n| `workflow` | YAML 工作流文件路径（必填） | — |\n| `--export` / `-e` | 将结果导出为 JSON 文件 | 不导出 |\n\n#### YAML 工作流示例\n\n```yaml\nname: daily_news_collector\ndescription: 每日新闻采集工作流\n\n# 全局变量\nvars:\n  base_url: \"https://news.example.com\"\n  output_dir: \"output\"\n\n# 遇到错误是否停止\nstop_on_error: true\n\nsteps:\n  # 第一步：导航到目标页面\n  - name: \"打开新闻首页\"\n    type: navigate\n    url: \"${base_url}\"\n    save: page_info\n\n  # 第二步：等待页面加载\n  - name: \"等待加载\"\n    type: wait\n    seconds: 2\n\n  # 第三步：提取页面内容\n  - name: \"提取新闻内容\"\n    type: extract\n    fields:\n      - title\n      - description\n      - text\n      - links\n    save: news_data\n\n  # 第四步：截取页面截图\n  - name: \"保存截图\"\n    type: screenshot\n    output: \"${output_dir}/news_screenshot.html\"\n    format: html\n\n  # 第五步：条件判断\n  - name: \"检查标题是否存在\"\n    type: condition\n    variable: page_info\n    operator: exists\n    then:\n      - name: \"标题提取成功\"\n        type: set_variable\n        value: \"页面标题已成功提取\"\n        save: status_message\n    else:\n      - name: \"标题提取失败\"\n        type: set_variable\n        value: \"未找到页面标题\"\n        save: status_message\n\n  # 第六步：循环处理\n  - name: \"批量处理\"\n    type: loop\n    count: 3\n    index_var: iteration\n    steps:\n      - name: \"处理第 ${iteration} 批\"\n        type: set_variable\n        value: \"正在处理第 ${iteration} 批数据\"\n        save: batch_status\n```\n\n#### 支持的步骤类型\n\n| 步骤类型 | 说明 | 关键参数 |\n|----------|------|----------|\n| `navigate` | 导航到指定 URL | `url` |\n| `extract` | 提取当前页面内容 | `fields`（可选） |\n| `screenshot` | 截取当前页面截图 | `output`, `format` |\n| `wait` | 等待指定秒数 | `seconds` |\n| `condition` | 条件分支 | `variable`, `operator`, `then`, `else` |\n| `loop` | 循环执行子步骤 | `count`, `index_var`, `steps` |\n| `set_variable` | 设置工作流变量 | `value` |\n\n#### 支持的条件运算符\n\n| 运算符 | 说明 |\n|--------|------|\n| `exists` | 变量是否存在 |\n| `equals` | 等于指定值 |\n| `not_equals` | 不等于指定值 |\n| `contains` | 包含指定值 |\n| `greater_than` | 大于指定值 |\n| `less_than` | 小于指定值 |\n| `is_true` | 布尔值为真 |\n| `is_false` | 布尔值为假 |\n\n### `interactive` — 交互式浏览器会话\n\n启动一个 REPL（读取-求值-输出循环）交互式会话，实时探索网页。\n\n```bash\n# 启动交互模式\nwebpilot interactive\n\n# 带初始 URL 启动\nwebpilot interactive --url https://example.com\n```\n\n**交互模式内置命令：**\n\n| 命令 | 说明 |\n|------|------|\n| `browse \u003curl\u003e` | 导航到指定 URL |\n| `extract` | 提取当前页面结构化内容 |\n| `screenshot [path]` | 保存截图（默认 `screenshot.html`） |\n| `info` | 显示当前页面信息 |\n| `ascii` | 显示 ASCII 截图 |\n| `title` | 显示页面标题 |\n| `links` | 列出当前页面所有链接 |\n| `images` | 列出当前页面所有图片 |\n| `cookies` | 显示当前 Cookie |\n| `help` | 显示帮助信息 |\n| `quit` / `exit` / `q` | 退出交互模式 |\n\n---\n\n## 💡 设计思路与迭代规划\n\n### 设计理念\n\nWebPilot-CLI 的设计遵循以下核心理念：\n\n1. **极简主义（Minimalism）**：不引入任何外部依赖，用最少的代码实现最多的功能。Python 标准库已经足够强大，`urllib` 处理网络请求，`html.parser` 解析 HTML，`http.cookiejar` 管理会话——我们不需要更多。\n\n2. **声明式优先（Declarative First）**：能用 YAML 配置表达的，就不需要写 Python 代码。工作流引擎让非程序员也能定义复杂的浏览器操作流程。\n\n3. **终端原生（Terminal Native）**：作为命令行工具，终端就是我们的主场。ASCII 截图、彩色输出、进度条——让终端体验不逊色于 GUI。\n\n4. **AI Agent 友好（Agent Friendly）**：结构化的 JSON 输出、可编程的工作流引擎、清晰的状态管理——每一个设计决策都考虑了 AI Agent 的集成需求。\n\n### 技术选型原因\n\n| 技术选择 | 原因 |\n|----------|------|\n| `urllib` 而非 `requests` | 零依赖，标准库自带，满足基本 HTTP 需求 |\n| `html.parser` 而非 `BeautifulSoup` | 零依赖，标准库自带，性能可控 |\n| `http.cookiejar` 而非 `requests.Session` | 零依赖，原生支持 Cookie 持久化 |\n| YAML 工作流 而非 Python 脚本 | 声明式更易读、更易维护、更易被 AI 生成 |\n| ASCII 截图 而非 PNG 截图 | 无需额外依赖（如 Pillow），终端原生展示 |\n\n### 后续规划\n\n- [ ] 🔌 **插件系统**：支持自定义提取器和输出格式的插件机制\n- [ ] 🗄️ **结果持久化**：支持将提取结果保存到 SQLite / CSV\n- [ ] 🔄 **增量抓取**：基于 ETag / Last-Modified 的增量内容更新\n- [ ] 📡 **API 模式**：内置 HTTP 服务器，提供 RESTful API 接口\n- [ ] 🧪 **断言引擎**：工作流中支持页面内容断言，用于监控和测试\n- [ ] 📊 **报告生成**：自动生成工作流执行报告（HTML / PDF）\n- [ ] 🌐 **代理支持**：内置 HTTP/SOCKS 代理配置\n- [ ] 📦 **批量模式**：支持从文件读取 URL 列表进行批量处理\n- [ ] 🤖 **MCP 集成**：作为 MCP（Model Context Protocol）服务器运行\n\n---\n\n## 📦 安装与部署指南\n\n### 系统要求\n\n- **操作系统**：Windows / macOS / Linux\n- **Python 版本**：3.8、3.9、3.10、3.11、3.12\n- **磁盘空间**：约 1 MB（源码）\n- **网络**：需要能访问目标网站\n\n### 安装方式\n\n```bash\n# 方式一：从 GitHub 直接安装（推荐）\npip install git+https://github.com/gitstq/webpilot-cli.git\n\n# 方式二：克隆仓库后以开发模式安装\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npip install -e .\n\n# 方式三：克隆仓库后直接使用（无需安装）\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npython -m webpilot.cli browse https://example.com\n```\n\n### 验证安装\n\n```bash\n# 查看版本号\nwebpilot --version\n\n# 查看帮助信息\nwebpilot --help\n\n# 快速测试\nwebpilot browse https://example.com\n```\n\n### 卸载\n\n```bash\npip uninstall webpilot-cli\n```\n\n---\n\n## 🤝 贡献指南\n\n我们欢迎并感谢所有形式的贡献！无论是提交 Bug 报告、改进文档，还是提交代码 Pull Request。\n\n### 如何贡献\n\n1. **Fork** 本仓库\n2. 创建你的特性分支：`git checkout -b feature/amazing-feature`\n3. 提交你的改动：`git commit -m 'Add some amazing feature'`\n4. 推送到分支：`git push origin feature/amazing-feature`\n5. 提交 **Pull Request**\n\n### 开发环境搭建\n\n```bash\n# 克隆仓库\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\n\n# 以开发模式安装\npip install -e .\n\n# 运行测试\npython -m pytest tests/\n\n# 运行特定测试\npython -m pytest tests/test_extractor.py -v\n```\n\n### 代码规范\n\n- 遵循 PEP 8 编码规范\n- 为所有公开函数编写文档字符串\n- 确保所有测试通过后再提交 PR\n- 提交信息使用清晰、描述性的语言\n\n### 提交 Issue\n\n在提交 Issue 之前，请：\n\n1. 搜索已有的 Issues，避免重复提交\n2. 提供复现步骤和期望行为\n3. 附上运行环境信息（Python 版本、操作系统等）\n\n---\n\n## 📄 开源协议\n\n本项目基于 [MIT License](LICENSE) 开源。\n\n```\nMIT License\n\nCopyright (c) 2024 WebPilot Team\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\n```\n\n---\n\n\u003cp align=\"center\"\u003e\n  用 ❤️ 和 Python 标准库构建 · \u003ca href=\"https://github.com/gitstq/webpilot-cli\"\u003eGitHub\u003c/a\u003e\n\u003c/p\u003e\n\n\u003c/details\u003e\n\n---\n\n\u003ca id=\"繁體中文\"\u003e\u003c/a\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch1\u003e🎉 WebPilot-CLI — 輕量級 AI 瀏覽器自動化命令列工具\u003c/h1\u003e\u003c/summary\u003e\n\n\u003e 🚀 零外部依賴 · YAML 工作流驅動 · 智慧內容擷取 · 終端原生體驗\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/version-v1.0.0-blue\" alt=\"Version\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-3.8%2B-green\" alt=\"Python 3.8+\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-orange\" alt=\"License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/dependencies-zero-brightgreen\" alt=\"Zero Dependencies\"\u003e\n\u003c/p\u003e\n\n---\n\n## 🎉 專案介紹\n\n**WebPilot-CLI** 是一款專為 AI Agent 與開發者打造的輕量級瀏覽器自動化命令列工具。它完全基於 Python 標準函式庫建構，**零外部依賴**，安裝即可使用。\n\n### 🎯 專案定位\n\n在 AI Agent 日益普及的今天，瀏覽器自動化是 Agent 與 Web 世界互動的核心能力。然而，現有的自動化工具要麼體積龐大（Selenium、Playwright），要麼依賴複雜（需要瀏覽器驅動程式），要麼功能單一（只能抓取、不能編排）。**WebPilot-CLI** 的定位是：\n\n\u003e **一個足夠輕、足夠智慧、足夠靈活的終端瀏覽器自動化瑞士軍刀。**\n\n### 💡 解決的痛點\n\n| 痛點 | WebPilot-CLI 的方案 |\n|------|-------------------|\n| 現有工具依賴沉重，安裝配置繁瑣 | 純 Python 標準函式庫實作，`pip install` 即可使用 |\n| 網頁抓取結果充滿雜訊（導覽列、廣告、頁尾） | 智慧雜訊過濾引擎，自動識別並過濾無關內容 |\n| 自動化操作需要寫大量程式碼 | YAML 工作流引擎，用宣告式配置取代命令式程式設計 |\n| 缺少終端原生的視覺化方案 | ASCII 藝術截圖 + HTML Canvas 截圖，終端也能「看到」網頁 |\n| 多步驟操作難以編排和重用 | 變數傳遞、條件分支、迴圈控制，完整的流程編排能力 |\n\n### 🌟 差異化亮點\n\n- 🪶 **極致輕量**：零外部依賴，整個工具包僅使用 Python 標準函式庫（`urllib`、`html.parser`、`http.cookiejar`）\n- 🧠 **智慧擷取**：內建雜訊識別引擎，自動過濾導覽列、側邊欄、廣告、頁尾等無關內容\n- 📝 **YAML 驅動**：用宣告式 YAML 定義複雜的瀏覽器操作流程，支援變數傳遞與條件分支\n- 🖥️ **終端友善**：ASCII 藝術截圖讓你在終端中也能「看到」網頁版面配置\n- 🔄 **工作階段管理**：內建 Cookie/Session 管理，輕鬆處理需要登入的場景\n- 📤 **多格式輸出**：支援 JSON、Markdown、純文字三種輸出格式，適配不同下游消費場景\n\n---\n\n## ✨ 核心特性\n\n- 🌐 **網頁瀏覽** — 一條命令取得網頁內容，自動擷取標題、正文、連結、圖片\n- 📸 **智慧截圖** — 支援 ASCII 藝術截圖和 HTML Canvas 截圖兩種模式\n- 🔍 **內容擷取** — 智慧過濾雜訊，擷取結構化內容（標題/描述/正文/連結/圖片）\n- ⚙️ **YAML 工作流** — 宣告式定義多步驟自動化流程，支援 navigate/extract/screenshot/wait/condition/loop\n- 🔄 **變數傳遞** — 步驟間透過 `${var}` 語法傳遞資料，支援條件分支和迴圈\n- 🍪 **工作階段管理** — 自動管理 Cookie 和 Session，支援跨請求狀態保持\n- 🖥️ **互動模式** — 內建 REPL 互動式瀏覽器工作階段，即時探索網頁\n- 📤 **多格式輸出** — JSON / Markdown / 純文字，適配管線和腳本整合\n- 🎨 **彩色終端** — 豐富的 ANSI 彩色輸出，提升終端閱讀體驗\n- 📊 **進度展示** — 工作流執行時顯示即時進度條\n- 🛡️ **穩健性** — 自動重試、逾時控制、編碼偵測、錯誤復原\n- 🪶 **零依賴** — 完全基於 Python 標準函式庫，無需安裝任何第三方套件\n\n---\n\n## 🚀 快速開始\n\n### 環境需求\n\n- Python 3.8 或更高版本\n- 網路連線（用於存取目標網頁）\n- 無需安裝瀏覽器或瀏覽器驅動程式\n\n### 安裝\n\n```bash\n# 方式一：從 GitHub 直接安裝（推薦）\npip install git+https://github.com/gitstq/webpilot-cli.git\n\n# 方式二：從原始碼安裝\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npip install -e .\n```\n\n### 快速體驗\n\n```bash\n# 瀏覽網頁，查看結構化內容\nwebpilot browse https://example.com\n\n# 以 JSON 格式輸出\nwebpilot browse https://example.com --output json\n\n# 截取網頁 ASCII 截圖\nwebpilot screenshot https://example.com -o screenshot.txt -f ascii\n\n# 截取網頁 HTML 截圖（可在瀏覽器中開啟）\nwebpilot screenshot https://example.com -o screenshot.html -f html\n\n# 擷取網頁結構化內容\nwebpilot extract https://example.com --fields title description text\n\n# 執行 YAML 工作流\nwebpilot run workflow.yaml\n\n# 啟動互動式瀏覽器工作階段\nwebpilot interactive\n```\n\n---\n\n## 📖 詳細使用指南\n\n### 全域選項\n\n所有子命令均支援以下全域選項：\n\n| 選項 | 縮寫 | 說明 | 預設值 |\n|------|------|------|--------|\n| `--output` | `-o` | 輸出格式：`json` / `markdown` / `text` | `markdown` |\n| `--no-color` | | 停用彩色終端輸出 | 關閉 |\n| `--verbose` | `-v` | 啟用詳細輸出模式 | 關閉 |\n| `--version` | `-V` | 顯示版本號 | — |\n\n### `browse` — 瀏覽網頁\n\n取得並展示網頁的結構化內容，自動過濾雜訊。\n\n```bash\n# 基本用法\nwebpilot browse \u003curl\u003e\n\n# 指定逾時時間和輸出格式\nwebpilot browse https://example.com --timeout 60 --output json\n\n# 啟用詳細模式\nwebpilot browse https://example.com -v\n```\n\n**參數說明：**\n\n| 參數 | 說明 | 預設值 |\n|------|------|--------|\n| `url` | 目標網頁 URL（必填） | — |\n| `--timeout` / `-t` | 請求逾時時間（秒） | 30 |\n\n### `screenshot` — 網頁截圖\n\n支援兩種截圖模式：終端友善的 ASCII 藝術截圖和可在瀏覽器中開啟的 HTML Canvas 截圖。\n\n```bash\n# HTML 截圖（預設，可在瀏覽器中開啟查看）\nwebpilot screenshot https://example.com -o page.html\n\n# ASCII 截圖（終端友善）\nwebpilot screenshot https://example.com -f ascii -o page.txt\n\n# 自訂終端寬度\nwebpilot screenshot https://example.com -f ascii --width 120\n```\n\n**參數說明：**\n\n| 參數 | 說明 | 預設值 |\n|------|------|--------|\n| `url` | 目標網頁 URL（必填） | — |\n| `--output-file` / `-o` | 輸出檔案路徑 | `screenshot.html` |\n| `--format` / `-f` | 截圖格式：`ascii` / `html` | `html` |\n| `--width` | ASCII 截圖的終端寬度（字元數） | 80 |\n\n### `extract` — 擷取結構化內容\n\n從網頁中擷取結構化資料，支援依欄位篩選。\n\n```bash\n# 擷取所有內容\nwebpilot extract https://example.com\n\n# 僅擷取標題和描述\nwebpilot extract https://example.com --fields title description\n\n# 僅擷取連結和圖片\nwebpilot extract https://example.com --fields links images\n\n# JSON 格式輸出，方便程式處理\nwebpilot extract https://example.com --output json\n```\n\n**參數說明：**\n\n| 參數 | 說明 | 預設值 |\n|------|------|--------|\n| `url` | 目標網頁 URL（必填） | — |\n| `--fields` | 擷取欄位：`title` / `description` / `text` / `links` / `images`（可多選） | 全部 |\n| `--timeout` / `-t` | 請求逾時時間（秒） | 30 |\n\n### `run` — 執行 YAML 工作流\n\n透過 YAML 檔案定義和執行多步驟瀏覽器自動化流程。\n\n```bash\n# 執行工作流\nwebpilot run workflow.yaml\n\n# 匯出執行結果\nwebpilot run workflow.yaml --export result.json\n\n# 詳細模式\nwebpilot run workflow.yaml -v\n```\n\n**參數說明：**\n\n| 參數 | 說明 | 預設值 |\n|------|------|--------|\n| `workflow` | YAML 工作流檔案路徑（必填） | — |\n| `--export` / `-e` | 將結果匯出為 JSON 檔案 | 不匯出 |\n\n#### YAML 工作流範例\n\n```yaml\nname: daily_news_collector\ndescription: 每日新聞採集工作流\n\n# 全域變數\nvars:\n  base_url: \"https://news.example.com\"\n  output_dir: \"output\"\n\n# 遇到錯誤是否停止\nstop_on_error: true\n\nsteps:\n  # 第一步：導航到目標頁面\n  - name: \"開啟新聞首頁\"\n    type: navigate\n    url: \"${base_url}\"\n    save: page_info\n\n  # 第二步：等待頁面載入\n  - name: \"等待載入\"\n    type: wait\n    seconds: 2\n\n  # 第三步：擷取頁面內容\n  - name: \"擷取新聞內容\"\n    type: extract\n    fields:\n      - title\n      - description\n      - text\n      - links\n    save: news_data\n\n  # 第四步：截取頁面截圖\n  - name: \"儲存截圖\"\n    type: screenshot\n    output: \"${output_dir}/news_screenshot.html\"\n    format: html\n\n  # 第五步：條件判斷\n  - name: \"檢查標題是否存在\"\n    type: condition\n    variable: page_info\n    operator: exists\n    then:\n      - name: \"標題擷取成功\"\n        type: set_variable\n        value: \"頁面標題已成功擷取\"\n        save: status_message\n    else:\n      - name: \"標題擷取失敗\"\n        type: set_variable\n        value: \"未找到頁面標題\"\n        save: status_message\n\n  # 第六步：迴圈處理\n  - name: \"批次處理\"\n    type: loop\n    count: 3\n    index_var: iteration\n    steps:\n      - name: \"處理第 ${iteration} 批\"\n        type: set_variable\n        value: \"正在處理第 ${iteration} 批資料\"\n        save: batch_status\n```\n\n#### 支援的步驟類型\n\n| 步驟類型 | 說明 | 關鍵參數 |\n|----------|------|----------|\n| `navigate` | 導航到指定 URL | `url` |\n| `extract` | 擷取當前頁面內容 | `fields`（可選） |\n| `screenshot` | 截取當前頁面截圖 | `output`, `format` |\n| `wait` | 等待指定秒數 | `seconds` |\n| `condition` | 條件分支 | `variable`, `operator`, `then`, `else` |\n| `loop` | 迴圈執行子步驟 | `count`, `index_var`, `steps` |\n| `set_variable` | 設定工作流變數 | `value` |\n\n#### 支援的條件運算子\n\n| 運算子 | 說明 |\n|--------|------|\n| `exists` | 變數是否存在 |\n| `equals` | 等於指定值 |\n| `not_equals` | 不等於指定值 |\n| `contains` | 包含指定值 |\n| `greater_than` | 大於指定值 |\n| `less_than` | 小於指定值 |\n| `is_true` | 布林值為真 |\n| `is_false` | 布林值為假 |\n\n### `interactive` — 互動式瀏覽器工作階段\n\n啟動一個 REPL（讀取-求值-輸出迴圈）互動式工作階段，即時探索網頁。\n\n```bash\n# 啟動互動模式\nwebpilot interactive\n\n# 帶初始 URL 啟動\nwebpilot interactive --url https://example.com\n```\n\n**互動模式內建命令：**\n\n| 命令 | 說明 |\n|------|------|\n| `browse \u003curl\u003e` | 導航到指定 URL |\n| `extract` | 擷取當前頁面結構化內容 |\n| `screenshot [path]` | 儲存截圖（預設 `screenshot.html`） |\n| `info` | 顯示當前頁面資訊 |\n| `ascii` | 顯示 ASCII 截圖 |\n| `title` | 顯示頁面標題 |\n| `links` | 列出當前頁面所有連結 |\n| `images` | 列出當前頁面所有圖片 |\n| `cookies` | 顯示當前 Cookie |\n| `help` | 顯示說明資訊 |\n| `quit` / `exit` / `q` | 結束互動模式 |\n\n---\n\n## 💡 設計思路與迭代規劃\n\n### 設計理念\n\nWebPilot-CLI 的設計遵循以下核心理念：\n\n1. **極簡主義（Minimalism）**：不引入任何外部依賴，用最少的程式碼實現最多的功能。Python 標準函式庫已經足夠強大，`urllib` 處理網路請求，`html.parser` 解析 HTML，`http.cookiejar` 管理工作階段——我們不需要更多。\n\n2. **宣告式優先（Declarative First）**：能用 YAML 配置表達的，就不需要寫 Python 程式碼。工作流引擎讓非程式設計師也能定義複雜的瀏覽器操作流程。\n\n3. **終端原生（Terminal Native）**：作為命令列工具，終端就是我們的主場。ASCII 截圖、彩色輸出、進度條——讓終端體驗不遜色於 GUI。\n\n4. **AI Agent 友善（Agent Friendly）**：結構化的 JSON 輸出、可程式化的工作流引擎、清晰的狀態管理——每一個設計決策都考慮了 AI Agent 的整合需求。\n\n### 技術選型原因\n\n| 技術選擇 | 原因 |\n|----------|------|\n| `urllib` 而非 `requests` | 零依賴，標準函式庫自帶，滿足基本 HTTP 需求 |\n| `html.parser` 而非 `BeautifulSoup` | 零依賴，標準函式庫自帶，效能可控 |\n| `http.cookiejar` 而非 `requests.Session` | 零依賴，原生支援 Cookie 持久化 |\n| YAML 工作流 而非 Python 腳本 | 宣告式更易讀、更易維護、更易被 AI 生成 |\n| ASCII 截圖 而非 PNG 截圖 | 無需額外依賴（如 Pillow），終端原生展示 |\n\n### 後續規劃\n\n- [ ] 🔌 **外掛系統**：支援自訂擷取器和輸出格式的外掛機制\n- [ ] 🗄️ **結果持久化**：支援將擷取結果儲存到 SQLite / CSV\n- [ ] 🔄 **增量抓取**：基於 ETag / Last-Modified 的增量內容更新\n- [ ] 📡 **API 模式**：內建 HTTP 伺服器，提供 RESTful API 介面\n- [ ] 🧪 **斷言引擎**：工作流中支援頁面內容斷言，用於監控和測試\n- [ ] 📊 **報告生成**：自動生成工作流執行報告（HTML / PDF）\n- [ ] 🌐 **代理支援**：內建 HTTP/SOCKS 代理設定\n- [ ] 📦 **批次模式**：支援從檔案讀取 URL 列表進行批次處理\n- [ ] 🤖 **MCP 整合**：作為 MCP（Model Context Protocol）伺服器執行\n\n---\n\n## 📦 安裝與部署指南\n\n### 系統需求\n\n- **作業系統**：Windows / macOS / Linux\n- **Python 版本**：3.8、3.9、3.10、3.11、3.12\n- **磁碟空間**：約 1 MB（原始碼）\n- **網路**：需要能存取目標網站\n\n### 安裝方式\n\n```bash\n# 方式一：從 GitHub 直接安裝（推薦）\npip install git+https://github.com/gitstq/webpilot-cli.git\n\n# 方式二：複製倉庫後以開發模式安裝\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npip install -e .\n\n# 方式三：複製倉庫後直接使用（無需安裝）\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npython -m webpilot.cli browse https://example.com\n```\n\n### 驗證安裝\n\n```bash\n# 查看版本號\nwebpilot --version\n\n# 查看說明資訊\nwebpilot --help\n\n# 快速測試\nwebpilot browse https://example.com\n```\n\n### 解除安裝\n\n```bash\npip uninstall webpilot-cli\n```\n\n---\n\n## 🤝 貢獻指南\n\n我們歡迎並感謝所有形式的貢獻！無論是提交 Bug 回報、改進文件，還是提交程式碼 Pull Request。\n\n### 如何貢獻\n\n1. **Fork** 本倉庫\n2. 建立你的特性分支：`git checkout -b feature/amazing-feature`\n3. 提交你的變更：`git commit -m 'Add some amazing feature'`\n4. 推送到分支：`git push origin feature/amazing-feature`\n5. 提交 **Pull Request**\n\n### 開發環境建置\n\n```bash\n# 複製倉庫\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\n\n# 以開發模式安裝\npip install -e .\n\n# 執行測試\npython -m pytest tests/\n\n# 執行特定測試\npython -m pytest tests/test_extractor.py -v\n```\n\n### 程式碼規範\n\n- 遵循 PEP 8 編碼規範\n- 為所有公開函式撰寫文件字串\n- 確保所有測試通過後再提交 PR\n- 提交資訊使用清晰、描述性的語言\n\n### 提交 Issue\n\n在提交 Issue 之前，請：\n\n1. 搜尋已有的 Issues，避免重複提交\n2. 提供重現步驟和期望行為\n3. 附上執行環境資訊（Python 版本、作業系統等）\n\n---\n\n## 📄 開源授權\n\n本專案基於 [MIT License](LICENSE) 開源。\n\n```\nMIT License\n\nCopyright (c) 2024 WebPilot Team\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\n```\n\n---\n\n\u003cp align=\"center\"\u003e\n  用 ❤️ 和 Python 標準函式庫建構 · \u003ca href=\"https://github.com/gitstq/webpilot-cli\"\u003eGitHub\u003c/a\u003e\n\u003c/p\u003e\n\n\u003c/details\u003e\n\n---\n\n\u003ca id=\"english\"\u003e\u003c/a\u003e\n\n\u003cdetails open\u003e\n\u003csummary\u003e\u003ch1\u003e🎉 WebPilot-CLI — Lightweight AI Browser Automation CLI Tool\u003c/h1\u003e\u003c/summary\u003e\n\n\u003e 🚀 Zero External Dependencies · YAML Workflow Engine · Smart Content Extraction · Terminal-Native Experience\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/version-v1.0.0-blue\" alt=\"Version\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-3.8%2B-green\" alt=\"Python 3.8+\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-orange\" alt=\"License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/dependencies-zero-brightgreen\" alt=\"Zero Dependencies\"\u003e\n\u003c/p\u003e\n\n---\n\n## 🎉 Introduction\n\n**WebPilot-CLI** is a lightweight browser automation CLI tool designed for AI Agents and developers. Built entirely on the Python standard library, it has **zero external dependencies** and works right out of the box.\n\n### 🎯 Project Positioning\n\nAs AI Agents become increasingly prevalent, browser automation is a core capability for Agent-Web interaction. However, existing tools are either heavyweight (Selenium, Playwright), complex to set up (requiring browser drivers), or limited in functionality (scraping-only, no orchestration). **WebPilot-CLI** aims to be:\n\n\u003e **A terminal-native browser automation Swiss Army knife that is lightweight, intelligent, and flexible.**\n\n### 💡 Problems We Solve\n\n| Pain Point | Our Solution |\n|------------|-------------|\n| Existing tools are heavy and complex to install | Pure Python standard library — just `pip install` and go |\n| Scraped content is full of noise (navbars, ads, footers) | Smart noise filtering engine that auto-detects and removes irrelevant content |\n| Automation requires writing lots of code | YAML workflow engine — declarative config over imperative programming |\n| No terminal-native visualization | ASCII art screenshots + HTML Canvas screenshots — \"see\" web pages in your terminal |\n| Multi-step operations are hard to orchestrate | Variable passing, conditional branching, loop control — full workflow orchestration |\n\n### 🌟 Key Differentiators\n\n- 🪶 **Ultra Lightweight**: Zero external dependencies. The entire toolkit uses only Python standard library modules (`urllib`, `html.parser`, `http.cookiejar`)\n- 🧠 **Smart Extraction**: Built-in noise detection engine that automatically filters out navigation bars, sidebars, ads, footers, and other irrelevant content\n- 📝 **YAML-Driven**: Define complex browser automation flows with declarative YAML, supporting variable passing and conditional branching\n- 🖥️ **Terminal-Friendly**: ASCII art screenshots let you \"see\" web page layouts right in your terminal\n- 🔄 **Session Management**: Built-in Cookie/Session management for handling login-required scenarios with ease\n- 📤 **Multi-Format Output**: JSON, Markdown, and plain text output formats to fit different downstream consumption needs\n\n---\n\n## ✨ Core Features\n\n- 🌐 **Web Browsing** — Fetch and display web page content with a single command; auto-extracts titles, body text, links, and images\n- 📸 **Smart Screenshots** — Two screenshot modes: ASCII art (terminal-friendly) and HTML Canvas (browser-viewable)\n- 🔍 **Content Extraction** — Intelligent noise filtering for structured content extraction (title/description/text/links/images)\n- ⚙️ **YAML Workflows** — Declaratively define multi-step automation flows with navigate/extract/screenshot/wait/condition/loop steps\n- 🔄 **Variable Passing** — Pass data between steps using `${var}` syntax, with support for conditional branching and loops\n- 🍪 **Session Management** — Automatic Cookie/Session management with cross-request state persistence\n- 🖥️ **Interactive Mode** — Built-in REPL interactive browser session for real-time web exploration\n- 📤 **Multi-Format Output** — JSON / Markdown / Plain text, ready for pipeline and script integration\n- 🎨 **Colored Terminal** — Rich ANSI colored output for an enhanced terminal reading experience\n- 📊 **Progress Display** — Real-time progress bars during workflow execution\n- 🛡️ **Robustness** — Auto-retry, timeout control, encoding detection, and error recovery\n- 🪶 **Zero Dependencies** — Entirely based on the Python standard library; no third-party packages needed\n\n---\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.8 or later\n- Network connection (to access target websites)\n- No browser or browser driver installation required\n\n### Installation\n\n```bash\n# Option 1: Install directly from GitHub (recommended)\npip install git+https://github.com/gitstq/webpilot-cli.git\n\n# Option 2: Install from source\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npip install -e .\n```\n\n### Try It Out\n\n```bash\n# Browse a web page and view structured content\nwebpilot browse https://example.com\n\n# Output in JSON format\nwebpilot browse https://example.com --output json\n\n# Take an ASCII screenshot\nwebpilot screenshot https://example.com -o screenshot.txt -f ascii\n\n# Take an HTML screenshot (openable in a browser)\nwebpilot screenshot https://example.com -o screenshot.html -f html\n\n# Extract structured content from a web page\nwebpilot extract https://example.com --fields title description text\n\n# Run a YAML workflow\nwebpilot run workflow.yaml\n\n# Start an interactive browser session\nwebpilot interactive\n```\n\n---\n\n## 📖 Detailed Usage Guide\n\n### Global Options\n\nAll subcommands support the following global options:\n\n| Option | Short | Description | Default |\n|--------|-------|-------------|---------|\n| `--output` | `-o` | Output format: `json` / `markdown` / `text` | `markdown` |\n| `--no-color` | | Disable colored terminal output | Off |\n| `--verbose` | `-v` | Enable verbose output mode | Off |\n| `--version` | `-V` | Show version number | — |\n\n### `browse` — Browse a Web Page\n\nFetch and display structured content from a web page, with automatic noise filtering.\n\n```bash\n# Basic usage\nwebpilot browse \u003curl\u003e\n\n# Specify timeout and output format\nwebpilot browse https://example.com --timeout 60 --output json\n\n# Enable verbose mode\nwebpilot browse https://example.com -v\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n|-----------|-------------|---------|\n| `url` | Target web page URL (required) | — |\n| `--timeout` / `-t` | Request timeout in seconds | 30 |\n\n### `screenshot` — Capture a Screenshot\n\nTwo screenshot modes are available: terminal-friendly ASCII art and browser-viewable HTML Canvas.\n\n```bash\n# HTML screenshot (default, viewable in a browser)\nwebpilot screenshot https://example.com -o page.html\n\n# ASCII screenshot (terminal-friendly)\nwebpilot screenshot https://example.com -f ascii -o page.txt\n\n# Custom terminal width\nwebpilot screenshot https://example.com -f ascii --width 120\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n|-----------|-------------|---------|\n| `url` | Target web page URL (required) | — |\n| `--output-file` / `-o` | Output file path | `screenshot.html` |\n| `--format` / `-f` | Screenshot format: `ascii` / `html` | `html` |\n| `--width` | Terminal width for ASCII screenshots (characters) | 80 |\n\n### `extract` — Extract Structured Content\n\nExtract structured data from a web page with field-level filtering.\n\n```bash\n# Extract all content\nwebpilot extract https://example.com\n\n# Extract only title and description\nwebpilot extract https://example.com --fields title description\n\n# Extract only links and images\nwebpilot extract https://example.com --fields links images\n\n# JSON output for programmatic processing\nwebpilot extract https://example.com --output json\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n|-----------|-------------|---------|\n| `url` | Target web page URL (required) | — |\n| `--fields` | Fields to extract: `title` / `description` / `text` / `links` / `images` (multiple allowed) | All |\n| `--timeout` / `-t` | Request timeout in seconds | 30 |\n\n### `run` — Execute a YAML Workflow\n\nDefine and execute multi-step browser automation flows via YAML files.\n\n```bash\n# Run a workflow\nwebpilot run workflow.yaml\n\n# Export execution results\nwebpilot run workflow.yaml --export result.json\n\n# Verbose mode\nwebpilot run workflow.yaml -v\n```\n\n**Parameters:**\n\n| Parameter | Description | Default |\n|-----------|-------------|---------|\n| `workflow` | Path to YAML workflow file (required) | — |\n| `--export` / `-e` | Export results to a JSON file | No export |\n\n#### YAML Workflow Example\n\n```yaml\nname: daily_news_collector\ndescription: Daily news collection workflow\n\n# Global variables\nvars:\n  base_url: \"https://news.example.com\"\n  output_dir: \"output\"\n\n# Stop on error\nstop_on_error: true\n\nsteps:\n  # Step 1: Navigate to the target page\n  - name: \"Open news homepage\"\n    type: navigate\n    url: \"${base_url}\"\n    save: page_info\n\n  # Step 2: Wait for page to load\n  - name: \"Wait for load\"\n    type: wait\n    seconds: 2\n\n  # Step 3: Extract page content\n  - name: \"Extract news content\"\n    type: extract\n    fields:\n      - title\n      - description\n      - text\n      - links\n    save: news_data\n\n  # Step 4: Take a screenshot\n  - name: \"Save screenshot\"\n    type: screenshot\n    output: \"${output_dir}/news_screenshot.html\"\n    format: html\n\n  # Step 5: Conditional check\n  - name: \"Check if title exists\"\n    type: condition\n    variable: page_info\n    operator: exists\n    then:\n      - name: \"Title found\"\n        type: set_variable\n        value: \"Page title was successfully extracted\"\n        save: status_message\n    else:\n      - name: \"No title found\"\n        type: set_variable\n        value: \"No page title found\"\n        save: status_message\n\n  # Step 6: Loop processing\n  - name: \"Batch processing\"\n    type: loop\n    count: 3\n    index_var: iteration\n    steps:\n      - name: \"Process batch ${iteration}\"\n        type: set_variable\n        value: \"Processing batch ${iteration}\"\n        save: batch_status\n```\n\n#### Supported Step Types\n\n| Step Type | Description | Key Parameters |\n|-----------|-------------|----------------|\n| `navigate` | Navigate to a URL | `url` |\n| `extract` | Extract content from the current page | `fields` (optional) |\n| `screenshot` | Take a screenshot of the current page | `output`, `format` |\n| `wait` | Wait for a specified number of seconds | `seconds` |\n| `condition` | Conditional branching | `variable`, `operator`, `then`, `else` |\n| `loop` | Execute sub-steps in a loop | `count`, `index_var`, `steps` |\n| `set_variable` | Set a workflow variable | `value` |\n\n#### Supported Condition Operators\n\n| Operator | Description |\n|----------|-------------|\n| `exists` | Variable exists |\n| `equals` | Equals a specified value |\n| `not_equals` | Does not equal a specified value |\n| `contains` | Contains a specified value |\n| `greater_than` | Greater than a specified value |\n| `less_than` | Less than a specified value |\n| `is_true` | Boolean value is true |\n| `is_false` | Boolean value is false |\n\n### `interactive` — Interactive Browser Session\n\nLaunch a REPL (Read-Eval-Print Loop) interactive session for real-time web exploration.\n\n```bash\n# Start interactive mode\nwebpilot interactive\n\n# Start with an initial URL\nwebpilot interactive --url https://example.com\n```\n\n**Built-in Interactive Commands:**\n\n| Command | Description |\n|---------|-------------|\n| `browse \u003curl\u003e` | Navigate to a URL |\n| `extract` | Extract structured content from the current page |\n| `screenshot [path]` | Save a screenshot (default: `screenshot.html`) |\n| `info` | Show current page information |\n| `ascii` | Display an ASCII screenshot |\n| `title` | Show the page title |\n| `links` | List all links on the current page |\n| `images` | List all images on the current page |\n| `cookies` | Show current cookies |\n| `help` | Show help information |\n| `quit` / `exit` / `q` | Exit interactive mode |\n\n---\n\n## 💡 Design Philosophy \u0026 Roadmap\n\n### Design Principles\n\nWebPilot-CLI is guided by the following core principles:\n\n1. **Minimalism**: No external dependencies. Achieve maximum functionality with minimum code. The Python standard library is already powerful enough — `urllib` for HTTP, `html.parser` for HTML parsing, `http.cookiejar` for session management. We don't need anything else.\n\n2. **Declarative First**: If it can be expressed in YAML configuration, it shouldn't require Python code. The workflow engine empowers non-programmers to define complex browser automation flows.\n\n3. **Terminal Native**: As a CLI tool, the terminal is our home turf. ASCII screenshots, colored output, progress bars — the terminal experience should rival any GUI.\n\n4. **Agent Friendly**: Structured JSON output, programmable workflow engine, clear state management — every design decision considers AI Agent integration needs.\n\n### Technology Choices\n\n| Choice | Rationale |\n|--------|-----------|\n| `urllib` over `requests` | Zero dependencies, included in the standard library, sufficient for basic HTTP needs |\n| `html.parser` over `BeautifulSoup` | Zero dependencies, included in the standard library, controllable performance |\n| `http.cookiejar` over `requests.Session` | Zero dependencies, native Cookie persistence support |\n| YAML workflows over Python scripts | Declarative approach is more readable, maintainable, and AI-generable |\n| ASCII screenshots over PNG screenshots | No extra dependencies (like Pillow), native terminal display |\n\n### Roadmap\n\n- [ ] 🔌 **Plugin System**: Support for custom extractors and output format plugins\n- [ ] 🗄️ **Result Persistence**: Save extraction results to SQLite / CSV\n- [ ] 🔄 **Incremental Scraping**: Content updates based on ETag / Last-Modified\n- [ ] 📡 **API Mode**: Built-in HTTP server with RESTful API endpoints\n- [ ] 🧪 **Assertion Engine**: Page content assertions in workflows for monitoring and testing\n- [ ] 📊 **Report Generation**: Automatic workflow execution reports (HTML / PDF)\n- [ ] 🌐 **Proxy Support**: Built-in HTTP/SOCKS proxy configuration\n- [ ] 📦 **Batch Mode**: Process URL lists from a file in bulk\n- [ ] 🤖 **MCP Integration**: Run as an MCP (Model Context Protocol) server\n\n---\n\n## 📦 Installation \u0026 Deployment Guide\n\n### System Requirements\n\n- **Operating System**: Windows / macOS / Linux\n- **Python Version**: 3.8, 3.9, 3.10, 3.11, 3.12\n- **Disk Space**: ~1 MB (source code)\n- **Network**: Access to target websites required\n\n### Installation Methods\n\n```bash\n# Option 1: Install directly from GitHub (recommended)\npip install git+https://github.com/gitstq/webpilot-cli.git\n\n# Option 2: Clone and install in development mode\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npip install -e .\n\n# Option 3: Clone and use directly (no installation needed)\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\npython -m webpilot.cli browse https://example.com\n```\n\n### Verify Installation\n\n```bash\n# Check version\nwebpilot --version\n\n# View help\nwebpilot --help\n\n# Quick test\nwebpilot browse https://example.com\n```\n\n### Uninstall\n\n```bash\npip uninstall webpilot-cli\n```\n\n---\n\n## 🤝 Contributing Guide\n\nWe welcome and appreciate contributions of all kinds — whether it's filing bug reports, improving documentation, or submitting code Pull Requests.\n\n### How to Contribute\n\n1. **Fork** this repository\n2. Create your feature branch: `git checkout -b feature/amazing-feature`\n3. Commit your changes: `git commit -m 'Add some amazing feature'`\n4. Push to the branch: `git push origin feature/amazing-feature`\n5. Submit a **Pull Request**\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/gitstq/webpilot-cli.git\ncd webpilot-cli\n\n# Install in development mode\npip install -e .\n\n# Run tests\npython -m pytest tests/\n\n# Run specific tests\npython -m pytest tests/test_extractor.py -v\n```\n\n### Code Standards\n\n- Follow PEP 8 coding conventions\n- Write docstrings for all public functions\n- Ensure all tests pass before submitting a PR\n- Use clear, descriptive commit messages\n\n### Filing Issues\n\nBefore submitting an issue, please:\n\n1. Search existing issues to avoid duplicates\n2. Provide reproduction steps and expected behavior\n3. Include your environment details (Python version, OS, etc.)\n\n---\n\n## 📄 License\n\nThis project is released under the [MIT License](LICENSE).\n\n```\nMIT License\n\nCopyright (c) 2024 WebPilot Team\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\n```\n\n---\n\n\u003cp align=\"center\"\u003e\n  Built with ❤️ and the Python Standard Library · \u003ca href=\"https://github.com/gitstq/webpilot-cli\"\u003eGitHub\u003c/a\u003e\n\u003c/p\u003e\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fwebpilot-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgitstq%2Fwebpilot-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fwebpilot-cli/lists"}