https://github.com/Zjianru/web-search-pro

Multi-engine web search skill for OpenClaw with full parameter control. Supports Tavily, Exa, Serper, SerpAPI with domain filtering, date ranges, deep search, news mode, and content extraction. 多引擎精细化搜索 Skill。
https://github.com/Zjianru/web-search-pro

ai-agent exa multi-engine openclaw openclaw-skill serpapi serper tavily web-search

Last synced: 26 days ago
JSON representation

Host: GitHub
URL: https://github.com/Zjianru/web-search-pro
Owner: Zjianru
License: mit
Created: 2026-02-09T08:45:06.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-03-12T03:08:42.000Z (about 2 months ago)
Last Synced: 2026-03-12T08:40:49.745Z (about 2 months ago)
Topics: ai-agent, exa, multi-engine, openclaw, openclaw-skill, serpapi, serper, tavily, web-search
Language: JavaScript
Size: 18.6 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-openclaw - Web Search Pro - Agent-first web search and retrieval stack with a real no-key baseline, explainable routing, visible federated-search gains, and built-in extract, crawl, map, and research flows. [ClawHub](https://clawhub.ai/Zjianru/web-search-pro) (Skills & Plugins / Notable Skills)

README

[English](README.md) | [中文](README_zh.md)

# Web Search Pro

`web-search-pro` is an OpenClaw search skill and local Node retrieval runtime for agents. It can
search the live web, fetch and extract pages, crawl sites, map docs, and assemble research-ready
evidence packs with explainable routing.

- ClawHub: [web-search-pro](https://clawhub.ai/Zjianru/web-search-pro)
- GitHub: [Zjianru/web-search-pro](https://github.com/Zjianru/web-search-pro)
- OpenClaw archive: [openclaw/skills/tree/main/skills/zjianru/web-search-pro](https://github.com/openclaw/skills/tree/main/skills/zjianru/web-search-pro)

## What It Is

This project sits between a lightweight web-search skill and a full hosted scraping product.

The simplest mental model is:

- a search skill for agents
- a local retrieval runtime
- a bridge from `search` to `extract`, `crawl`, `map`, and `research`

Use it when search is only the beginning of the task and the agent may need to keep collecting,
structuring, and handing off evidence.

## Choose This If

Choose `web-search-pro` if you need:

- live web search and current-events lookup
- news search with explainable routing and visible fallback behavior
- official docs, API docs, and code lookup
- company, product, and competitor research
- site crawl, site map, and docs discovery
- a no-key baseline first, then premium providers only when needed
- structured outputs that an upstream model can keep using

In short: this is for developers who want one skill to cover search, retrieval, and evidence prep.

## Do Not Choose This If

Do not choose `web-search-pro` if you primarily want:

- the lightest possible single-purpose web search wrapper
- a hosted remote scraping SaaS
- a browser-first crawler by default
- a narrative report writer that hides retrieval details
- an unlimited no-key search guarantee

If all you need is lightweight one-shot search, a smaller skill will usually be a better fit.

## Why Developers Pick It

Compared with a plain search skill, the differentiators are:

- **Explainable routing**
`routingSummary` exposes why a provider was selected, how confident the planner is, and what the
top signals were.
- **Visible federated gains**
`federated.value` shows what multi-provider fanout actually recovered, corroborated, or deduped.
- **Search-to-research chain**
One surface can move from `search` into `extract`, `crawl`, `map`, and `research`.
- **No-key baseline**
You can evaluate the skill without buying into a provider stack first.
- **Agent-readable diagnostics**
`doctor.mjs`, `bootstrap.mjs`, `capabilities.mjs`, and `review.mjs` expose runtime state and
boundaries instead of leaving the model to guess.

## Quick Start

There are two honest ways to approach this project:

- **Install and use it as a skill**
Start from the ClawHub page or the OpenClaw archive entry if you want to use it inside OpenClaw.
- **Run it from source**
Use the commands below if you want to evaluate the local runtime directly, inspect outputs, or
contribute to the repo.

The shortest successful path is:

- start with the no-key baseline
- add one premium provider only when you need stronger recall or fresher results
- then try docs, news, and research flows

### Option A: No-key baseline

No API key is required for the first successful run.

The baseline is:

- `ddg` for best-effort web search
- `fetch` for extract, crawl, and map fallback

```bash
node scripts/doctor.mjs --json
node scripts/bootstrap.mjs --json
node scripts/search.mjs "OpenAI Responses API docs" --json
```

What these commands tell you:

- `doctor.mjs` tells you whether the runtime is usable right now
- `bootstrap.mjs` gives an agent-readable runtime snapshot
- `search.mjs` proves the baseline path works before you add premium providers

### Option B: Add one premium provider

If you only add one premium provider, start with `TAVILY_API_KEY`.

That is the shortest upgrade path because one credential improves:

- general web search
- news search
- extract quality

```bash
export TAVILY_API_KEY=tvly-xxxxx
node scripts/doctor.mjs --json
node scripts/search.mjs "latest OpenAI news" --type news --json
```

### First successful searches

```bash
node scripts/search.mjs "OpenClaw web search" --json
node scripts/search.mjs "OpenAI Responses API docs" --preset docs --plan --json
node scripts/extract.mjs "https://platform.openai.com/docs" --json
```

### Then try docs, news, and research

```bash
node scripts/search.mjs "OpenAI Responses API docs" --preset docs --json
node scripts/search.mjs "latest OpenAI news" --type news --json
node scripts/research.mjs "OpenClaw search skill landscape" --plan --json
```

## Core Commands

| Command | Purpose |
| --- | --- |
| `search.mjs` | Multi-provider search with explainable routing |
| `extract.mjs` | Single-page readable extraction |
| `render.mjs` | Forced browser-backed extraction through the local render lane |
| `crawl.mjs` | Safe BFS crawl |
| `map.mjs` | Site-structure discovery |
| `research.mjs` | Structured `plan + evidence pack` generation |
| `doctor.mjs` | Runtime diagnostics |
| `bootstrap.mjs` | Agent-readable runtime bootstrap contract |
| `capabilities.mjs` | Provider capability snapshot |
| `review.mjs` | Review and moderation summary |
| `cache.mjs` | Cache inspection |
| `health.mjs` | Provider health inspection |
| `eval.mjs` | Retrieval and benchmark harness |

## Why Federated Search Matters

Federation is not just "more providers". It makes multi-provider value visible with compact,
machine-readable gain metrics.

Key fields:

- `federated.providersUsed`
Providers that actually returned results.
- `federated.value.additionalProvidersUsed`
How many non-primary providers really contributed.
- `federated.value.resultsRecoveredByFanout`
Final results that would disappear in primary-only mode.
- `federated.value.resultsCorroboratedByFanout`
Final results supported by both the primary and at least one fanout provider.
- `federated.value.duplicateSavings`
Exact or near-duplicate results removed by merge.
- `routingSummary.federation.value`
The compact federation gain summary exposed alongside route explanation.

Example:

```json
{
"federated": {
"providersUsed": ["serper", "tavily"],
"value": {
"additionalProvidersUsed": 1,
"resultsWithFanoutSupport": 2,
"resultsRecoveredByFanout": 1,
"resultsCorroboratedByFanout": 1,
"duplicateSavings": 1,
"answerProvider": "tavily",
"primarySucceeded": true
}
}
}
```

Interpretation:

- `resultsRecoveredByFanout=1` means federation recovered one final result that primary-only
search would have missed
- `resultsCorroboratedByFanout=1` means another final result got multi-provider support
- `duplicateSavings=1` means the merge removed one duplicate instead of wasting result slots

## Routing And Output Contract

The router combines five layers of truth:

1. provider capability facts
2. structured query signals
3. runtime policy from `config.json`
4. local health state
5. optional federation

Important fields:

- `selectedProvider`
The primary route. It is not the same thing as "the only provider used".
- `routingSummary`
Compact route explanation with `selectionMode`, `confidence`, `topSignals`, alternatives, and
federation context.
- `routing.diagnostics`
Full route diagnostics exposed by `--explain-routing` or `--plan`.
- `federated.providersUsed`
The provider set that actually returned results when fanout is active.
- `federated.value`
Compact federation gain summary: added providers, recovered results, corroborated results, and
duplicate savings.
- `cached` / `cache`
Cache hit plus age / TTL telemetry for agents.
- `renderLane`
Runtime availability and policy summary for the browser-backed render lane.
- `meta.searchType`
User-facing result surface selector. Current shipped values are `web | news`.
- `meta.intentPreset`
User-facing intent preset. Current shipped values are
`general | code | company | docs | research`.

These are product-layer inputs, not provider ids.

## Providers And Upgrade Paths

No API key is required for the baseline. Optional provider credentials or endpoints unlock stronger
coverage.

```bash
TAVILY_API_KEY=tvly-xxxxx
EXA_API_KEY=exa-xxxxx
QUERIT_API_KEY=xxxxx
SERPER_API_KEY=xxxxx
BRAVE_API_KEY=xxxxx
SERPAPI_API_KEY=xxxxx
YOU_API_KEY=xxxxx
SEARXNG_INSTANCE_URL=https://searx.example.com

# Perplexity / Sonar: choose one transport path
PERPLEXITY_API_KEY=xxxxx
OPENROUTER_API_KEY=xxxxx
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
KILOCODE_API_KEY=xxxxx

# Or use a custom OpenAI-compatible gateway
PERPLEXITY_GATEWAY_API_KEY=xxxxx
PERPLEXITY_BASE_URL=https://gateway.example.com/v1
PERPLEXITY_MODEL=perplexity/sonar-pro
```

Provider roles:

- Tavily: strongest premium default for general search, news, and extract
- Exa: semantic retrieval and extract fallback
- Querit: multilingual AI search with native geo and language filters
- Serper: Google-like search with strong news and locale coverage
- Brave: structured general web search
- SerpAPI: multi-engine routing including Baidu and Yandex
- You.com: LLM-ready web search with freshness and mixed web/news coverage
- SearXNG: self-hosted privacy-first metasearch fallback
- Perplexity / Sonar: answer-first grounded search via native or gateway transport
- DDG: best-effort no-key baseline search
- Fetch: no-key extract / crawl / map baseline
- Render: optional local browser lane

## Distribution Surfaces

The project has two honest distribution surfaces:

- **GitHub / local source tree**
The full surface, including `render.mjs`, `eval.mjs`, tests, and the deeper research toolchain.
- **ClawHub publish package**
A generated core profile built by `scripts/build-clawhub-package.mjs`.

Why this split exists:

- local developers need the full runtime and benchmark surface
- ClawHub moderation benefits from a narrower, more honest publish boundary
- the registry package is still a code-backed Node runtime, not an instruction-only bundle

Detailed notes:

- [docs/clawhub-package.md](/Users/codez/develop/web-search-pro/docs/clawhub-package.md)
- [docs/clawhub-compliance.md](/Users/codez/develop/web-search-pro/docs/clawhub-compliance.md)

## Boundaries

`web-search-pro` is strong at:

- capability-aware retrieval
- explainable routing
- safe extract / crawl / map
- structured research packs
- local diagnostics and review surfaces

It is intentionally not:

- a hosted remote scraping service
- a final report writer
- a browser-first crawler by default
- an unlimited no-key search guarantee

## Positioning vs Other Skills

- **vs lightweight `web-search` skills**
`web-search-pro` is heavier, but it gives you explainable routing, federation visibility, and a
path into `extract`, `crawl`, `map`, and `research`.
- **vs search-router-first skills such as `web-search-plus`**
`web-search-pro` is broader as a retrieval stack. The differentiator is not only where to
search, but what happens after search.
- **vs hosted scrape-first products**
`web-search-pro` stays local-first, more inspectable, and more explicit about safety boundaries
and runtime behavior.

## Safety

Safe fetch:

- allows only `http` / `https`
- blocks credential-bearing URLs
- blocks localhost, private, and metadata targets
- revalidates redirects
- keeps JavaScript disabled

Browser render:

- is off by default
- uses a local headless browser only when enabled
- revalidates navigations
- can enforce same-origin-only navigation

Challenge and anti-bot interstitial pages are reported as failures, not silent successes.

## Discovery Keywords

`web search`, `news search`, `latest updates`, `current events`, `docs search`, `API docs`,
`code search`, `company research`, `competitor analysis`, `site crawl`, `site map`,
`multilingual search`, `Baidu search`, `Google-like search`, `answer-first search`,
`cited answers`, `explainable routing`, `no-key baseline`

## Versioning

- Product / docs version: `2.1`
- JSON schema version: `1.0`

`2.x` is the retrieval-stack product line. Machine-readable payloads remain additive and
compatible, so schema stays `1.0`.

## Docs

- [README_zh.md](/Users/codez/develop/web-search-pro/README_zh.md)
- [CHANGELOG.md](/Users/codez/develop/web-search-pro/CHANGELOG.md)
- [docs/search-routing-model.md](/Users/codez/develop/web-search-pro/docs/search-routing-model.md)
- [docs/search-ux-model.md](/Users/codez/develop/web-search-pro/docs/search-ux-model.md)
- [docs/research-layer.md](/Users/codez/develop/web-search-pro/docs/research-layer.md)
- [docs/head-to-head-eval.md](/Users/codez/develop/web-search-pro/docs/head-to-head-eval.md)
- [docs/clawhub-package.md](/Users/codez/develop/web-search-pro/docs/clawhub-package.md)
- [docs/clawhub-compliance.md](/Users/codez/develop/web-search-pro/docs/clawhub-compliance.md)
- [docs/marketing-launch-kit.md](/Users/codez/develop/web-search-pro/docs/marketing-launch-kit.md)
- [docs/agent-contract-p0.md](/Users/codez/develop/web-search-pro/docs/agent-contract-p0.md)

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Zjianru/web-search-pro

Awesome Lists containing this project

README