An open API service indexing awesome lists of open source software.

https://github.com/scrapfly/scrapfly-cli

Agentic CLI for the Scrapfly platform: scrape, extract, crawl, and drive a cloud browser over CDP. Built-in LLM agent with Anthropic, OpenAI, Gemini, and Ollama. JSON-first, pipe-friendly, one static binary.
https://github.com/scrapfly/scrapfly-cli

agentic ai-agent anthropic anti-bot anti-detect antidetect-browser bot-detection browser-automation cdp cli data-extraction golang headless-browser llm-agent openai scrapfly screenshot stealth-browser web-crawler web-scraping

Last synced: about 2 months ago
JSON representation

Agentic CLI for the Scrapfly platform: scrape, extract, crawl, and drive a cloud browser over CDP. Built-in LLM agent with Anthropic, OpenAI, Gemini, and Ollama. JSON-first, pipe-friendly, one static binary.

Awesome Lists containing this project

README

          


Scrapfly

scrapfly-cli


The agentic CLI for web data and browser automation.

Scrape, extract, and drive a cloud browser over CDP from any LLM
tool-use loop. JSON-first, pipe-friendly, one static binary.


Release
Docs
License

---

`scrapfly` is the official CLI for the [Scrapfly](https://scrapfly.io)
platform. It is built **agent-first**: every verb returns a stable JSON
envelope, every product is a tool an LLM can call, and an autonomous
agent is baked in.

- **Agentic by design**: `scrapfly agent ""` runs a Playwright-MCP
style loop against a cloud browser. Providers ship out of the box for
Anthropic, OpenAI, Gemini, and any OpenAI-compatible endpoint (Ollama,
vLLM). Point any other tool-use loop at the full CLI instead via
[`agent-onboarding/SKILL.md`](agent-onboarding/SKILL.md).
- **Full product surface**: Web Scraping, Screenshot, Extraction, Crawler,
and the CDP-driven Browser, each with complete SDK-parity flags.
- **Pipe-friendly**: stable envelope (`{success, product, data|error}`),
`--content-only`/`--data-only` for single-line piping, ndjson for
batch runs, `--pretty` for humans.
- **Composable**: multi-call browser sessions via a Unix-socket daemon
(cookies + AXTree refs persist across invocations), WARC/HAR in and
out, `scrapfly browser` outputs a CDP URL you can hand to Playwright,
Puppeteer, or browser-use.

## Install

One-liner (macOS / Linux):

```bash
curl -fsSL https://scrapfly.io/scrapfly-cli/install | sh
```

Pinned version or custom prefix:

```bash
curl -fsSL https://scrapfly.io/scrapfly-cli/install | sh -s -- --version v0.1.0 --prefix ~/.local/bin
```

Or grab a release tarball directly from
[Releases](https://github.com/scrapfly/scrapfly-cli/releases) - artifacts
follow `scrapfly-{os}-{arch}.tar.gz` (`scrapfly-windows-amd64.zip` on
Windows). Supported: `darwin-universal`, `darwin-amd64`, `darwin-arm64`,
`linux-amd64`, `linux-arm64`, `windows-amd64`.

From source (Go 1.24+):

```bash
go install github.com/scrapfly/scrapfly-cli/cmd/scrapfly@latest
```

Or from a package manager:

```bash
# Homebrew (macOS, once the tap is published)
brew install scrapfly/tap/scrapfly

# npm / pnpm / yarn
npm install -D scrapfly-cli
npx scrapfly version
```

### macOS Gatekeeper

The released macOS binary is not yet signed with an Apple Developer ID, so
first-run may hit *"Apple could not verify 'scrapfly' is free of malware"*.

- `install.sh` strips the quarantine bit automatically.
- If you downloaded the tarball manually, run once:
```bash
xattr -d com.apple.quarantine /path/to/scrapfly
```
(or right-click → Open in Finder the first time).

Signing + notarization is on the roadmap.

### Self-update

Once installed, keep the binary current without re-running the installer:

```bash
scrapfly update # upgrade in place to the latest GitHub release
scrapfly update --check # just print whether a newer version exists
scrapfly update --version v0.3.0 # pin to a specific release
```

The binary always comes from the GitHub release + `checksums.txt`; the
release-notes feed is only used for the human-readable changelog.

## Authentication

```bash
export SCRAPFLY_API_KEY=scp-live-...
# or persist it
scrapfly config set-key scp-live-...
```

Resolution order: `--api-key` flag > `SCRAPFLY_API_KEY` env >
`~/.scrapfly/config.json`.

## Quick start

```bash
# Scrape a JS-heavy page with anti-bot + markdown output
scrapfly scrape https://web-scraping.dev/products --render-js --asp --format markdown

# Pipe scrape into extract (two-step: fetch + AI extraction)
scrapfly scrape https://web-scraping.dev/product/1 --render-js --proxified \
| scrapfly extract --content-type text/html --prompt "name, price, sku"

# Screenshot auto-named into a directory
scrapfly -O ./shots screenshot https://example.com --resolution 1920x1080

# Crawl synchronously, then extract the WARC you downloaded
scrapfly crawl run https://example.com --max-pages 20 --content-format markdown
scrapfly crawl artifact --type warc -o crawl.warc
scrapfly crawl parse warc-list crawl.warc --pretty
```

## Browser & Agent

`scrapfly browser` gives you a CDP WebSocket URL you can hand to
Playwright, Puppeteer, [browser-use](https://github.com/browser-use/browser-use),
or drive directly through the built-in session daemon:

```bash
scrapfly browser --session demo start &
scrapfly browser navigate https://web-scraping.dev/login
scrapfly browser fill 'input[name=username]' user123
scrapfly browser fill 'input[name=password]' password
scrapfly browser click 'button[type=submit]'
scrapfly browser close
```

On Scrapfly's custom browser, `fill` / `click` / `wait` / `slide`
automatically go through the **Antibot** CDP domain (human-like timing,
slider-captcha support) and `content` uses **Page.getRenderedContent**
(HTML with iframes inlined).

For fully autonomous runs:

```bash
ANTHROPIC_API_KEY=... scrapfly agent "Find the first product name and price" \
--url https://web-scraping.dev/products \
--schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}},"required":["name","price"]}'
```

Providers: Anthropic (default), OpenAI, Google Gemini, and any
OpenAI-compatible endpoint (Ollama, vLLM, …).

## Product coverage

| Scrapfly API | REST | CLI |
|-------------------------|--------------------------------------------------|----------------------------------------|
| Web Scraping | `GET/POST /scrape` | `scrapfly scrape ` |
| Scrape Batch | `POST /scrape/batch` | `scrapfly batch ` |
| Screenshot | `POST /screenshot` | `scrapfly screenshot ` |
| Extraction | `POST /extraction` | `scrapfly extract` |
| Classify | `POST /classify` | `scrapfly classify` |
| Crawler | `POST /crawl` + `/crawl/{uuid}/...` | `scrapfly crawl {start,run,status,...}`|
| Browser (CDP) | `wss://browser.scrapfly.io` | `scrapfly browser [start/...]` |
| Browser Unblock | `POST /unblock` | `scrapfly browser --unblock` |
| Account | `GET /account` | `scrapfly account` / `scrapfly status` |

Every documented SDK field is exposed as a flag. See
[`agent-onboarding/SKILL.md`](agent-onboarding/SKILL.md) for the complete
map.

## For LLM agents

Point your tool-use loop at [`agent-onboarding/SKILL.md`](agent-onboarding/SKILL.md):
a compact guide covering the six usage paths, auth, envelope contract,
and every CLI verb grouped by intent.

## GitHub Actions

Use this repo as a reusable action to install the CLI on a runner:

```yaml
- uses: scrapfly/scrapfly-cli@v0.2.0
with:
api-key: ${{ secrets.SCRAPFLY_API_KEY }}
- run: scrapfly scrape https://example.com --format markdown
```

Inputs: `version` (default `latest`), `api-key` (exported to `SCRAPFLY_API_KEY`
with log masking), `prefix` (default `$HOME/.local/bin`).

## MCP server

For Cursor, Claude Desktop, Claude Code, and any other MCP-compatible
client:

```jsonc
{
"mcpServers": {
"scrapfly": {
"command": "scrapfly",
"args": ["mcp", "serve"],
"env": { "SCRAPFLY_API_KEY": "scp-live-..." }
}
}
}
```

Exposes `scrape`, `screenshot`, `extract`, `crawl_run`, and `selector` as
tools. Schemas are generated from the flag definitions so the client sees
exactly the surface the CLI does.

## Shell completion

```bash
scrapfly completion zsh > "${fpath[1]}/_scrapfly" # zsh
scrapfly completion bash > ~/.bash_completion.d/scrapfly # bash
scrapfly completion fish > ~/.config/fish/completions/scrapfly.fish
```

## Development

```bash
make install # go mod download
make dev # build dist/scrapfly
make test # go test ./...
make lint # go vet ./...
make fmt # gofmt -w .

# cut a release (tags + pushes; CI publishes binaries)
make release VERSION=0.2.0 NEXT_VERSION=0.2.1
```

## Links

- Scrapfly docs: https://scrapfly.io/docs
- Official SDKs (Python, TypeScript, Go, Rust): https://scrapfly.io/docs/sdk
- Issues: https://github.com/scrapfly/scrapfly-cli/issues

## License

MIT. See [LICENSE](LICENSE).