https://github.com/scrapfly/scrapfly-cli
Agentic CLI for the Scrapfly platform: scrape, extract, crawl, and drive a cloud browser over CDP. Built-in LLM agent with Anthropic, OpenAI, Gemini, and Ollama. JSON-first, pipe-friendly, one static binary.
https://github.com/scrapfly/scrapfly-cli
agentic ai-agent anthropic anti-bot anti-detect antidetect-browser bot-detection browser-automation cdp cli data-extraction golang headless-browser llm-agent openai scrapfly screenshot stealth-browser web-crawler web-scraping
Last synced: about 2 months ago
JSON representation
Agentic CLI for the Scrapfly platform: scrape, extract, crawl, and drive a cloud browser over CDP. Built-in LLM agent with Anthropic, OpenAI, Gemini, and Ollama. JSON-first, pipe-friendly, one static binary.
- Host: GitHub
- URL: https://github.com/scrapfly/scrapfly-cli
- Owner: scrapfly
- License: mit
- Created: 2026-04-15T08:51:01.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-20T09:50:56.000Z (about 2 months ago)
- Last Synced: 2026-04-20T10:39:55.557Z (about 2 months ago)
- Topics: agentic, ai-agent, anthropic, anti-bot, anti-detect, antidetect-browser, bot-detection, browser-automation, cdp, cli, data-extraction, golang, headless-browser, llm-agent, openai, scrapfly, screenshot, stealth-browser, web-crawler, web-scraping
- Language: Go
- Size: 523 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
scrapfly-cli
The agentic CLI for web data and browser automation.
Scrape, extract, and drive a cloud browser over CDP from any LLM
tool-use loop. JSON-first, pipe-friendly, one static binary.
---
`scrapfly` is the official CLI for the [Scrapfly](https://scrapfly.io)
platform. It is built **agent-first**: every verb returns a stable JSON
envelope, every product is a tool an LLM can call, and an autonomous
agent is baked in.
- **Agentic by design**: `scrapfly agent ""` runs a Playwright-MCP
style loop against a cloud browser. Providers ship out of the box for
Anthropic, OpenAI, Gemini, and any OpenAI-compatible endpoint (Ollama,
vLLM). Point any other tool-use loop at the full CLI instead via
[`agent-onboarding/SKILL.md`](agent-onboarding/SKILL.md).
- **Full product surface**: Web Scraping, Screenshot, Extraction, Crawler,
and the CDP-driven Browser, each with complete SDK-parity flags.
- **Pipe-friendly**: stable envelope (`{success, product, data|error}`),
`--content-only`/`--data-only` for single-line piping, ndjson for
batch runs, `--pretty` for humans.
- **Composable**: multi-call browser sessions via a Unix-socket daemon
(cookies + AXTree refs persist across invocations), WARC/HAR in and
out, `scrapfly browser` outputs a CDP URL you can hand to Playwright,
Puppeteer, or browser-use.
## Install
One-liner (macOS / Linux):
```bash
curl -fsSL https://scrapfly.io/scrapfly-cli/install | sh
```
Pinned version or custom prefix:
```bash
curl -fsSL https://scrapfly.io/scrapfly-cli/install | sh -s -- --version v0.1.0 --prefix ~/.local/bin
```
Or grab a release tarball directly from
[Releases](https://github.com/scrapfly/scrapfly-cli/releases) - artifacts
follow `scrapfly-{os}-{arch}.tar.gz` (`scrapfly-windows-amd64.zip` on
Windows). Supported: `darwin-universal`, `darwin-amd64`, `darwin-arm64`,
`linux-amd64`, `linux-arm64`, `windows-amd64`.
From source (Go 1.24+):
```bash
go install github.com/scrapfly/scrapfly-cli/cmd/scrapfly@latest
```
Or from a package manager:
```bash
# Homebrew (macOS, once the tap is published)
brew install scrapfly/tap/scrapfly
# npm / pnpm / yarn
npm install -D scrapfly-cli
npx scrapfly version
```
### macOS Gatekeeper
The released macOS binary is not yet signed with an Apple Developer ID, so
first-run may hit *"Apple could not verify 'scrapfly' is free of malware"*.
- `install.sh` strips the quarantine bit automatically.
- If you downloaded the tarball manually, run once:
```bash
xattr -d com.apple.quarantine /path/to/scrapfly
```
(or right-click → Open in Finder the first time).
Signing + notarization is on the roadmap.
### Self-update
Once installed, keep the binary current without re-running the installer:
```bash
scrapfly update # upgrade in place to the latest GitHub release
scrapfly update --check # just print whether a newer version exists
scrapfly update --version v0.3.0 # pin to a specific release
```
The binary always comes from the GitHub release + `checksums.txt`; the
release-notes feed is only used for the human-readable changelog.
## Authentication
```bash
export SCRAPFLY_API_KEY=scp-live-...
# or persist it
scrapfly config set-key scp-live-...
```
Resolution order: `--api-key` flag > `SCRAPFLY_API_KEY` env >
`~/.scrapfly/config.json`.
## Quick start
```bash
# Scrape a JS-heavy page with anti-bot + markdown output
scrapfly scrape https://web-scraping.dev/products --render-js --asp --format markdown
# Pipe scrape into extract (two-step: fetch + AI extraction)
scrapfly scrape https://web-scraping.dev/product/1 --render-js --proxified \
| scrapfly extract --content-type text/html --prompt "name, price, sku"
# Screenshot auto-named into a directory
scrapfly -O ./shots screenshot https://example.com --resolution 1920x1080
# Crawl synchronously, then extract the WARC you downloaded
scrapfly crawl run https://example.com --max-pages 20 --content-format markdown
scrapfly crawl artifact --type warc -o crawl.warc
scrapfly crawl parse warc-list crawl.warc --pretty
```
## Browser & Agent
`scrapfly browser` gives you a CDP WebSocket URL you can hand to
Playwright, Puppeteer, [browser-use](https://github.com/browser-use/browser-use),
or drive directly through the built-in session daemon:
```bash
scrapfly browser --session demo start &
scrapfly browser navigate https://web-scraping.dev/login
scrapfly browser fill 'input[name=username]' user123
scrapfly browser fill 'input[name=password]' password
scrapfly browser click 'button[type=submit]'
scrapfly browser close
```
On Scrapfly's custom browser, `fill` / `click` / `wait` / `slide`
automatically go through the **Antibot** CDP domain (human-like timing,
slider-captcha support) and `content` uses **Page.getRenderedContent**
(HTML with iframes inlined).
For fully autonomous runs:
```bash
ANTHROPIC_API_KEY=... scrapfly agent "Find the first product name and price" \
--url https://web-scraping.dev/products \
--schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}},"required":["name","price"]}'
```
Providers: Anthropic (default), OpenAI, Google Gemini, and any
OpenAI-compatible endpoint (Ollama, vLLM, …).
## Product coverage
| Scrapfly API | REST | CLI |
|-------------------------|--------------------------------------------------|----------------------------------------|
| Web Scraping | `GET/POST /scrape` | `scrapfly scrape ` |
| Scrape Batch | `POST /scrape/batch` | `scrapfly batch ` |
| Screenshot | `POST /screenshot` | `scrapfly screenshot ` |
| Extraction | `POST /extraction` | `scrapfly extract` |
| Classify | `POST /classify` | `scrapfly classify` |
| Crawler | `POST /crawl` + `/crawl/{uuid}/...` | `scrapfly crawl {start,run,status,...}`|
| Browser (CDP) | `wss://browser.scrapfly.io` | `scrapfly browser [start/...]` |
| Browser Unblock | `POST /unblock` | `scrapfly browser --unblock` |
| Account | `GET /account` | `scrapfly account` / `scrapfly status` |
Every documented SDK field is exposed as a flag. See
[`agent-onboarding/SKILL.md`](agent-onboarding/SKILL.md) for the complete
map.
## For LLM agents
Point your tool-use loop at [`agent-onboarding/SKILL.md`](agent-onboarding/SKILL.md):
a compact guide covering the six usage paths, auth, envelope contract,
and every CLI verb grouped by intent.
## GitHub Actions
Use this repo as a reusable action to install the CLI on a runner:
```yaml
- uses: scrapfly/scrapfly-cli@v0.2.0
with:
api-key: ${{ secrets.SCRAPFLY_API_KEY }}
- run: scrapfly scrape https://example.com --format markdown
```
Inputs: `version` (default `latest`), `api-key` (exported to `SCRAPFLY_API_KEY`
with log masking), `prefix` (default `$HOME/.local/bin`).
## MCP server
For Cursor, Claude Desktop, Claude Code, and any other MCP-compatible
client:
```jsonc
{
"mcpServers": {
"scrapfly": {
"command": "scrapfly",
"args": ["mcp", "serve"],
"env": { "SCRAPFLY_API_KEY": "scp-live-..." }
}
}
}
```
Exposes `scrape`, `screenshot`, `extract`, `crawl_run`, and `selector` as
tools. Schemas are generated from the flag definitions so the client sees
exactly the surface the CLI does.
## Shell completion
```bash
scrapfly completion zsh > "${fpath[1]}/_scrapfly" # zsh
scrapfly completion bash > ~/.bash_completion.d/scrapfly # bash
scrapfly completion fish > ~/.config/fish/completions/scrapfly.fish
```
## Development
```bash
make install # go mod download
make dev # build dist/scrapfly
make test # go test ./...
make lint # go vet ./...
make fmt # gofmt -w .
# cut a release (tags + pushes; CI publishes binaries)
make release VERSION=0.2.0 NEXT_VERSION=0.2.1
```
## Links
- Scrapfly docs: https://scrapfly.io/docs
- Official SDKs (Python, TypeScript, Go, Rust): https://scrapfly.io/docs/sdk
- Issues: https://github.com/scrapfly/scrapfly-cli/issues
## License
MIT. See [LICENSE](LICENSE).