https://github.com/testy-cool/scrape-gateway
Unified scraping gateway — 7 providers, cheapest-first routing, content validation, domain memory. One sgw url call, it figures out the rest.
https://github.com/testy-cool/scrape-gateway
anti-bot cli proxy python scraping web-scraping
Last synced: 1 day ago
JSON representation
Unified scraping gateway — 7 providers, cheapest-first routing, content validation, domain memory. One sgw url call, it figures out the rest.
- Host: GitHub
- URL: https://github.com/testy-cool/scrape-gateway
- Owner: testy-cool
- License: other
- Created: 2026-05-05T18:17:09.000Z (25 days ago)
- Default Branch: main
- Last Pushed: 2026-05-27T08:01:56.000Z (4 days ago)
- Last Synced: 2026-05-27T08:05:41.856Z (4 days ago)
- Topics: anti-bot, cli, proxy, python, scraping, web-scraping
- Language: Python
- Size: 260 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scrape-gateway (`sgw`)
[](https://github.com/testy-cool/scrape-gateway/actions/workflows/ci.yml)
[](https://github.com/testy-cool/scrape-gateway/releases/latest)
[](LICENSE)
One command, seven providers. Free ones tried first, paid ones only when needed. Domain memory skips the trial-and-error on repeat visits.
## Quick start
```bash
git clone https://github.com/testy-cool/scrape-gateway.git
cd scrape-gateway
pip install -e .
cp .env.example .env # add API keys (optional — free providers work without any)
sgw selftest # verify installation
sgw url https://example.com
```
## Commands
| Command | What it does |
|---|---|
| `sgw url ` | Scrape one page through the provider chain |
| `sgw extract ` | Pull structured data (JSON/CSV) from listing pages |
| `sgw detect ` | Recon — find repeated elements before extracting |
| `sgw links ` | Index all links on a page |
| `sgw follow ` | Scrape link #n from a page |
| `sgw recipe ` | Replay a saved YAML workflow |
| `sgw run ` | Batch scrape URLs from a text file |
| `sgw meta ` | Extract OpenGraph metadata as JSON |
| `sgw history ` | Show scrape timeline and page changes |
| `sgw telemetry` | Inspect recent scrape reports |
| `sgw providers` | List all available providers |
| `sgw extensions` | Browse/install community extensions |
| `sgw selftest` | Verify installation with known-safe sites |
Full usage and examples: [docs/commands.md](docs/commands.md)
## Providers
7 built-in, 3 free. Router tries cheapest first.
| Provider | Cost | JS | Geo | Anti-bot |
|---|---|---|---|---|
| `raw_http` | free | no | no | none |
| `wreq` | free | no | no | TLS fingerprinting |
| `curl_cffi` | free | no | no | TLS fingerprinting |
| `scrapedrive` | paid | yes | yes | full (3 tiers) |
| `scrape_do` | paid | yes | yes | residential proxies |
| `scrapingbee` | paid | yes | yes | premium proxies |
| `scraperapi` | paid | yes | yes | premium proxies |
Add API keys in `.env` to enable paid providers. Without them, `sgw` uses free providers only.
## Extend it
Drop a `.py` file in `~/.config/scrape-gateway/providers/` or install from the registry with `sgw extensions`. See [docs/extensions.md](docs/extensions.md).
## Python API
```python
from scrape_gateway import ScrapeGateway, ScrapeRequest
gw = ScrapeGateway.from_config()
result = await gw.scrape(ScrapeRequest("https://example.com"))
```
More: [docs/python-api.md](docs/python-api.md)
## Docs
- [Commands](docs/commands.md) — full reference with examples
- [Architecture](docs/architecture.md) — how the router, cache, and memory work
- [Configuration](docs/configuration.md) — YAML config and `.env` setup
- [Extensions](docs/extensions.md) — writing custom providers
- [Python API](docs/python-api.md) — using sgw as a library
- [Providers](docs/providers.md) — provider details and API mapping