https://github.com/testy-cool/scrape-gateway

Unified scraping gateway — 7 providers, cheapest-first routing, content validation, domain memory. One sgw url call, it figures out the rest.
https://github.com/testy-cool/scrape-gateway

anti-bot cli proxy python scraping web-scraping

Last synced: about 2 months ago
JSON representation

Unified scraping gateway — 7 providers, cheapest-first routing, content validation, domain memory. One sgw url call, it figures out the rest.

Host: GitHub
URL: https://github.com/testy-cool/scrape-gateway
Owner: testy-cool
License: other
Created: 2026-05-05T18:17:09.000Z (2 months ago)
Default Branch: main
Last Pushed: 2026-05-27T08:01:56.000Z (about 2 months ago)
Last Synced: 2026-05-27T08:05:41.856Z (about 2 months ago)
Topics: anti-bot, cli, proxy, python, scraping, web-scraping
Language: Python
Size: 260 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 25
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # scrape-gateway (`sgw`)

[![ci](https://github.com/testy-cool/scrape-gateway/actions/workflows/ci.yml/badge.svg)](https://github.com/testy-cool/scrape-gateway/actions/workflows/ci.yml)

[![version](https://img.shields.io/badge/version-0.3.0-blue)](https://github.com/testy-cool/scrape-gateway/releases/latest)

[![license](https://img.shields.io/badge/license-Apache--2.0-green)](LICENSE)



  



One command, seven providers. Free ones tried first, paid ones only when needed. Domain memory skips the trial-and-error on repeat visits.

## Quick start

```bash

git clone https://github.com/testy-cool/scrape-gateway.git

cd scrape-gateway

pip install -e .

cp .env.example .env   # add API keys (optional — free providers work without any)

sgw selftest           # verify installation

sgw url https://example.com

```

## Commands

| Command | What it does |

|---|---|

| `sgw url ` | Scrape one page through the provider chain |

| `sgw extract ` | Pull structured data (JSON/CSV) from listing pages |

| `sgw detect ` | Recon — find repeated elements before extracting |

| `sgw links ` | Index all links on a page |

| `sgw follow  ` | Scrape link #n from a page |

| `sgw recipe ` | Replay a saved YAML workflow |

| `sgw run ` | Batch scrape URLs from a text file |

| `sgw meta ` | Extract OpenGraph metadata as JSON |

| `sgw history ` | Show scrape timeline and page changes |

| `sgw telemetry` | Inspect recent scrape reports |

| `sgw providers` | List all available providers |

| `sgw extensions` | Browse/install community extensions |

| `sgw selftest` | Verify installation with known-safe sites |

Full usage and examples: [docs/commands.md](docs/commands.md)

## Providers

7 built-in, 3 free. Router tries cheapest first.

| Provider | Cost | JS | Geo | Anti-bot |

|---|---|---|---|---|

| `raw_http` | free | no | no | none |

| `wreq` | free | no | no | TLS fingerprinting |

| `curl_cffi` | free | no | no | TLS fingerprinting |

| `scrapedrive` | paid | yes | yes | full (3 tiers) |

| `scrape_do` | paid | yes | yes | residential proxies |

| `scrapingbee` | paid | yes | yes | premium proxies |

| `scraperapi` | paid | yes | yes | premium proxies |

Add API keys in `.env` to enable paid providers. Without them, `sgw` uses free providers only.

## Extend it

Drop a `.py` file in `~/.config/scrape-gateway/providers/` or install from the registry with `sgw extensions`. See [docs/extensions.md](docs/extensions.md).

## Python API

```python

from scrape_gateway import ScrapeGateway, ScrapeRequest

gw = ScrapeGateway.from_config()

result = await gw.scrape(ScrapeRequest("https://example.com"))

```

More: [docs/python-api.md](docs/python-api.md)

## Docs

- [Commands](docs/commands.md) — full reference with examples

- [Architecture](docs/architecture.md) — how the router, cache, and memory work

- [Configuration](docs/configuration.md) — YAML config and `.env` setup

- [Extensions](docs/extensions.md) — writing custom providers

- [Python API](docs/python-api.md) — using sgw as a library

- [Providers](docs/providers.md) — provider details and API mapping

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/testy-cool/scrape-gateway

Awesome Lists containing this project

README