{"id":49118125,"url":"https://github.com/scrapegraphai/scrapegraph-py","last_synced_at":"2026-04-21T09:02:06.024Z","repository":{"id":265423621,"uuid":"880829353","full_name":"ScrapeGraphAI/scrapegraph-py","owner":"ScrapeGraphAI","description":"Official Python SDK for the ScrapeGraph AI   API. Smart scraping, search, crawling,   markdownify, agentic browser automation,   scheduled jobs, and structured data   extraction ","archived":false,"fork":false,"pushed_at":"2026-04-14T21:41:16.000Z","size":14519,"stargazers_count":69,"open_issues_count":3,"forks_count":15,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-14T22:10:56.010Z","etag":null,"topics":["api","json-schema","python","scrapegraph","scraping","sdk-js","sdk-nodejs","sdk-python","web-crawler","web-scraping","web-scraping-python"],"latest_commit_sha":null,"homepage":"https://scrapegraphai.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ScrapeGraphAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-10-30T12:46:08.000Z","updated_at":"2026-04-14T20:06:52.000Z","dependencies_parsed_at":"2024-11-29T10:41:27.608Z","dependency_job_id":"c52de486-ed94-4275-b1cd-f859adaf705c","html_url":"https://github.com/ScrapeGraphAI/scrapegraph-py","commit_stats":null,"previous_names":["scrapegraphai/scrapegraph-sdk","scrapegraphai/scrapegraph-py"],"tags_count":75,"template":false,"template_full_name":null,"purl":"pkg:github/ScrapeGraphAI/scrapegraph-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2Fscrapegraph-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2Fscrapegraph-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2Fscrapegraph-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2Fscrapegraph-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ScrapeGraphAI","download_url":"https://codeload.github.com/ScrapeGraphAI/scrapegraph-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ScrapeGraphAI%2Fscrapegraph-py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32084721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-21T06:27:27.065Z","status":"ssl_error","status_checked_at":"2026-04-21T06:27:21.250Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","json-schema","python","scrapegraph","scraping","sdk-js","sdk-nodejs","sdk-python","web-crawler","web-scraping","web-scraping-python"],"created_at":"2026-04-21T09:02:04.979Z","updated_at":"2026-04-21T09:02:06.014Z","avatar_url":"https://github.com/ScrapeGraphAI.png","language":"Jupyter Notebook","readme":"# ScrapeGraphAI Python SDK\n\n[![PyPI version](https://badge.fury.io/py/scrapegraph-py.svg)](https://badge.fury.io/py/scrapegraph-py)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://scrapegraphai.com\"\u003e\n    \u003cimg src=\"media/banner.png\" alt=\"ScrapeGraphAI Python SDK\" style=\"width: 100%;\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nOfficial Python SDK for the [ScrapeGraphAI API](https://scrapegraphai.com).\n\n## Install\n\n```bash\npip install scrapegraph-py\n# or\nuv add scrapegraph-py\n```\n\n## Quick Start\n\n```python\nfrom scrapegraph_py import ScrapeGraphAI, ScrapeRequest\n\n# reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key=\"...\")\nsgai = ScrapeGraphAI()\n\nresult = sgai.scrape(ScrapeRequest(\n    url=\"https://example.com\",\n))\n\nif result.status == \"success\":\n    print(result.data[\"results\"][\"markdown\"][\"data\"])\nelse:\n    print(result.error)\n```\n\nEvery method returns `ApiResult[T]` — no exceptions to catch:\n\n```python\n@dataclass\nclass ApiResult(Generic[T]):\n    status: Literal[\"success\", \"error\"]\n    data: T | None\n    error: str | None\n    elapsed_ms: int\n```\n\n## API\n\n### scrape\n\nScrape a webpage in multiple formats (markdown, html, screenshot, json, etc).\n\n```python\nfrom scrapegraph_py import (\n    ScrapeGraphAI, ScrapeRequest, FetchConfig,\n    MarkdownFormatConfig, ScreenshotFormatConfig, JsonFormatConfig\n)\n\nsgai = ScrapeGraphAI()\n\nres = sgai.scrape(ScrapeRequest(\n    url=\"https://example.com\",\n    formats=[\n        MarkdownFormatConfig(mode=\"reader\"),\n        ScreenshotFormatConfig(full_page=True, width=1440, height=900),\n        JsonFormatConfig(prompt=\"Extract product info\"),\n    ],\n    content_type=\"text/html\",           # optional, auto-detected\n    fetch_config=FetchConfig(           # optional\n        mode=\"js\",                      # \"auto\" | \"fast\" | \"js\"\n        stealth=True,\n        timeout=30000,\n        wait=2000,\n        scrolls=3,\n        headers={\"Accept-Language\": \"en\"},\n        cookies={\"session\": \"abc\"},\n        country=\"us\",\n    ),\n))\n```\n\n**Formats:**\n- `markdown` — Clean markdown (modes: `normal`, `reader`, `prune`)\n- `html` — Raw HTML (modes: `normal`, `reader`, `prune`)\n- `links` — All links on the page\n- `images` — All image URLs\n- `summary` — AI-generated summary\n- `json` — Structured extraction with prompt/schema\n- `branding` — Brand colors, typography, logos\n- `screenshot` — Page screenshot (full_page, width, height, quality)\n\n### extract\n\nExtract structured data from a URL, HTML, or markdown using AI.\n\n```python\nfrom scrapegraph_py import ScrapeGraphAI, ExtractRequest\n\nsgai = ScrapeGraphAI()\n\nres = sgai.extract(ExtractRequest(\n    url=\"https://example.com\",\n    prompt=\"Extract product names and prices\",\n    schema={\"type\": \"object\", \"properties\": {...}},  # optional\n    mode=\"reader\",                                    # optional\n    fetch_config=FetchConfig(...),                   # optional\n))\n# Or pass html/markdown directly instead of url\n```\n\n### search\n\nSearch the web and optionally extract structured data.\n\n```python\nfrom scrapegraph_py import ScrapeGraphAI, SearchRequest\n\nsgai = ScrapeGraphAI()\n\nres = sgai.search(SearchRequest(\n    query=\"best programming languages 2024\",\n    num_results=5,                      # 1-20, default 3\n    format=\"markdown\",                  # \"markdown\" | \"html\"\n    prompt=\"Extract key points\",        # optional, for AI extraction\n    schema={...},                       # optional\n    time_range=\"past_week\",             # optional\n    location_geo_code=\"us\",             # optional\n    fetch_config=FetchConfig(...),      # optional\n))\n```\n\n### crawl\n\nCrawl a website and its linked pages.\n\n```python\nfrom scrapegraph_py import ScrapeGraphAI, CrawlRequest, MarkdownFormatConfig\n\nsgai = ScrapeGraphAI()\n\n# Start a crawl\nstart = sgai.crawl.start(CrawlRequest(\n    url=\"https://example.com\",\n    formats=[MarkdownFormatConfig()],\n    max_pages=50,\n    max_depth=2,\n    max_links_per_page=10,\n    include_patterns=[\"/blog/*\"],\n    exclude_patterns=[\"/admin/*\"],\n    fetch_config=FetchConfig(...),\n))\n\n# Check status\nstatus = sgai.crawl.get(start.data[\"id\"])\n\n# Control\nsgai.crawl.stop(crawl_id)\nsgai.crawl.resume(crawl_id)\nsgai.crawl.delete(crawl_id)\n```\n\n### monitor\n\nMonitor a webpage for changes on a schedule.\n\n```python\nfrom scrapegraph_py import ScrapeGraphAI, MonitorCreateRequest, MarkdownFormatConfig\n\nsgai = ScrapeGraphAI()\n\n# Create a monitor\nmon = sgai.monitor.create(MonitorCreateRequest(\n    url=\"https://example.com\",\n    name=\"Price Monitor\",\n    interval=\"0 * * * *\",               # cron expression\n    formats=[MarkdownFormatConfig()],\n    webhook_url=\"https://...\",          # optional\n    fetch_config=FetchConfig(...),\n))\n\n# Manage monitors\nsgai.monitor.list()\nsgai.monitor.get(cron_id)\nsgai.monitor.update(cron_id, MonitorUpdateRequest(interval=\"0 */6 * * *\"))\nsgai.monitor.pause(cron_id)\nsgai.monitor.resume(cron_id)\nsgai.monitor.delete(cron_id)\n```\n\n### history\n\nFetch request history.\n\n```python\nfrom scrapegraph_py import ScrapeGraphAI, HistoryFilter\n\nsgai = ScrapeGraphAI()\n\nhistory = sgai.history.list(HistoryFilter(\n    service=\"scrape\",                   # optional filter\n    page=1,\n    limit=20,\n))\n\nentry = sgai.history.get(\"request-id\")\n```\n\n### credits / health\n\n```python\nfrom scrapegraph_py import ScrapeGraphAI\n\nsgai = ScrapeGraphAI()\n\ncredits = sgai.credits()\n# { remaining: 1000, used: 500, plan: \"pro\", jobs: { crawl: {...}, monitor: {...} } }\n\nhealth = sgai.health()\n# { status: \"ok\", uptime: 12345 }\n```\n\n## Async Client\n\nAll methods have async equivalents via `AsyncScrapeGraphAI`:\n\n```python\nimport asyncio\nfrom scrapegraph_py import AsyncScrapeGraphAI, ScrapeRequest\n\nasync def main():\n    async with AsyncScrapeGraphAI() as sgai:\n        result = await sgai.scrape(ScrapeRequest(url=\"https://example.com\"))\n        if result.status == \"success\":\n            print(result.data[\"results\"][\"markdown\"][\"data\"])\n        else:\n            print(result.error)\n\nasyncio.run(main())\n```\n\n### Async Extract\n\n```python\nasync with AsyncScrapeGraphAI() as sgai:\n    res = await sgai.extract(ExtractRequest(\n        url=\"https://example.com\",\n        prompt=\"Extract product names and prices\",\n    ))\n```\n\n### Async Search\n\n```python\nasync with AsyncScrapeGraphAI() as sgai:\n    res = await sgai.search(SearchRequest(\n        query=\"best programming languages 2024\",\n        num_results=5,\n    ))\n```\n\n### Async Crawl\n\n```python\nasync with AsyncScrapeGraphAI() as sgai:\n    start = await sgai.crawl.start(CrawlRequest(\n        url=\"https://example.com\",\n        max_pages=50,\n    ))\n    status = await sgai.crawl.get(start.data[\"id\"])\n```\n\n### Async Monitor\n\n```python\nasync with AsyncScrapeGraphAI() as sgai:\n    mon = await sgai.monitor.create(MonitorCreateRequest(\n        url=\"https://example.com\",\n        name=\"Price Monitor\",\n        interval=\"0 * * * *\",\n    ))\n```\n\n## Examples\n\n### Sync Examples\n\n| Service | Example | Description |\n|---------|---------|-------------|\n| scrape | [`scrape_basic.py`](examples/scrape/scrape_basic.py) | Basic markdown scraping |\n| scrape | [`scrape_multi_format.py`](examples/scrape/scrape_multi_format.py) | Multiple formats |\n| scrape | [`scrape_json_extraction.py`](examples/scrape/scrape_json_extraction.py) | Structured JSON extraction |\n| scrape | [`scrape_pdf.py`](examples/scrape/scrape_pdf.py) | PDF document parsing |\n| scrape | [`scrape_with_fetchconfig.py`](examples/scrape/scrape_with_fetchconfig.py) | JS rendering, stealth mode |\n| extract | [`extract_basic.py`](examples/extract/extract_basic.py) | AI data extraction |\n| extract | [`extract_with_schema.py`](examples/extract/extract_with_schema.py) | Extraction with JSON schema |\n| search | [`search_basic.py`](examples/search/search_basic.py) | Web search |\n| search | [`search_with_extraction.py`](examples/search/search_with_extraction.py) | Search + AI extraction |\n| crawl | [`crawl_basic.py`](examples/crawl/crawl_basic.py) | Start and monitor a crawl |\n| crawl | [`crawl_with_formats.py`](examples/crawl/crawl_with_formats.py) | Crawl with formats |\n| monitor | [`monitor_basic.py`](examples/monitor/monitor_basic.py) | Create a page monitor |\n| monitor | [`monitor_with_webhook.py`](examples/monitor/monitor_with_webhook.py) | Monitor with webhook |\n| utilities | [`credits.py`](examples/utilities/credits.py) | Check credits and limits |\n| utilities | [`health.py`](examples/utilities/health.py) | API health check |\n| utilities | [`history.py`](examples/utilities/history.py) | Request history |\n\n### Async Examples\n\n| Service | Example | Description |\n|---------|---------|-------------|\n| scrape | [`scrape_basic_async.py`](examples/scrape/scrape_basic_async.py) | Basic markdown scraping |\n| scrape | [`scrape_multi_format_async.py`](examples/scrape/scrape_multi_format_async.py) | Multiple formats |\n| scrape | [`scrape_json_extraction_async.py`](examples/scrape/scrape_json_extraction_async.py) | Structured JSON extraction |\n| scrape | [`scrape_pdf_async.py`](examples/scrape/scrape_pdf_async.py) | PDF document parsing |\n| scrape | [`scrape_with_fetchconfig_async.py`](examples/scrape/scrape_with_fetchconfig_async.py) | JS rendering, stealth mode |\n| extract | [`extract_basic_async.py`](examples/extract/extract_basic_async.py) | AI data extraction |\n| extract | [`extract_with_schema_async.py`](examples/extract/extract_with_schema_async.py) | Extraction with JSON schema |\n| search | [`search_basic_async.py`](examples/search/search_basic_async.py) | Web search |\n| search | [`search_with_extraction_async.py`](examples/search/search_with_extraction_async.py) | Search + AI extraction |\n| crawl | [`crawl_basic_async.py`](examples/crawl/crawl_basic_async.py) | Start and monitor a crawl |\n| crawl | [`crawl_with_formats_async.py`](examples/crawl/crawl_with_formats_async.py) | Crawl with formats |\n| monitor | [`monitor_basic_async.py`](examples/monitor/monitor_basic_async.py) | Create a page monitor |\n| monitor | [`monitor_with_webhook_async.py`](examples/monitor/monitor_with_webhook_async.py) | Monitor with webhook |\n| utilities | [`credits_async.py`](examples/utilities/credits_async.py) | Check credits and limits |\n| utilities | [`health_async.py`](examples/utilities/health_async.py) | API health check |\n| utilities | [`history_async.py`](examples/utilities/history_async.py) | Request history |\n\n## Environment Variables\n\n| Variable | Description | Default |\n|----------|-------------|---------|\n| `SGAI_API_KEY` | Your ScrapeGraphAI API key | — |\n| `SGAI_API_URL` | Override API base URL | `https://v2-api.scrapegraphai.com/api` |\n| `SGAI_DEBUG` | Enable debug logging (`\"1\"`) | off |\n| `SGAI_TIMEOUT` | Request timeout in seconds | `120` |\n\n## Development\n\n```bash\nuv sync\nuv run pytest tests/              # unit tests\nuv run pytest tests/test_integration.py  # live API tests (requires SGAI_API_KEY)\nuv run ruff check .               # lint\n```\n\n## License\n\nMIT - [ScrapeGraphAI](https://scrapegraphai.com)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapegraphai%2Fscrapegraph-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapegraphai%2Fscrapegraph-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapegraphai%2Fscrapegraph-py/lists"}