{"id":14064916,"url":"https://github.com/jamesturk/scrapelib","last_synced_at":"2025-04-04T06:07:11.277Z","repository":{"id":967501,"uuid":"760380","full_name":"jamesturk/scrapelib","owner":"jamesturk","description":"⛏ a library for scraping unreliable pages","archived":false,"fork":false,"pushed_at":"2024-08-20T15:46:48.000Z","size":981,"stargazers_count":210,"open_issues_count":7,"forks_count":40,"subscribers_count":17,"default_branch":"main","last_synced_at":"2025-03-28T05:08:36.899Z","etag":null,"topics":["http","python","scraper"],"latest_commit_sha":null,"homepage":"https://jamesturk.github.io/scrapelib/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jamesturk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"jamesturk","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2010-07-06T19:34:01.000Z","updated_at":"2024-12-31T19:41:48.000Z","dependencies_parsed_at":"2024-03-02T06:29:54.246Z","dependency_job_id":"803fa194-6c02-4762-b048-58ea5566ce3a","html_url":"https://github.com/jamesturk/scrapelib","commit_stats":{"total_commits":473,"total_committers":21,"mean_commits":"22.523809523809526","dds":0.4249471458773785,"last_synced_commit":"5ce09163fbd73129033ecf882a15d6a3ba53834d"},"previous_names":[],"tags_count":40,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fscrapelib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fscrapelib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fscrapelib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fscrapelib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jamesturk","download_url":"https://codeload.github.com/jamesturk/scrapelib/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247128743,"owners_count":20888235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["http","python","scraper"],"created_at":"2024-08-13T07:04:10.548Z","updated_at":"2025-04-04T06:07:11.255Z","avatar_url":"https://github.com/jamesturk.png","language":"Python","funding_links":["https://github.com/sponsors/jamesturk"],"categories":["Python"],"sub_categories":[],"readme":"**scrapelib** is a library for making requests to less-than-reliable websites.\n\nSource: [https://github.com/jamesturk/scrapelib](https://github.com/jamesturk/scrapelib)\n\nDocumentation: [https://jamesturk.github.io/scrapelib/](https://jamesturk.github.io/scrapelib/)\n\nIssues: [https://github.com/jamesturk/scrapelib/issues](https://github.com/jamesturk/scrapelib/issues)\n\n[![PyPI badge](https://badge.fury.io/py/scrapelib.svg)](https://badge.fury.io/py/scrapelib)\n[![Test badge](https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg)](https://github.com/jamesturk/scrapelib/actions?query=workflow%3ATest)\n\n## Features\n\n**scrapelib** originated as part of the [Open States](http://openstates.org/)\nproject to scrape the websites of all 50 state legislatures and as a result\nwas therefore designed with features desirable when dealing with sites that\nhave intermittent errors or require rate-limiting.\n\nAdvantages of using scrapelib over using requests as-is:\n\n- HTTP(S) and FTP requests via an identical API\n- support for simple caching with pluggable cache backends\n- highly-configurable request throtting\n- configurable retries for non-permanent site failures\n- All of the power of the suberb [requests](http://python-requests.org) library.\n\n\n## Installation\n\n*scrapelib* is on [PyPI](https://pypi.org/project/scrapelib/), and can be installed via any standard package management tool:\n\n    poetry add scrapelib\n\nor:\n\n    pip install scrapelib\n\n\n## Example Usage\n\n``` python\n\n  import scrapelib\n  s = scrapelib.Scraper(requests_per_minute=10)\n\n  # Grab Google front page\n  s.get('http://google.com')\n\n  # Will be throttled to 10 HTTP requests per minute\n  while True:\n      s.get('http://example.com')\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesturk%2Fscrapelib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjamesturk%2Fscrapelib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesturk%2Fscrapelib/lists"}