{"id":20683211,"url":"https://github.com/simonsolnes/webcache","last_synced_at":"2026-05-28T06:10:33.090Z","repository":{"id":113369648,"uuid":"131528562","full_name":"simonsolnes/webcache","owner":"simonsolnes","description":"Cache webpages when you are testing your web scrapers.","archived":false,"fork":false,"pushed_at":"2018-04-29T22:58:20.000Z","size":11,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-03-29T14:35:46.293Z","etag":null,"topics":["scraping","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simonsolnes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-29T20:43:05.000Z","updated_at":"2021-08-13T19:24:56.000Z","dependencies_parsed_at":"2023-11-02T23:30:29.099Z","dependency_job_id":null,"html_url":"https://github.com/simonsolnes/webcache","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/simonsolnes/webcache","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonsolnes%2Fwebcache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonsolnes%2Fwebcache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonsolnes%2Fwebcache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonsolnes%2Fwebcache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simonsolnes","download_url":"https://codeload.github.com/simonsolnes/webcache/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonsolnes%2Fwebcache/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33596370,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scraping","web-scraping"],"created_at":"2024-11-16T22:15:55.444Z","updated_at":"2026-05-28T06:10:33.073Z","avatar_url":"https://github.com/simonsolnes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# webcache\nCache webpages when you are testing your web scrapers.\n\n## Putting it in your project\nAdd this to your project with:\n\n`$ git submodule add https://github.com/simonsolnes/webcache webcache`\n\nand\n\n```python3\nfrom webcache import WebCache\n```\n\n## Quick Intro\n\n\nTo download a webpage:\n```python\nwith WebCache() as c:\n\twebsite = c.get('https://www.python.org')\n```\n\n## Methods\n\n`get(url) -\u003e` webpage-data `(str)`  \nGets the webpage data from the web or local cache.\n\n`insert(*urls (string))`  \nPuts one or several urls in the directory, but the cache doesn't download it. Meant for cuncurrent downloads.\n\n`fetch()`  \nWill download all webpages that are not local.\n\n`update_url(*urls (string))`  \nWill update the urls that is passed.\n\n`update_all()`  \nWill redownload all webpages that the cache knows about.\n\n`update_old(age (int, seconds))` \nWill update the urls that has an age older than the one specified.\n\n`reset()`  \nWill delete all local data.\n\n\n## Downloading concurrently; `insert` and `fetch`\n\n```python\nurls = [\n\t'https://www.python.org'\n\t'https://duckduckgo.com'\n\t'https://www.wikipedia.org'\n]\nwith WebCache() as c:\n\tc.insert(*urls)\n\tc.fetch()\n\twebsite = c.get('https://www.python.org')\n```\n\n## Updating webpages\n\nUpdate an url:\n```python\nwith WebCache() as c:\n\t# one\n\tc.update_url('https://www.python.org')\n\t# or several\n\tc.update(*urls)\n```\n\nUpdate old urls:\n```python\nwith WebCache() as c:\n\tc.update_old(60 * 60)\n```\n\nUpdate all urls:\n```python\nwith WebCache() as c:\n\tc.update_all()\n```\n\n## Not get a DoS\nTo not overload a server, you can set an amount of time that you are willing to wait when the cache is downloading several webpages. The longer the wait, the longer the time between each request.\n\n```python\nwith WebCache(60) as c:\n\t...\n```\n\n## Without context\nIt is possible to do:\n```python\nc = WebCache():\nwebsite = c.get('https://www.python.org')\n```\nBut is not recommended when several webpages is needed, since the cache needs to load its directory for each time you create an instance.\n\nThe class is a singleton, so there is no need to worry about if there is something else is using the cache at the moment.\n\n## Reset\nReset the whole cache:\n```python\nWebCache().reset()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonsolnes%2Fwebcache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonsolnes%2Fwebcache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonsolnes%2Fwebcache/lists"}