{"id":13792470,"url":"https://github.com/scrapy-plugins/scrapy-pagestorage","last_synced_at":"2025-05-02T22:32:05.819Z","repository":{"id":4968451,"uuid":"49573214","full_name":"scrapy-plugins/scrapy-pagestorage","owner":"scrapy-plugins","description":"A scrapy extension to store requests and responses information in storage service","archived":false,"fork":false,"pushed_at":"2022-03-11T11:46:32.000Z","size":42,"stargazers_count":26,"open_issues_count":3,"forks_count":6,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-07T07:52:32.510Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scrapy-plugins.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-01-13T12:46:04.000Z","updated_at":"2024-02-11T13:09:27.000Z","dependencies_parsed_at":"2022-08-06T18:00:55.358Z","dependency_job_id":null,"html_url":"https://github.com/scrapy-plugins/scrapy-pagestorage","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-pagestorage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-pagestorage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-pagestorage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-pagestorage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scrapy-plugins","download_url":"https://codeload.github.com/scrapy-plugins/scrapy-pagestorage/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252116441,"owners_count":21697379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T22:01:12.548Z","updated_at":"2025-05-02T22:32:04.298Z","avatar_url":"https://github.com/scrapy-plugins.png","language":"Python","funding_links":[],"categories":["Apps"],"sub_categories":["Other Useful Extensions"],"readme":"==================\nscrapy-pagestorage\n==================\n\n.. image:: https://img.shields.io/pypi/v/scrapy-pagestorage.svg\n   :target: https://pypi.python.org/pypi/scrapy-pagestorage\n   :alt: PyPI Version\n\n.. image:: https://img.shields.io/pypi/pyversions/scrapy-pagestorage.svg\n   :target: https://pypi.python.org/pypi/scrapy-pagestorage\n   :alt: Python Versions\n\n.. image:: https://github.com/scrapy-plugins/scrapy-pagestorage/actions/workflows/tests.yml/badge.svg\n   :target: https://github.com/scrapy-plugins/scrapy-pagestorage/actions/workflows/tests.yml\n   :alt: Build Status\n\n.. image:: https://img.shields.io/codecov/c/github/scrapy-plugins/scrapy-pagestorage/master.svg\n   :target: https://codecov.io/github/scrapy-plugins/scrapy-pagestorage\n   :alt: Coverage report\n\nA scrapy extension to store requests and responses information in storage service.\n\nInstallation\n============\n\nYou can install scrapy-pagestorage using pip::\n\n    pip install scrapy-pagestorage\n\nYou can then enable the middleware in your `settings.py`::\n\n    SPIDER_MIDDLEWARES = {\n        ...\n        'scrapy_pagestorage.PageStorageMiddleware': 900\n    }\n\nHow to use it\n=============\n\nEnable extension through `settings.py`::\n\n    PAGE_STORAGE_ENABLED = True\n    PAGE_STORAGE_ON_ERROR_ENABLED = True\n\nConfigure the exension through `settings.py`::\n\n    PAGE_STORAGE_MODE = \"VERSIONED_CACHE\"\n    PAGE_STORAGE_LIMIT = 100\n    PAGE_STORAGE_ON_ERROR_LIMIT = 100\n    PAGE_STORAGE_TRIM_HTML = True\n\nThe extension is auto-enabled for Portia spiders (``SHUB_SPIDER_TYPE=portia``).\n\nSettings\n========\n\nPAGE_STORAGE_MODE\n-----------------\nDefault: ``None``\n\nA string which specifies if the extension will store information using cache store or\nversioned cache store (set `PAGE_STORAGE_MODE=\"VERSIONED_CACHE\"` to use versioned one).\n\nPAGE_STORAGE_LIMIT\n------------------\nAn integer to set a limit of visited pages amount to store.\n\nPAGE_STORAGE_ON_ERROR_LIMIT\n---------------------------\nAn integer to set a limit for page errors amount to store.\n\nPAGE_STORAGE_TRIM_HTML\n----------------------\nDefault: ``False``\n\nRemove whitespace from the start and end of the HTML to reduce file size.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-pagestorage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapy-plugins%2Fscrapy-pagestorage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-pagestorage/lists"}