{"id":13414811,"url":"https://github.com/scrapy-plugins/scrapy-deltafetch","last_synced_at":"2025-04-04T23:07:38.240Z","repository":{"id":44383763,"uuid":"61316147","full_name":"scrapy-plugins/scrapy-deltafetch","owner":"scrapy-plugins","description":"Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls","archived":false,"fork":false,"pushed_at":"2021-10-17T17:36:33.000Z","size":54,"stargazers_count":264,"open_issues_count":18,"forks_count":48,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-09-19T12:08:37.225Z","etag":null,"topics":["hacktoberfest","hacktoberfest2021"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scrapy-plugins.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-06-16T18:21:12.000Z","updated_at":"2024-06-07T13:02:39.000Z","dependencies_parsed_at":"2022-07-14T14:00:31.740Z","dependency_job_id":null,"html_url":"https://github.com/scrapy-plugins/scrapy-deltafetch","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-deltafetch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-deltafetch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-deltafetch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapy-plugins%2Fscrapy-deltafetch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scrapy-plugins","download_url":"https://codeload.github.com/scrapy-plugins/scrapy-deltafetch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247261603,"owners_count":20910108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hacktoberfest","hacktoberfest2021"],"created_at":"2024-07-30T21:00:37.089Z","updated_at":"2025-04-04T23:07:38.221Z","avatar_url":"https://github.com/scrapy-plugins.png","language":"Python","funding_links":[],"categories":["Apps","Scrapy Middleware"],"sub_categories":["Other Useful Extensions"],"readme":"=================\nscrapy-deltafetch\n=================\n  \n.. image:: https://github.com/scrapy-plugins/scrapy-deltafetch/workflows/CI/badge.svg\n   :target: https://github.com/scrapy-plugins/scrapy-deltafetch/actions\n\n.. image:: https://img.shields.io/pypi/pyversions/scrapy-deltafetch.svg\n    :target: https://pypi.python.org/pypi/scrapy-deltafetch\n\n.. image:: https://img.shields.io/pypi/v/scrapy-deltafetch.svg\n    :target: https://pypi.python.org/pypi/scrapy-deltafetch\n\n.. image:: https://img.shields.io/pypi/l/scrapy-deltafetch.svg\n    :target: https://pypi.python.org/pypi/scrapy-deltafetch\n\n.. image:: https://img.shields.io/pypi/dm/scrapy-deltafetch.svg\n   :target: https://pypistats.org/packages/scrapy-deltafetch\n   :alt: Downloads count\n\nThis is a Scrapy spider middleware to ignore requests\nto pages seen in previous crawls of the same spider,\nthus producing a \"delta crawl\" containing only new requests.\n\nThis also speeds up the crawl, by reducing the number of requests that need\nto be crawled, and processed (typically, item requests are the most CPU\nintensive).\n\nDeltaFetch middleware uses Python's dbm_ package to store requests fingerprints.\n\n.. _dbm: https://docs.python.org/3/library/dbm.html\n\n\nInstallation\n============\n\nInstall scrapy-deltafetch using ``pip``::\n\n    $ pip install scrapy-deltafetch\n\n\nConfiguration\n=============\n\n1. Add DeltaFetch middleware by including it in ``SPIDER_MIDDLEWARES``\n   in your ``settings.py`` file::\n\n      SPIDER_MIDDLEWARES = {\n          'scrapy_deltafetch.DeltaFetch': 100,\n      }\n\n   Here, priority ``100`` is just an example.\n   Set its value depending on other middlewares you may have enabled already.\n\n2. Enable the middleware using ``DELTAFETCH_ENABLED`` in your ``settings.py``::\n\n      DELTAFETCH_ENABLED = True\n\n\nUsage\n=====\n\nFollowing are the different options to control DeltaFetch middleware\nbehavior.\n\nSupported Scrapy settings\n-------------------------\n\n* ``DELTAFETCH_ENABLED`` — to enable (or disable) this extension\n* ``DELTAFETCH_DIR`` — directory where to store state\n* ``DELTAFETCH_RESET`` — reset the state, clearing out all seen requests\n\nThese usually go in your Scrapy project's ``settings.py``.\n\n\nSupported Scrapy spider arguments\n---------------------------------\n\n* ``deltafetch_reset`` — same effect as DELTAFETCH_RESET setting\n\nExample::\n\n    $ scrapy crawl example -a deltafetch_reset=1\n\n\nSupported Scrapy request meta keys\n----------------------------------\n\n* ``deltafetch_key`` — used to define the lookup key for that request. by\n  default it's Scrapy's default Request fingerprint function,\n  but it can be changed to contain an item id, for example.\n  This requires support from the spider, but makes the extension\n  more efficient for sites that many URLs for the same item.\n\n* ``deltafetch_enabled`` - if set to False it will disable deltafetch for some\n  specific request\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-deltafetch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapy-plugins%2Fscrapy-deltafetch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapy-plugins%2Fscrapy-deltafetch/lists"}