{"id":13464502,"url":"https://github.com/rmax/scrapy-redis","last_synced_at":"2025-12-18T09:46:55.481Z","repository":{"id":1017730,"uuid":"2286594","full_name":"rmax/scrapy-redis","owner":"rmax","description":"Redis-based components for Scrapy.","archived":false,"fork":false,"pushed_at":"2024-07-06T21:54:35.000Z","size":233,"stargazers_count":5596,"open_issues_count":34,"forks_count":1585,"subscribers_count":271,"default_branch":"master","last_synced_at":"2025-05-11T11:04:20.633Z","etag":null,"topics":["crawler","distributed","redis","scrapy"],"latest_commit_sha":null,"homepage":"http://scrapy-redis.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rmax.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2011-08-29T04:06:23.000Z","updated_at":"2025-05-10T12:01:55.000Z","dependencies_parsed_at":"2024-11-16T04:45:42.344Z","dependency_job_id":null,"html_url":"https://github.com/rmax/scrapy-redis","commit_stats":{"total_commits":220,"total_committers":41,"mean_commits":5.365853658536586,"dds":0.6818181818181819,"last_synced_commit":"c3064c2fa74e623bf14448d82cc07ca2da8e183d"},"previous_names":["rolando/scrapy-redis"],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmax%2Fscrapy-redis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmax%2Fscrapy-redis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmax%2Fscrapy-redis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmax%2Fscrapy-redis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rmax","download_url":"https://codeload.github.com/rmax/scrapy-redis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253766779,"owners_count":21960990,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","distributed","redis","scrapy"],"created_at":"2024-07-31T14:00:44.973Z","updated_at":"2025-12-18T09:46:55.387Z","avatar_url":"https://github.com/rmax.png","language":"Python","readme":"============\nScrapy-Redis\n============\n\n.. image:: https://readthedocs.org/projects/scrapy-redis/badge/?version=latest\n        :alt: Documentation Status\n        :target: https://readthedocs.org/projects/scrapy-redis/?badge=latest\n\n.. image:: https://img.shields.io/pypi/v/scrapy-redis.svg\n        :target: https://pypi.python.org/pypi/scrapy-redis\n\n.. image:: https://img.shields.io/pypi/pyversions/scrapy-redis.svg\n        :target: https://pypi.python.org/pypi/scrapy-redis\n\n.. image:: https://github.com/rmax/scrapy-redis/actions/workflows/builds.yml/badge.svg\n        :target: https://github.com/rmax/scrapy-redis/actions/workflows/builds.yml\n        \n.. image:: https://github.com/rmax/scrapy-redis/actions/workflows/checks.yml/badge.svg\n        :target: https://github.com/rmax/scrapy-redis/actions/workflows/checks.yml\n        \n.. image:: https://github.com/rmax/scrapy-redis/actions/workflows/tests.yml/badge.svg\n        :target: https://github.com/rmax/scrapy-redis/actions/workflows/tests.yml\n        \n.. image:: https://codecov.io/github/rmax/scrapy-redis/coverage.svg?branch=master\n        :alt: Coverage Status\n        :target: https://codecov.io/github/rmax/scrapy-redis\n\n.. image:: https://img.shields.io/badge/security-bandit-green.svg\n        :alt: Security Status\n        :target: https://github.com/rmax/scrapy-redis\n    \nRedis-based components for Scrapy.\n\n* Usage: https://github.com/rmax/scrapy-redis/wiki/Usage\n* Documentation: https://github.com/rmax/scrapy-redis/wiki.\n* Release: https://github.com/rmax/scrapy-redis/wiki/History\n* Contribution: https://github.com/rmax/scrapy-redis/wiki/Getting-Started\n* LICENSE: MIT license\n\nFeatures\n--------\n\n* Distributed crawling/scraping\n\n    You can start multiple spider instances that share a single redis queue.\n    Best suitable for broad multi-domain crawls.\n\n* Distributed post-processing\n\n    Scraped items gets pushed into a redis queued meaning that you can start as\n    many as needed post-processing processes sharing the items queue.\n\n* Scrapy plug-and-play components\n\n    Scheduler + Duplication Filter, Item Pipeline, Base Spiders.\n\n* In this forked version: added ``json`` supported data in Redis\n\n    data contains ``url``, ```meta``` and other optional parameters. ``meta`` is a nested json which contains sub-data.\n    this function extract this data and send another FormRequest with ``url``, ``meta`` and addition ``formdata``.\n\n    For example:\n\n    .. code-block:: json\n\n        { \"url\": \"https://exaple.com\", \"meta\": {\"job-id\":\"123xsd\", \"start-date\":\"dd/mm/yy\"}, \"url_cookie_key\":\"fertxsas\" }\n\n    this data can be accessed in `scrapy spider` through response.\n    like: `request.url`, `request.meta`, `request.cookies`\n    \n.. note:: This features cover the basic case of distributing the workload across multiple workers. If you need more features like URL expiration, advanced URL prioritization, etc., we suggest you to take a look at the Frontera_ project.\n\nRequirements\n------------\n\n* Python 3.7+\n* Redis \u003e= 5.0\n* ``Scrapy`` \u003e=  2.0\n* ``redis-py`` \u003e= 4.0\n\nInstallation\n------------\n\nFrom pip \n\n.. code-block:: bash\n\n    pip install scrapy-redis\n\nFrom GitHub\n\n.. code-block:: bash\n\n    git clone https://github.com/darkrho/scrapy-redis.git\n    cd scrapy-redis\n    python setup.py install\n\n.. note:: For using this json supported data feature, please make sure you have not installed the scrapy-redis through pip. If you already did it, you first uninstall that one.\n  \n.. code-block:: bash\n\n    pip uninstall scrapy-redis\n\nAlternative Choice\n---------------------------\n\nFrontera_  is a web crawling framework consisting of `crawl frontier`_, and distribution/scaling primitives, allowing to build a large scale online web crawler.\n\n.. _Frontera: https://github.com/scrapinghub/frontera\n.. _crawl frontier: http://nlp.stanford.edu/IR-book/html/htmledition/the-url-frontier-1.html\n","funding_links":[],"categories":["All","Python","Apps"],"sub_categories":["Distributed Spider"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frmax%2Fscrapy-redis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frmax%2Fscrapy-redis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frmax%2Fscrapy-redis/lists"}