https://github.com/scrapinghub/scrapy-poet
Page Object pattern for Scrapy
https://github.com/scrapinghub/scrapy-poet
Last synced: 12 days ago
JSON representation
Page Object pattern for Scrapy
- Host: GitHub
- URL: https://github.com/scrapinghub/scrapy-poet
- Owner: scrapinghub
- License: bsd-3-clause
- Created: 2019-08-28T18:10:32.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-02-12T10:59:02.000Z (2 months ago)
- Last Synced: 2025-04-01T13:01:44.663Z (19 days ago)
- Language: Python
- Size: 1.12 MB
- Stars: 120
- Watchers: 10
- Forks: 28
- Open Issues: 16
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- Contributing: docs/contributing.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-scrapy - scrapy-poet
README
===========
scrapy-poet
===========.. image:: https://img.shields.io/pypi/v/scrapy-poet.svg
:target: https://pypi.python.org/pypi/scrapy-poet
:alt: PyPI Version.. image:: https://img.shields.io/pypi/pyversions/scrapy-poet.svg
:target: https://pypi.python.org/pypi/scrapy-poet
:alt: Supported Python Versions.. image:: https://github.com/scrapinghub/scrapy-poet/workflows/tox/badge.svg
:target: https://github.com/scrapinghub/scrapy-poet/actions
:alt: Build Status.. image:: https://codecov.io/github/scrapinghub/scrapy-poet/coverage.svg?branch=master
:target: https://codecov.io/gh/scrapinghub/scrapy-poet
:alt: Coverage report.. image:: https://readthedocs.org/projects/scrapy-poet/badge/?version=stable
:target: https://scrapy-poet.readthedocs.io/en/stable/?badge=stable
:alt: Documentation Status``scrapy-poet`` is the `web-poet`_ Page Object pattern implementation for Scrapy.
``scrapy-poet`` allows to write spiders where extraction logic is separated from the crawling one.
With ``scrapy-poet`` is possible to make a single spider that supports many sites with
different layouts.Requires **Python 3.9+** and **Scrapy >= 2.6.0**.
Read the `documentation `_ for more information.
License is BSD 3-clause.
* Documentation: https://scrapy-poet.readthedocs.io
* Source code: https://github.com/scrapinghub/scrapy-poet
* Issue tracker: https://github.com/scrapinghub/scrapy-poet/issues.. _`web-poet`: https://github.com/scrapinghub/web-poet
Quick Start
***********Installation
============.. code-block::
pip install scrapy-poet
Usage in a Scrapy Project
=========================Add the following inside Scrapy's ``settings.py`` file:
- Scrapy ≥ 2.10:
.. code-block:: python
ADDONS = {
"scrapy_poet.Addon": 300,
}- Scrapy < 2.10:
.. code-block:: python
DOWNLOADER_MIDDLEWARES = {
"scrapy_poet.InjectionMiddleware": 543,
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
"scrapy_poet.DownloaderStatsMiddleware": 850,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
SPIDER_MIDDLEWARES = {
"scrapy_poet.RetryMiddleware": 275,
}Developing
==========Setup your local Python environment via:
1. `pip install -r requirements-dev.txt`
2. `pre-commit install`Now everytime you perform a `git commit`, these tools will run against the
staged files:* `black`
* `isort`
* `flake8`You can also directly invoke `pre-commit run --all-files` or `tox -e linters`
to run them without performing a commit.