https://github.com/scrapinghub/scrapy-poet
Page Object pattern for Scrapy
https://github.com/scrapinghub/scrapy-poet
Last synced: 6 months ago
JSON representation
Page Object pattern for Scrapy
- Host: GitHub
- URL: https://github.com/scrapinghub/scrapy-poet
- Owner: scrapinghub
- License: bsd-3-clause
- Created: 2019-08-28T18:10:32.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2025-02-12T10:59:02.000Z (10 months ago)
- Last Synced: 2025-05-07T04:37:37.574Z (7 months ago)
- Language: Python
- Size: 1.12 MB
- Stars: 121
- Watchers: 10
- Forks: 28
- Open Issues: 16
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- Contributing: docs/contributing.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-scrapy - scrapy-poet
README
===========
scrapy-poet
===========
.. image:: https://img.shields.io/pypi/v/scrapy-poet.svg
:target: https://pypi.python.org/pypi/scrapy-poet
:alt: PyPI Version
.. image:: https://img.shields.io/pypi/pyversions/scrapy-poet.svg
:target: https://pypi.python.org/pypi/scrapy-poet
:alt: Supported Python Versions
.. image:: https://github.com/scrapinghub/scrapy-poet/workflows/tox/badge.svg
:target: https://github.com/scrapinghub/scrapy-poet/actions
:alt: Build Status
.. image:: https://codecov.io/github/scrapinghub/scrapy-poet/coverage.svg?branch=master
:target: https://codecov.io/gh/scrapinghub/scrapy-poet
:alt: Coverage report
.. image:: https://readthedocs.org/projects/scrapy-poet/badge/?version=stable
:target: https://scrapy-poet.readthedocs.io/en/stable/?badge=stable
:alt: Documentation Status
``scrapy-poet`` is the `web-poet`_ Page Object pattern implementation for Scrapy.
``scrapy-poet`` allows to write spiders where extraction logic is separated from the crawling one.
With ``scrapy-poet`` is possible to make a single spider that supports many sites with
different layouts.
Requires **Python 3.9+** and **Scrapy >= 2.6.0**.
Read the `documentation `_ for more information.
License is BSD 3-clause.
* Documentation: https://scrapy-poet.readthedocs.io
* Source code: https://github.com/scrapinghub/scrapy-poet
* Issue tracker: https://github.com/scrapinghub/scrapy-poet/issues
.. _`web-poet`: https://github.com/scrapinghub/web-poet
Quick Start
***********
Installation
============
.. code-block::
pip install scrapy-poet
Usage in a Scrapy Project
=========================
Add the following inside Scrapy's ``settings.py`` file:
- Scrapy ≥ 2.10:
.. code-block:: python
ADDONS = {
"scrapy_poet.Addon": 300,
}
- Scrapy < 2.10:
.. code-block:: python
DOWNLOADER_MIDDLEWARES = {
"scrapy_poet.InjectionMiddleware": 543,
"scrapy.downloadermiddlewares.stats.DownloaderStats": None,
"scrapy_poet.DownloaderStatsMiddleware": 850,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_poet.ScrapyPoetRequestFingerprinter"
SPIDER_MIDDLEWARES = {
"scrapy_poet.RetryMiddleware": 275,
}
Developing
==========
Setup your local Python environment via:
1. `pip install -r requirements-dev.txt`
2. `pre-commit install`
Now everytime you perform a `git commit`, these tools will run against the
staged files:
* `black`
* `isort`
* `flake8`
You can also directly invoke `pre-commit run --all-files` or `tox -e linters`
to run them without performing a commit.