https://github.com/rmax/scrapy-inline-requests
A decorator to write coroutine-like spider callbacks.
https://github.com/rmax/scrapy-inline-requests
Last synced: about 1 year ago
JSON representation
A decorator to write coroutine-like spider callbacks.
- Host: GitHub
- URL: https://github.com/rmax/scrapy-inline-requests
- Owner: rmax
- License: mit
- Created: 2012-02-03T20:36:00.000Z (over 14 years ago)
- Default Branch: master
- Last Pushed: 2022-12-26T20:31:12.000Z (over 3 years ago)
- Last Synced: 2025-02-27T10:39:29.919Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 68.4 KB
- Stars: 110
- Watchers: 5
- Forks: 27
- Open Issues: 10
-
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
======================
Scrapy Inline Requests
======================
.. image:: https://img.shields.io/pypi/v/scrapy-inline-requests.svg
:target: https://pypi.python.org/pypi/scrapy-inline-requests
.. image:: https://img.shields.io/pypi/pyversions/scrapy-inline-requests.svg
:target: https://pypi.python.org/pypi/scrapy-inline-requests
.. image:: https://readthedocs.org/projects/scrapy-inline-requests/badge/?version=latest
:target: https://readthedocs.org/projects/scrapy-inline-requests/?badge=latest
:alt: Documentation Status
.. image:: https://img.shields.io/travis/rolando/scrapy-inline-requests.svg
:target: https://travis-ci.org/rolando/scrapy-inline-requests
.. image:: https://codecov.io/github/rolando/scrapy-inline-requests/coverage.svg?branch=master
:alt: Coverage Status
:target: https://codecov.io/github/rolando/scrapy-inline-requests
.. image:: https://landscape.io/github/rolando/scrapy-inline-requests/master/landscape.svg?style=flat
:target: https://landscape.io/github/rolando/scrapy-inline-requests/master
:alt: Code Quality Status
.. image:: https://requires.io/github/rolando/scrapy-inline-requests/requirements.svg?branch=master
:alt: Requirements Status
:target: https://requires.io/github/rolando/scrapy-inline-requests/requirements/?branch=master
A decorator for writing coroutine-like spider callbacks.
* Free software: MIT license
* Documentation: https://scrapy-inline-requests.readthedocs.org.
* Python versions: 2.7, 3.4+
Quickstart
----------
The spider below shows a simple use case of scraping a page and following a few links:
.. code:: python
from inline_requests import inline_requests
from scrapy import Spider, Request
class MySpider(Spider):
name = 'myspider'
start_urls = ['http://httpbin.org/html']
@inline_requests
def parse(self, response):
urls = [response.url]
for i in range(10):
next_url = response.urljoin('?page=%d' % i)
try:
next_resp = yield Request(next_url)
urls.append(next_resp.url)
except Exception:
self.logger.info("Failed request %s", i, exc_info=True)
yield {'urls': urls}
See the ``examples/`` directory for a more complex spider.
.. warning::
The generator resumes its execution when a request's response is processed,
this means the generator won't be resume after yielding an item or a request
with it's own callback.
Known Issues
------------
* Middlewares can drop or ignore non-200 status responses causing the callback
to not continue its execution. This can be overcome by using the flag
``handle_httpstatus_all``. See the `httperror middleware`_ documentation.
* High concurrency and large responses can cause higher memory usage.
* This decorator assumes your method have the following signature
``(self, response)``.
* Wrapped requests may not be able to be serialized by persistent backends.
* Unless you know what you are doing, the decorated method must be a spider
method and return a **generator** instance.
.. _`httperror middleware`: http://doc.scrapy.org/en/latest/topics/spider-middleware.html#scrapy.spidermiddlewares.httperror.HttpErrorMiddleware