Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scrapinghub/scrapyrt
HTTP API for Scrapy spiders
https://github.com/scrapinghub/scrapyrt
crawler crawling hacktoberfest hacktoberfest2021 python scraper scrapy twisted webcrawler webcrawling
Last synced: 5 days ago
JSON representation
HTTP API for Scrapy spiders
- Host: GitHub
- URL: https://github.com/scrapinghub/scrapyrt
- Owner: scrapinghub
- License: bsd-3-clause
- Created: 2015-01-06T15:07:16.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2024-06-28T14:25:11.000Z (6 months ago)
- Last Synced: 2024-12-01T19:00:22.151Z (12 days ago)
- Topics: crawler, crawling, hacktoberfest, hacktoberfest2021, python, scraper, scrapy, twisted, webcrawler, webcrawling
- Language: Python
- Homepage:
- Size: 233 KB
- Stars: 837
- Watchers: 45
- Forks: 162
- Open Issues: 30
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-scrapy - scrapyrt
README
.. image:: https://raw.githubusercontent.com/scrapinghub/scrapyrt/master/artwork/logo.gif
:width: 400px
:align: center==========================
ScrapyRT (Scrapy realtime)
==========================.. image:: https://github.com/scrapinghub/scrapyrt/workflows/CI/badge.svg
:target: https://github.com/scrapinghub/scrapyrt/actions.. image:: https://img.shields.io/pypi/pyversions/scrapyrt.svg
:target: https://pypi.python.org/pypi/scrapyrt.. image:: https://img.shields.io/pypi/v/scrapyrt.svg
:target: https://pypi.python.org/pypi/scrapyrt.. image:: https://img.shields.io/pypi/l/scrapyrt.svg
:target: https://pypi.python.org/pypi/scrapyrt.. image:: https://img.shields.io/pypi/dm/scrapyrt.svg
:target: https://pypistats.org/packages/scrapyrt
:alt: Downloads count.. image:: https://readthedocs.org/projects/scrapyrt/badge/?version=latest
:target: https://scrapyrt.readthedocs.io/en/latest/api.htmlAdd HTTP API for your `Scrapy `_ project in minutes.
You send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider
visiting this URL.* All Scrapy project components (e.g. middleware, pipelines, extensions) are supported
* You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.Quickstart
===============**1. install**
.. code-block:: shell
> pip install scrapyrt
**2. switch to Scrapy project (e.g. quotesbot project)**
.. code-block:: shell
> cd my/project_path/is/quotesbot
**3. launch ScrapyRT**
.. code-block:: shell
> scrapyrt
**4. run your spiders**
.. code-block:: shell
> curl "localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"
**5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider**
.. code-block:: shell
> curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/", "callback":"some_callback"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl.json -v
Scrapyrt will look for ``scrapy.cfg`` file to determine your project settings,
and will raise error if it won't find one. Note that you need to have all
your project requirements installed.Note
====
* Project is not a replacement for `Scrapyd `_ or `Scrapy Cloud `_ or other infrastructure to run long running crawls
* Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quicklyDocumentation
=============`Documentation is available on readthedocs `_.
Support
=======Open source support is provided here in Github. Please `create a question
issue`_ (ie. issue with "question" label).Commercial support is also available by `Zyte`_.
.. _create a question issue: https://github.com/scrapinghub/scrapyrt/issues/new?labels=question
.. _Zyte: http://zyte.comLicense
=======
ScrapyRT is offered under `BSD 3-Clause license `_.Development
===========
Development taking place on `Github `_.