Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ljanyst/scrapy-rss-exporter
An RSS exporter for Scrapy
https://github.com/ljanyst/scrapy-rss-exporter
Last synced: about 2 months ago
JSON representation
An RSS exporter for Scrapy
- Host: GitHub
- URL: https://github.com/ljanyst/scrapy-rss-exporter
- Owner: ljanyst
- License: bsd-3-clause
- Created: 2017-11-22T10:47:18.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-01-11T19:31:29.000Z (about 5 years ago)
- Last Synced: 2024-07-25T04:03:07.733Z (6 months ago)
- Language: Python
- Size: 5.86 KB
- Stars: 3
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-scrapy - scrapy-rss-exporter
README
===================
scrapy-rss-exporter
===================.. image:: https://img.shields.io/pypi/v/scrapy-rss-exporter.svg
:target: https://pypi.python.org/pypi/scrapy-rss-exporter
:alt: PyPI VersionGenerate an RSS feed using the `Scrapy `_ framework.
Table of Contents
=================* `Installation <#installation>`__
* `Usage <#usage>`__* `Feed Items <#feed-items>`__
* `Global Exporter <#global-exporter>`__
* `Per Spider Exporter <#per-spider-exporter>`__Installation
============* Install :code:`scrapy-rss-exporter` using :code:`pip`:
.. code:: bash
pip install scrapy-rss-exporter
* or using :code:`setuptools`:
.. code:: bash
python setup.py install
Usage
=====Feed Items
----------The most convenient way to use the exporter is to return the objects of
:code:`RssItem` class from your spiders. This class derives from
:code:`scrapy.Item`, so it will work with other exporters as well.You will need to set the following keys:
.. code:: python
from scrapy_rss_exporter.items import RssItem, Enclosure
rss_item = RssItem()
rss_item['title'] = 'Item title'
rss_item['link'] = 'Item url'
rss_item['guid'] = 'Item ID'
rss_item['description'] = 'Item Description'
rss_item['pub_date'] = None
rss_item['enclosure'] = [Enclosure(url=img, type='image/jpeg')]The :code:`pub_date` field should contain a date in the
`RFC882 `_
format. If you use :code:`None`, the system will insert the current date
in the appropriate format. The :code:`enclosure` field is optional and should
contain a (possibly empty) list of :code:`Enclosure` objects.Global Exporter
---------------To set the exporter up globally, you need to declare it in the
:code:`FEED_EXPORTERS` dictionary in the :code:`settings.py` file:.. code:: python
FEED_EXPORTERS = {
'rss': 'scrapy_rss_exporter.exporters.RssItemExporter'
}You can then use it as a :code:`FEED_FORMAT` and specify the output file in the
:code:`FEED_URI`:.. code:: python
FEED_FORMAT = 'rss'
FEED_URI = 's3://my-feeds/my-feed.rss'**Note:** Bear in mind that, if you use a local file as output, :code:`scrapy`
will append to an existing file resulting with an invalid RSS code. You should,
therefore, make sure to delete any existing output file before running the
spider. The :code:`s3` storage does not have this problem because
:code:`scrapy` uploads are using the :code:`S3 PutObject` method.:code:`scrapy` does not seem to allow to push any configuration option to an
exporter. Therefore, if you want to customize the feed title and other metadata,
you need to create a subclass and update the :code:`FEED_EXPORTERS` dictionary
with the new class name:.. code:: python
class MyRssExporter(RssItemExporter):
def __init__(self, *args, **kwargs):
kwargs['title'] = 'My RSS'
kwargs['link'] = 'https://www.mywebsite.com'
kwargs['description'] = 'My RSS Items'
super(MyRssExporter, self).__init__(*args, **kwargs)Per Spider Exporter
-------------------You can, of course, specify a different exporter with different settings for
each spider. Just use the :code:`custom_settings` field to override the global
configuration fields:.. code:: python
class MySpider(scrapy.Spider):
name = "my"
start_urls = ['https://www.mywebsite.com']
custom_settings = {
'FEED_EXPORTERS': {'rss': 'project.spiders.my_spider.MyExporter'},
'FEED_FORMAT': 'rss',
'FEED_URI': 's3://my-feeds/my-feed.rss',
}def parse(self, response):
pass