{"id":15009026,"url":"https://github.com/woxcab/scrapy_rss","last_synced_at":"2026-02-02T17:24:23.039Z","repository":{"id":50490247,"uuid":"80703777","full_name":"woxcab/scrapy_rss","owner":"woxcab","description":"Tools to easy generate RSS feed that contains each scraped item using Scrapy framework.","archived":false,"fork":false,"pushed_at":"2024-11-24T13:35:51.000Z","size":438,"stargazers_count":33,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-27T00:14:02.470Z","etag":null,"topics":["python","python-2","python-3","python2","python3","rss","rss-feed","scrapy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/woxcab.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-02T07:43:53.000Z","updated_at":"2025-01-20T13:41:26.000Z","dependencies_parsed_at":"2024-02-16T05:23:23.469Z","dependency_job_id":"9c65138d-9c40-4ff5-8200-08ea8555979d","html_url":"https://github.com/woxcab/scrapy_rss","commit_stats":{"total_commits":188,"total_committers":2,"mean_commits":94.0,"dds":0.005319148936170248,"last_synced_commit":"93bf446fcaf4b334364f4e9d2175ce555f76dea7"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/woxcab%2Fscrapy_rss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/woxcab%2Fscrapy_rss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/woxcab%2Fscrapy_rss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/woxcab%2Fscrapy_rss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/woxcab","download_url":"https://codeload.github.com/woxcab/scrapy_rss/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247773726,"owners_count":20993639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","python-2","python-3","python2","python3","rss","rss-feed","scrapy"],"created_at":"2024-09-24T19:22:30.155Z","updated_at":"2026-02-02T17:24:23.034Z","avatar_url":"https://github.com/woxcab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"==========\nscrapy_rss\n==========\n\n.. image:: https://img.shields.io/pypi/v/scrapy-rss.svg?style=flat-square\n   :target: https://pypi.python.org/pypi/scrapy_rss\n   :alt: PyPI Version\n\n.. image:: https://img.shields.io/pypi/wheel/scrapy-rss.svg?style=flat-square\n   :target: https://pypi.python.org/pypi/scrapy_rss\n   :alt: Wheel Status\n\n.. image:: https://github.com/woxcab/scrapy_rss/actions/workflows/tests.yml/badge.svg?branch=master\n   :target: https://github.com/woxcab/scrapy_rss/actions\n   :alt: Testing status\n\n.. image:: https://img.shields.io/codecov/c/github/woxcab/scrapy_rss/master.svg?style=flat-square\n   :target: http://codecov.io/github/woxcab/scrapy_rss?branch=master\n   :alt: Coverage report\n\n.. image:: https://img.shields.io/pypi/pyversions/scrapy-rss.svg?style=flat-square\n   :target: https://pypi.python.org/pypi/scrapy_rss\n   :alt: Supported python versions\n\n\nTools to easy generate `RSS feed \u003chttp://www.rssboard.org/rss-specification\u003e`_\nthat contains each scraped item using `Scrapy framework \u003chttps://github.com/scrapy/scrapy\u003e`_.\n\n\nTable of Contents\n=================\n* `Installation \u003c#installation\u003e`__\n* `How To Use \u003c#how-to-use\u003e`__\n\n  * `Configuration \u003c#configuration\u003e`__\n  * `Usage \u003c#usage\u003e`__\n  \n    * `Basic usage \u003c#basic-usage\u003e`__\n    * `RssItem derivation and namespaces \u003c#rssitem-derivation-and-namespaces\u003e`__\n  * `Optional Additional Customization \u003c#feed-channel-elements-customization-optionally\u003e`__\n  * `Backward compatibility notices \u003c#backward-compatibility-notices\u003e`__\n\n* `Scrapy Project Examples \u003c#scrapy-project-examples\u003e`__\n\n\n`Installation \u003chttps://packaging.python.org/installing/\u003e`_\n==========================================================\n* Install :code:`scrapy_rss` using pip\n\n  .. code:: bash\n\n       pip install scrapy_rss\n\n  or using pip for the specific interpreter, e.g.:\n\n  .. code:: bash\n\n      pip3 install scrapy_rss\n\n* or using setuptools directly:\n\n  .. code:: bash\n\n      cd path/to/root/of/scrapy_rss\n      python setup.py install\n\n  or using setuptools for specific interpreter, e.g.:\n\n  .. code:: bash\n\n      cd path/to/root/of/scrapy_rss\n      python3 setup.py install\n\n\nHow To Use\n==========\n\nConfiguration\n-------------\n\nAdd parameters to the Scrapy project settings (`settings.py` file)\nor to the :code:`custom_settings` attribute of the spider:\n\n1. Add item pipeline that export items to rss feed:\n\n   .. code:: python\n\n     ITEM_PIPELINES = {\n         # ...\n         'scrapy_rss.pipelines.FeedExportPipeline': 900,  # or another priority\n         # ...\n     }\n\n\n2. Add required feed parameters:\n\n   FEED_FILE\n       the absolute or relative file path where the result RSS feed will be saved.\n       For example, :code:`feed.rss` or :code:`output/feed.rss`.\n   FEED_TITLE\n       the name of the channel (feed),\n   FEED_DESCRIPTION\n       the phrase or sentence that describes the channel (feed),\n   FEED_LINK\n       the URL to the HTML website corresponding to the channel (feed)\n\n   .. code:: python\n\n     FEED_FILE = 'path/to/feed.rss'\n     FEED_TITLE = 'Some title of the channel'\n     FEED_LINK = 'http://example.com/rss'\n     FEED_DESCRIPTION = 'About channel'\n\n\nUsage\n-----\nBasic usage\n^^^^^^^^^^^\n\nDeclare your item directly as RssItem():\n\n.. code:: python\n\n  import scrapy_rss\n\n  item1 = scrapy_rss.RssItem()\n\nOr use predefined item class :code:`RssedItem` with RSS field named as :code:`rss`\nthat's instance of :code:`RssItem`:\n\n.. code:: python\n\n  import scrapy\n  import scrapy_rss\n\n  class MyItem(scrapy_rss.RssedItem):\n      field1 = scrapy.Field()\n      field2 = scrapy.Field()\n      # ...\n\n  item2 = MyItem()\n\n\nSet/get item fields. Case sensitive attributes of :code:`RssItem()` are appropriate to RSS elements.\nAttributes of RSS elements are case sensitive too.\nIf the editor allows autocompletion then it suggests attributes for instances of :code:`RssedItem` and :code:`RssItem`.\nIt's allowed to set **any** subset of RSS elements (e.g. title only). For example:\n\n.. code:: python\n\n  from datetime import datetime\n\n  item1.title = 'RSS item title'  # set value of \u003ctitle\u003e element\n  title = item1.title.value  # get value of \u003ctitle\u003e element\n  item1.description = 'description'\n\n  item1.guid = 'item identifier'\n  item1.guid.isPermaLink = False  # set value of attribute isPermalink of \u003cguid\u003e element,\n                                  # isPermaLink is True by default\n  is_permalink = item1.guid.isPermaLink  # get value of attribute isPermalink of \u003cguid\u003e element\n  guid = item1.guid.value  # get value of element \u003cguid\u003e\n\n  item1.category = 'single category'\n  category = item1.category\n  item1.category = ['first category', 'second category']\n  first_category = item1.category[0].value # get value of the element \u003ccategory\u003e with multiple values\n  all_categories = [cat.value for cat in item1.category]\n\n  # direct attributes setting\n  item1.enclosure.url = 'http://example.com/file'\n  item1.enclosure.length = 0\n  item1.enclosure.type = 'text/plain'\n\n  # or dict based attributes setting\n  item1.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}\n  item1.guid = {'value': 'item identifier', 'isPermaLink': True}\n\n  item1.pubDate = datetime.now()  # correctly works with Python' datetimes\n\n\n  item2.rss.title = 'Item title'\n  item2.rss.guid = 'identifier'\n  item2.rss.enclosure = {'url': 'http://example.com/file', 'length': 0, 'type': 'text/plain'}\n\n\nAll allowed elements are listed in the `scrapy_rss/items.py \u003chttps://github.com/woxcab/scrapy_rss/blob/master/scrapy_rss/items.py\u003e`_.\nAll allowed attributes of each element with constraints and default values\nare listed in the `scrapy_rss/elements.py \u003chttps://github.com/woxcab/scrapy_rss/blob/master/scrapy_rss/elements.py\u003e`_.\nAlso you can read `RSS specification \u003chttp://www.rssboard.org/rss-specification\u003e`_ for more details.\n\n:code:`RssItem` derivation and namespaces\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can extend RssItem to add new XML fields that can be namespaced or not.\nYou can specify namespaces in an attribute and/or an element constructors.\nNamespace prefix can be specified in the attribute/element name\nusing double underscores as delimiter (:code:`prefix__name`)\nor in the attribute/element constructor using :code:`ns_prefix` argument. \nNamespace URI can be specified using :code:`ns_uri` argument of the constructor.\n\n.. code:: python\n\n    from scrapy_rss.meta import ElementAttribute, Element\n    from scrapy_rss.items import RssItem\n\n    class Element0(Element):\n        # attributes without special namespace\n        attr0 = ElementAttribute(is_content=True, required=True)\n        attr1 = ElementAttribute()\n\n    class Element1(Element):\n        # attribute \"prefix2:attr2\" with namespace xmlns:prefix2=\"id2\"\n        attr2 = ElementAttribute(ns_prefix=\"prefix2\", ns_uri=\"id2\")\n\n        # attribute \"prefix3:attr3\" with namespace xmlns:prefix3=\"id3\"\n        prefix3__attr3 = ElementAttribute(ns_uri=\"id3\")\n\n        # attribute \"prefix4:attr4\" with namespace xmlns:prefix4=\"id4\"\n        fake_prefix__attr4 = ElementAttribute(ns_prefix=\"prefix4\", ns_uri=\"id4\")\n\n        # attribute \"attr5\" with default namespace xmlns=\"id5\"\n        attr5 = ElementAttribute(ns_uri=\"id5\")\n\n    class MyXMLItem(RssItem):\n        # element \u003celem1\u003e without namespace\n        elem1 = Element0()\n\n        # element \u003celem_prefix2:elem2\u003e with namespace xmlns:elem_prefix2=\"id2e\"\n        elem2 = Element0(ns_prefix=\"elem_prefix2\", ns_uri=\"id2e\")\n\n        # element \u003celem_prefix3:elem3\u003e with namespace xmlns:elem_prefix3=\"id3e\"\n        elem_prefix3__elem3 = Element1(ns_uri=\"id3e\")\n\n        # yet another element \u003celem_prefix4:elem3\u003e with namespace xmlns:elem_prefix4=\"id4e\"\n        # (does not conflict with previous one)\n        fake_prefix__elem3 = Element0(ns_prefix=\"elem_prefix4\", ns_uri=\"id4e\")\n\n        # element \u003celem5\u003e with default namespace xmlns=\"id5e\"\n        elem5 = Element0(ns_uri=\"id5e\")\n\nAccess to elements and its attributes is the same as with simple items:\n\n.. code:: python\n\n    item = MyXMLItem()\n    item.title = 'Some title'\n    item.elem1.attr0 = 'Required content value'\n    item.elem1 = 'Another way to set content value'\n    item.elem1.attr1 = 'Some attribute value'\n    item.elem_prefix3__elem3.prefix3__attr3 = 'Yet another attribute value'\n    item.elem_prefix3__elem3.fake_prefix__attr4 = '' # non-None value is interpreted as assigned\n    item.fake_prefix__elem3.attr1 = 42\n\n\nSeveral optional settings are allowed for namespaced items:\n\nFEED_NAMESPACES\n  list of tuples :code:`[(prefix, URI), ...]` or dictionary :code:`{prefix: URI, ...}` of namespaces\n  that must be defined in the root XML element\n\nFEED_ITEM_CLASS or FEED_ITEM_CLS\n  main class of feed items (class object :code:`MyXMLItem` or path to class :code:`\"path.to.MyXMLItem\"`).\n  **Default value**: :code:`RssItem`.\n  It's used in order to extract all possible namespaces\n  that will be declared in the root XML element.\n\n  Feed items do **NOT** have to be instances of this class or its subclass.\n\nIf these settings are not defined or only part of namespaces are defined\nthen other used namespaces will be declared either in the :code:`\u003citem\u003e` element\nor in its subelements when these namespaces are not unique.\nEach :code:`\u003citem\u003e` element and its sublements always contains\nonly namespace declarations of non-:code:`None` attributes (including ones that are interpreted as element content).\n\n\nFeed (Channel) Elements Customization [optionally]\n--------------------------------------------------\n\nIf you want to change other channel parameters (such as language, copyright, managingEditor, webMaster,\npubDate, lastBuildDate, category, generator, docs, cloud, ttl, image, rating, textInput, skipHours, skipDays)\nthen define your own exporter that's inherited from :code:`FeedItemExporter` class and, for example,\nmodify one or more children of :code:`self.channel` `Element \u003chttps://github.com/woxcab/scrapy_rss/blob/master/scrapy_rss/rss/channel.py\u003e`__ (camelCase attributes naming):\n\n.. code:: python\n\n   from datetime import datetime\n   from scrapy_rss.rss import channel_elements\n   from scrapy_rss.exporters import FeedItemExporter\n\n   class MyRssItemExporter(FeedItemExporter):\n      def __init__(self, *args, **kwargs):\n         super(MyRssItemExporter, self).__init__(*args, **kwargs)\n         self.channel.generator = 'Special generator'\n         self.channel.language = 'en-us'\n         self.channel.managingEditor = 'editor@example.com'\n         self.channel.webMaster = 'webmaster@example.com'\n         self.channel.copyright = 'Copyright 2025'\n         self.channel.pubDate = datetime(2025, 9, 10, 13, 0, 0)\n\n         self.channel.category = ['category 1', 'category 2']\n         self.channel.category.append('category 3')\n         self.channel.category.extend(['category 4', 'category 5'])\n\n         # initialize image from dict\n         self.channel.image = {\n             'url': 'https://example.com/img.jpg',\n             'description': 'Image link hover text',\n         }\n         # or initialize image from ImageElement\n         self.channel.image = channel_elements.ImageElement(url='https://example.com/img.jpg')\n         # or initialize image by each attribute\n         self.channel.image.url = 'https://example.com/img.jpg' # required attribute of image\n         self.channel.image.title = 'Image title' # optional\n         self.channel.image.link = 'https://example.com/page' # optional\n         self.channel.image.description = 'Image link hover text' # optional\n         self.channel.image.width = 140 # optional\n         self.channel.image.height = 350 # optional\n\n         self.channel.docs = 'https://example.com/rss_docs'\n         self.channel.cloud = {\n             'domain': 'rpc.sys.com',\n             'port': '80',\n             'path': '/RPC2',\n             'registerProcedure': 'myCloud.rssPleaseNotify',\n             'protocol': 'xml-rpc'\n         }\n         self.channel.ttl = 60\n         self.channel.rating = 4.0\n         self.channel.textInput = channel_elements.TextInputElement(\n             title='Input title',\n             description='Description of input',\n             name='Input name',\n             link='http://example.com/cgi.py'\n         )\n\n         self.channel.skipHours = (0, 1, 3, 7, 23) # initialize list from iterable\n         self.channel.skipHours = 12 # or initialize list with single value\n\n         self.channel.skipDays = 14 # initialize list with single value\n         self.channel.skipDays = [1, 14] # or initialize list from list\n\nor modify :code:`kwargs` arguments (snake_case arguments naming):\n\n.. code:: python\n\n   from scrapy_rss.exporters import FeedItemExporter\n\n   class MyRssItemExporter(FeedItemExporter):\n      def __init__(self, *args, **kwargs):\n         kwargs['generator'] = kwargs.get('generator', 'Special generator')\n         kwargs['language'] = kwargs.get('language', 'en-us')\n         kwargs['managing_editor'] = kwargs.get('managing_editor', 'editor@example.com')\n         kwargs['managing_editor'] = kwargs.get('managing_editor', ('category 1', 'category 2'))\n         kwargs['image'] = kwargs.get('image', {'url': 'https://example.com/img.jpg'})\n         # etc.\n         super(MyRssItemExporter, self).__init__(*args, **kwargs)\n\nAnd add :code:`FEED_EXPORTER` parameter to the Scrapy project settings\nor to the :code:`custom_settings` attribute of the spider:\n\n.. code:: python\n\n   FEED_EXPORTER = 'myproject.exporters.MyRssItemExporter'\n\n\nBackward compatibility notices\n------------------------------\nSince version 1.0.0 some classes have been renamed, but old-named classes have been kept and marked as deprecated\nfor backward compatibility, so they can still be used.\n\nBut `some elements \u003chttps://github.com/woxcab/scrapy_rss/blob/master/scrapy_rss/rss/item_elements.py\u003e`__\nof :code:`RssItem` have some their attributes renamed in a backward incompatible way:\nalmost all **content** attributes (text content of XML tag after exporting)\nare renamed to :code:`value` to enhance code readability.\n\nSo if you do not want update your code expressions (such as an old-style :code:`item.title.title`\nto a new-style :code:`item.title.value` or :code:`item.guid.guid` to :code:`item.guid.value`) then\nyou can easily import old-style classes\n\n.. code:: python\n\n    # old-style classes\n    from scrapy_rss.rss.old.items import RssItem, RssedItem\n\ninstead of new-style ones\n\n.. code:: python\n\n    # new-style classes\n    from scrapy_rss.items import RssItem, RssedItem\n\nrespectively.\n\n\nScrapy Project Examples\n=======================\n\n`Examples directory \u003chttps://github.com/woxcab/scrapy_rss/blob/master/examples\u003e`_ contains\nseveral Scrapy projects with the scrapy_rss usage demonstration. It crawls\n`this website \u003chttps://woxcab.github.io/scrapy_rss/\u003e`_ whose source code is\n`here \u003chttps://github.com/woxcab/scrapy_rss/blob/master/examples/website\u003e`_.\n\nJust go to the Scrapy project directory and run commands\n\n.. code:: bash\n\n   scrapy crawl first_spider\n   scrapy crawl second_spider\n\nThereafter `feed.rss` and `feed2.rss` files will be created in the same directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwoxcab%2Fscrapy_rss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwoxcab%2Fscrapy_rss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwoxcab%2Fscrapy_rss/lists"}