{"id":22940580,"url":"https://github.com/dhvcc/rss-parser","last_synced_at":"2025-04-23T04:27:31.623Z","repository":{"id":38041250,"uuid":"300986100","full_name":"dhvcc/rss-parser","owner":"dhvcc","description":"typed python RSS parsing module built using xmltodict and pydantic","archived":false,"fork":false,"pushed_at":"2024-09-26T10:58:17.000Z","size":274,"stargazers_count":43,"open_issues_count":1,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-29T22:05:23.355Z","etag":null,"topics":["atom","atom-feed","atom-parser","bs4","gplv3","mit-license","pydantic","python","python-3","python3","rss","rss-feed-parser","rss-feed-scraper","rss-parser","typed","typed-python","xml","xml-parser"],"latest_commit_sha":null,"homepage":"https://dhvcc.github.io/rss-parser/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dhvcc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-03T21:42:31.000Z","updated_at":"2025-03-15T12:57:30.000Z","dependencies_parsed_at":"2025-01-06T11:26:14.268Z","dependency_job_id":"a6c3b6ce-9f62-4e70-bd59-60c9696e0fd4","html_url":"https://github.com/dhvcc/rss-parser","commit_stats":{"total_commits":42,"total_committers":6,"mean_commits":7.0,"dds":0.5,"last_synced_commit":"a13e8faa8676049fc79a539421a8ec57aaeac36b"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhvcc%2Frss-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhvcc%2Frss-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhvcc%2Frss-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhvcc%2Frss-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dhvcc","download_url":"https://codeload.github.com/dhvcc/rss-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250369336,"owners_count":21419225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atom","atom-feed","atom-parser","bs4","gplv3","mit-license","pydantic","python","python-3","python3","rss","rss-feed-parser","rss-feed-scraper","rss-parser","typed","typed-python","xml","xml-parser"],"created_at":"2024-12-14T13:23:48.787Z","updated_at":"2025-04-23T04:27:31.607Z","avatar_url":"https://github.com/dhvcc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rss parser\n\n[![Downloads](https://pepy.tech/badge/rss-parser)](https://pepy.tech/project/rss-parser)\n[![Downloads](https://pepy.tech/badge/rss-parser/month)](https://pepy.tech/project/rss-parser)\n[![Downloads](https://pepy.tech/badge/rss-parser/week)](https://pepy.tech/project/rss-parser)\n\n[![PyPI version](https://img.shields.io/pypi/v/rss-parser)](https://pypi.org/project/rss-parser)\n[![Python versions](https://img.shields.io/pypi/pyversions/rss-parser)](https://pypi.org/project/rss-parser)\n[![Wheel status](https://img.shields.io/pypi/wheel/rss-parser)](https://pypi.org/project/rss-parser)\n[![License](https://img.shields.io/pypi/l/rss-parser?color=success)](https://github.com/dhvcc/rss-parser/blob/master/LICENSE)\n\n![Docs](https://github.com/dhvcc/rss-parser/actions/workflows/pages/pages-build-deployment/badge.svg)\n![CI](https://github.com/dhvcc/rss-parser/actions/workflows/ci.yml/badge.svg?branch=master)\n![PyPi publish](https://github.com/dhvcc/rss-parser/actions/workflows/publish_to_pypi.yml/badge.svg)\n\n## About\n\n`rss-parser` is typed python RSS/Atom parsing module built using [pydantic](https://github.com/pydantic/pydantic) and [xmltodict](https://github.com/martinblech/xmltodict)\n\n## Installation\n\n```bash\npip install rss-parser\n```\n\nor\n\n```bash\ngit clone https://github.com/dhvcc/rss-parser.git\ncd rss-parser\npoetry build\npip install dist/*.whl\n```\n\n## V1 -\u003e V2 migration\n- `Parser` class was renamed to `RSSParser`\n- Models for RSS-specific schemas were moved from `rss_parser.models` to `rss_parser.models.rss`. Generic types are not touched\n- Date parsing was changed a bit, now uses pydantic's `validator` instead of `email.utils`, so the code will produce datetimes better, where it was defaulting to `str` before\n\n## Usage\n\n### Quickstart\n\n**NOTE: For parsing Atom, use `AtomParser`**\n\n```python\nfrom rss_parser import RSSParser\nfrom requests import get  # noqa\n\nrss_url = \"https://rss.art19.com/apology-line\"\nresponse = get(rss_url)\n\nrss = RSSParser.parse(response.text)\n\n# Print out rss meta data\nprint(\"Language\", rss.channel.language)\nprint(\"RSS\", rss.version)\n\n# Iteratively print feed items\nfor item in rss.channel.items:\n    print(item.title)\n    print(item.description[:50])\n\n# Language en\n# RSS 2.0\n# Wondery Presents - Flipping The Bird: Elon vs Twitter\n# \u003cp\u003eWhen Elon Musk posted a video of himself arrivi\n# Introducing: The Apology Line\n# \u003cp\u003eIf you could call a number and say you’re sorry\n```\n\nHere we can see that description is still somehow has \u003cp\u003e - this is beacause it's placed as [CDATA](https://www.w3resource.com/xml/CDATA-sections.php) like so\n\n```\n\u003c![CDATA[\u003cp\u003eIf you could call ...\u003c/p\u003e]]\u003e\n```\n\n### Overriding schema\n\nIf you want to customize the schema or provide a custom one - use `schema` keyword argument of the parser\n\n```python\nfrom rss_parser import RSSParser\nfrom rss_parser.models import XMLBaseModel\nfrom rss_parser.models.rss import RSS\nfrom rss_parser.models.types import Tag\n\n\nclass CustomSchema(RSS, XMLBaseModel):\n    channel: None = None  # Removing previous channel field\n    custom: Tag[str]\n\n\nwith open(\"tests/samples/custom.xml\") as f:\n    data = f.read()\n\nrss = RSSParser.parse(data, schema=CustomSchema)\n\nprint(\"RSS\", rss.version)\nprint(\"Custom\", rss.custom)\n\n# RSS 2.0\n# Custom Custom tag data\n```\n\n### xmltodict\n\nThis library uses [xmltodict](https://github.com/martinblech/xmltodict) to parse XML data. You can see the detailed documentation [here](https://github.com/martinblech/xmltodict#xmltodict)\n\nThe basic thing you should know is that your data is processed into dictionaries\n\nFor example, this data\n\n```xml\n\u003ctag\u003econtent\u003c/tag\u003e\n```\n\nwill result in the following\n\n```python\n{\n    \"tag\": \"content\"\n}\n```\n\n*But*, when handling attributes, the content of the tag will be also a dictionary\n\n```xml\n\u003ctag attr=\"1\" data-value=\"data\"\u003edata\u003c/tag\u003e\n```\n\nTurns into\n\n```python\n{\n    \"tag\": {\n        \"@attr\": \"1\",\n        \"@data-value\": \"data\",\n        \"#text\": \"content\"\n    }\n}\n```\n\nMultiple children of a tag will be put into a list\n\n```xml\n\u003cdiv\u003e\n    \u003ctag\u003econtent\u003c/tag\u003e\n    \u003ctag\u003econtent2\u003c/tag\u003e\n\u003c/div\u003e\n```\n\nResults in a list\n\n```python\n[\n    { \"tag\": \"content\" },\n    { \"tag\": \"content\" },\n]\n```\n\nIf you don't want to deal with those conditions and parse something **always** as a list - \nplease, use `rss_parser.models.types.only_list.OnlyList` like we did in `Channel`\n```python\nfrom typing import Optional\n\nfrom rss_parser.models.rss.item import Item\nfrom rss_parser.models.types.only_list import OnlyList\nfrom rss_parser.models.types.tag import Tag\nfrom rss_parser.pydantic_proxy import import_v1_pydantic\n\npydantic = import_v1_pydantic()\n...\n\n\nclass OptionalChannelElementsMixin(...):\n    ...\n    items: Optional[OnlyList[Tag[Item]]] = pydantic.Field(alias=\"item\", default=[])\n```\n\n### Tag field\n\nThis is a generic field that handles tags as raw data or a dictonary returned with attributes\n\nExample\n\n```python\nfrom rss_parser.models import XMLBaseModel\nfrom rss_parser.models.types.tag import Tag\n\n\nclass Model(XMLBaseModel):\n    width: Tag[int]\n    category: Tag[str]\n\n\nm = Model(\n    width=48,\n    category={\"@someAttribute\": \"https://example.com\", \"#text\": \"valid string\"},\n)\n\n# Content value is an integer, as per the generic type\nassert m.width.content == 48\n\nassert type(m.width), type(m.width.content) == (Tag[int], int)\n\n# The attributes are empty by default\nassert m.width.attributes == {} # But are populated when provided.\n\n# Note that the @ symbol is trimmed from the beggining and name is convert to snake_case\nassert m.category.attributes == {'some_attribute': 'https://example.com'}\n```\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first\nto discuss what you would like to change.\n\nInstall dependencies with `poetry install` (`pip install poetry`)\n\n`pre-commit` usage is highly recommended. To install hooks run\n\n```bash\npoetry run pre-commit install -t=pre-commit -t=pre-push\n```\n\n## License\n\n[GPLv3](https://github.com/dhvcc/rss-parser/blob/master/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhvcc%2Frss-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdhvcc%2Frss-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhvcc%2Frss-parser/lists"}