Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scrapy/parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
https://github.com/scrapy/parsel
css hacktoberfest lxml python scraping selectors xml xpath
Last synced: 5 days ago
JSON representation
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
- Host: GitHub
- URL: https://github.com/scrapy/parsel
- Owner: scrapy
- License: bsd-3-clause
- Created: 2015-04-24T15:53:36.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2024-10-16T09:20:56.000Z (3 months ago)
- Last Synced: 2024-10-29T11:06:13.412Z (3 months ago)
- Topics: css, hacktoberfest, lxml, python, scraping, selectors, xml, xpath
- Language: Python
- Homepage:
- Size: 810 KB
- Stars: 1,144
- Watchers: 35
- Forks: 146
- Open Issues: 41
-
Metadata Files:
- Readme: README.rst
- Changelog: NEWS
- License: LICENSE
Awesome Lists containing this project
- starred-awesome - parsel - Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors (Python)
- best-of-web-python - GitHub - 31% open · ⏱️ 08.04.2024): (Web Scraping & Crawling)
README
======
Parsel
======.. image:: https://github.com/scrapy/parsel/actions/workflows/tests.yml/badge.svg
:target: https://github.com/scrapy/parsel/actions/workflows/tests.yml
:alt: Tests.. image:: https://img.shields.io/pypi/pyversions/parsel.svg
:target: https://github.com/scrapy/parsel/actions/workflows/tests.yml
:alt: Supported Python versions.. image:: https://img.shields.io/pypi/v/parsel.svg
:target: https://pypi.python.org/pypi/parsel
:alt: PyPI Version.. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
:target: https://codecov.io/github/scrapy/parsel?branch=master
:alt: Coverage reportParsel is a BSD-licensed Python_ library to extract data from HTML_, JSON_, and
XML_ documents.It supports:
- CSS_ and XPath_ expressions for HTML and XML documents
- JMESPath_ expressions for JSON documents
- `Regular expressions`_
Find the Parsel online documentation at https://parsel.readthedocs.org.
Example (`open online demo`_):
.. code-block:: python
>>> from parsel import Selector
>>> text = """
Hello, Parsel!
{"a": ["b", "c"]}
"""
>>> selector = Selector(text=text)
>>> selector.css('h1::text').get()
'Hello, Parsel!'
>>> selector.xpath('//h1/text()').re(r'\w+')
['Hello', 'Parsel']
>>> for li in selector.css('ul > li'):
... print(li.xpath('.//@href').get())
http://example.com
http://scrapy.org
>>> selector.css('script::text').jmespath("a").get()
'b'
>>> selector.css('script::text').jmespath("a").getall()
['b', 'c']
.. _CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets
.. _HTML: https://en.wikipedia.org/wiki/HTML
.. _JMESPath: https://jmespath.org/
.. _JSON: https://en.wikipedia.org/wiki/JSON
.. _open online demo: https://colab.research.google.com/drive/149VFa6Px3wg7S3SEnUqk--TyBrKplxCN#forceEdit=true&sandboxMode=true
.. _Python: https://www.python.org/
.. _regular expressions: https://docs.python.org/library/re.html
.. _XML: https://en.wikipedia.org/wiki/XML
.. _XPath: https://en.wikipedia.org/wiki/XPath