https://github.com/hchasestevens/xpyth
A module for querying the DOM tree and writing XPath expressions using native Python syntax.
https://github.com/hchasestevens/xpyth
comprehension dsl lxml metaprogramming python python-comprehension-syntax xpath-expression
Last synced: 7 months ago
JSON representation
A module for querying the DOM tree and writing XPath expressions using native Python syntax.
- Host: GitHub
- URL: https://github.com/hchasestevens/xpyth
- Owner: hchasestevens
- License: mit
- Created: 2015-03-28T09:58:44.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2018-06-13T00:20:45.000Z (over 7 years ago)
- Last Synced: 2025-03-03T22:09:39.279Z (7 months ago)
- Topics: comprehension, dsl, lxml, metaprogramming, python, python-comprehension-syntax, xpath-expression
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 127
- Watchers: 8
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: COPYING.txt
Awesome Lists containing this project
README
# xpyth
[](https://travis-ci.org/hchasestevens/xpyth)
[](https://badge.fury.io/py/xpyth)
A module for querying the DOM tree and writing XPath expressions using native Python syntax.
Example usage
-------------
```python
>>> from xpyth import xpath, DOM, X>>> xpath(X for X in DOM if X.name == 'main')
"//*[@name='main']">>> xpath(span for div in DOM for span in div if div.id == 'main')
"//div[@id='main']//span">>> xpath(a for a in DOM if '.com' not in a.href)
"//a[not(contains(@href, '.com'))]">>> xpath(a.href for a in DOM if any(p for p in a.ancestors if p.id))
"//a[./ancestor::p[@id]]/@href">>> xpath(X.data-bind for X in DOM if X.data-bind == '1')
"//*[@data-bind='1']/@data-bind">>> xpath(
... form.action
... for form in DOM
... if all(
... input
... for input in form.children
... if input.value == 'a'
... )
... )
"//form[not(./input[not(@value='a')])]/@action">>> allowed_ids = list('abc')
>>> xpath(X for X in DOM if X.id in allowed_ids)
"//*[@id='a' or @id='b' or @id='c']"
```Motivation
----------XPath is the de facto standard in querying XML and HTML documents. In Python (and most other languages), XPath expressions are represented as strings; this not only constitutes a potential security threat, but also means that developers are denied standard text-editor and IDE features such as syntax highlighting and autocomplete when writing XPaths. Furthermore, having to become familiar with XPath (or CSS selectors) presents a barrier to entry for developers who want to interact with the web.
[Great inroads](https://msdn.microsoft.com/en-us/library/bb397933.aspx) have been made in various programming languages in allowing the use of native list-comprehension-like syntax to generate SQL queries. __xpyth__ piggybacks off one such effort, [Pony](http://ponyorm.com/), to extend this functionality to XPath. __Now anyone familiar with Python comprehension syntax can query XML/HTML documents quickly and easily__. Moreover, __xpyth__ integrates with the popular [lxml](http://lxml.de/) library to enable developers to go beyond the querying capabilities of XPath (when necessary).
Installation
------------```bash
pip install xpyth
```Use with lxml
-------------__xpyth__ supports querying lxml ```ElementTree```s using the ```query``` function. For example, given a document
```html
```
accessible as the ```ElementTree``` ```tree```, the following can be executed:
```python
>>> len(query(a for a in tree))
4
>>> query(a for a in tree if 'Not Google' not in a.text)[0].attrib.get('href')
"http://www.google.com"
>>> next(
... node
... for node in
... query(
... p
... for p in
... tree
... if p.id
... )
... if re.match(r'\D+', node.attrib.get('id'))
... ).text
"123"
```Known Issues
------------* HTML tag names that contain special characters (dashes) cannot be selected, as they violate Python's generator comprehension syntax. HTML attributes containing dashes, e.g. ``data-bind``, work normally.
* The use of ```all``` is quite buggy, e.g. the following return incorrect expressions:```python
>>> xpath(X for X in DOM if all(p.id in ('a', 'b') for p in X))
"//*[not(.//p/@id='a' or //p/@id='b')]" # expected "//*[not(.//p[./@id!='a' and ./@id!='b'])]"
>>> xpath(X for X in DOM if all('x' in p.id for p in X))
"//*[not(.contains(@id, //p))]" # expected "//*[not(.//p[not(contains(@id, 'x'))])]"
```
Contacts
--------* Name: [H. Chase Stevens](http://www.chasestevens.com)
* Twitter: [@hchasestevens](https://twitter.com/hchasestevens)