{"id":13469300,"url":"https://github.com/hchasestevens/xpyth","last_synced_at":"2025-03-16T18:32:31.749Z","repository":{"id":29491210,"uuid":"33028702","full_name":"hchasestevens/xpyth","owner":"hchasestevens","description":"A module for querying the DOM tree and writing XPath expressions using native Python syntax.","archived":false,"fork":false,"pushed_at":"2018-06-13T00:20:45.000Z","size":39,"stargazers_count":127,"open_issues_count":3,"forks_count":5,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-03T22:09:39.279Z","etag":null,"topics":["comprehension","dsl","lxml","metaprogramming","python","python-comprehension-syntax","xpath-expression"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hchasestevens.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-03-28T09:58:44.000Z","updated_at":"2024-11-28T16:30:52.000Z","dependencies_parsed_at":"2022-08-24T10:20:36.133Z","dependency_job_id":null,"html_url":"https://github.com/hchasestevens/xpyth","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchasestevens%2Fxpyth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchasestevens%2Fxpyth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchasestevens%2Fxpyth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hchasestevens%2Fxpyth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hchasestevens","download_url":"https://codeload.github.com/hchasestevens/xpyth/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243826783,"owners_count":20354220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["comprehension","dsl","lxml","metaprogramming","python","python-comprehension-syntax","xpath-expression"],"created_at":"2024-07-31T15:01:32.374Z","updated_at":"2025-03-16T18:32:27.636Z","avatar_url":"https://github.com/hchasestevens.png","language":"Python","readme":"# xpyth\n\n[![Build Status](https://travis-ci.org/hchasestevens/xpyth.svg?branch=master)](https://travis-ci.org/hchasestevens/xpyth)\n[![PyPI version](https://badge.fury.io/py/xpyth.svg)](https://badge.fury.io/py/xpyth)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/xpyth.svg) \n\nA module for querying the DOM tree and writing XPath expressions using native Python syntax.\n\nExample usage\n-------------\n```python\n\u003e\u003e\u003e from xpyth import xpath, DOM, X\n\n\u003e\u003e\u003e xpath(X for X in DOM if X.name == 'main')\n\"//*[@name='main']\"\n\n\u003e\u003e\u003e xpath(span for div in DOM for span in div if div.id == 'main')\n\"//div[@id='main']//span\"\n\n\u003e\u003e\u003e xpath(a for a in DOM if '.com' not in a.href)\n\"//a[not(contains(@href, '.com'))]\"\n\n\u003e\u003e\u003e xpath(a.href for a in DOM if any(p for p in a.ancestors if p.id))\n\"//a[./ancestor::p[@id]]/@href\"\n\n\u003e\u003e\u003e xpath(X.data-bind for X in DOM if X.data-bind == '1')\n\"//*[@data-bind='1']/@data-bind\"\n\n\u003e\u003e\u003e xpath(\n...     form.action \n...     for form in DOM \n...     if all(\n...         input \n...         for input in form.children \n...         if input.value == 'a'\n...     )\n... )\n\"//form[not(./input[not(@value='a')])]/@action\"\n\n\u003e\u003e\u003e allowed_ids = list('abc')\n\u003e\u003e\u003e xpath(X for X in DOM if X.id in allowed_ids)\n\"//*[@id='a' or @id='b' or @id='c']\"\n```\n\nMotivation\n----------\n\nXPath is the de facto standard in querying XML and HTML documents. In Python (and most other languages), XPath expressions are represented as strings; this not only constitutes a potential security threat, but also means that developers are denied standard text-editor and IDE features such as syntax highlighting and autocomplete when writing XPaths. Furthermore, having to become familiar with XPath (or CSS selectors) presents a barrier to entry for developers who want to interact with the web.\n\n[Great inroads](https://msdn.microsoft.com/en-us/library/bb397933.aspx) have been made in various programming languages in allowing the use of native list-comprehension-like syntax to generate SQL queries. __xpyth__ piggybacks off one such effort, [Pony](http://ponyorm.com/), to extend this functionality to XPath. __Now anyone familiar with Python comprehension syntax can query XML/HTML documents quickly and easily__. Moreover, __xpyth__ integrates with the popular [lxml](http://lxml.de/) library to enable developers to go beyond the querying capabilities of XPath (when necessary).\n\nInstallation\n------------\n\n```bash\npip install xpyth\n```\n\n\nUse with lxml\n-------------\n\n__xpyth__ supports querying lxml ```ElementTree```s using the ```query``` function. For example, given a document\n```html\n\u003chtml\u003e\n    \u003cdiv id='main' class='main'\u003e\n        \u003ca href='http://www.google.com'\u003eGoogle\u003c/a\u003e\n        \u003ca href='http://www.chasestevens.com'\u003eNot Google\u003c/a\u003e\n        \u003cp\u003eLorem ipsum\u003c/p\u003e\n        \u003cp id='123'\u003eno numbers here\u003c/p\u003e\n        \u003cp id='numbers_only'\u003e123\u003c/p\u003e\n    \u003c/div\u003e\n    \u003cdiv id='123' class='secondary'\u003e\n        \u003ca href='http://www.google.org'\u003eGoogle Charity\u003c/a\u003e\n        \u003ca href='http://www.chasestevens.org'\u003eBroken link!\u003c/a\u003e\n    \u003c/div\u003e\n\u003c/html\u003e\n```\naccessible as the ```ElementTree``` ```tree```, the following can be executed:\n```python\n\u003e\u003e\u003e len(query(a for a in tree))\n4\n\u003e\u003e\u003e query(a for a in tree if 'Not Google' not in a.text)[0].attrib.get('href')\n\"http://www.google.com\"\n\u003e\u003e\u003e next(\n...     node \n...     for node in \n...     query(\n...         p \n...         for p in \n...         tree \n...         if p.id\n...     ) \n...     if re.match(r'\\D+', node.attrib.get('id'))\n... ).text\n\"123\"\n```\n\nKnown Issues\n------------\n\n*  HTML tag names that contain special characters (dashes) cannot be selected, as they violate Python's generator comprehension syntax. HTML attributes containing dashes, e.g. ``data-bind``, work normally.\n*  The use of ```all``` is quite buggy, e.g. the following return incorrect expressions:\n\n   ```python\n   \u003e\u003e\u003e xpath(X for X in DOM if all(p.id in ('a', 'b') for p in X))\n   \"//*[not(.//p/@id='a' or //p/@id='b')]\"  # expected \"//*[not(.//p[./@id!='a' and ./@id!='b'])]\"\n   \u003e\u003e\u003e xpath(X for X in DOM if all('x' in p.id for p in X))\n   \"//*[not(.contains(@id, //p))]\"  # expected \"//*[not(.//p[not(contains(@id, 'x'))])]\"\n   ```\n    \nContacts\n--------\n\n* Name: [H. Chase Stevens](http://www.chasestevens.com)\n* Twitter: [@hchasestevens](https://twitter.com/hchasestevens)\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhchasestevens%2Fxpyth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhchasestevens%2Fxpyth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhchasestevens%2Fxpyth/lists"}