An open API service indexing awesome lists of open source software.

https://github.com/imgurbot12/pyxml

Pure python3 alternative to stdlib xml.etree with HTML support
https://github.com/imgurbot12/pyxml

html-parser parser python python3 xml xml-parser

Last synced: 10 months ago
JSON representation

Pure python3 alternative to stdlib xml.etree with HTML support

Awesome Lists containing this project

README

          

Pyxml
------
Pure python3 alternative to stdlib xml.etree with HTML support

### Install

```
pip install pyxml3
```

### Advantages

1. The default parser ignores XML Declaration Entities avoiding
most if not all XML related vulnerabilities such as
[The Billion Laughs Attack](https://en.wikipedia.org/wiki/Billion_laughs_attack)

2. Our XPATH implementation is much more complete than both xml.etree
and even LXML. Additional functions and features are available making
it easier to quickly parse complex data structures in a single line.

### Examples

###### Standard Usage:

```python
import pyxml

etree = pyxml.fromstring(b'

Hello World!

')
for element in etree.iter():
print(element)

with open('example.xml', 'rb') as f:
etree = pyxml.fromstring(f)
print(etree)
```

###### Monkey Patch:

```python
import pyxml
pyxml.compat.monkey_patch()

from xml.etree import ElementTree as ET

etree = ET.fromstring('

Hello World!

')
for element in etree.iter():
print(element)

print(etree.find('//p[starts-with(lower-case(text()), "hello")]'))
```

###### HTML:

```python
import pyxml.html

etree = pyxml.html.fromstring('


Hello World!




')
for element in etree.iter():
print(element)

print(etree.find('//p[notempty(text())]'))
```