An open API service indexing awesome lists of open source software.

https://github.com/awolverp/markupever

The fast, most optimal, and correct HTML & XML parsing library for Python written in Rust.
https://github.com/awolverp/markupever

html5ever library markup-languages parser python rust scraping selectors web-scraping

Last synced: 4 days ago
JSON representation

The fast, most optimal, and correct HTML & XML parsing library for Python written in Rust.

Awesome Lists containing this project

README

          


MarkupEver



The fast, most optimal, and correct HTML & XML parsing library



Documentation | Releases | Benchmarks

![text](https://img.shields.io/badge/coverage-100-08000)
![image](https://img.shields.io/pypi/v/markupever.svg)
![image](https://img.shields.io/pypi/l/markupever.svg)
![image](https://img.shields.io/pypi/pyversions/markupever.svg)
![python-test](https://github.com/awolverp/markupever/actions/workflows/test.yml/badge.svg)
![download](https://img.shields.io/pypi/dm/markupever?style=flat-square&color=%23314bb5)

------

MarkupEver is a modern, fast (high-performance), XML & HTML languages parsing library written in Rust.

**KEY FEATURES:**
* ๐Ÿš€ **Fast**: Very high performance and fast (thanks to **[html5ever](https://github.com/servo/html5ever)** and **[selectors](https://github.com/servo/stylo/tree/main/selectors)**).
* ๐Ÿ”ฅ **Easy**: Designed to be easy to use and learn. Completion everywhere.
* โœจ **Low-Memory**: Written in Rust. Uses low memory. Don't worry about memory leaks. Uses Rust memory allocator.
* ๐Ÿงถ **Thread-safe**: Completely thread-safe.
* ๐ŸŽฏ **Quering**: Use your **CSS** knowledge for selecting elements from a HTML or XML document.
* โšก **Streaming**: Incremental/streaming parsing support.

## Installation
You can install MarkupEver by using **pip**:

It's recommended to use virtual environments.

```console
$ pip3 install markupever
```

## Example

### Parse
Parsing a HTML content and selecting elements:

```python
import markupever

dom = markupever.parse_file("file.html", "html")
# Or parse a HTML content directly:
# dom = markupever.parse("... content ...", "html")

for element in dom.select("div.section > p:child-nth(1)"):
print(element.text())
```

### Create DOM
Creating a DOM from zero:

```python
from markupever import dom

dom = dom.TreeDom()
root: dom.Document = dom.root()

root.create_doctype("html")

html = root.create_element("html", {"lang": "en"})
body = html.create_element("body")
body.create_text("Hello Everyone ...")

print(root.serialize())
#
#
# Hello Everyone ...
#
```