Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/razrfalcon/roxmltree
Represent an XML document as a read-only tree.
https://github.com/razrfalcon/roxmltree
xml
Last synced: 7 days ago
JSON representation
Represent an XML document as a read-only tree.
- Host: GitHub
- URL: https://github.com/razrfalcon/roxmltree
- Owner: RazrFalcon
- License: apache-2.0
- Created: 2018-08-31T16:12:31.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-05-27T08:28:44.000Z (7 months ago)
- Last Synced: 2024-12-12T12:06:22.752Z (14 days ago)
- Topics: xml
- Language: Rust
- Homepage:
- Size: 511 KB
- Stars: 435
- Watchers: 10
- Forks: 37
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE-APACHE
Awesome Lists containing this project
README
# roxmltree
![Build Status](https://github.com/RazrFalcon/roxmltree/workflows/Rust/badge.svg)
[![Crates.io](https://img.shields.io/crates/v/roxmltree.svg)](https://crates.io/crates/roxmltree)
[![Documentation](https://docs.rs/roxmltree/badge.svg)](https://docs.rs/roxmltree)
[![Rust 1.60+](https://img.shields.io/badge/rust-1.60+-orange.svg)](https://www.rust-lang.org)Represents an [XML](https://www.w3.org/TR/xml/) document as a read-only tree.
```rust
// Find element by id.
let doc = roxmltree::Document::parse("")?;
let elem = doc.descendants().find(|n| n.attribute("id") == Some("rect1"))?;
assert!(elem.has_tag_name("rect"));
```## Why read-only?
Because in some cases all you need is to retrieve some data from an XML document.
And for such cases, we can make a lot of optimizations.## Parsing behavior
Sadly, XML can be parsed in many different ways. *roxmltree* tries to mimic the
behavior of Python's [lxml](https://lxml.de/).
For more details see [docs/parsing.md](https://github.com/RazrFalcon/roxmltree/blob/master/docs/parsing.md).## Alternatives
| Feature/Crate | roxmltree | [libxml2] | [xmltree] | [sxd-document] |
| ------------------------------- | :--------------: | :-----------------: | :--------------: | :--------------: |
| Element namespace resolving | ✓ | ✓ | ✓ | ~1 |
| Attribute namespace resolving | ✓ | ✓ | | ✓ |
| [Entity references] | ✓ | ✓ | × | × |
| [Character references] | ✓ | ✓ | ✓ | ✓ |
| [Attribute-Value normalization] | ✓ | ✓ | | |
| Comments | ✓ | ✓ | | ✓ |
| Processing instructions | ✓ | ✓ | ✓ | ✓ |
| UTF-8 BOM | ✓ | ✓ | × | × |
| Non UTF-8 input | | ✓ | | |
| Complete DTD support | | ✓ | | |
| Position preserving2 | ✓ | ✓ | | |
| HTML support | | ✓ | | |
| Tree modification | | ✓ | ✓ | ✓ |
| Writing | | ✓ | ✓ | ✓ |
| No **unsafe** | ✓ | | ✓ | |
| Language | Rust | C | Rust | Rust |
| Dependencies | **0** | - | 2 | 2 |
| Tested version | 0.20.0 | Apple-provided | 0.10.3 | 0.3.2 |
| License | MIT / Apache-2.0 | MIT | MIT | MIT |Legend:
- ✓ - supported
- × - parsing error
- ~ - partial
- *nothing* - not supportedNotes:
1. No default namespace propagation.
2. *roxmltree* keeps all node and attribute positions in the original document,
so you can easily retrieve it if you need it.
See [examples/print_pos.rs](examples/print_pos.rs) for details.There is also `elementtree` and `treexml` crates, but they are abandoned for a long time.
[Entity references]: https://www.w3.org/TR/REC-xml/#dt-entref
[Character references]: https://www.w3.org/TR/REC-xml/#NT-CharRef
[Attribute-Value Normalization]: https://www.w3.org/TR/REC-xml/#AVNormalize[libxml2]: http://xmlsoft.org/
[xmltree]: https://crates.io/crates/xmltree
[sxd-document]: https://crates.io/crates/sxd-document## Performance
Here are some benchmarks comparing `roxmltree` to other XML tree libraries.
```text
test huge_roxmltree ... bench: 2,997,887 ns/iter (+/- 48,976)
test huge_libxml2 ... bench: 6,850,666 ns/iter (+/- 306,180)
test huge_sdx_document ... bench: 9,440,412 ns/iter (+/- 117,106)
test huge_xmltree ... bench: 41,662,316 ns/iter (+/- 850,360)test large_roxmltree ... bench: 1,494,886 ns/iter (+/- 30,384)
test large_libxml2 ... bench: 3,250,606 ns/iter (+/- 140,201)
test large_sdx_document ... bench: 4,242,162 ns/iter (+/- 99,740)
test large_xmltree ... bench: 13,980,228 ns/iter (+/- 229,363)test medium_roxmltree ... bench: 421,137 ns/iter (+/- 13,855)
test medium_libxml2 ... bench: 950,984 ns/iter (+/- 34,099)
test medium_sdx_document ... bench: 1,618,270 ns/iter (+/- 23,466)
test medium_xmltree ... bench: 4,315,974 ns/iter (+/- 31,849)test tiny_roxmltree ... bench: 2,522 ns/iter (+/- 31)
test tiny_libxml2 ... bench: 8,931 ns/iter (+/- 235)
test tiny_sdx_document ... bench: 11,658 ns/iter (+/- 82)
test tiny_xmltree ... bench: 20,215 ns/iter (+/- 303)
```When comparing to streaming XML parsers `roxmltree` is slightly slower than `quick-xml`,
but still way faster than `xmlrs`.
Note that streaming parsers usually do not provide a proper string unescaping,
DTD resolving and namespaces support.```text
test huge_quick_xml ... bench: 2,997,887 ns/iter (+/- 48,976)
test huge_roxmltree ... bench: 3,147,424 ns/iter (+/- 49,153)
test huge_xmlrs ... bench: 36,258,312 ns/iter (+/- 180,438)test large_quick_xml ... bench: 1,250,053 ns/iter (+/- 21,943)
test large_roxmltree ... bench: 1,494,886 ns/iter (+/- 30,384)
test large_xmlrs ... bench: 11,239,516 ns/iter (+/- 76,937)test medium_quick_xml ... bench: 206,232 ns/iter (+/- 2,157)
test medium_roxmltree ... bench: 421,137 ns/iter (+/- 13,855)
test medium_xmlrs ... bench: 3,975,916 ns/iter (+/- 44,967)test tiny_quick_xml ... bench: 2,233 ns/iter (+/- 70)
test tiny_roxmltree ... bench: 2,522 ns/iter (+/- 31)
test tiny_xmlrs ... bench: 17,155 ns/iter (+/- 429)
```### Notes
The benchmarks were taken on a Apple M1 Pro.
You can try running the benchmarks yourself by running `cargo bench` in the `benches` dir.- Since all libraries have a different XML support, benchmarking is a bit pointless.
- We bench *libxml2* using the *[rust-libxml]* wrapper crate[xml-rs]: https://crates.io/crates/xml-rs
[quick-xml]: https://crates.io/crates/quick-xml
[rust-libxml]: https://github.com/KWARC/rust-libxml## Memory overhead
`roxmltree` tries to use as little memory as possible to allow parsing
very large (multi-GB) XML files.The peak memory usage doesn't directly correlate with the file size
but rather with the amount of nodes and attributes a file has.
How many attributes had to be normalized (i.e. allocated).
And how many text nodes had to be preprocessed (i.e. allocated).`roxmltree` never allocates element and attribute names, processing instructions
and comments.By disabling the `positions` feature, you can shave 8 bytes from each node and attribute.
On average, the overhead is around 6-8x the file size.
For example, our 1.1GB sample XML will peak at 7.6GB RAM with default features enabled
and at 6.8GB RAM when `positions` is disabled.## Safety
- This library must not panic. Any panic should be considered a critical bug and reported.
- This library forbids `unsafe` code.## API
This library uses Rust's idiomatic API based on iterators.
In case you are more familiar with browser/JS DOM APIs - you can check out
[tests/dom-api.rs](tests/dom-api.rs) to see how it can be mapped onto the Rust one.Built on top of this API, a mapping to the [Serde data model](https://serde.rs/data-model.html)
is available via the [`serde-roxmltree` crate](https://crates.io/crates/serde-roxmltree).## License
Licensed under either of
- [Apache License v2.0](LICENSE-APACHE)
- [MIT license](LICENSE-MIT)at your option.
## Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.