Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ocramz/xeno
Fast Haskell XML parser
https://github.com/ocramz/xeno
memory-benchmark parser sax xml xml-parser xml-parsing
Last synced: 6 days ago
JSON representation
Fast Haskell XML parser
- Host: GitHub
- URL: https://github.com/ocramz/xeno
- Owner: ocramz
- License: other
- Created: 2016-12-22T12:34:49.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2023-07-16T07:56:38.000Z (over 1 year ago)
- Last Synced: 2024-05-01T23:26:16.831Z (9 months ago)
- Topics: memory-benchmark, parser, sax, xml, xml-parser, xml-parsing
- Language: Haskell
- Homepage:
- Size: 313 KB
- Stars: 118
- Watchers: 9
- Forks: 32
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.markdown
- License: LICENSE
Awesome Lists containing this project
README
# xeno
[![Github actions build status](https://img.shields.io/github/workflow/status/ocramz/xeno/Stack)](https://github.com/ocramz/xeno/actions) [![Hackage version](https://img.shields.io/hackage/v/xeno.svg?label=Hackage)](https://hackage.haskell.org/package/xeno) [![Stackage version](https://www.stackage.org/package/xeno/badge/lts?label=Stackage)](https://www.stackage.org/package/xeno)
A fast event-based XML parser.
[Blog post](http://chrisdone.com/posts/fast-haskell-c-parsing-xml).
## Features
* SAX-style/fold parser which triggers events for open/close
tags, attributes, text, etc.
* Low memory use (see memory benchmarks below).
* Very fast (see speed benchmarks below).
* It
[cheats like Hexml does](http://neilmitchell.blogspot.co.uk/2016/12/new-xml-parser-hexml.html)
(doesn't expand entities, or most of the XML standard).
* Written in pure Haskell.
* CDATA is supported as of version 0.2.Please see the bottom of this file for guidelines on contributing to this library.
## Performance goals
The [hexml](https://github.com/ndmitchell/hexml) Haskell library uses
an XML parser written in C, so that is the baseline we're trying to
beat or match roughly.![Imgur](http://i.imgur.com/XgdZoQ9.png)
The `Xeno.SAX` module is faster than Hexml for simply walking the
document. Hexml actually does more work, allocating a DOM. `Xeno.DOM`
is slighly slower or faster than Hexml depending on the document,
although it is 2x slower on a 211KB document.Memory benchmarks for Xeno:
Case Bytes GCs Check
4kb/xeno/sax 2,376 0 OK
31kb/xeno/sax 1,824 0 OK
211kb/xeno/sax 56,832 0 OK
4kb/xeno/dom 11,360 0 OK
31kb/xeno/dom 10,352 0 OK
211kb/xeno/dom 1,082,816 0 OKI memory benchmarked Hexml, but most of its allocation happens in C,
which GHC doesn't track. So the data wasn't useful to compare.Speed benchmarks:
benchmarking 4KB/hexml/dom
time 6.317 μs (6.279 μs .. 6.354 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 6.333 μs (6.307 μs .. 6.362 μs)
std dev 97.15 ns (77.15 ns .. 125.3 ns)
variance introduced by outliers: 13% (moderately inflated)benchmarking 4KB/xeno/sax
time 5.152 μs (5.131 μs .. 5.179 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 5.139 μs (5.128 μs .. 5.161 μs)
std dev 58.02 ns (41.25 ns .. 85.41 ns)benchmarking 4KB/xeno/dom
time 10.93 μs (10.83 μs .. 11.14 μs)
0.994 R² (0.983 R² .. 0.999 R²)
mean 11.35 μs (11.12 μs .. 11.91 μs)
std dev 1.188 μs (458.7 ns .. 2.148 μs)
variance introduced by outliers: 87% (severely inflated)benchmarking 31KB/hexml/dom
time 9.405 μs (9.348 μs .. 9.480 μs)
0.999 R² (0.998 R² .. 0.999 R²)
mean 9.745 μs (9.599 μs .. 10.06 μs)
std dev 745.3 ns (598.6 ns .. 902.4 ns)
variance introduced by outliers: 78% (severely inflated)benchmarking 31KB/xeno/sax
time 2.736 μs (2.723 μs .. 2.753 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.757 μs (2.742 μs .. 2.791 μs)
std dev 76.93 ns (43.62 ns .. 136.1 ns)
variance introduced by outliers: 35% (moderately inflated)benchmarking 31KB/xeno/dom
time 5.767 μs (5.735 μs .. 5.814 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 5.759 μs (5.728 μs .. 5.810 μs)
std dev 127.3 ns (79.02 ns .. 177.2 ns)
variance introduced by outliers: 24% (moderately inflated)benchmarking 211KB/hexml/dom
time 260.3 μs (259.8 μs .. 260.8 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 259.9 μs (259.7 μs .. 260.3 μs)
std dev 959.9 ns (821.8 ns .. 1.178 μs)benchmarking 211KB/xeno/sax
time 249.2 μs (248.5 μs .. 250.1 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 251.5 μs (250.6 μs .. 253.0 μs)
std dev 3.944 μs (3.032 μs .. 5.345 μs)benchmarking 211KB/xeno/dom
time 543.1 μs (539.4 μs .. 547.0 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 550.0 μs (545.3 μs .. 553.6 μs)
std dev 14.39 μs (12.45 μs .. 17.12 μs)
variance introduced by outliers: 17% (moderately inflated)## DOM Example
Easy as running the parse function:
``` haskell
> parse ""
hisuphi
Right
(Node
"p"
[("key", "val"), ("x", "foo"), ("k", "")]
[ Element (Node "a" [] [Element (Node "hr" [] []), Text "hi"])
, Element (Node "b" [] [Text "sup"])
, Text "hi"
])
```## SAX Example
Quickly dumping XML:
``` haskell
> let input = "TextHello, World!Content!Trailing."
> dump input
"Text""Hello, World!"
"Content!"
"Trailing."
```Folding over XML:
``` haskell
> fold const (\m _ _ -> m + 1) const const const const 0 input -- Count attributes.
Right 2
`````` haskell
> fold (\m _ -> m + 1) (\m _ _ -> m) const const const const 0 input -- Count elements.
Right 3
```Most general XML processor:
``` haskell
process
:: Monad m
=> (ByteString -> m ()) -- ^ Open tag.
-> (ByteString -> ByteString -> m ()) -- ^ Tag attribute.
-> (ByteString -> m ()) -- ^ End open tag.
-> (ByteString -> m ()) -- ^ Text.
-> (ByteString -> m ()) -- ^ Close tag.
-> ByteString -- ^ Input string.
-> m ()
```You can use any monad you want. IO, State, etc. For example, `fold` is
implemented like this:``` haskell
fold openF attrF endOpenF textF closeF s str =
execState
(process
(\name -> modify (\s' -> openF s' name))
(\key value -> modify (\s' -> attrF s' key value))
(\name -> modify (\s' -> endOpenF s' name))
(\text -> modify (\s' -> textF s' text))
(\name -> modify (\s' -> closeF s' name))
str)
s
```The `process` is marked as INLINE, which means use-sites of it will
inline, and your particular monad's type will be potentially erased
for great performance.## Contributors
See CONTRIBUTORS.md
## Contribution guidelines
All contributions and bug fixes are welcome and will be credited appropriately, as long as they are aligned with the goals of this library: speed and memory efficiency. In practical terms, patches and additional features should not introduce significant performance regressions.