{"id":17331903,"url":"https://github.com/ocramz/xeno","last_synced_at":"2025-09-21T18:26:43.527Z","repository":{"id":39634708,"uuid":"77143240","full_name":"ocramz/xeno","owner":"ocramz","description":"Fast Haskell XML parser ","archived":false,"fork":false,"pushed_at":"2023-07-16T07:56:38.000Z","size":320,"stargazers_count":118,"open_issues_count":10,"forks_count":32,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-05-01T23:26:16.831Z","etag":null,"topics":["memory-benchmark","parser","sax","xml","xml-parser","xml-parsing"],"latest_commit_sha":null,"homepage":"","language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ocramz.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.markdown","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2016-12-22T12:34:49.000Z","updated_at":"2023-05-25T12:17:36.000Z","dependencies_parsed_at":"2024-02-26T01:36:27.916Z","dependency_job_id":"51b9135c-d3ee-4ab3-839b-d75be8a1ccdd","html_url":"https://github.com/ocramz/xeno","commit_stats":{"total_commits":187,"total_committers":22,"mean_commits":8.5,"dds":0.53475935828877,"last_synced_commit":"9099ce01aec9eed2fb08aada0ed7a0bc29c203fe"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocramz%2Fxeno","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocramz%2Fxeno/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocramz%2Fxeno/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ocramz%2Fxeno/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ocramz","download_url":"https://codeload.github.com/ocramz/xeno/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247378149,"owners_count":20929297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["memory-benchmark","parser","sax","xml","xml-parser","xml-parsing"],"created_at":"2024-10-15T14:56:00.027Z","updated_at":"2025-09-21T18:26:38.447Z","avatar_url":"https://github.com/ocramz.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xeno\n\n[![Github actions build status](https://img.shields.io/github/workflow/status/ocramz/xeno/Stack)](https://github.com/ocramz/xeno/actions) [![Hackage version](https://img.shields.io/hackage/v/xeno.svg?label=Hackage)](https://hackage.haskell.org/package/xeno) [![Stackage version](https://www.stackage.org/package/xeno/badge/lts?label=Stackage)](https://www.stackage.org/package/xeno)\n\n\nA fast event-based XML parser.\n\n[Blog post](http://chrisdone.com/posts/fast-haskell-c-parsing-xml).\n\n## Features\n\n* SAX-style/fold parser which triggers events for open/close\n  tags, attributes, text, etc.\n* Low memory use (see memory benchmarks below).\n* Very fast (see speed benchmarks below).\n* It\n  [cheats like Hexml does](http://neilmitchell.blogspot.co.uk/2016/12/new-xml-parser-hexml.html)\n  (doesn't expand entities, or most of the XML standard).\n* Written in pure Haskell.\n* CDATA is supported as of version 0.2.\n\nPlease see the bottom of this file for guidelines on contributing to this library.\n\n\n## Performance goals\n\nThe [hexml](https://github.com/ndmitchell/hexml) Haskell library uses\nan XML parser written in C, so that is the baseline we're trying to\nbeat or match roughly.\n\n![Imgur](http://i.imgur.com/XgdZoQ9.png)\n\nThe `Xeno.SAX` module is faster than Hexml for simply walking the\ndocument. Hexml actually does more work, allocating a DOM. `Xeno.DOM`\nis slighly slower or faster than Hexml depending on the document,\nalthough it is 2x slower on a 211KB document.\n\nMemory benchmarks for Xeno:\n\n    Case                Bytes  GCs  Check\n    4kb/xeno/sax        2,376    0  OK\n    31kb/xeno/sax       1,824    0  OK\n    211kb/xeno/sax     56,832    0  OK\n    4kb/xeno/dom       11,360    0  OK\n    31kb/xeno/dom      10,352    0  OK\n    211kb/xeno/dom  1,082,816    0  OK\n\nI memory benchmarked Hexml, but most of its allocation happens in C,\nwhich GHC doesn't track. So the data wasn't useful to compare.\n\nSpeed benchmarks:\n\n    benchmarking 4KB/hexml/dom\n    time                 6.317 μs   (6.279 μs .. 6.354 μs)\n                         1.000 R²   (1.000 R² .. 1.000 R²)\n    mean                 6.333 μs   (6.307 μs .. 6.362 μs)\n    std dev              97.15 ns   (77.15 ns .. 125.3 ns)\n    variance introduced by outliers: 13% (moderately inflated)\n\n    benchmarking 4KB/xeno/sax\n    time                 5.152 μs   (5.131 μs .. 5.179 μs)\n                         1.000 R²   (1.000 R² .. 1.000 R²)\n    mean                 5.139 μs   (5.128 μs .. 5.161 μs)\n    std dev              58.02 ns   (41.25 ns .. 85.41 ns)\n\n    benchmarking 4KB/xeno/dom\n    time                 10.93 μs   (10.83 μs .. 11.14 μs)\n                         0.994 R²   (0.983 R² .. 0.999 R²)\n    mean                 11.35 μs   (11.12 μs .. 11.91 μs)\n    std dev              1.188 μs   (458.7 ns .. 2.148 μs)\n    variance introduced by outliers: 87% (severely inflated)\n\n    benchmarking 31KB/hexml/dom\n    time                 9.405 μs   (9.348 μs .. 9.480 μs)\n                         0.999 R²   (0.998 R² .. 0.999 R²)\n    mean                 9.745 μs   (9.599 μs .. 10.06 μs)\n    std dev              745.3 ns   (598.6 ns .. 902.4 ns)\n    variance introduced by outliers: 78% (severely inflated)\n\n    benchmarking 31KB/xeno/sax\n    time                 2.736 μs   (2.723 μs .. 2.753 μs)\n                         1.000 R²   (1.000 R² .. 1.000 R²)\n    mean                 2.757 μs   (2.742 μs .. 2.791 μs)\n    std dev              76.93 ns   (43.62 ns .. 136.1 ns)\n    variance introduced by outliers: 35% (moderately inflated)\n\n    benchmarking 31KB/xeno/dom\n    time                 5.767 μs   (5.735 μs .. 5.814 μs)\n                         0.999 R²   (0.999 R² .. 1.000 R²)\n    mean                 5.759 μs   (5.728 μs .. 5.810 μs)\n    std dev              127.3 ns   (79.02 ns .. 177.2 ns)\n    variance introduced by outliers: 24% (moderately inflated)\n\n    benchmarking 211KB/hexml/dom\n    time                 260.3 μs   (259.8 μs .. 260.8 μs)\n                         1.000 R²   (1.000 R² .. 1.000 R²)\n    mean                 259.9 μs   (259.7 μs .. 260.3 μs)\n    std dev              959.9 ns   (821.8 ns .. 1.178 μs)\n\n    benchmarking 211KB/xeno/sax\n    time                 249.2 μs   (248.5 μs .. 250.1 μs)\n                         1.000 R²   (1.000 R² .. 1.000 R²)\n    mean                 251.5 μs   (250.6 μs .. 253.0 μs)\n    std dev              3.944 μs   (3.032 μs .. 5.345 μs)\n\n    benchmarking 211KB/xeno/dom\n    time                 543.1 μs   (539.4 μs .. 547.0 μs)\n                         0.999 R²   (0.999 R² .. 1.000 R²)\n    mean                 550.0 μs   (545.3 μs .. 553.6 μs)\n    std dev              14.39 μs   (12.45 μs .. 17.12 μs)\n    variance introduced by outliers: 17% (moderately inflated)\n\n## DOM Example\n\nEasy as running the parse function:\n\n``` haskell\n\u003e parse \"\u003cp key='val' x=\\\"foo\\\" k=\\\"\\\"\u003e\u003ca\u003e\u003chr/\u003ehi\u003c/a\u003e\u003cb\u003esup\u003c/b\u003ehi\u003c/p\u003e\"\nRight\n  (Node\n     \"p\"\n     [(\"key\", \"val\"), (\"x\", \"foo\"), (\"k\", \"\")]\n     [ Element (Node \"a\" [] [Element (Node \"hr\" [] []), Text \"hi\"])\n     , Element (Node \"b\" [] [Text \"sup\"])\n     , Text \"hi\"\n     ])\n```\n\n## SAX Example\n\nQuickly dumping XML:\n\n``` haskell\n\u003e let input = \"Text\u003ctag prop='value'\u003eHello, World!\u003c/tag\u003e\u003cx\u003e\u003cy prop=\\\"x\\\"\u003eContent!\u003c/y\u003e\u003c/x\u003eTrailing.\"\n\u003e dump input\n\"Text\"\n\u003ctag prop=\"value\"\u003e\n  \"Hello, World!\"\n\u003c/tag\u003e\n\u003cx\u003e\n  \u003cy prop=\"x\"\u003e\n    \"Content!\"\n  \u003c/y\u003e\n\u003c/x\u003e\n\"Trailing.\"\n```\n\nFolding over XML:\n\n``` haskell\n\u003e fold const (\\m _ _ -\u003e m + 1) const const const const 0 input -- Count attributes.\nRight 2\n```\n\n``` haskell\n\u003e fold (\\m _ -\u003e m + 1) (\\m _ _ -\u003e m) const const const const 0 input -- Count elements.\nRight 3\n```\n\nMost general XML processor:\n\n``` haskell\nprocess\n  :: Monad m\n  =\u003e (ByteString -\u003e m ())               -- ^ Open tag.\n  -\u003e (ByteString -\u003e ByteString -\u003e m ()) -- ^ Tag attribute.\n  -\u003e (ByteString -\u003e m ())               -- ^ End open tag.\n  -\u003e (ByteString -\u003e m ())               -- ^ Text.\n  -\u003e (ByteString -\u003e m ())               -- ^ Close tag.\n  -\u003e ByteString                         -- ^ Input string.\n  -\u003e m ()\n```\n\nYou can use any monad you want. IO, State, etc. For example, `fold` is\nimplemented like this:\n\n``` haskell\nfold openF attrF endOpenF textF closeF s str =\n  execState\n    (process\n       (\\name -\u003e modify (\\s' -\u003e openF s' name))\n       (\\key value -\u003e modify (\\s' -\u003e attrF s' key value))\n       (\\name -\u003e modify (\\s' -\u003e endOpenF s' name))\n       (\\text -\u003e modify (\\s' -\u003e textF s' text))\n       (\\name -\u003e modify (\\s' -\u003e closeF s' name))\n       str)\n    s\n```\n\nThe `process` is marked as INLINE, which means use-sites of it will\ninline, and your particular monad's type will be potentially erased\nfor great performance.\n\n\n## Contributors\n\nSee CONTRIBUTORS.md\n\n\n## Contribution guidelines\n\nAll contributions and bug fixes are welcome and will be credited appropriately, as long as they are aligned with the goals of this library: speed and memory efficiency. In practical terms, patches and additional features should not introduce significant performance regressions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focramz%2Fxeno","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focramz%2Fxeno","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focramz%2Fxeno/lists"}