{"id":13633331,"url":"https://github.com/silviucpp/erlxml","last_synced_at":"2025-07-13T20:32:08.644Z","repository":{"id":86640079,"uuid":"87405507","full_name":"silviucpp/erlxml","owner":"silviucpp","description":"erlxml - Erlang XML parsing library based on pugixml","archived":false,"fork":false,"pushed_at":"2022-02-12T21:58:04.000Z","size":155,"stargazers_count":15,"open_issues_count":1,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-09T02:34:05.761Z","etag":null,"topics":["erlang","pugixml","streaming","xml"],"latest_commit_sha":null,"homepage":null,"language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/silviucpp.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-04-06T08:29:49.000Z","updated_at":"2024-04-15T07:44:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"2828f467-362c-49ba-a9b6-0ea364fd85fc","html_url":"https://github.com/silviucpp/erlxml","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlxml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlxml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlxml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/silviucpp%2Ferlxml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/silviucpp","download_url":"https://codeload.github.com/silviucpp/erlxml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225916515,"owners_count":17544819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["erlang","pugixml","streaming","xml"],"created_at":"2024-08-01T23:00:33.723Z","updated_at":"2025-07-13T20:32:08.634Z","avatar_url":"https://github.com/silviucpp.png","language":"Erlang","funding_links":[],"categories":["Text and Numbers"],"sub_categories":["XML"],"readme":"# erlxml\n\n*erlxml - Erlang XML parsing library based on pugixml*\n\n[![Build Status](https://app.travis-ci.com/silviucpp/erlxml.svg?branch=master)](https://travis-ci.com/github/silviucpp/erlxml)\n[![GitHub](https://img.shields.io/github/license/silviucpp/erlxml)](https://github.com/silviucpp/erlxml/blob/master/LICENSE)\n[![Hex.pm](https://img.shields.io/hexpm/v/erlxml2)](https://hex.pm/packages/erlxml2)\n\n# Implementation notes\n\n[pugixml][1] is the fastest dom parser available in c++ based on the benchmarks available [here][2]. The streaming parser works by dividing the\nstream into independent stanzas, which are then processed using pugixml. While the splitting algorithm is quite fast, it is designed for simplicity,\nwhich currently imposes some limitations on the streaming mode:\n\n- Does not support `CDATA`\n- Does not support comments containing special XML characters\n- Does not support `DOCTYPE` declarations\n\nAll of the above limitations apply only to streaming mode and not to DOM parsing mode. \n\n### Getting starting:\n\n##### DOM parsing\n\n```erlang\nerlxml:parse(\u003c\u003c\"\u003cfoo attr1='bar'\u003eSome Value\u003c/foo\u003e\"\u003e\u003e).\n```\n\nWhich results in\n\n```erlang\n{ok,{xmlel,\u003c\u003c\"foo\"\u003e\u003e,\n           [{\u003c\u003c\"attr1\"\u003e\u003e,\u003c\u003c\"bar\"\u003e\u003e}],\n           [{xmlcdata,\u003c\u003c\"Some Value\"\u003e\u003e}]}}\n```\n\n##### Generate an XML document from Erlang terms\n\n```erlang\nXml = {xmlel,\u003c\u003c\"foo\"\u003e\u003e,\n    [{\u003c\u003c\"attr1\"\u003e\u003e,\u003c\u003c\"bar\"\u003e\u003e}],  % Attributes\n    [{xmlcdata,\u003c\u003c\"Some Value\"\u003e\u003e}]   % Elements\n},\nerlxml:to_binary(Xml).\n```\n\nWhich results in\n\n```erlang\n\u003c\u003c\"\u003cfoo attr1=\\\"bar\\\"\u003eSome Value\u003c/foo\u003e\"\u003e\u003e\n```\n\n##### Streaming parsing\n\n```erlang\nChunk1 = \u003c\u003c\"\u003cstream\u003e\u003cfoo attr1=\\\"bar\"\u003e\u003e,\nChunk2 = \u003c\u003c\"\\\"\u003eSome Value\u003c/foo\u003e\u003c/stream\u003e\"\u003e\u003e,\n{ok, Parser} = erlxml:new_stream(),\n{ok,[{xmlstreamstart,\u003c\u003c\"stream\"\u003e\u003e,[]}]} = erlxml:parse_stream(Parser, Chunk1),\nRs = erlxml:parse_stream(Parser, Chunk2),\n{ok,[{xmlel,\u003c\u003c\"foo\"\u003e\u003e,\n        [{\u003c\u003c\"attr1\"\u003e\u003e,\u003c\u003c\"bar\"\u003e\u003e}],\n        [{xmlcdata,\u003c\u003c\"Some Value\"\u003e\u003e}]},\n     {xmlstreamend,\u003c\u003c\"stream\"\u003e\u003e}]} = Rs.\n```\n\n### Options \n\nWhen you create a stream using `new_stream/1` you can specify the following options:\n\n- `stanza_limit` - Specify the maximum size a stanza can have. In case the library parses more than this number of bytes \nwithout finding a stanza will return and error `{error, {max_stanza_limit_hit, binary()}}`. Example: `{stanza_limit, 65000}`. By default, it is 0 that means unlimited.\n\n- `strip_non_utf8` - Will strip from attributes values and node values elements all invalid utf8 characters. This is considered \nuser input and might have malformed chars. Default is `false`.\n\n### Benchmarks\n\nThe benchmark code is inside the benchmark `folder`. The performances are compared against:\n\n- [exml][3] version used: 3.4.1\n- [fast_xml][4] version used: 1.1.55\n\nAll tests are running with three different concurrency levels (how many erlang processes are spawn)\n\n- C1 (concurrency level 1)\n- C5 (concurrency level 5)\n- C10 (concurrency level 10)\n\n##### DOM parsing\n\nParse the same stanza defined in `benchmark/benchmark.erl` for 600000 times:\n\n```sh\nmake bench_parsing\n```\n\n| Library    | C1 (ms)      |   C5 (ms) | C10 (ms)  |\n|:----------:|:------------:|:---------:|:---------:|\n| erlxml     |  1875.128    |  417.368  |  315.65   |\n| exml       |  2417.334    |  578.226  |  407.516  |\n| fast_xml   | 24159.517    | 5854.817  | 4007.837  |\n\nNote: \n\n- Starting version 3.0.0, [exml][3] saw significant improvements by replacing Expat with RapidXML.\n- `erlxml` delivers the best performance, followed by `exml`, while `fast_xml` performs the worst (huge difference).\n\n##### Generate an XML document from Erlang terms\n\nEncode the same erlang term defined in `benchmark/benchmark.erl` for 600000 times:\n\n```sh\nmake bench_encoding\n```\n\n|   Library   | C1 (ms)  | C5 (ms) | C10 (ms) |\n|:-----------:|:--------:|:-------:|:--------:|\n|  `erlxml`   | 1381.338 | 322.851 | 251.936  |\n|   `exml`    | 1333.54  | 301.625 | 234.295  |\n| `fast_xml`  | 1019.238 | 238.676 | 198.69   |\n\nNote:\n\n- `fast_xml` delivers the best performance, followed by `exml`, and `erlxml` with almost the same performance.\n- `erlxml` improved encoding performance in version `2.1.0` by removing unnecessary memory copy and string length computing.\n\n##### Streaming parsing\n\nTest is located in `benchmark/benchmark_stream.erl`, and will load all stanza's from `test/data/stream.txt` and run the parsing mode over that stanza's for 60000 times:\n\n```sh\nmake bench_streaming\n```\n\n```sh\n### engine: erlxml concurrency: 1 -\u003e 2337.112 ms 193.81 MB/sec total bytes processed: 452.96 MB\n### engine: erlxml concurrency: 5 -\u003e 598.737 ms 756.52 MB/sec total bytes processed: 452.96 MB\n### engine: erlxml concurrency: 10 -\u003e 407.379 ms 1.09 GB/sec total bytes processed: 452.96 MB\n### engine: exml concurrency: 1 -\u003e 11790.975 ms 38.42 MB/sec total bytes processed: 452.96 MB\n### engine: exml concurrency: 5 -\u003e 2552.339 ms 177.47 MB/sec total bytes processed: 452.96 MB\n### engine: exml concurrency: 10 -\u003e 1840.267 ms 246.14 MB/sec total bytes processed: 452.96 MB\n### engine: fast_xml concurrency: 1 -\u003e 22677.758 ms 19.97 MB/sec total bytes processed: 452.96 MB\n### engine: fast_xml concurrency: 5 -\u003e 5184.096 ms 87.37 MB/sec total bytes processed: 452.96 MB\n### engine: fast_xml concurrency: 10 -\u003e 3854.402 ms 117.52 MB/sec total bytes processed: 452.96 MB \n```\n\n|   Library   | C1 (MB/s)      | C5 (MB/s) | C10 (MB/s) |\n|:-----------:|:--------------:|:---------:|:----------:|\n|   erlxml    | 193.81         |  756.52   |   1090     |\n|    exml     |  38.42         |  177.47   |    246     |\n| fast_xml    |  19.97         |   87.37   |    117     |\n\nNotes:\n\n- `erlxml` is the clear winner.\n\n[1]:http://pugixml.org\n[2]:http://pugixml.org/benchmark.html\n[3]:https://github.com/esl/exml\n[4]:https://github.com/processone/fast_xml\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsilviucpp%2Ferlxml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsilviucpp%2Ferlxml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsilviucpp%2Ferlxml/lists"}