Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qcam/saxy
Fast SAX parser and encoder for XML in Elixir
https://github.com/qcam/saxy
elixir elixir-lang xml xml-builder xml-builder-library xml-library xml-parser
Last synced: 2 days ago
JSON representation
Fast SAX parser and encoder for XML in Elixir
- Host: GitHub
- URL: https://github.com/qcam/saxy
- Owner: qcam
- License: mit
- Created: 2017-12-27T01:44:33.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-10-22T13:45:43.000Z (3 months ago)
- Last Synced: 2025-01-02T13:05:42.317Z (9 days ago)
- Topics: elixir, elixir-lang, xml, xml-builder, xml-builder-library, xml-library, xml-parser
- Language: Elixir
- Homepage: https://hexdocs.pm/saxy
- Size: 1.62 MB
- Stars: 281
- Watchers: 5
- Forks: 39
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
- freaking_awesome_elixir - Elixir - Saxy is an XML parser and encoder in Elixir that focuses on speed and standard compliance. (XML)
- fucking-awesome-elixir - saxy - Saxy is an XML parser and encoder in Elixir that focuses on speed and standard compliance. (XML)
- awesome-elixir - saxy - Saxy is an XML parser and encoder in Elixir that focuses on speed and standard compliance. (XML)
README
Saxy
====[![Test suite](https://github.com/qcam/saxy/actions/workflows/test.yml/badge.svg)](https://github.com/qcam/saxy/actions/workflows/test.yml)
[![Module Version](https://img.shields.io/hexpm/v/saxy.svg)](https://hex.pm/packages/saxy)Saxy (Sรก xแป) is an XML SAX parser and encoder in Elixir that focuses on speed, usability and standard compliance.
Comply with [Extensible Markup Language (XML) 1.0 (Fifth Edition)](https://www.w3.org/TR/xml/).
## Features highlight
* An incredibly fast XML 1.0 SAX parser.
* An extremely fast XML encoder.
* Native support for streaming parsing large XML files.
* Parse XML documents into simple DOM format.
* Support quick returning in event handlers.## Installation
Add `:saxy` to your `mix.exs`.
```elixir
def deps() do
[
{:saxy, "~> 1.6"}
]
end
```## Overview
Full documentation is available on [HexDocs](https://hexdocs.pm/saxy/).
If you never work with a SAX parser before, please check out [this
guide][sax-guide].### SAX parser
A SAX event handler implementation is required before starting parsing.
```elixir
defmodule MyEventHandler do
@behaviour Saxy.Handlerdef handle_event(:start_document, prolog, state) do
IO.inspect("Start parsing document")
{:ok, [{:start_document, prolog} | state]}
enddef handle_event(:end_document, _data, state) do
IO.inspect("Finish parsing document")
{:ok, [{:end_document} | state]}
enddef handle_event(:start_element, {name, attributes}, state) do
IO.inspect("Start parsing element #{name} with attributes #{inspect(attributes)}")
{:ok, [{:start_element, name, attributes} | state]}
enddef handle_event(:end_element, name, state) do
IO.inspect("Finish parsing element #{name}")
{:ok, [{:end_element, name} | state]}
enddef handle_event(:characters, chars, state) do
IO.inspect("Receive characters #{chars}")
{:ok, [{:characters, chars} | state]}
enddef handle_event(:cdata, cdata, state) do
IO.inspect("Receive CData #{cdata}")
{:ok, [{:cdata, cdata} | state]}
end
end
```Then start parsing XML documents with:
```elixir
iex> xml = ""
iex> Saxy.parse_string(xml, MyEventHandler, [])
{:ok,
[{:end_document},
{:end_element, "foo"},
{:start_element, "foo", [{"bar", "value"}]},
{:start_document, [version: "1.0"]}]}
```### Streaming parsing
Saxy also accepts file stream as the input:
```elixir
stream = File.stream!("/path/to/file")Saxy.parse_stream(stream, MyEventHandler, initial_state)
```It even supports parsing a normal stream.
```elixir
stream = File.stream!("/path/to/file") |> Stream.filter(&(&1 != "\n"))Saxy.parse_stream(stream, MyEventHandler, initial_state)
```### Partial parsing
Saxy can parse an XML document partially. This feature is useful when the
document cannot be turned into a stream e.g receiving over socket.```elixir
{:ok, partial} = Partial.new(MyEventHandler, initial_state)
{:cont, partial} = Partial.parse(partial, "")
{:cont, partial} = Partial.parse(partial, "")
{:cont, partial} = Partial.parse(partial, "")
{:ok, state} = Partial.terminate(partial)
```### Simple DOM format exporting
Sometimes it will be convenient to just export the XML document into simple DOM
format, which is a 3-element tuple including the tag name, attributes, and a
list of its children.`Saxy.SimpleForm` module has this nicely supported:
```elixir
Saxy.SimpleForm.parse_string(data){"menu", [],
[
{"movie",
[{"id", "tt0120338"}, {"url", "https://www.imdb.com/title/tt0120338/"}],
[{"name", [], ["Titanic"]}, {"characters", [], ["Jack & Rose"]}]},
{"movie",
[{"id", "tt0109830"}, {"url", "https://www.imdb.com/title/tt0109830/"}],
[
{"name", [], ["Forest Gump"]},
{"characters", [], ["Forest & Jenny"]}
]}
]}
```### XML builder
Saxy offers two APIs to build simple form and encode XML document.
Use `Saxy.XML` to build and compose XML simple form, then `Saxy.encode!/2`
to encode the built element into XML binary.```elixir
iex> import Saxy.XML
iex> element = element("person", [gender: "female"], "Alice")
{"person", [{"gender", "female"}], [{:characters, "Alice"}]}
iex> Saxy.encode!(element, [])
"Alice"
```See `Saxy.XML` for more XML building APIs.
Saxy also provides `Saxy.Builder` protocol to help composing structs into simple form.
```elixir
defmodule Person do
@derive {Saxy.Builder, name: "person", attributes: [:gender], children: [:name]}defstruct [:gender, :name]
endiex> jack = %Person{gender: :male, name: "Jack"}
iex> john = %Person{gender: :male, name: "John"}
iex> import Saxy.XML
iex> root = element("people", [], [jack, john])
iex> Saxy.encode!(root, [])
"JackJohn"
```## FAQs with Saxy/XMLs
### Saxy sounds cool! But I just wanted to quickly convert some XMLs into maps/JSON...
Saxy does not have offer XML to maps conversion, because many awesome people
already made it happen ๐ช:* https://github.com/bennyhat/xml_json
* https://github.com/xinz/sax_mapAlternatively, this [pull request](https://github.com/qcam/saxy/pull/78) could
serve as a good reference if you want to implement your own map-based handler.### Does Saxy work with XPath?
Saxy in its core is a SAX parser, therefore Saxy does not, and likely will
not, offer any XPath functionality.[SweetXml][sweet_xml] is a wonderful library to work with XPath. However,
`:xmerl`, the library used by SweetXml, is not always memory efficient and
speedy. You can combine the best of both sides with [Saxmerl][saxmerl], which
is a Saxy extension converting XML documents into SweetXml compatible format.
Please check that library out for more information.### Saxy! Where did the name come from?
![Sa xi Chuong Duong](./assets/saxi.jpg)
Sa Xi, pronounced like `sa-see`, is an awesome soft drink made by [Chuong Duong](http://www.cdbeco.com.vn/en).
## Benchmarking
Note that benchmarking XML parsers is difficult and highly depends on the complexity
of the documents being parsed. Event I try hard to make the benchmarking suite
fair but it's hard to avoid biases when choosing the documents to benchmark
against.Therefore the conclusion in this section is only for reference purpose. Please
feel free to benchmark against your target documents. The benchmark suite can be found
in [bench/](https://github.com/qcam/saxy/tree/master/bench).A rule of thumb is that we should compare apple to apple. Some XML parsers
target only specific types of XML. Therefore some indicators are provided in the
test suite to let know of the fairness of the benchmark results.Some quick and biased conclusions from the benchmark suite:
* For SAX parser, Saxy is usually 1.4 times faster than [Erlsom](https://github.com/willemdj/erlsom).
With deeply nested documents, Saxy is noticeably faster (4 times faster).
* For XML builder and encoding, Saxy is usually 10 to 30 times faster than [XML Builder](https://github.com/joshnuss/xml_builder).
With deeply nested documents, it could be 180 times faster.
* Saxy significantly uses less memory than XML Builder (4 times to 25 times).
* Saxy significantly uses less memory than Xmerl, Erlsom and Exomler (1.4 times
10 times).## Limitations
* No XSD supported.
* No DTD supported, when Saxy encounters a `