Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kbrw/sweet_xml
https://github.com/kbrw/sweet_xml
elixir stream xml xpath
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/kbrw/sweet_xml
- Owner: kbrw
- License: mit
- Created: 2014-07-24T01:32:45.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-10-09T15:13:06.000Z (about 1 month ago)
- Last Synced: 2024-10-19T19:26:38.661Z (20 days ago)
- Topics: elixir, stream, xml, xpath
- Language: Elixir
- Homepage:
- Size: 340 KB
- Stars: 362
- Watchers: 12
- Forks: 60
- Open Issues: 27
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- freaking_awesome_elixir - Elixir - Query XML simply and effectively. (XML)
README
# SweetXml
[![Build Status](https://api.travis-ci.org/kbrw/sweet_xml.svg)](http://travis-ci.org/kbrw/sweet_xml)
[![Module Version](https://img.shields.io/hexpm/v/sweet_xml.svg)](https://hex.pm/packages/sweet_xml)
[![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/sweet_xml/)
[![Total Download](https://img.shields.io/hexpm/dt/sweet_xml.svg)](https://hex.pm/packages/sweet_xml)
[![License](https://img.shields.io/hexpm/l/sweet_xml.svg)](https://github.com/kbrw/sweet_xml/blob/master/LICENSE)
[![Last Updated](https://img.shields.io/github/last-commit/kbrw/sweet_xml.svg)](https://github.com/kbrw/sweet_xml/commits/master)`SweetXml` is a thin wrapper around `:xmerl`. It allows you to convert a
`char_list` or `xmlElement` record as defined in `:xmerl` to an elixir value such
as `map`, `list`, `string`, `integer`, `float` or any combination of these.## Installation
Add dependency to your project's `mix.exs`:
```elixir
def deps do
[{:sweet_xml, "~> 0.7.4"}]
end
````SweetXml` depends on `:xmerl`. On some Linux systems, you might need
to install the package `erlang-xmerl`.## Examples
Given an XML document such as below:
```xml
Match One
1
Team One
2
Team Two
Match Two
2
Team Two
3
Team Three
Match Three
1
Team One
3
Team Three
```
We can do the following:```elixir
import SweetXml
doc = "..." # as above
```Get the name of the first match:
```elixir
result = doc |> xpath(~x"//matchup/name/text()") # `sigil_x` for (x)path
assert result == 'Match One'
```Get the XML record of the name of the first match:
```elixir
result = doc |> xpath(~x"//matchup/name"e) # `e` is the modifier for (e)ntity
assert result == {:xmlElement, :name, :name, [], {:xmlNamespace, [], []},
[matchup: 2, matchups: 2, game: 1], 2, [],
[{:xmlText, [name: 2, matchup: 2, matchups: 2, game: 1], 1, [],
'Match One', :text}], [],
...}
```Get the full list of matchup name:
```elixir
result = doc |> xpath(~x"//matchup/name/text()"l) # `l` stands for (l)ist
assert result == ['Match One', 'Match Two', 'Match Three']
```Get a list of winner-id by attributes:
```elixir
result = doc |> xpath(~x"//matchup/@winner-id"l)
assert result == ['1', '2', '1']
```Get a list of matchups with different map structure:
```elixir
result = doc |> xpath(
~x"//matchups/matchup"l,
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
)
assert result == [
%{name: 'Match One', winner: %{name: 'Team One'}},
%{name: 'Match Two', winner: %{name: 'Team Two'}},
%{name: 'Match Three', winner: %{name: 'Team One'}}
]
```Or directly return a mapping of your liking:
```elixir
result = doc |> xmap(
matchups: [
~x"//matchups/matchup"l,
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
],
last_matchup: [
~x"//matchups/matchup[last()]",
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
]
)
assert result == %{
matchups: [
%{name: 'Match One', winner: %{name: 'Team One'}},
%{name: 'Match Two', winner: %{name: 'Team Two'}},
%{name: 'Match Three', winner: %{name: 'Team One'}}
],
last_matchup: %{name: 'Match Three', winner: %{name: 'Team One'}}
}
```## The ~x Sigil
Warning ! Because we use `xmerl` internally, only XPath 1.0 paths are handled.
In the above examples, we used the expression `~x"//some/path"` to
define the path. The reason is it allows us to more precisely specify what
is being returned.* `~x"//some/path"`
without any modifiers, `xpath/2` will return the value of the entity if
the entity is of type `xmlText`, `xmlAttribute`, `xmlPI`, `xmlComment`
as defined in `:xmerl`* `~x"//some/path"e`
`e` stands for (e)ntity. This forces `xpath/2` to return the entity with
which you can further chain your `xpath/2` call* `~x"//some/path"l`
'l' stands for (l)ist. This forces `xpath/2` to return a list. Without
`l`, `xpath/2` will only return the first element of the match* `~x"//some/path"k`
'k' stands for (k)eyword. This forces `xpath/2` to return a Keyword instead of a Map.
* `~x"//some/path"el` - mix of the above
* `~x"//some/path"s`
's' stands for (s)tring. This forces `xpath/2` to return the value as
string instead of a char list.* `~x"//some/path"S`
'S' stands for soft (S)tring. This forces `xpath/2` to return the value as
string instead of a char list, but if node content is incompatible with a string,
set `""`.* `~x"//some/path"o`
'o' stands for (o)ptional. This allows the path to not exist, and will return nil.
* `~x"//some/path"sl` - string list.
* `~x"//some/path"i`
'i' stands for (i)nteger. This forces `xpath/2` to return the value as
integer instead of a char list.* `~x//some/path"I`
'I' stands for soft (I)nteger. This forces `xpath/2` to return the value as
integer instead of a char list, but if node content is incompatible with an integer,
set `0`.* `~x"//some/path"f`
'f' stands for (f)loat. This forces `xpath/2` to return the value as
float instead of a char list.* `~x//some/path"F`
'F' stands for soft (F)loat. This forces `xpath/2` to return the value as
float instead of a char list, but if node content is incompatible with a float,
set `0.0`.* `~x"//some/path"il` - integer list.
If you use the *optional* modifier `o` together with a *soft* cast modifier
(uppercase), then the value is set to `nil` when the value is not compatible
for instance `~x//some/path/text()"Fo` return `nil` if the text is not a number.Also in the examples section, we always import SweetXml first. This
makes `x_sigil` available in the current scope. Without it, instead of using
`~x`, you can use the `%SweetXpath` struct```elixir
assert ~x"//some/path"e == %SweetXpath{path: '//some/path', is_value: false, is_list: false, cast_to: false}
```Note the use of char_list in the path definition.
## Namespace support
Given a XML document such as below
```xml
Match One
1
Team One
5
2
Team Two
2
```
We can do the following:
```elixir
import SweetXml
xml_str = "..." # as above
doc = parse(xml_str, namespace_conformant: true)
```Note the fact that we explicitly parse the XML with the `namespace_conformant:
true` option. This is needed to allow nodes to be identified in a prefix
independent way.We can use namespace prefixes of our preference, regardless of what prefix is
used in the document:```elixir
result = doc
|> xpath(~x"//ff:matchup/ff:name/text()"
|> add_namespace("ff", "http://example.com/fantasy-league"))assert result == 'Match One'
```We can specify multiple namespace prefixes:
```elixir
result = doc
|> xpath(~x"//ff:matchup//bb:runs/text()"
|> add_namespace("ff", "http://example.com/fantasy-league")
|> add_namespace("bb", "http://example.com/baseball-stats"))assert result == '5'
```## From Chaining to Nesting
Here's a brief explanation to how nesting came about.
### Chaining
Both `xpath` and `xmap` can take an `:xmerl` XML record as the first argument.
Therefore you can chain calls to these functions like below:```elixir
doc
|> xpath(~x"//li"l)
|> Enum.map fn (li_node) ->
%{
name: li_node |> xpath(~x"./name/text()"),
age: li_node |> xpath(~x"./age/text()")
}
end
```### Mapping to a structure
Since the previous example is such a common use case, SweetXml allows you just
simply do the following```elixir
doc
|> xpath(
~x"//li"l,
name: ~x"./name/text()",
age: ~x"./age/text()"
)
```### Nesting
But what you want is sometimes more complex than just that, SweetXml thus also
allows nesting```elixir
doc
|> xpath(
~x"//li"l,
name: [
~x"./name",
first: ~x"./first/text()",
last: ~x"./last/text()"
],
age: ~x"./age/text()"
)
```### Transform By
Sometimes we need to transform the value to what we need, SweetXml supports that
via `transform_by/2````elixir
doc = "
result = doc |> xpath(
~x"//li"l,
name: [
~x"./name",
first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
],
age: ~x"./age/text()"i
)
^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]
```
The same can be used to break parsing code into reusable functions that can be
used in nesting:
```elixir
doc = "
parse_name = fn xpath_node ->
xpath_node |> xmap(
first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
)
end
result = doc |> xpath(
~x"//li"l,
name: ~x"./name" |> transform_by(parse_name),
age: ~x"./age/text()"i
)
^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]
```
For more examples, please take a look at the tests and help.
## Streaming
`SweetXml` now also supports streaming in various forms. Here's a sample XML doc.
Notice the certain lines have XML tags that span multiple lines:
```xml
XML Parsing
Nested Head
Neato €
-
First - Second
- Third
- Forth
first star
```
### Working with `File.stream!/1`
Working with streams is exactly the same as working with binaries:
```elixir
File.stream!("file_above.xml") |> xpath(...)
```
### `SweetXml` element streaming
Once you have a file stream, you may not want to work with the entire document to
save memory:
```elixir
file_stream = File.stream!("file_above.xml")
result = file_stream
|> stream_tags([:li, :special_match_key])
|> Stream.map(fn
{_, doc} ->
xpath(doc, ~x"./text()")
end)
|> Enum.to_list
assert result == ['\n First', 'Second\n ', 'Third', 'Forth', 'first star']
```
**Warning:** In case of large document, you may want to use the `discard`
option to avoid memory leak.
```elixir
result = file_stream
|> stream_tags([:li, :special_match_key], discard: [:li, :special_match_key])
```
## Security
Whenever you have to deal with some XML that was not generated by your system (untrusted document),
it is highly recommended that you separate the parsing step from the mapping step, in order to be able
to prevent some default behavior through options. You can check the doc for `SweetXml.parse/2` for more details.
The current recommendations are:
```
doc |> parse(dtd: :none) |> xpath(spec, subspec)
enum |> stream_tags(tags, dtd: :none)
```
## Copyright and License
Copyright (c) 2014, Frank Liu
SweetXml source code is licensed under the [MIT License](https://github.com/kbrw/sweet_xml/blob/master/LICENSE).
# CONTRIBUTING
Hi, and thank you for wanting to contribute.
Please refer to the centralized information available at: https://github.com/kbrw#contributing