An open API service indexing awesome lists of open source software.

https://github.com/edwindj/xmlwriter

Elegant and fast xml generation for R
https://github.com/edwindj/xmlwriter

Last synced: 4 months ago
JSON representation

Elegant and fast xml generation for R

Awesome Lists containing this project

README

        

---
output: github_document
editor_options:
markdown:
wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# xmlwriter

> Fast and elegant XML generation for R

[![CRAN
status](https://www.r-pkg.org/badges/version/xmlwriter)](https://CRAN.R-project.org/package=xmlwriter)
[![R-CMD-check](https://github.com/edwindj/xmlwriter/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/edwindj/xmlwriter/actions/workflows/R-CMD-check.yaml)

`xmlwriter` is an R package that provides a fast and simple interface for
creating XML documents and fragments from R. It has a simple elegant
syntax for creating `xml_fragment`s.

`xmlwriter`'s XML generation from R lists is fast, implemented in C++
using [`Rcpp`](https://cran.r-project.org/package=Rcpp). Curious for the
benchmarks? Check the [performance section](#performance).

`xmlwriter` can be used as a companion to R packages
[`XML`](https://cran.r-project.org/package=XML) or
[`xml2`](https://cran.r-project.org/package=xml2) which are both
wonderful packages optimized for parsing, querying and manipulating XML
documents. Both `XML` and `xml2` provide several ways for creating XML
documents, but they are not optimized for generating and writing XML.

Creating XML documents with `XML` and `xml2` can be a bit cumbersome,
mostly because it\
forces the author to manipulate the XML document tree, obscuring the XML
structure of the document, and making it hard to read the XML that is
being generated. `xml2` does provide a way to create XML documents from
R data structures using nested lists which is a powerful feature, but it
is not optimized for speed or readability.

`xmlwriter` provides an intuitive interface for creating XML documents,
that mimics how XML is written in a text editor.

It has two different ways to create XML documents:

- a light weight R syntax using `tag`, `frag`, `+`, `/` and
`data_frag`, creating an `xml_fragment`, that can be easily
translated into a `character` or `xml2::xml_document` object, or be
used as a flexible building block for generating a larger XML
document.
- an `xmlbuilder` object that allows you to create XML documents in a
feed-forward manner, with `start` and `end` methods, giving you more
control on the XML document structure, including XML comment, prolog
etc.

## Installation

You can install the development version of `xmlwriter` from:w

[GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("edwindj/xmlwriter")
```

## Example

### Using `tag`, `frag`, `+`, `/` and `data_frag`:

`xml_fragment`s allow to write XML using readable R syntax.
`tag` and `frag` are function that create XML elements that can be combined
using `+` and `/` operators resulting in an `xml_fragment`.

`xml_fragment`s can be used to create an `character` with valid XML, a
`xml2::xml_document` or as a building block for more complex XML
documents. An `xml_fragment` is a list object that is identical to the
output of `xml2::as_list`, and can be converted to a `character` or an
`xml2::xml_document` object but `xmlwriter` provides a much faster
implementation (see performance).

`tag` is a function that creates a simple `xml_fragment` element with a
given tag name, text, and attributes. It allows for creating elements with
a tag name using a character, which gives flexibility.

```{r}
library(xmlwriter)

# tag creates a simple xml_fragment, with text and named attributes
tag("person", "John Doe", id = 1, state="CA")
```

`frag` is a function that allows for specifying nested elements and
attributes, thus creating a more complex `xml_fragment`.

An argument to `frag` is either:

- a named element in which case the name is used as the tag name for that
element. The element can be a value or a nested `frag`.
- an unnamed element in which case the element is added as a text
node.
- a `.attr` argument that is used to add attributes to the parent
element.

```{r}
# a frag can contain multiple elements
frag(
name = "John Doe",
age = 30
)

# and can nest frags
frag(
person = frag(
# attributes on person are specified using .attr
.attr = c(id = 1, state = "CA"),
name = "John Doe",
age = 30
)
)
```

The `xml_fragment` function is a restricted version of `frag` that does
not allow `.attr` on its top level.

```{r}
# xml_fragment is more strict version of a frag
fragment <- xml_fragment(
person = frag(
.attr = c(id = 1, state="CA"),
name = "John Doe",
age = 30,
address = frag(
street = "123 Main St",
city = "Anytown",
state = "CA",
zip = 12345
)
)
)

```

```{r, eval=FALSE}
cat(as.character(fragment))
```

```{r, results='asis', echo=FALSE}
cat("```XML\n")
cat(as.character(fragment))
cat("\n```\n")
```

`data_frag` is function that converts a data.frame to an `xml_fragment`:

```{r data}
data <- data.frame(
name = c("John Doe", "Jane Doe"),
age = c(30, 25),
stringsAsFactors = FALSE
)

# create an xml_fragment from a data.frame
data_frag(data, row_tag = "person")

```

Or you can use it within an `xml_fragment`:

```{r, results='hide'}
# but you can also use it within an xml_fragment
doc <- xml_fragment(
homeless = data_frag(data, row_tags = "person")
)

doc
```

```{r, eval=TRUE, echo=FALSE, results='asis'}
cat("```XML\n")
doc |> as.character() |> cat()
cat("\n```\n")
```

#### Combine fragments with `+`, `append` or `c()`)

`xml_fragment`s such as `tag`, `frag` and `data_frag` can be combined
with the `+` operator, which is equivalent to the `append()` and `c()`
function: it creates sibling XML nodes.

```{r}
library(xmlwriter)
john <- tag("person", "John", id = 1)
jane <- tag("person", "Jane", id = 2)

john + jane

john + tag("person", "Jane", id = 2)

john + xml_fragment(
person = frag(
.attr = c(id = 2),
"Jane"
)
)
```

#### Add child fragments with '/' or `add_child_fragment()`

- the `/` operator, which is equivalent to the `add_child_fragment`
function which creates a child XML node of the last XML node in an
`xml_fragment`.

```{r}
tag("person", id = 1) / (
tag("name", "John Doe")
)

tag("person", id = 1) / frag(
name = "John Doe",
age = 30
)

tag("person", id = 1) |>
add_child_fragment(
name = "John Doe",
age = 30
)
```

#### Flexible XML creation...

Using the `tag`, `frag`, `+` and `/` functions, one can create complex
XML documents in a very flexible manner.

```{r}
# overly complex, but shows flexible construction of XML
fragment <-
tag("person", id = "1") / # adds child nodes to person
( frag(
name = "John Doe",
age = 30
) + # add an extra child node to person, with subnodes
tag("address") |> add_child_fragment(
street = "123 Main St",
city = "Anytown",
state = "CA",
zip = 12345
)
)

fragment
```
A cleaner version of the above:
```{r}
tag("person", id = 1) / frag(
name = "John Doe",
age = 30,
address = frag(
street = "123 Main St",
city = "Anytown",
state = "CA",
zip = 12345
)
)
```

#### `xml2` compatibility

`xml_writer` does not have a hard dependency on `xml2`, but the output
of xml_writer can be converted to a `xml2::xml_document` object. An
`xml_fragment` is identical to the output of `xml2::as_list`, so it can
be converted to a `xml2::xml_document` object.

`xml_fragment` supports the following `xml2` methods:

- `xml2::as_xml_document`, calls the `xml2::read_xml` method
- `xml2::as_list`, removes the `xml_fragment` class
- `xml2::write_xml`, writes the xml to a file or console

One can also use the `as_xml_nodeset` function to convert the
`xml_fragment` to a `xml2::xml_nodeset` object.

```{r}
fragment |> xml2::as_xml_document()
fragment |> as_xml_nodeset()
```

`xml_fragment` implements the `xml2::write_xml` method

```{r write_xml, eval=FALSE}
xml2::write_xml(fragment, file="") # print to the console
```

results in:

```{r, echo=FALSE, results='asis'}
cat("```XML\n")
xml2::write_xml(fragment, "")
cat("\n```\n")
```

# Performance of xml generation:{#performance}

`xmlwriter` is optimized for generating xml documents and fragments from
a R `list` structure that is identical to the `xml2::as_list` output.

```{r performance, eval=TRUE}
library(microbenchmark)

library(xml2)
library(xmlwriter)

# read in a sample 600k xml file as a R list str
doc <- xml2::read_xml("./example/DataGeneric.xml")
doc_list <- xml2::as_list(doc)

# just copy the list and set an extra attribute class="xml_fragment"
# making it a xml_fragment object
doc_fragment <- structure(doc_list, class = "xml_fragment")

# see how long it takes to create an xml document with xml2 and xmlwriter

(m <- microbenchmark(
xml2 = xml2::as_xml_document(doc_list),
xmlwriter = xml2::as_xml_document(doc_fragment),
times = 10
))
```

```{r, include=FALSE}
a <- aggregate(time ~ expr, m, mean)
faster <- round(a[1,2] / a[2,2],1)
```

`xmlwriter` is about `r faster` times faster than `xml2` for creating an xml document from
an R list. Note that `xmlwriter` includes a round trip, since `xmlwriter` first generates
a `character` vector which is then read using `xml2::read_xml()`.

### Using an `xmlbuilder` object

`xmlbuilder` is an object that allows you to create xml documents in a
feed-forward manner.

With `xml_fragment` one creates the xml document structure as one
(large) R list and then converts it to a xml string or
`xml2::xml_document`, with `xmlbuilder` one incremently builds the xml
document, without having the whole list structure in memory. It also
provides more functions for the output xml document, like adding a
prolog or comment.

It provides the following methods:

- `start` to start a new element
- `end` to end the current element
- `element` to add an element with a value
- `comment` to add a comment
- `prolog` to add a prolog
- `fragment` to add a fragment

```{r xmlbuilder}
library(xmlwriter)

b <- xmlbuilder(allow_fragments = FALSE)
b$comment("This is an xml comment")
b$start("homeless")
b$start("person", id = "1")
b$element("name", "John Doe")
b$element("age", 30)
b$start("address")
b$element("street", "123 Main St")
b$element("city", "Anytown")
b$element("state", "CA")
b$element("zip", 12345)
b$end("address")
b$end("person")
b$start("person", id = "2")
b$element("name", "Jane Doe")
b$element("age", 25)
b$start("address")
b$element("street", "321 Main St")
b$element("city", "Anytown")
b$element("state", "CA")
b$element("zip", 54321)
b$end("address")
b$end("person")
b$fragment(
person = frag(
.attr = c(id = "3"),
name = "Jim Doe",
age = 35
)
)
b$end("homeless")

# includes a xml prolog and comment
b

as.character(b)

# only contains the actual nodes
xml2::as_xml_document(b)
```