https://github.com/r-lib/xmlparsedata

R code parse data as an XML tree
https://github.com/r-lib/xmlparsedata

r xml

Last synced: about 1 year ago
JSON representation

R code parse data as an XML tree

Host: GitHub
URL: https://github.com/r-lib/xmlparsedata
Owner: r-lib
License: other
Created: 2017-07-25T08:38:45.000Z (almost 9 years ago)
Default Branch: main
Last Pushed: 2025-05-07T16:13:09.000Z (about 1 year ago)
Last Synced: 2025-06-03T05:03:56.454Z (about 1 year ago)
Topics: r, xml
Language: R
Homepage: https://r-lib.github.io/xmlparsedata/
Size: 5.75 MB
Stars: 24
Watchers: 4
Forks: 7
Open Issues: 6
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r}

#| label: setup

#| echo: false

#| message: false

knitr::opts_chunk$set(

  comment = "#>",

  tidy = FALSE,

  error = FALSE

)

```

# xmlparsedata

> Parse Data of R Code as an 'XML' Tree

[![R-CMD-check](https://github.com/r-lib/xmlparsedata/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/r-lib/xmlparsedata/actions/workflows/R-CMD-check.yaml)

[![](https://www.r-pkg.org/badges/version/xmlparsedata)](https://www.r-pkg.org/pkg/xmlparsedata)

[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/xmlparsedata)](https://www.r-pkg.org/pkg/xmlparsedata)

[![Codecov test coverage](https://codecov.io/gh/r-lib/xmlparsedata/graph/badge.svg)](https://app.codecov.io/gh/r-lib/xmlparsedata)

Convert the output of 'utils::getParseData()' to an 'XML' tree, that is

searchable and easier to manipulate in general.

---

  - [Installation](#installation)

  - [Usage](#usage)

    - [Introduction](#introduction)

    - [`utils::getParseData()`](#utilsgetparsedata)

    - [`xml_parse_data()`](#xml_parse_data)

    - [Renaming some tokens](#renaming-some-tokens)

    - [Search the parse tree with `xml2`](#search-the-parse-tree-with-xml2)

  - [License](#license)

## Installation

Stable version:

```{r}

#| eval: false

install.packages("xmlparsedata")

```

Development version:

```{r}

#| eval: false

pak::pak("r-lib/zip")

```

## Usage

### Introduction

In recent R versions the parser can attach source code location

information to the parsed expressions. This information is often

useful for static analysis, e.g. code linting. It can be accessed

via the `utils::getParseData()` function.

`xmlparsedata` converts this information to an XML tree.

The R parser's token names are preserved in the XML as much as

possible, but some of them are not valid XML tag names, so they are

renamed, see below.

### `utils::getParseData()`

`utils::getParseData()` summarizes the parse information in a data

frame. The data frame has one row per expression tree node, and each

node points to its parent. Here is a small example:

```{r}

p <- parse(

  text = "function(a = 1, b = 2) { \n  a + b\n}\n",

  keep.source = TRUE

  )

getParseData(p)

```

### `xml_parse_data()`

`xmlparsedata::xml_parse_data()` converts the parse information to

an XML document. It works similarly to `getParseData()`. Specify the

`pretty = TRUE` option to pretty-indent the XML output. Note that this

has a small overhead, so if you are parsing large files, I suggest you

omit it.

```{r}

library(xmlparsedata)

xml <- xml_parse_data(p, pretty = TRUE)

cat(xml)

```

The top XML tag is ``, which is a list of

expressions, each expression is an `` tag. Each tag

has attributes that define the location: `line1`, `col1`,

`line2`, `col2`. These are from the `getParseData()`

data frame column names.

### Renaming some tokens

The R parser's token names are preserved in the XML as much as

possible, but some of them are not valid XML tag names, so they are

renamed, see the `xml_parse_token_map` vector for the mapping:

```{r}

xml_parse_token_map

```

### Search the parse tree with `xml2`

The `xml2` package can search XML documents using

[XPath](https://en.wikipedia.org/wiki/XPath) expressions. This is often

useful to search for specific code patterns.

As an example we search a source file from base R for `1:nrow()`

expressions, which are usually unsafe, as `nrow()` might be zero,

and then the expression is equivalent to `1:0`, i.e. `c(1, 0)`, which

is usually not the intended behavior.

We load and parse the file directly from the the R source code mirror

at https://github.com/wch/r-source:

```{r}

url <- paste0(

  "https://raw.githubusercontent.com/wch/r-source/",

  "4fc93819fc7401b8695ce57a948fe163d4188f47/src/library/tools/R/xgettext.R"

)

src <- readLines(url)

p <- parse(text = src, keep.source = TRUE)

```

and we convert it to an XML tree:

```{r}

library(xml2)

xml <- read_xml(xml_parse_data(p))

```

The `1:nrow()` expression corresponds to the following

tree in R:

```

  +-- 

    +-- NUM_CONST: 1

  +-- ':'

  +-- 

    +-- 

      +-- SYMBOL_FUNCTION_CALL nrow

    +-- '('

	+-- 

	+-- ')'

```

```{r}

bad <- xml_parse_data(

  parse(text = "1:nrow(expr)", keep.source = TRUE),

  pretty = TRUE

)

cat(bad)

```

This translates to the following XPath expression (ignoring

the last tree tokens from the `length(expr)` expressions):

```{r}

xp <- paste0(

  "//expr",

     "[expr[NUM_CONST[text()='1']]]",

     "[OP-COLON]",

     "[expr[expr[SYMBOL_FUNCTION_CALL[text()='nrow']]]]"

)

```

We can search for this subtree with `xml2::xml_find_all()`:

```{r}

bad_nrow <- xml_find_all(xml, xp)

bad_nrow

```

There is only one hit, in line 334:

```{r}

cbind(332:336, src[332:336])

```

## Code of Conduct

Please note that the xmlparsedata project is released with a

[Contributor Code of Conduct](https://r-lib.github.io/xmlparsedata/CODE_OF_CONDUCT.html).

By contributing to this project, you agree to abide by its terms.

## License

MIT © Mango Solutions, RStudio

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/r-lib/xmlparsedata

Awesome Lists containing this project

README