https://github.com/r-lib/xmlparsedata
R code parse data as an XML tree
https://github.com/r-lib/xmlparsedata
r xml
Last synced: about 1 year ago
JSON representation
R code parse data as an XML tree
- Host: GitHub
- URL: https://github.com/r-lib/xmlparsedata
- Owner: r-lib
- License: other
- Created: 2017-07-25T08:38:45.000Z (almost 9 years ago)
- Default Branch: main
- Last Pushed: 2025-05-07T16:13:09.000Z (about 1 year ago)
- Last Synced: 2025-06-03T05:03:56.454Z (about 1 year ago)
- Topics: r, xml
- Language: R
- Homepage: https://r-lib.github.io/xmlparsedata/
- Size: 5.75 MB
- Stars: 24
- Watchers: 4
- Forks: 7
- Open Issues: 6
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
---
output: github_document
---
```{r}
#| label: setup
#| echo: false
#| message: false
knitr::opts_chunk$set(
comment = "#>",
tidy = FALSE,
error = FALSE
)
```
# xmlparsedata
> Parse Data of R Code as an 'XML' Tree
[](https://github.com/r-lib/xmlparsedata/actions/workflows/R-CMD-check.yaml)
[](https://www.r-pkg.org/pkg/xmlparsedata)
[](https://www.r-pkg.org/pkg/xmlparsedata)
[](https://app.codecov.io/gh/r-lib/xmlparsedata)
Convert the output of 'utils::getParseData()' to an 'XML' tree, that is
searchable and easier to manipulate in general.
---
- [Installation](#installation)
- [Usage](#usage)
- [Introduction](#introduction)
- [`utils::getParseData()`](#utilsgetparsedata)
- [`xml_parse_data()`](#xml_parse_data)
- [Renaming some tokens](#renaming-some-tokens)
- [Search the parse tree with `xml2`](#search-the-parse-tree-with-xml2)
- [License](#license)
## Installation
Stable version:
```{r}
#| eval: false
install.packages("xmlparsedata")
```
Development version:
```{r}
#| eval: false
pak::pak("r-lib/zip")
```
## Usage
### Introduction
In recent R versions the parser can attach source code location
information to the parsed expressions. This information is often
useful for static analysis, e.g. code linting. It can be accessed
via the `utils::getParseData()` function.
`xmlparsedata` converts this information to an XML tree.
The R parser's token names are preserved in the XML as much as
possible, but some of them are not valid XML tag names, so they are
renamed, see below.
### `utils::getParseData()`
`utils::getParseData()` summarizes the parse information in a data
frame. The data frame has one row per expression tree node, and each
node points to its parent. Here is a small example:
```{r}
p <- parse(
text = "function(a = 1, b = 2) { \n a + b\n}\n",
keep.source = TRUE
)
getParseData(p)
```
### `xml_parse_data()`
`xmlparsedata::xml_parse_data()` converts the parse information to
an XML document. It works similarly to `getParseData()`. Specify the
`pretty = TRUE` option to pretty-indent the XML output. Note that this
has a small overhead, so if you are parsing large files, I suggest you
omit it.
```{r}
library(xmlparsedata)
xml <- xml_parse_data(p, pretty = TRUE)
cat(xml)
```
The top XML tag is ``, which is a list of
expressions, each expression is an `` tag. Each tag
has attributes that define the location: `line1`, `col1`,
`line2`, `col2`. These are from the `getParseData()`
data frame column names.
### Renaming some tokens
The R parser's token names are preserved in the XML as much as
possible, but some of them are not valid XML tag names, so they are
renamed, see the `xml_parse_token_map` vector for the mapping:
```{r}
xml_parse_token_map
```
### Search the parse tree with `xml2`
The `xml2` package can search XML documents using
[XPath](https://en.wikipedia.org/wiki/XPath) expressions. This is often
useful to search for specific code patterns.
As an example we search a source file from base R for `1:nrow()`
expressions, which are usually unsafe, as `nrow()` might be zero,
and then the expression is equivalent to `1:0`, i.e. `c(1, 0)`, which
is usually not the intended behavior.
We load and parse the file directly from the the R source code mirror
at https://github.com/wch/r-source:
```{r}
url <- paste0(
"https://raw.githubusercontent.com/wch/r-source/",
"4fc93819fc7401b8695ce57a948fe163d4188f47/src/library/tools/R/xgettext.R"
)
src <- readLines(url)
p <- parse(text = src, keep.source = TRUE)
```
and we convert it to an XML tree:
```{r}
library(xml2)
xml <- read_xml(xml_parse_data(p))
```
The `1:nrow()` expression corresponds to the following
tree in R:
```
+--
+-- NUM_CONST: 1
+-- ':'
+--
+--
+-- SYMBOL_FUNCTION_CALL nrow
+-- '('
+--
+-- ')'
```
```{r}
bad <- xml_parse_data(
parse(text = "1:nrow(expr)", keep.source = TRUE),
pretty = TRUE
)
cat(bad)
```
This translates to the following XPath expression (ignoring
the last tree tokens from the `length(expr)` expressions):
```{r}
xp <- paste0(
"//expr",
"[expr[NUM_CONST[text()='1']]]",
"[OP-COLON]",
"[expr[expr[SYMBOL_FUNCTION_CALL[text()='nrow']]]]"
)
```
We can search for this subtree with `xml2::xml_find_all()`:
```{r}
bad_nrow <- xml_find_all(xml, xp)
bad_nrow
```
There is only one hit, in line 334:
```{r}
cbind(332:336, src[332:336])
```
## Code of Conduct
Please note that the xmlparsedata project is released with a
[Contributor Code of Conduct](https://r-lib.github.io/xmlparsedata/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.
## License
MIT © Mango Solutions, RStudio