{"id":29020231,"url":"https://github.com/r-lib/xmlparsedata","last_synced_at":"2025-06-26T01:04:52.654Z","repository":{"id":23150683,"uuid":"98284253","full_name":"r-lib/xmlparsedata","owner":"r-lib","description":"R code parse data as an XML tree","archived":false,"fork":false,"pushed_at":"2025-05-07T16:13:09.000Z","size":6025,"stargazers_count":24,"open_issues_count":6,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-03T05:03:56.454Z","etag":null,"topics":["r","xml"],"latest_commit_sha":null,"homepage":"https://r-lib.github.io/xmlparsedata/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"MangoTheCat/xmlparsedata","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/r-lib.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null}},"created_at":"2017-07-25T08:38:45.000Z","updated_at":"2025-05-07T16:10:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"270ddffe-6958-4d0c-ac2d-eb9424d0b679","html_url":"https://github.com/r-lib/xmlparsedata","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/r-lib/xmlparsedata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r-lib%2Fxmlparsedata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r-lib%2Fxmlparsedata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r-lib%2Fxmlparsedata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r-lib%2Fxmlparsedata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/r-lib","download_url":"https://codeload.github.com/r-lib/xmlparsedata/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/r-lib%2Fxmlparsedata/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261978912,"owners_count":23239417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","xml"],"created_at":"2025-06-26T01:04:49.665Z","updated_at":"2025-06-26T01:04:52.635Z","avatar_url":"https://github.com/r-lib.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r}\n#| label: setup\n#| echo: false\n#| message: false\nknitr::opts_chunk$set(\n  comment = \"#\u003e\",\n  tidy = FALSE,\n  error = FALSE\n)\n```\n\n# xmlparsedata\n\n\u003e Parse Data of R Code as an 'XML' Tree\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/r-lib/xmlparsedata/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/r-lib/xmlparsedata/actions/workflows/R-CMD-check.yaml)\n[![](https://www.r-pkg.org/badges/version/xmlparsedata)](https://www.r-pkg.org/pkg/xmlparsedata)\n[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/xmlparsedata)](https://www.r-pkg.org/pkg/xmlparsedata)\n[![Codecov test coverage](https://codecov.io/gh/r-lib/xmlparsedata/graph/badge.svg)](https://app.codecov.io/gh/r-lib/xmlparsedata)\n\u003c!-- badges: end --\u003e\n\nConvert the output of 'utils::getParseData()' to an 'XML' tree, that is\nsearchable and easier to manipulate in general.\n\n---\n\n  - [Installation](#installation)\n  - [Usage](#usage)\n    - [Introduction](#introduction)\n    - [`utils::getParseData()`](#utilsgetparsedata)\n    - [`xml_parse_data()`](#xml_parse_data)\n    - [Renaming some tokens](#renaming-some-tokens)\n    - [Search the parse tree with `xml2`](#search-the-parse-tree-with-xml2)\n  - [License](#license)\n\n## Installation\n\nStable version:\n\n```{r}\n#| eval: false\ninstall.packages(\"xmlparsedata\")\n```\n\nDevelopment version:\n\n```{r}\n#| eval: false\npak::pak(\"r-lib/zip\")\n```\n\n## Usage\n\n### Introduction\n\nIn recent R versions the parser can attach source code location\ninformation to the parsed expressions. This information is often\nuseful for static analysis, e.g. code linting. It can be accessed\nvia the `utils::getParseData()` function.\n\n`xmlparsedata` converts this information to an XML tree.\nThe R parser's token names are preserved in the XML as much as\npossible, but some of them are not valid XML tag names, so they are\nrenamed, see below.\n\n### `utils::getParseData()`\n\n`utils::getParseData()` summarizes the parse information in a data\nframe. The data frame has one row per expression tree node, and each\nnode points to its parent. Here is a small example:\n\n```{r}\np \u003c- parse(\n  text = \"function(a = 1, b = 2) { \\n  a + b\\n}\\n\",\n  keep.source = TRUE\n  )\ngetParseData(p)\n```\n\n### `xml_parse_data()`\n\n`xmlparsedata::xml_parse_data()` converts the parse information to\nan XML document. It works similarly to `getParseData()`. Specify the\n`pretty = TRUE` option to pretty-indent the XML output. Note that this\nhas a small overhead, so if you are parsing large files, I suggest you\nomit it.\n\n```{r}\nlibrary(xmlparsedata)\nxml \u003c- xml_parse_data(p, pretty = TRUE)\ncat(xml)\n```\n\nThe top XML tag is `\u003cexprlist\u003e`, which is a list of\nexpressions, each expression is an `\u003cexpr\u003e` tag. Each tag\nhas attributes that define the location: `line1`, `col1`,\n`line2`, `col2`. These are from the `getParseData()`\ndata frame column names.\n\n### Renaming some tokens\n\nThe R parser's token names are preserved in the XML as much as\npossible, but some of them are not valid XML tag names, so they are\nrenamed, see the `xml_parse_token_map` vector for the mapping:\n\n```{r}\nxml_parse_token_map\n```\n\n### Search the parse tree with `xml2`\n\nThe `xml2` package can search XML documents using\n[XPath](https://en.wikipedia.org/wiki/XPath) expressions. This is often\nuseful to search for specific code patterns.\n\nAs an example we search a source file from base R for `1:nrow(\u003cexpr\u003e)`\nexpressions, which are usually unsafe, as `nrow()` might be zero,\nand then the expression is equivalent to `1:0`, i.e. `c(1, 0)`, which\nis usually not the intended behavior.\n\nWe load and parse the file directly from the the R source code mirror\nat https://github.com/wch/r-source:\n\n```{r}\nurl \u003c- paste0(\n  \"https://raw.githubusercontent.com/wch/r-source/\",\n  \"4fc93819fc7401b8695ce57a948fe163d4188f47/src/library/tools/R/xgettext.R\"\n)\nsrc \u003c- readLines(url)\np \u003c- parse(text = src, keep.source = TRUE)\n```\n\nand we convert it to an XML tree:\n\n```{r}\nlibrary(xml2)\nxml \u003c- read_xml(xml_parse_data(p))\n```\n\nThe `1:nrow(\u003cexpr\u003e)` expression corresponds to the following\ntree in R:\n\n```\n\u003cexpr\u003e\n  +-- \u003cexpr\u003e\n    +-- NUM_CONST: 1\n  +-- ':'\n  +-- \u003cexpr\u003e\n    +-- \u003cexpr\u003e\n      +-- SYMBOL_FUNCTION_CALL nrow\n    +-- '('\n\t+-- \u003cexpr\u003e\n\t+-- ')'\n```\n\n```{r}\nbad \u003c- xml_parse_data(\n  parse(text = \"1:nrow(expr)\", keep.source = TRUE),\n  pretty = TRUE\n)\ncat(bad)\n```\n\nThis translates to the following XPath expression (ignoring\nthe last tree tokens from the `length(expr)` expressions):\n\n```{r}\nxp \u003c- paste0(\n  \"//expr\",\n     \"[expr[NUM_CONST[text()='1']]]\",\n     \"[OP-COLON]\",\n     \"[expr[expr[SYMBOL_FUNCTION_CALL[text()='nrow']]]]\"\n)\n```\n\nWe can search for this subtree with `xml2::xml_find_all()`:\n\n```{r}\nbad_nrow \u003c- xml_find_all(xml, xp)\nbad_nrow\n```\n\nThere is only one hit, in line 334:\n\n```{r}\ncbind(332:336, src[332:336])\n```\n\n## Code of Conduct\n\nPlease note that the xmlparsedata project is released with a\n[Contributor Code of Conduct](https://r-lib.github.io/xmlparsedata/CODE_OF_CONDUCT.html).\nBy contributing to this project, you agree to abide by its terms.\n\n## License\n\nMIT © Mango Solutions, RStudio\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fr-lib%2Fxmlparsedata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fr-lib%2Fxmlparsedata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fr-lib%2Fxmlparsedata/lists"}