https://github.com/himloul/r-utils
Reproducible R functions to avoid repetitive work.
https://github.com/himloul/r-utils
excel xlsx xml
Last synced: 10 months ago
JSON representation
Reproducible R functions to avoid repetitive work.
- Host: GitHub
- URL: https://github.com/himloul/r-utils
- Owner: himloul
- License: mit
- Created: 2020-11-13T20:35:02.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-05-21T16:32:42.000Z (about 3 years ago)
- Last Synced: 2025-03-03T19:53:33.404Z (over 1 year ago)
- Topics: excel, xlsx, xml
- Language: R
- Homepage:
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Useful functions scripted in R
this repository includes some reproducible functions
## Data Collection
### Read Excel (XML 2003) Spreadsheets in R
While there are packages in R to read spreadsheets (`readxl::read_xlsx` and `readxl::read_xls`), they do not support XML 2003 spreadsheets, which can often be misidentified as XLS files due to their file extension. This function provides a solution to reading these files by using the read_xml function from the `xml2` package and the `xml_find_all` function from the `tidyverse` package. The function reads the spreadsheet, identifies the data in the rows and columns, and returns the data in a data frame with the column names derived from the first row of data.
```r
library(tidyverse)
library(xml2)
readExcelXML =
function(filename) {
doc <- read_xml(filename)
ns <- xml_ns(doc)
rows <- xml_find_all(doc, paste0(".//ss:Worksheet/ss:Table/ss:Row"), ns = ns)
values <- lapply(rows, . %>% xml_find_all(".//ss:Cell/ss:Data", ns = ns) %>% xml_text %>% unlist)
columnNames <- values[[1]]
dat <- do.call(rbind.data.frame, c(values[-1], stringsAsFactors = FALSE))
names(dat) <- columnNames
dat
}
```