https://github.com/trinker/read_docx
https://github.com/trinker/read_docx
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/trinker/read_docx
- Owner: trinker
- Created: 2014-11-16T22:08:59.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2014-11-16T22:19:42.000Z (over 11 years ago)
- Last Synced: 2025-04-04T08:39:08.985Z (about 1 year ago)
- Language: R
- Size: 113 KB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
A function developed by [Bryan Goodrich](https://www.linkedin.com/in/bryangoodrich) for reading .docx files into R:
```{r, echo=FALSE}
library(knitr)
## knitr::knit2html("README.Rmd", "README.md")
```
**The Code**
```{r}
read_docx <- function (file, skip = 0) {
tmp <- tempfile()
if (!dir.create(tmp)) stop("Temporary directory could not be established.")
unzip(file, exdir = tmp)
xmlfile <- file.path(tmp, "word", "document.xml")
doc <- XML::xmlTreeParse(xmlfile, useInternalNodes = TRUE)
unlink(tmp, recursive = TRUE)
nodeSet <- XML::getNodeSet(doc, "//w:p")
pvalues <- sapply(nodeSet, XML::xmlValue)
pvalues <- pvalues[pvalues != ""]
if (skip > 0) pvalues <- pvalues[-seq(skip)]
pvalues
}
```
**In Action...**
```{r, message=FALSE}
library(qdapRegex); library(qdap)
input <- rm_non_ascii(read_docx("LRA2014AdvocacyProposal.docx"))
rm_citation(unbag(input), extract=TRUE)
```