An open API service indexing awesome lists of open source software.

https://github.com/trinker/read_docx


https://github.com/trinker/read_docx

Last synced: 9 months ago
JSON representation

Awesome Lists containing this project

README

          

A function developed by [Bryan Goodrich](https://www.linkedin.com/in/bryangoodrich) for reading .docx files into R:

```{r, echo=FALSE}
library(knitr)
## knitr::knit2html("README.Rmd", "README.md")
```

**The Code**

```{r}
read_docx <- function (file, skip = 0) {
tmp <- tempfile()
if (!dir.create(tmp)) stop("Temporary directory could not be established.")
unzip(file, exdir = tmp)
xmlfile <- file.path(tmp, "word", "document.xml")
doc <- XML::xmlTreeParse(xmlfile, useInternalNodes = TRUE)
unlink(tmp, recursive = TRUE)
nodeSet <- XML::getNodeSet(doc, "//w:p")
pvalues <- sapply(nodeSet, XML::xmlValue)
pvalues <- pvalues[pvalues != ""]
if (skip > 0) pvalues <- pvalues[-seq(skip)]
pvalues
}
```

**In Action...**

```{r, message=FALSE}
library(qdapRegex); library(qdap)
input <- rm_non_ascii(read_docx("LRA2014AdvocacyProposal.docx"))
rm_citation(unbag(input), extract=TRUE)
```