Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mkearney/readthat
Read Text Data
https://github.com/mkearney/readthat
http-get r r-package read read-file read-url readsource readtextfile rstats text
Last synced: 3 months ago
JSON representation
Read Text Data
- Host: GitHub
- URL: https://github.com/mkearney/readthat
- Owner: mkearney
- License: other
- Created: 2019-09-20T16:55:10.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-10-25T16:39:54.000Z (over 5 years ago)
- Last Synced: 2024-08-13T07:15:32.541Z (6 months ago)
- Topics: http-get, r, r-package, read, read-file, read-url, readsource, readtextfile, rstats, text
- Language: R
- Size: 1.5 MB
- Stars: 26
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - mkearney/readthat - Read Text Data (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(readthat)
```# readthat
[data:image/s3,"s3://crabby-images/663a7/663a7c9bedf8827d6340bf577a5e3055615f6b73" alt="CRAN status"](https://CRAN.R-project.org/package=readthat)
[data:image/s3,"s3://crabby-images/b1188/b118805435842f11524232186ba827bd737e812b" alt="Lifecycle: experimental"](https://www.tidyverse.org/lifecycle/#experimental)
[data:image/s3,"s3://crabby-images/9e929/9e9294b72ed2a6b8a9e797e175b8ca68cbf5e732" alt="Travis build status"](https://travis-ci.org/mkearney/readthat)
[data:image/s3,"s3://crabby-images/864ab/864ab3347c1c2cb89f903415aa7c7ea2d157969d" alt="Codecov test coverage"](https://codecov.io/gh/mkearney/readthat?branch=master)Quickly read text/source from local files and web pages.
## Installation
You can install the development released version of readthat from Github with:
``` r
remotes::install_github("mkearney/readthat")
```## Examples
Let's say we want to read-in the source of the following websites:
```{r}
## a vector of URLs
urls <- c(
"https://mikewk.com",
"https://cnn.com",
"https://www.cnn.com/us"
)
```Use `readthat::read()` to read the text/source of a single file/URL
```{r}
## read single web/file (returns text vector)
x <- read(urls[1])## preview output
substr(x, 1, 60)## use apply functions to read multiple pages
xx <- sapply(urls, read)## preview output
lapply(xx, substr, 1, 60)
```## Comparisons
Benchmark comparison for reading a text file:
```{r}
## save a text file
writeLines(read(urls[1]), x <- tempfile())## coompare read times
bm_file <- bench::mark(
readr = readr::read_lines(x),
readthat = read(x),
readLines = readLines(x),
check = FALSE
)## view results
bm_file
``````{r, include=FALSE}
p1 <- ggplot2::autoplot(bm_file)
p1 + ggplot2::ggsave("man/figures/README-bm_file.png",
width = 9, height = 5, units = "in")
```data:image/s3,"s3://crabby-images/85d38/85d382730c610626bc4cf65222a1b54d5cf21df2" alt=""
Benchmark comparison for reading a web page:
```{r}
x <- "https://www.espn.com/nfl/scoreboard"
bm_html <- bench::mark(
httr = httr::content(httr::GET(x), as = "text", encoding = "UTF-8"),
xml2 = xml2::read_html(x),
readthat = read(x),
readLines = readLines(x, warn = FALSE),
readr = readr::read_lines(x),
check = FALSE,
iterations = 25,
filter_gc = TRUE
)
bm_html
``````{r, include=FALSE}
p2 <- ggplot2::autoplot(bm_html)
p2 + ggplot2::ggsave("man/figures/README-bm_html.png",
width = 9, height = 5, units = "in")
```data:image/s3,"s3://crabby-images/bfbbc/bfbbc0d23b9bcae923064699edce02734bbd9f66" alt=""