Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mkearney/readthat
Read Text Data
https://github.com/mkearney/readthat
http-get r r-package read read-file read-url readsource readtextfile rstats text
Last synced: 3 months ago
JSON representation
Read Text Data
- Host: GitHub
- URL: https://github.com/mkearney/readthat
- Owner: mkearney
- License: other
- Created: 2019-09-20T16:55:10.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2019-10-25T16:39:54.000Z (about 5 years ago)
- Last Synced: 2024-06-11T12:43:02.793Z (5 months ago)
- Topics: http-get, r, r-package, read, read-file, read-url, readsource, readtextfile, rstats, text
- Language: R
- Size: 1.5 MB
- Stars: 26
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - mkearney/readthat - Read Text Data (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(readthat)
```# readthat
[![CRAN status](https://www.r-pkg.org/badges/version/readthat)](https://CRAN.R-project.org/package=readthat)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![Travis build status](https://travis-ci.org/mkearney/readthat.svg?branch=master)](https://travis-ci.org/mkearney/readthat)
[![Codecov test coverage](https://codecov.io/gh/mkearney/readthat/branch/master/graph/badge.svg)](https://codecov.io/gh/mkearney/readthat?branch=master)Quickly read text/source from local files and web pages.
## Installation
You can install the development released version of readthat from Github with:
``` r
remotes::install_github("mkearney/readthat")
```## Examples
Let's say we want to read-in the source of the following websites:
```{r}
## a vector of URLs
urls <- c(
"https://mikewk.com",
"https://cnn.com",
"https://www.cnn.com/us"
)
```Use `readthat::read()` to read the text/source of a single file/URL
```{r}
## read single web/file (returns text vector)
x <- read(urls[1])## preview output
substr(x, 1, 60)## use apply functions to read multiple pages
xx <- sapply(urls, read)## preview output
lapply(xx, substr, 1, 60)
```## Comparisons
Benchmark comparison for reading a text file:
```{r}
## save a text file
writeLines(read(urls[1]), x <- tempfile())## coompare read times
bm_file <- bench::mark(
readr = readr::read_lines(x),
readthat = read(x),
readLines = readLines(x),
check = FALSE
)## view results
bm_file
``````{r, include=FALSE}
p1 <- ggplot2::autoplot(bm_file)
p1 + ggplot2::ggsave("man/figures/README-bm_file.png",
width = 9, height = 5, units = "in")
```![](man/figures/README-bm_file.png)
Benchmark comparison for reading a web page:
```{r}
x <- "https://www.espn.com/nfl/scoreboard"
bm_html <- bench::mark(
httr = httr::content(httr::GET(x), as = "text", encoding = "UTF-8"),
xml2 = xml2::read_html(x),
readthat = read(x),
readLines = readLines(x, warn = FALSE),
readr = readr::read_lines(x),
check = FALSE,
iterations = 25,
filter_gc = TRUE
)
bm_html
``````{r, include=FALSE}
p2 <- ggplot2::autoplot(bm_html)
p2 + ggplot2::ggsave("man/figures/README-bm_html.png",
width = 9, height = 5, units = "in")
```![](man/figures/README-bm_html.png)