https://github.com/mkearney/readthat

Read Text Data
https://github.com/mkearney/readthat

http-get r r-package read read-file read-url readsource readtextfile rstats text

Last synced: 7 months ago
JSON representation

Read Text Data

Host: GitHub
URL: https://github.com/mkearney/readthat
Owner: mkearney
License: other
Created: 2019-09-20T16:55:10.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-10-25T16:39:54.000Z (about 6 years ago)
Last Synced: 2025-03-26T05:41:51.446Z (8 months ago)
Topics: http-get, r, r-package, read, read-file, read-url, readsource, readtextfile, rstats, text
Language: R
Size: 1.5 MB
Stars: 26
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

jimsghstars - mkearney/readthat - Read Text Data (R)

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

library(readthat)

```

# readthat 

[![CRAN status](https://www.r-pkg.org/badges/version/readthat)](https://CRAN.R-project.org/package=readthat)

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)

[![Travis build status](https://travis-ci.org/mkearney/readthat.svg?branch=master)](https://travis-ci.org/mkearney/readthat)

[![Codecov test coverage](https://codecov.io/gh/mkearney/readthat/branch/master/graph/badge.svg)](https://codecov.io/gh/mkearney/readthat?branch=master)

Quickly read text/source from local files and web pages.

## Installation

You can install the development released version of readthat from Github with:

``` r

remotes::install_github("mkearney/readthat")

```

## Examples

Let's say we want to read-in the source of the following websites:

```{r}

## a vector of URLs

urls <- c(

  "https://mikewk.com",

  "https://cnn.com",

  "https://www.cnn.com/us"

)

```

Use `readthat::read()` to read the text/source of a single file/URL

```{r}

## read single web/file (returns text vector)

x <- read(urls[1])

## preview output

substr(x, 1, 60)

## use apply functions to read multiple pages

xx <- sapply(urls, read)

## preview output

lapply(xx, substr, 1, 60)

```

## Comparisons

Benchmark comparison for reading a text file:

```{r}

## save a text file

writeLines(read(urls[1]), x <- tempfile())

## coompare read times

bm_file <- bench::mark(

  readr = readr::read_lines(x),

  readthat = read(x),

  readLines = readLines(x),

  check = FALSE

)

## view results

bm_file

```

```{r, include=FALSE}

p1 <- ggplot2::autoplot(bm_file)

p1 + ggplot2::ggsave("man/figures/README-bm_file.png",

    width = 9, height = 5, units = "in")

```

![](man/figures/README-bm_file.png)

Benchmark comparison for reading a web page:

```{r}

x <- "https://www.espn.com/nfl/scoreboard"

bm_html <- bench::mark(

  httr = httr::content(httr::GET(x), as = "text", encoding = "UTF-8"),

  xml2 = xml2::read_html(x),

  readthat = read(x),

  readLines = readLines(x, warn = FALSE),

  readr = readr::read_lines(x),

  check = FALSE,

  iterations = 25,

  filter_gc = TRUE

)

bm_html

```

```{r, include=FALSE}

p2 <- ggplot2::autoplot(bm_html)

p2 + ggplot2::ggsave("man/figures/README-bm_html.png",

    width = 9, height = 5, units = "in")

```

![](man/figures/README-bm_html.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mkearney/readthat

Awesome Lists containing this project

README