Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hrbrmstr/crux

Identify the Crux of an Article
https://github.com/hrbrmstr/crux

crux r rjava rstats web-scraping

Last synced: 7 days ago
JSON representation

Identify the Crux of an Article

Awesome Lists containing this project

README

        

---
output: rmarkdown::github_document
editor_options:
chunk_output_type: inline
---
```{r pkg-knitr-opts, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE, fig.retina=2, message=FALSE, warning=FALSE)
options(width=120)
```

[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/crux.svg?branch=master)](https://travis-ci.org/hrbrmstr/crux)
[![Coverage Status](https://codecov.io/gh/hrbrmstr/crux/branch/master/graph/badge.svg)](https://codecov.io/gh/hrbrmstr/crux)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/crux)](https://cran.r-project.org/package=crux)

# crux

Identify the Crux of an Article

## Description

Methods are provided to retrieve HTML content and return extracted
metadata and summarised plain text. Further methods are provided to classify
URLs with or without making network calls. Based on .

## What's Inside The Tin

The following functions are implemented:

- `classify_url`: Classify a URL with or without making network calls
- `is_ad_image`: Classify a URL with or without making network calls
- `is_likely_archive`: Classify a URL with or without making network calls
- `is_likely_article`: Classify a URL with or without making network calls
- `is_likely_audio`: Classify a URL with or without making network calls
- `is_likely_binary_doc`: Classify a URL with or without making network calls
- `is_likely_executable`: Classify a URL with or without making network calls
- `is_likely_image`: Classify a URL with or without making network calls
- `is_likely_video`: Classify a URL with or without making network calls
- `is_web_scheme`: Classify a URL with or without making network calls
- `summarise_url`: Summarise the contents at a URL to essential bits

## Installation

```{r install-ex, eval=FALSE}
install.packages(c("cruxjars", "crux"), repos = "https://cinc.rud.is/")
```

## Usage

```{r lib-ex}
library(crux)

# current version
packageVersion("crux")

```

```{r}
str(
summarise_url("http://time.com/5541738/joe-biden-backtracks-pence-praise-criticism/"), 1
)
```

```{r}
str(
classify_url("https://www.washingtonpost.com/powerpost/house-democrats-explode-in-recriminations-as-liberals-lash-out-at-moderates/2019/02/28/c3d163fe-3b87-11e9-a06c-3ec8ed509d15_story.html")
)
```

## crux Metrics

```{r cloc, echo=FALSE}
cloc::cloc_pkg_md()
```

## Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md).
By participating in this project you agree to abide by its terms.