Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hrbrmstr/crux
Identify the Crux of an Article
https://github.com/hrbrmstr/crux
crux r rjava rstats web-scraping
Last synced: 7 days ago
JSON representation
Identify the Crux of an Article
- Host: GitHub
- URL: https://github.com/hrbrmstr/crux
- Owner: hrbrmstr
- License: apache-2.0
- Created: 2019-03-01T09:04:05.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-03-01T12:56:01.000Z (almost 6 years ago)
- Last Synced: 2024-11-15T14:42:29.794Z (2 months ago)
- Topics: crux, r, rjava, rstats, web-scraping
- Language: R
- Homepage:
- Size: 15.6 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: rmarkdown::github_document
editor_options:
chunk_output_type: inline
---
```{r pkg-knitr-opts, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE, fig.retina=2, message=FALSE, warning=FALSE)
options(width=120)
```[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/crux.svg?branch=master)](https://travis-ci.org/hrbrmstr/crux)
[![Coverage Status](https://codecov.io/gh/hrbrmstr/crux/branch/master/graph/badge.svg)](https://codecov.io/gh/hrbrmstr/crux)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/crux)](https://cran.r-project.org/package=crux)# crux
Identify the Crux of an Article
## Description
Methods are provided to retrieve HTML content and return extracted
metadata and summarised plain text. Further methods are provided to classify
URLs with or without making network calls. Based on .
## What's Inside The TinThe following functions are implemented:
- `classify_url`: Classify a URL with or without making network calls
- `is_ad_image`: Classify a URL with or without making network calls
- `is_likely_archive`: Classify a URL with or without making network calls
- `is_likely_article`: Classify a URL with or without making network calls
- `is_likely_audio`: Classify a URL with or without making network calls
- `is_likely_binary_doc`: Classify a URL with or without making network calls
- `is_likely_executable`: Classify a URL with or without making network calls
- `is_likely_image`: Classify a URL with or without making network calls
- `is_likely_video`: Classify a URL with or without making network calls
- `is_web_scheme`: Classify a URL with or without making network calls
- `summarise_url`: Summarise the contents at a URL to essential bits## Installation
```{r install-ex, eval=FALSE}
install.packages(c("cruxjars", "crux"), repos = "https://cinc.rud.is/")
```## Usage
```{r lib-ex}
library(crux)# current version
packageVersion("crux")```
```{r}
str(
summarise_url("http://time.com/5541738/joe-biden-backtracks-pence-praise-criticism/"), 1
)
``````{r}
str(
classify_url("https://www.washingtonpost.com/powerpost/house-democrats-explode-in-recriminations-as-liberals-lash-out-at-moderates/2019/02/28/c3d163fe-3b87-11e9-a06c-3ec8ed509d15_story.html")
)
```## crux Metrics
```{r cloc, echo=FALSE}
cloc::cloc_pkg_md()
```## Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md).
By participating in this project you agree to abide by its terms.