Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/noamross/quickscraper

An R package wrapping quickscrape, a node.js web scraper
https://github.com/noamross/quickscraper

Last synced: 10 days ago
JSON representation

An R package wrapping quickscrape, a node.js web scraper

Host: GitHub
URL: https://github.com/noamross/quickscraper
Owner: noamross
License: mit
Created: 2014-07-04T05:17:31.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2014-07-22T23:48:16.000Z (over 10 years ago)
Last Synced: 2024-06-11T17:06:29.538Z (5 months ago)
Language: R
Size: 1.63 MB
Stars: 16
Watchers: 4
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # quickscraper

An R wrapper for the [quickscrape](https://github.com/ContentMine/quickscrape)

web scraping tool.

## Installation

You need `node.js` 0.8 or later installed to run quickscraper.  You can get it

[here](http://nodejs.org/).

In R, run:

```

if(!require(devtools)) install.packge("devtools")

library(devtools)

install_github("noamross/quickscraper")

```

When first loaded, `quickscraper` will ask to install its node module dependencies.

These are not bundled with the package because they contain system-specific

binaries which would not be allowed on CRAN, but by using the node package

manager after the R package install,  the correct ones can be selected and

installed.

The node-wrapping and dependency-checking components of this package (In 

[`R/node.r`](https://github.com/noamross/quickscraper/blob/master/R/node.R)) 

will eventually be extracted into a separate package of utilties for wrapping

arbitrary node modules.

## Usage

Usage is currently very basic, as the `quickscrape` API is still evolving.

The function `scrape()` scrapes information from a url or a set or URLs,

saving the results to disk or returning them to R in the form of a list.

For instance:

```

scrape('https://peerj.com/articles/409/')

```

See `?scrape` for more details and options.

## Notes

Note that this repository contains both

[quickscrape](https://github.com/ContentMine/quickscrape) and

[journal-scrapers](https://github.com/ContentMine/journal-scrapers). These

are handled using `git subtree`.