Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/noamross/quickscraper
An R package wrapping quickscrape, a node.js web scraper
https://github.com/noamross/quickscraper
Last synced: 10 days ago
JSON representation
An R package wrapping quickscrape, a node.js web scraper
- Host: GitHub
- URL: https://github.com/noamross/quickscraper
- Owner: noamross
- License: mit
- Created: 2014-07-04T05:17:31.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-07-22T23:48:16.000Z (over 10 years ago)
- Last Synced: 2024-06-11T17:06:29.538Z (5 months ago)
- Language: R
- Size: 1.63 MB
- Stars: 16
- Watchers: 4
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# quickscraper
An R wrapper for the [quickscrape](https://github.com/ContentMine/quickscrape)
web scraping tool.## Installation
You need `node.js` 0.8 or later installed to run quickscraper. You can get it
[here](http://nodejs.org/).In R, run:
```
if(!require(devtools)) install.packge("devtools")
library(devtools)
install_github("noamross/quickscraper")
```When first loaded, `quickscraper` will ask to install its node module dependencies.
These are not bundled with the package because they contain system-specific
binaries which would not be allowed on CRAN, but by using the node package
manager after the R package install, the correct ones can be selected and
installed.The node-wrapping and dependency-checking components of this package (In
[`R/node.r`](https://github.com/noamross/quickscraper/blob/master/R/node.R))
will eventually be extracted into a separate package of utilties for wrapping
arbitrary node modules.## Usage
Usage is currently very basic, as the `quickscrape` API is still evolving.
The function `scrape()` scrapes information from a url or a set or URLs,
saving the results to disk or returning them to R in the form of a list.
For instance:```
scrape('https://peerj.com/articles/409/')
```See `?scrape` for more details and options.
## Notes
Note that this repository contains both
[quickscrape](https://github.com/ContentMine/quickscrape) and
[journal-scrapers](https://github.com/ContentMine/journal-scrapers). These
are handled using `git subtree`.