Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dharmafly/noodle

A node server and module which allows for cross-domain page scraping on web documents with JSONP or POST.
https://github.com/dharmafly/noodle

Last synced: 5 days ago
JSON representation

A node server and module which allows for cross-domain page scraping on web documents with JSONP or POST.

Awesome Lists containing this project

README

        

[noodle](https://noodle.dharmafly.com)
=============================

noodle is a Node.js server and module for querying and scraping data from web documents. It features:

```JSON
{
"url": "https://github.com/explore",
"selector": "ol.ranked-repositories h3 a",
"extract": "href"
}
```

Features
--------

- Cross domain document querying (html, json, xml, atom, rss feeds)
- Server supports querying via JSONP and JSON POST
- Multiple queries per request
- Access to queried server headers
- Allows for POSTing to web documents
- In memory caching for query results and web documents

Server quick start
------------------

Setup

$ npm install noodlejs

or

$ git clone [email protected]:dharmafly/noodle.git
$ cd noodle
$ npm install

Start the server by running the binary

$ bin/noodle-server
Noodle node server started
├ process title node-noodle
├ process pid 4739
└ server port 8888

You may specify a port number as an argument

$ bin/noodle-server 9090
Noodle node server started
├ process title node-noodle
├ process pid 4739
└ server port 9090

Noodle as a node module
-----------------------

If you are interested in the node module just run ```npm install noodlejs```,
require it and check out the [noodle api](https://noodle.dharmafly.com/reference/#Noodle-as-node-module)

```javascript
var noodle = require('noodlejs');

noodle.query({
url: 'https://github.com/explore',
selector: 'ol.ranked-repositories h3 a',
extract: 'href'
})
.then(function (results) {
console.log(results);
});
```

Tests
-----

The noodle tests create a temporary server on port `8889` which the automated
tests tell noodle to query against.

To run tests you can use the provided binary *from the noodle package
root directory*:

$ cd noodle
$ bin/tests

Contribute
----------

Contributors and suggestions welcomed.

- [https://noodle.dharmafly.com](https://noodle.dharmafly.com)
- [https://github.com/dharmafly/noodle](https://github.com/dharmafly/noodle)