https://github.com/dial-once/node-dom-extractor
A node package to extract DOM from a remote HTML page
https://github.com/dial-once/node-dom-extractor
dom-extractor extract extract-dom javascript nodejs selector
Last synced: over 1 year ago
JSON representation
A node package to extract DOM from a remote HTML page
- Host: GitHub
- URL: https://github.com/dial-once/node-dom-extractor
- Owner: dial-once
- License: mit
- Created: 2014-12-03T10:55:47.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2018-11-30T08:42:13.000Z (over 7 years ago)
- Last Synced: 2025-03-21T14:17:57.772Z (over 1 year ago)
- Topics: dom-extractor, extract, extract-dom, javascript, nodejs, selector
- Language: JavaScript
- Size: 50.8 KB
- Stars: 7
- Watchers: 9
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
node-dom-extractor
==================
[](https://gitlab.com/dial-once/node-dom-extractor/commits/master)
[](http://sonar.dialonce.net/dashboard?id=node-dom-extractor)
[](http://sonar.dialonce.net/dashboard?id=node-dom-extractor)
[](http://sonar.dialonce.net/dashboard?id=node-dom-extractor)
[](http://proxy.dialonce.net/sonar/api/badges/measure?key=node-dom-extractor&metric=coverage)
[](http://sonar.dialonce.net/dashboard?id=node-dom-extractor)
[](http://sonar.dialonce.net/dashboard?id=node-dom-extractor)
A node package used to extract a DOM element from a remote page or a string, using selectors. Based on jsdom for fetching and parsing, and juice for inlining css.
### Install
npm install dom-extractor
### Extract DOM from a remote URL
```js
var extractor = require('dom-extractor');
extractor.fetch("http://github.com/", "div.header", function(data){
//data contains the extracted HTML with css inlined, here the github header
});
```
### Extract DOM from a string
```js
var extractor = require('dom-extractor');
extractor.fetch("
HelloWorld!", ".a", function(data){
//should contains the div with class a
});
```
#### Note about # selector
When you use # as a selector, the browser do not send the data since it is a keyword for anchoring things in page, browser side.
To use it anyway, use |sharp| as the selector.
#### Using options
You can use options as the second parameter. List of current options are:
```js
{
selector: String, //set the selector for extraction default is body
innerText: Boolean, //get text only from extraction, no html or css default is false
inlineCss: Boolean //Put style in style attributes of extracted dom default is true
}
```
Example, using div.header selector and getting text only from result:
```js
var extractor = require('dom-extractor');
extractor.fetch("http://github.com/", { selector: "div.header", innerText: true }, function(data){
//data contains the extracted HTML with css inlined, here the github header
});
```
### Use it as a middleware (Connect)
```js
app.use('/proxy', extractor.middleware());
```
### Running tests
```
npm install
npm test
```