Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/strugee/fulldom-server
Proxy-like server that will show you the DOM of a page after JS runs
https://github.com/strugee/fulldom-server
daemon dom hacktoberfest nodejs scraping scraping-websites server
Last synced: 3 months ago
JSON representation
Proxy-like server that will show you the DOM of a page after JS runs
- Host: GitHub
- URL: https://github.com/strugee/fulldom-server
- Owner: strugee
- License: agpl-3.0
- Created: 2016-10-01T20:35:42.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-06-21T15:50:39.000Z (over 1 year ago)
- Last Synced: 2024-08-02T15:53:47.382Z (6 months ago)
- Topics: daemon, dom, hacktoberfest, nodejs, scraping, scraping-websites, server
- Language: JavaScript
- Homepage:
- Size: 105 KB
- Stars: 38
- Watchers: 4
- Forks: 6
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# fulldom-server
[![Build Status](https://travis-ci.org/strugee/fulldom-server.svg?branch=master)](https://travis-ci.org/strugee/fulldom-server)
[![Coverage Status](https://coveralls.io/repos/github/strugee/fulldom-server/badge.svg?branch=master)](https://coveralls.io/github/strugee/fulldom-server?branch=master)Proxy-like server that will show you the DOM of a page after JS runs
Especially useful when combined with [Huginn][1]'s `WebsiteAgent`. See [cantino/huginn#888][2]
## Installing
$ [sudo] npm install -g fulldom
## Running
Simple usage:
$ fulldom-server
This will give you a fulldom server running on port 8000, bound to `0.0.0.0`. You can override these defaults with CLI options:
$ fulldom-server -p 1337 -a localhost
This tells fulldom to bind to port 1337 on localhost only.
You can also do the same thing with environment variables, if that's your cup of tea:
$ FULLDOM_PORT=1337 FULLDOM_ADDRESS=localhost fulldom-server
And last but not least, you can configure fulldom with a JSON configuration file at `/etc/fulldom.json` (or whatever is specified with `--config` or `-c`):
$ cat /etc/fulldom.json
{
"port: 1337,
"address: "localhost"
}
$ fulldom-serverThe configuration keys are the same as the long-form CLI options (e.g. `--port` on the CLI corresponds to `port` in JSON).
See `fulldom-server --help` for details.
## Warning
fulldom relies on [PhantomJS][3] and thus has to spawn a new process for each and every incoming request.
Please keep this in mind if you plan to publicly deploy fulldom or use it in production.
## Usage
tl;dr: do an HTTP GET on `/?selector=` to get the serialized DOM of `` as soon as `` appears in the document.
fulldom exposes a single endpoint at `/` which, when sent an HTTP GET request, will give you back the serialized DOM of the page at `` when the page's JS has "finished running". As it is essentially impossible to determine when that happens you need to specify a CSS selector, the presence of which will be used as a heuristic for when the page is "loaded". For example, if you are trying to scrape an image gallery, but the gallery is filled in via JS, you might use `img` as your selector. In this case fulldom will load the gallery and wait until there is at least one match for the `img` selector, then serialize the gallery's DOM and return it to you in an HTTP response.
Note that both `` and `` should be percent-encoded - you need to be particularly careful to encode `/`s. ProTip™: `:` is `%3A`, and `/` is `%2F`.
## Author
AJ Jordan
## License
AGPL 3.0+
[1]: https://github.com/cantino/huginn
[2]: https://github.com/cantino/huginn/issues/888
[3]: http://phantomjs.org/