https://github.com/zackiles/deep-scrape
Scrape pages with node/io.js and get a whole lot of meta data. Shows; headers, Ajax requests/responses, rendered html, Javascript AST's, dependencies, console events, and a whole lot more.
https://github.com/zackiles/deep-scrape
Last synced: about 1 year ago
JSON representation
Scrape pages with node/io.js and get a whole lot of meta data. Shows; headers, Ajax requests/responses, rendered html, Javascript AST's, dependencies, console events, and a whole lot more.
- Host: GitHub
- URL: https://github.com/zackiles/deep-scrape
- Owner: zackiles
- Created: 2015-04-11T15:17:26.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2016-05-06T14:37:21.000Z (about 10 years ago)
- Last Synced: 2025-04-03T11:53:57.598Z (about 1 year ago)
- Language: JavaScript
- Size: 42 KB
- Stars: 7
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Deep Scrape
[](http://badge.fury.io/js/deep-scrape)
Scrape and crawl pages with io.js and get a whole lot of meta data. Shows; headers, Ajax requests/responses, rendered html, Javascript AST's, dependencies, console events, and a whole lot more. Crawl sites, or scrape a single page. Add cookies or proxy requests. Fingerprints common javascript libraries, and allows you to write your own.
## Installation
This was tested on node 0.12.x. It can be run as a module export, or a command line script.
```sh
npm install deep-scrape
// or clone the repository and run it as a script.
```
## Use Case
- You are scraping websites with lots of javascript (Angular, Ember, Browserfy).
- You don't mind trading a bit of performance for more detailed scraping data.
- You would like to find potential DOM sinks and sources on your pages (Possibly for vulnerability scanning).
- You need the most detailed metadata, metrics, and analyitics on your scraped pages.
- You would like to fingerprint possible technologies a certain site or page uses.