Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mc256/node-static-webpage-crawler
download entire website with its directory structure.
https://github.com/mc256/node-static-webpage-crawler
cache-server crawler nodejs static-site
Last synced: 2 months ago
JSON representation
download entire website with its directory structure.
- Host: GitHub
- URL: https://github.com/mc256/node-static-webpage-crawler
- Owner: mc256
- License: mit
- Created: 2017-06-17T21:33:17.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-04-24T18:23:18.000Z (over 5 years ago)
- Last Synced: 2024-11-08T23:55:13.514Z (3 months ago)
- Topics: cache-server, crawler, nodejs, static-site
- Language: JavaScript
- Homepage:
- Size: 16.6 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# static-webpage-crawler
This is a very simple web crawler. You can use this to download your ENTIRE website. You may also combine this small program with Nginx to build a reverse proxy for you website.
**This Package requires Node.js 8**
## Install
```
npm i static-webpage-crawler
```## Use it
```
node static-webpage-crawler --url=YOURDOMAIN --cache-dir=YOURDIRECTORY/
```### Required Arguments
**-u, --url** Target URL.
**-c, --cache-dir** Directory for all the cache files.
### Options
**-t, --thread** Although Node.js is single thread, the web crawler is not.
(default is 5)**-a, --customized-ua** A customized user agent in header field to identify this crawler.
**-i, --index-page** Default index page. (index.html)
**-e, --default-page-extension** Default page extension. Do not forget the PERIOD in front of it.
(.html)**--use-http** Use deprecated HTTP insecure connection. (not recommanded)
You may also use it with Nginx
```
server {
...
root YOURDIRECTORY;location / {
try_files /$host$request_uri
/$host$request_uri.html
/$host$request_uri/index.html
/$host$request_uri"index.html"
@pass_proxy;
}location @pass_proxy {
proxy_pass https://backend;
...
}
...
}```
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details