https://github.com/mc256/node-static-webpage-crawler

download entire website with its directory structure.
https://github.com/mc256/node-static-webpage-crawler

cache-server crawler nodejs static-site

Last synced: 2 months ago
JSON representation

download entire website with its directory structure.

Host: GitHub
URL: https://github.com/mc256/node-static-webpage-crawler
Owner: mc256
License: mit
Created: 2017-06-17T21:33:17.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2019-04-24T18:23:18.000Z (about 6 years ago)
Last Synced: 2025-02-25T16:11:30.796Z (3 months ago)
Topics: cache-server, crawler, nodejs, static-site
Language: JavaScript
Homepage:
Size: 16.6 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

# static-webpage-crawler

This is a very simple web crawler. You can use this to download your ENTIRE website. You may also combine this small program with Nginx to build a reverse proxy for you website.

**This Package requires Node.js 8**

## Install

```
npm i static-webpage-crawler
```

## Use it

```
node static-webpage-crawler --url=YOURDOMAIN --cache-dir=YOURDIRECTORY/
```

### Required Arguments

**-u, --url** Target URL.

**-c, --cache-dir** Directory for all the cache files.

### Options

**-t, --thread** Although Node.js is single thread, the web crawler is not.
(default is 5)

**-a, --customized-ua** A customized user agent in header field to identify this crawler.

**-i, --index-page** Default index page. (index.html)

**-e, --default-page-extension** Default page extension. Do not forget the PERIOD in front of it.
(.html)

**--use-http** Use deprecated HTTP insecure connection. (not recommanded)

You may also use it with Nginx

```
server {
...
root YOURDIRECTORY;

location / {
try_files /$host$request_uri
/$host$request_uri.html
/$host$request_uri/index.html
/$host$request_uri"index.html"
@pass_proxy;
}

location @pass_proxy {
proxy_pass https://backend;
...
}
...
}

```

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mc256/node-static-webpage-crawler

Awesome Lists containing this project

README