Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/k1LoW/utsusemi
A tool to generate a static website by crawling the original site.
https://github.com/k1LoW/utsusemi
api aws aws-lambda crawler s3-website serverless serverless-framework
Last synced: 3 months ago
JSON representation
A tool to generate a static website by crawling the original site.
- Host: GitHub
- URL: https://github.com/k1LoW/utsusemi
- Owner: k1LoW
- Created: 2017-05-26T01:27:25.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-11-28T02:21:19.000Z (almost 7 years ago)
- Last Synced: 2024-05-28T13:28:08.413Z (5 months ago)
- Topics: api, aws, aws-lambda, crawler, s3-website, serverless, serverless-framework
- Language: JavaScript
- Homepage:
- Size: 190 KB
- Stars: 30
- Watchers: 5
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# utsusemi [![Build Status](https://travis-ci.org/k1LoW/utsusemi.svg?branch=master)](https://travis-ci.org/k1LoW/utsusemi)
![logo](logo.png)
utsusemi = "[空蝉](http://ffxiclopedia.wikia.com/wiki/Utsusemi)"
A tool to generate a static website by crawling the original site.
## Using framework
- Serverless Framework :zap:
## How to deploy
### :octocat: STEP 1. Clone
```console
$ git clone https://github.com/k1LoW/utsusemi.git
$ cd utsusemi
$ npm install
```### :pencil: STEP 2. Set environment variables OR Edit config.yml
Set environment variables.
OR
Copy [`config.example.yml`](config.example.yml) to `config.yml`. And edit.
Environment / config.yml Document is [here](docs/env.md) :book: .
### :rocket: STEP 3. Deploy to AWS
```console
$ AWS_PROFILE=XXxxXXX npm run deploy
```And get endpoints URL and `UtsusemiWebsiteURL`
#### :bomb: Destroy utsusemi
Run following command.
```console
$ AWS_PROFILE=XXxxXXX npm run destroy
```## Usage
### Start crawling `/in?path={startPath}&depth={crawlDepth}`
Start crawling to targetHost.
```console
$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/in?path=/&depth=3
```And, access `UtsusemiWebsiteURL`.
#### `force` option
Disable cache
```console
$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/in?path=/&depth=3&force=1
```### Purge crawling queue `/purge`
Cancel crawling.
```console
$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/purge
```### Delete object of utsusemi content `/delete?prefix={objectPrefix}`
Delete S3 object.
```console
$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/delete?path=/
```### Show crawling queue status `/status`
```console
$ curl https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/status
```### Set N crawling action `POST /nin`
Start crawling to targetHost with N crawling action.
```console
$ curl -X POST -H "Content-Type: application/json" -d @nin-sample.json https://xxxxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/v0/nin
```## Architecture
![Architecture](architecture.png)
### Crawling rule
- HTML -> `depth = depth - 1`
- CSS -> The source request in the CSS does not consume `depth`.
- Other contents -> End ( `depth = 0` )
- 403, 404, 410 -> Delete S3 object