Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wabarc/cairn

NPM package and CLI tool for saving web page as single HTML file
https://github.com/wabarc/cairn

archive base64 cli html html-files internet-archive javascript memento mhtml node nodejs npm-package obelisk single-file typescript wayback wayback-archiver webpage

Last synced: about 2 months ago
JSON representation

NPM package and CLI tool for saving web page as single HTML file

Awesome Lists containing this project

README

        

# Cairn

```text

// ) )
// ___ ( ) __ __
// // ) ) / / // ) ) // ) )
// // / / / / // // / /
((____/ / ((___( ( / / // // / /

```

Cairn is an npm package and CLI tool for saving the web page as a single HTML file,
it is TypeScript implementation of [Obelisk](https://github.com/go-shiori/obelisk).

## Features

## Usage

### As CLI tool

```sh
npm install -g @wabarc/cairn
```

```sh
$ cairn -h

Usage: cairn [options] url1 [url2]...[urlN]

CLI tool for saving web page as single HTML file

Options:
-v, --version output the current version
-o, --output path to save archival result
-u, --user-agent set custom user agent
-p, --proxy [protocol://]host[:port] use this proxy
-t, --timeout maximum time (in second) request timeout
--no-js disable JavaScript
--no-css disable CSS styling
--no-embeds remove embedded elements (e.g iframe)
--no-medias remove media elements (e.g img, audio)
-h, --help display help for command
```

### As npm package

```sh
npm install @wabarc/cairn
```

```javascript
import { Cairn } from '@wabarc/cairn';
// const cairn = require('@wabarc/cairn');

const cairn = new Cairn();

cairn
.request({ url: url })
.options({ userAgent: 'Cairn/2.0.0', proxy: 'socks5://127.0.0.1:1080' })
.archive()
.then((archived) => {
console.log(archived.url, archived.webpage.html());
})
.catch((err) => console.warn(`${url} => ${JSON.stringify(err)}`));
```

#### Instance methods

##### cairn#request({ url: string }): this
##### cairn#options({}): this
- proxy?: string;
- userAgent?: string;
- disableJS?: boolean;
- disableCSS?: boolean;
- disableEmbeds?: boolean;
- disableMedias?: boolean;
- timeout?: number;

##### cairn#archive(): Promise
##### cairn#Archived
- url: string;
- webpage: cheerio.Root;
- status: 200 | 400 | 401 | 403 | 404 | 500 | 502 | 503 | 504;
- contentType: 'text/html' | 'text/plain' | 'text/*';

#### Request Params

##### request

```javascript
{
// `url` is archival target.
url: 'https://www.github.com'
}
```

##### options

```javascript
{
proxy: 'socks5://127.0.0.1:1080',
userAgent: 'Cairn/2.0.0',

disableJS: true,
disableCSS: false,
disableEmbeds: false,
disableMedias: true,

timeout: 30
}
```

#### Response Schema

for v1.x:

The `archive` method will return webpage body as string.

for v2.x:

```javascript
{
url: 'https://github.com/',
webpage: cheerio.Root,
status: 200,
contentType: 'text/html'
}
```

## License

Cairn has been re-licensed under MIT since version 3.0.0. If you are using versions 2 and 1, you should note that it is licensed under GPL 3.0.

This software is released under the terms of the MIT. See the [LICENSE](https://github.com/wabarc/cairn/blob/main/LICENSE) file for details.