Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wabarc/cairn
NPM package and CLI tool for saving web page as single HTML file
https://github.com/wabarc/cairn
archive base64 cli html html-files internet-archive javascript memento mhtml node nodejs npm-package obelisk single-file typescript wayback wayback-archiver webpage
Last synced: about 2 months ago
JSON representation
NPM package and CLI tool for saving web page as single HTML file
- Host: GitHub
- URL: https://github.com/wabarc/cairn
- Owner: wabarc
- License: mit
- Created: 2020-10-08T07:18:16.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-05-28T23:18:42.000Z (8 months ago)
- Last Synced: 2024-05-29T13:55:11.439Z (8 months ago)
- Topics: archive, base64, cli, html, html-files, internet-archive, javascript, memento, mhtml, node, nodejs, npm-package, obelisk, single-file, typescript, wayback, wayback-archiver, webpage
- Language: TypeScript
- Homepage:
- Size: 536 KB
- Stars: 37
- Watchers: 4
- Forks: 2
- Open Issues: 26
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome - wabarc/cairn - NPM package and CLI tool for saving web page as single HTML file (TypeScript)
README
# Cairn
```text
// ) )
// ___ ( ) __ __
// // ) ) / / // ) ) // ) )
// // / / / / // // / /
((____/ / ((___( ( / / // // / /```
Cairn is an npm package and CLI tool for saving the web page as a single HTML file,
it is TypeScript implementation of [Obelisk](https://github.com/go-shiori/obelisk).## Features
## Usage
### As CLI tool
```sh
npm install -g @wabarc/cairn
``````sh
$ cairn -hUsage: cairn [options] url1 [url2]...[urlN]
CLI tool for saving web page as single HTML file
Options:
-v, --version output the current version
-o, --output path to save archival result
-u, --user-agent set custom user agent
-p, --proxy [protocol://]host[:port] use this proxy
-t, --timeout maximum time (in second) request timeout
--no-js disable JavaScript
--no-css disable CSS styling
--no-embeds remove embedded elements (e.g iframe)
--no-medias remove media elements (e.g img, audio)
-h, --help display help for command
```### As npm package
```sh
npm install @wabarc/cairn
``````javascript
import { Cairn } from '@wabarc/cairn';
// const cairn = require('@wabarc/cairn');const cairn = new Cairn();
cairn
.request({ url: url })
.options({ userAgent: 'Cairn/2.0.0', proxy: 'socks5://127.0.0.1:1080' })
.archive()
.then((archived) => {
console.log(archived.url, archived.webpage.html());
})
.catch((err) => console.warn(`${url} => ${JSON.stringify(err)}`));
```#### Instance methods
##### cairn#request({ url: string }): this
##### cairn#options({}): this
- proxy?: string;
- userAgent?: string;
- disableJS?: boolean;
- disableCSS?: boolean;
- disableEmbeds?: boolean;
- disableMedias?: boolean;
- timeout?: number;##### cairn#archive(): Promise
##### cairn#Archived
- url: string;
- webpage: cheerio.Root;
- status: 200 | 400 | 401 | 403 | 404 | 500 | 502 | 503 | 504;
- contentType: 'text/html' | 'text/plain' | 'text/*';#### Request Params
##### request
```javascript
{
// `url` is archival target.
url: 'https://www.github.com'
}
```##### options
```javascript
{
proxy: 'socks5://127.0.0.1:1080',
userAgent: 'Cairn/2.0.0',disableJS: true,
disableCSS: false,
disableEmbeds: false,
disableMedias: true,timeout: 30
}
```#### Response Schema
for v1.x:
The `archive` method will return webpage body as string.
for v2.x:
```javascript
{
url: 'https://github.com/',
webpage: cheerio.Root,
status: 200,
contentType: 'text/html'
}
```## License
Cairn has been re-licensed under MIT since version 3.0.0. If you are using versions 2 and 1, you should note that it is licensed under GPL 3.0.
This software is released under the terms of the MIT. See the [LICENSE](https://github.com/wabarc/cairn/blob/main/LICENSE) file for details.