An open API service indexing awesome lists of open source software.

https://gitlab.com/pommalabs/htmlark

HtmlArk packs a webpage into a single HTML file: https://htmlark-docs.pommalabs.xyz/
https://gitlab.com/pommalabs/htmlark

audios css data embed fonts html images javascript uri videos

Last synced: 10 months ago
JSON representation

HtmlArk packs a webpage into a single HTML file: https://htmlark-docs.pommalabs.xyz/

Awesome Lists containing this project

README

          

# HtmlArk

[![License: MIT][project-license-badge]][project-license]
[![Donate][paypal-donations-badge]][paypal-donations]
[![Docs][docfx-docs-badge]][docfx-docs]
[![NuGet version][nuget-version-badge]][nuget-package]
[![NuGet downloads][nuget-downloads-badge]][nuget-package]

[![standard-readme compliant][github-standard-readme-badge]][github-standard-readme]
[![GitLab pipeline status][gitlab-pipeline-status-badge]][gitlab-pipelines]
[![Quality gate][sonar-quality-gate-badge]][sonar-website]
[![Code coverage][sonar-coverage-badge]][sonar-website]
[![Renovate enabled][renovate-badge]][renovate-website]

Embeds images, fonts, CSS and JavaScript into an HTML file.
Resources are embedded using [data URIs][mdn-data-uris].

This project is a .NET rewrite of the [homonymous Python project][github-htmlark],
from which the command line interface has been copied in order to ease interoperability.

Most disclaimers which were valid for the original library apply here too:

- :warning: **HtmlArk should be used with trusted HTML pages only or in a sandboxed environment.**
Untrusted HTML pages might contain resource links which are valid for HtmlArk
but they might pose a serious security risk to your organization.
- **HtmlArk works with static HTML pages only.**
If an image or other resource is loaded with JavaScript, HtmlArk won't even know it exists.
- **Most browsers support data URIs, but as usual IE support might be less than ideal.**
Check data URIs compatibility on [Can I use][caniuse-data-uris].

HtmlArk can be used to "pack" web pages into single HTML files.
However, HtmlArk is not a crawler, so it must be paired with one in order to pack entire websites.

:bulb: If you plan to serve packed web pages, please remember to turn on GZIP compression.
It usually yields good results and it helps to reduce download size.

## Table of Contents

- [Install](#install)
- [Usage](#usage)
- [Library](#library)
- [Tool](#tool)
- [Maintainers](#maintainers)
- [Contributing](#contributing)
- [Editing](#editing)
- [Restoring dependencies](#restoring-dependencies)
- [Running tests](#running-tests)
- [License](#license)

## Install

NuGet package [PommaLabs.HtmlArk][nuget-package] is available for download:

```bash
dotnet add package PommaLabs.HtmlArk
```

[HtmlArk .NET tool][nuget-package-tool] can be installed with following command:

```bash
dotnet tool install PommaLabs.HtmlArk.Tool
```

## Usage

### Library

As a library, HtmlArk can be included with the following `using` statement in your class:

```cs
using PommaLabs.HtmlArk;
```

And then, it can be used like this, for example:

```cs
IHtmlArchiver htmlArchiver = new HtmlArchiver(NullLogger.Instance);
string archivedHtml = await htmlArchiver.ArchiveAsync(new Uri("https://www.example.com/"));
```

If you use dependency injection, it can be registered this way:

```cs
services.AddHtmlArchiver(); // Maps IHtmlArchiver to HtmlArchiver as singleton.
```

### Tool

HtmlArk .NET tool accepts the following command line arguments:

```txt
-M, --http-client-max-resource-size How many bytes can be downloaded for each resource.

-T, --http-client-timeout Timeout of the internal HTTP client.

-A, --ignore-audios Ignores audios during archival.

-C, --ignore-css Ignores style sheets during archival.

-E, --ignore-errors Ignores unreadable resources.

-I, --ignore-images Ignores images during archival.

-J, --ignore-js Ignores external JavaScript during archival.

-V, --ignore-videos Ignores videos during archival.

-m, --minify Minifies output HTML.

-o, --output Output file path. If not specified, output will be written to STDOUT.

-v, --verbose Prints detailed information during HTML archival.

--help Display this help screen.

--version Display version information.

input (pos. 0) Required. Input URI or file path.
```

Interface is modeled after the [original Python project][github-htmlark],
so it should be pretty easy to switch between them.

## Maintainers

[@pomma89][gitlab-pomma89].

## Contributing

MRs accepted.

Small note: If editing the README, please conform to the [standard-readme][github-standard-readme] specification.

### Editing

[Visual Studio Code][vscode-website], with [Remote Containers extension][vscode-remote-containers],
is the recommended way to work on this project.

A development container has been configured with all required tools.

[Visual Studio Community][vs-website] is also supported
and an updated solution file, `htmlark.sln`, has been provided.

### Restoring dependencies

When opening the development container, dependencies should be automatically restored.

Anyway, dependencies can be restored with following command:

```bash
dotnet restore
```

### Running tests

Tests can be run with following command:

```bash
dotnet test
```

Tests can also be run with following command, which collects coverage information:

```bash
./build.sh --target run-tests
```

## License

MIT © 2020-2024 [PommaLabs Team and Contributors][pommalabs-website]

[caniuse-data-uris]: https://caniuse.com/datauri
[docfx-docs]: https://htmlark-docs.pommalabs.xyz/
[docfx-docs-badge]: https://img.shields.io/badge/DocFX-OK-green?style=flat-square
[github-htmlark]: https://github.com/BitLooter/htmlark
[github-standard-readme]: https://github.com/RichardLitt/standard-readme
[github-standard-readme-badge]: https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square
[gitlab-pipeline-status-badge]: https://gitlab.com/pommalabs/htmlark/badges/main/pipeline.svg?style=flat-square
[gitlab-pipelines]: https://gitlab.com/pommalabs/htmlark/pipelines
[gitlab-pomma89]: https://gitlab.com/pomma89
[mdn-data-uris]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs
[nuget-downloads-badge]: https://img.shields.io/nuget/dt/PommaLabs.HtmlArk?style=flat-square
[nuget-package]: https://www.nuget.org/packages/PommaLabs.HtmlArk/
[nuget-package-tool]: https://www.nuget.org/packages/PommaLabs.HtmlArk.Tool/
[nuget-version-badge]: https://img.shields.io/nuget/v/PommaLabs.HtmlArk?style=flat-square
[paypal-donations]: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=ELJWKEYS9QGKA
[paypal-donations-badge]: https://img.shields.io/badge/Donate-PayPal-important.svg?style=flat-square
[pommalabs-website]: https://pommalabs.xyz/
[project-license]: https://gitlab.com/pommalabs/htmlark/-/blob/main/LICENSE
[project-license-badge]: https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square
[renovate-badge]: https://img.shields.io/badge/renovate-enabled-brightgreen.svg?style=flat-square
[renovate-website]: https://renovate.whitesourcesoftware.com/
[sonar-coverage-badge]: https://img.shields.io/sonar/coverage/pommalabs_htmlark?server=https%3A%2F%2Fsonarcloud.io&sonarVersion=8&style=flat-square
[sonar-quality-gate-badge]: https://img.shields.io/sonar/quality_gate/pommalabs_htmlark?server=https%3A%2F%2Fsonarcloud.io&sonarVersion=8&style=flat-square
[sonar-website]: https://sonarcloud.io/dashboard?id=pommalabs_htmlark
[vs-website]: https://visualstudio.microsoft.com/
[vscode-remote-containers]: https://code.visualstudio.com/docs/remote/containers
[vscode-website]: https://code.visualstudio.com/