https://gitlab.com/pommalabs/htmlark
HtmlArk packs a webpage into a single HTML file: https://htmlark-docs.pommalabs.xyz/
https://gitlab.com/pommalabs/htmlark
audios css data embed fonts html images javascript uri videos
Last synced: 10 months ago
JSON representation
HtmlArk packs a webpage into a single HTML file: https://htmlark-docs.pommalabs.xyz/
- Host: gitlab.com
- URL: https://gitlab.com/pommalabs/htmlark
- Owner: pommalabs
- License: mit
- Created: 2020-08-23T13:52:46.255Z (almost 6 years ago)
- Default Branch: main
- Last Synced: 2025-08-01T14:52:11.733Z (11 months ago)
- Topics: audios, css, data, embed, fonts, html, images, javascript, uri, videos
- Stars: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# HtmlArk
[![License: MIT][project-license-badge]][project-license]
[![Donate][paypal-donations-badge]][paypal-donations]
[![Docs][docfx-docs-badge]][docfx-docs]
[![NuGet version][nuget-version-badge]][nuget-package]
[![NuGet downloads][nuget-downloads-badge]][nuget-package]
[![standard-readme compliant][github-standard-readme-badge]][github-standard-readme]
[![GitLab pipeline status][gitlab-pipeline-status-badge]][gitlab-pipelines]
[![Quality gate][sonar-quality-gate-badge]][sonar-website]
[![Code coverage][sonar-coverage-badge]][sonar-website]
[![Renovate enabled][renovate-badge]][renovate-website]
Embeds images, fonts, CSS and JavaScript into an HTML file.
Resources are embedded using [data URIs][mdn-data-uris].
This project is a .NET rewrite of the [homonymous Python project][github-htmlark],
from which the command line interface has been copied in order to ease interoperability.
Most disclaimers which were valid for the original library apply here too:
- :warning: **HtmlArk should be used with trusted HTML pages only or in a sandboxed environment.**
Untrusted HTML pages might contain resource links which are valid for HtmlArk
but they might pose a serious security risk to your organization.
- **HtmlArk works with static HTML pages only.**
If an image or other resource is loaded with JavaScript, HtmlArk won't even know it exists.
- **Most browsers support data URIs, but as usual IE support might be less than ideal.**
Check data URIs compatibility on [Can I use][caniuse-data-uris].
HtmlArk can be used to "pack" web pages into single HTML files.
However, HtmlArk is not a crawler, so it must be paired with one in order to pack entire websites.
:bulb: If you plan to serve packed web pages, please remember to turn on GZIP compression.
It usually yields good results and it helps to reduce download size.
## Table of Contents
- [Install](#install)
- [Usage](#usage)
- [Library](#library)
- [Tool](#tool)
- [Maintainers](#maintainers)
- [Contributing](#contributing)
- [Editing](#editing)
- [Restoring dependencies](#restoring-dependencies)
- [Running tests](#running-tests)
- [License](#license)
## Install
NuGet package [PommaLabs.HtmlArk][nuget-package] is available for download:
```bash
dotnet add package PommaLabs.HtmlArk
```
[HtmlArk .NET tool][nuget-package-tool] can be installed with following command:
```bash
dotnet tool install PommaLabs.HtmlArk.Tool
```
## Usage
### Library
As a library, HtmlArk can be included with the following `using` statement in your class:
```cs
using PommaLabs.HtmlArk;
```
And then, it can be used like this, for example:
```cs
IHtmlArchiver htmlArchiver = new HtmlArchiver(NullLogger.Instance);
string archivedHtml = await htmlArchiver.ArchiveAsync(new Uri("https://www.example.com/"));
```
If you use dependency injection, it can be registered this way:
```cs
services.AddHtmlArchiver(); // Maps IHtmlArchiver to HtmlArchiver as singleton.
```
### Tool
HtmlArk .NET tool accepts the following command line arguments:
```txt
-M, --http-client-max-resource-size How many bytes can be downloaded for each resource.
-T, --http-client-timeout Timeout of the internal HTTP client.
-A, --ignore-audios Ignores audios during archival.
-C, --ignore-css Ignores style sheets during archival.
-E, --ignore-errors Ignores unreadable resources.
-I, --ignore-images Ignores images during archival.
-J, --ignore-js Ignores external JavaScript during archival.
-V, --ignore-videos Ignores videos during archival.
-m, --minify Minifies output HTML.
-o, --output Output file path. If not specified, output will be written to STDOUT.
-v, --verbose Prints detailed information during HTML archival.
--help Display this help screen.
--version Display version information.
input (pos. 0) Required. Input URI or file path.
```
Interface is modeled after the [original Python project][github-htmlark],
so it should be pretty easy to switch between them.
## Maintainers
[@pomma89][gitlab-pomma89].
## Contributing
MRs accepted.
Small note: If editing the README, please conform to the [standard-readme][github-standard-readme] specification.
### Editing
[Visual Studio Code][vscode-website], with [Remote Containers extension][vscode-remote-containers],
is the recommended way to work on this project.
A development container has been configured with all required tools.
[Visual Studio Community][vs-website] is also supported
and an updated solution file, `htmlark.sln`, has been provided.
### Restoring dependencies
When opening the development container, dependencies should be automatically restored.
Anyway, dependencies can be restored with following command:
```bash
dotnet restore
```
### Running tests
Tests can be run with following command:
```bash
dotnet test
```
Tests can also be run with following command, which collects coverage information:
```bash
./build.sh --target run-tests
```
## License
MIT © 2020-2024 [PommaLabs Team and Contributors][pommalabs-website]
[caniuse-data-uris]: https://caniuse.com/datauri
[docfx-docs]: https://htmlark-docs.pommalabs.xyz/
[docfx-docs-badge]: https://img.shields.io/badge/DocFX-OK-green?style=flat-square
[github-htmlark]: https://github.com/BitLooter/htmlark
[github-standard-readme]: https://github.com/RichardLitt/standard-readme
[github-standard-readme-badge]: https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square
[gitlab-pipeline-status-badge]: https://gitlab.com/pommalabs/htmlark/badges/main/pipeline.svg?style=flat-square
[gitlab-pipelines]: https://gitlab.com/pommalabs/htmlark/pipelines
[gitlab-pomma89]: https://gitlab.com/pomma89
[mdn-data-uris]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs
[nuget-downloads-badge]: https://img.shields.io/nuget/dt/PommaLabs.HtmlArk?style=flat-square
[nuget-package]: https://www.nuget.org/packages/PommaLabs.HtmlArk/
[nuget-package-tool]: https://www.nuget.org/packages/PommaLabs.HtmlArk.Tool/
[nuget-version-badge]: https://img.shields.io/nuget/v/PommaLabs.HtmlArk?style=flat-square
[paypal-donations]: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=ELJWKEYS9QGKA
[paypal-donations-badge]: https://img.shields.io/badge/Donate-PayPal-important.svg?style=flat-square
[pommalabs-website]: https://pommalabs.xyz/
[project-license]: https://gitlab.com/pommalabs/htmlark/-/blob/main/LICENSE
[project-license-badge]: https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square
[renovate-badge]: https://img.shields.io/badge/renovate-enabled-brightgreen.svg?style=flat-square
[renovate-website]: https://renovate.whitesourcesoftware.com/
[sonar-coverage-badge]: https://img.shields.io/sonar/coverage/pommalabs_htmlark?server=https%3A%2F%2Fsonarcloud.io&sonarVersion=8&style=flat-square
[sonar-quality-gate-badge]: https://img.shields.io/sonar/quality_gate/pommalabs_htmlark?server=https%3A%2F%2Fsonarcloud.io&sonarVersion=8&style=flat-square
[sonar-website]: https://sonarcloud.io/dashboard?id=pommalabs_htmlark
[vs-website]: https://visualstudio.microsoft.com/
[vscode-remote-containers]: https://code.visualstudio.com/docs/remote/containers
[vscode-website]: https://code.visualstudio.com/