Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/untitaker/hyperlink
Very fast link checker for CI.
https://github.com/untitaker/hyperlink
404 broken-anchors broken-link-finder ci fast link-checker link-checking linter linters rust validators
Last synced: about 7 hours ago
JSON representation
Very fast link checker for CI.
- Host: GitHub
- URL: https://github.com/untitaker/hyperlink
- Owner: untitaker
- License: mit
- Created: 2020-10-04T10:42:59.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-24T15:55:38.000Z (20 days ago)
- Last Synced: 2024-10-25T13:50:22.637Z (19 days ago)
- Topics: 404, broken-anchors, broken-link-finder, ci, fast, link-checker, link-checking, linter, linters, rust, validators
- Language: Rust
- Homepage:
- Size: 331 KB
- Stars: 175
- Watchers: 2
- Forks: 10
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hyperlink
A command-line tool to find broken links in your static site.
* **Fast.** [docs.sentry.io](https://github.com/getsentry/sentry-docs) produces
1.1 GB of HTML files. `hyperlink` handles this amount of data in 4 seconds on
a MacBook Pro 2018. See [Alternatives](#alternatives) for a performance comparison.* **Pay for what you need.** By default, `hyperlink` checks for hard 404s in
internal links only. Anything beyond that is opt-in. See [Options](#options)
for a list of features to enable.* **Maps back errors to source files.** If your static site was created from
Markdown files, `hyperlink` can try to find the original broken link by
fuzzy-matching the content around it. See the [`--sources` option](#options).* Supports traversing file-system paths only, no arbitrary URLs. Hyperlink does not know how to make network calls.
However, hyperlink does have tools to [extract external links](#external-links).
* Does not honor `robots.txt`. A broken link is still broken for users even if
not indexed by Google.* Does not parse CSS files, as broken links in CSS have not been a practical
concern for us. We are concerned about broken link in the page content, not
the chrome around it.* Only supports UTF-8 encoded HTML files.
## Installation and Usage
[Download the latest binary](https://github.com/untitaker/hyperlink/releases) and:
```bash
# Check a folder of HTML
./hyperlink public/# Also validate anchors
./hyperlink public/ --check-anchors# src/ is a folder of Markdown. Show original Markdown file paths in errors
./hyperlink public/ --sources src/
```### GitHub action
```yaml
- uses: untitaker/[email protected]
with:
args: public/ --sources src/
```### NPM
```bash
npm install -g @untitaker/hyperlink
hyperlink public/ --sources src/
```### Docker
```bash
docker run -v $PWD:/check ghcr.io/untitaker/hyperlink:0.1.43 /check/public/ --sources /check/src/# specific commit
docker run -v $PWD:/check ghcr.io/untitaker/hyperlink:sha-82ca78c /check/public/ --sources /check/src
```[See all available tags](https://github.com/untitaker/hyperlink/pkgs/container/hyperlink)
### From source
```bash
cargo install --locked hyperlink # latest stable release
cargo install --locked --git https://github.com/untitaker/hyperlink # latest git SHA
```## Options
When invoked without options, `hyperlink` only checks for 404s of internal
links. However, it can do more.* `-j/--jobs`: How many threads to spawn for parsing HTML. By default
`hyperlink` will attempt to saturate your CPU.* `--check-anchors`: Opt-in, check for validity of anchors on pages. Broken
anchors are considered warnings, meaning that `hyperlink` will `exit 2` if
there are *only* broken anchors but no hard 404s.* `--sources`: A folder of markdown files that were the input for the HTML
`hyperlink` has to check. This is used to provide better error messages that
point at the actual file to edit. `hyperlink` does very simple content-based
matching to figure out which markdown files may have been involved in the
creation of a HTML file.Why not just crawl and validate links in Markdown at this point? Answer:
* There are countless of proprietary extensions to markdown out there for
creating intra-page links that are generally not supported by link checking
tools.* The structure of your markdown content does not necessarily match the
structure of your HTML (i.e. what the user actually sees). With this setup,
`hyperlink` does not have to assume anything about your build pipeline.* `--github-actions`: Emit [GitHub actions
errors](https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#setting-an-error-message),
i.e. add error messages in-line to PR diffs. This is only useful with
`--sources` set.If you are using `hyperlink` through the GitHub action this option is already
set. It is only useful if you are downloading/building and running hyperlink
yourself in CI.## Exit codes
* `exit 1`: There have been errors (hard 404s)
* `exit 2`: There have been only warnings (broken anchors)## External links
Hyperlink does not know how to check external links, but it gives you some tools to extract them.
```
hyperlink dump-external-links build/
# http://example.com/myurl
# ...
```This allows you to plug in your own logic that fits the requirements for your
site (special handling for social networks, custom URI schemes, ...):```
# filter for HTTP URLs and turn off all link-checking for our social media
# handles, as twitter.com is unreliable and we already know those links are correct.hyperlink dump-external-links build/ | \
rg '^https?://' | \
rg -v '^https://twitter.com/untitaker' | \
xargs -P20 -I{} bash -c 'curl -ILf "{}" &> /dev/null || (echo "{}" && exit 1)'
```...and allows hyperlink to focus on its main job of traversing and parsing HTML.
## Alternatives
*(roughly ranked by performance, determined by some unserious benchmark. this
section contains partially dated measurements and is not continuously updated
with regards to either performance or featureset)*None of the listed alternatives have an equivalent to `hyperlink`'s `--sources`
and `--github-actions` feature.* [lychee](https://github.com/lycheeverse/lychee), like `hyperlink`, is a great
choice for obscenely large static sites. Additionally it can check
external/outbound links. An invocation of `lychee --offline public/` is more or
less equivalent to `hyperlink public/`.* [liche](https://github.com/raviqqe/liche) seems to be fairly fast, but is
unmaintained.* [htmltest](https://github.com/wjdp/htmltest) seems to be fairly fast as well,
and is more of a general-purpose HTML linting tool.* [muffet](https://github.com/raviqqe/muffet) seems to have similar performance
as `htmltest`. We tested `muffet` with
[`http-server`](https://www.npmjs.com/package/http-server) and webfsd without
noticing a change in timings.* [linkcheck](https://github.com/filiph/linkcheck) is faster than `linkchecker`
but still quite slow on large sites.We tried `linkcheck` together with
[`http-server`](https://www.npmjs.com/package/http-server) on localhost,
although that does not seem to be the bottleneck at all.* [wummel/linkchecker](https://wummel.github.io/linkchecker/) seems to be the
fairly feature-rich, but was a non-starter due to performance. This applies
to other countless link checkers we tried that are not mentioned here.## Testimonials
> We use Hyperlink to check for dead links on
> [Graphviz's static-site user documentation](https://graphviz.org/), because:
>
> * Hyperlink is *blazingly* fast, checking 700 HTML pages in 220ms (default) and
> 850ms (with `--check-anchors`).
> * Hyperlink's single-binary release, with no library dependencies,
> was trivial to integrate into our [continuous integration tests](https://gitlab.com/graphviz/graphviz.gitlab.io/-/blob/5dcfa637b7df17e3a1b821f3d7e9de8f5f82544b/.gitlab-ci.yml#L27).
> * High coverage: Hyperlink immediately spotted over a thousand broken page
> links within both `` tags and HTML redirects, and a further 62 broken
> anchor-links with `--check-anchors`.
> * Hyperlink's design decision to crawl only static files (avoiding HTTP),
> avoids test flakiness from network requests, allowing me to confidently
> block merging if Hyperlink reports an error.
>
> In conclusion, Hyperlink fills the "static site continuous testing" niche
> really nicely.-- Mark Hansen, Graphviz documentation maintainer
## License
Licensed under the MIT, see [`./LICENSE`](./LICENSE).