{"id":13580154,"url":"https://github.com/untitaker/hyperlink","last_synced_at":"2025-05-15T15:04:10.016Z","repository":{"id":37035579,"uuid":"301103252","full_name":"untitaker/hyperlink","owner":"untitaker","description":"Very fast link checker for CI.","archived":false,"fork":false,"pushed_at":"2025-04-10T14:46:19.000Z","size":345,"stargazers_count":191,"open_issues_count":5,"forks_count":12,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-10T10:52:00.061Z","etag":null,"topics":["404","broken-anchors","broken-link-finder","ci","fast","link-checker","link-checking","linter","linters","rust","validators"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/untitaker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-04T10:42:59.000Z","updated_at":"2025-04-24T13:52:13.000Z","dependencies_parsed_at":"2023-02-12T07:01:26.523Z","dependency_job_id":"c66a74f1-162b-4ff7-88fa-436430218aec","html_url":"https://github.com/untitaker/hyperlink","commit_stats":{"total_commits":351,"total_committers":7,"mean_commits":"50.142857142857146","dds":"0.27635327635327633","last_synced_commit":"06a65ffc7e4ec33040690022d9b94bcb4b64878b"},"previous_names":[],"tags_count":46,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/untitaker%2Fhyperlink","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/untitaker%2Fhyperlink/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/untitaker%2Fhyperlink/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/untitaker%2Fhyperlink/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/untitaker","download_url":"https://codeload.github.com/untitaker/hyperlink/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254292042,"owners_count":22046426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["404","broken-anchors","broken-link-finder","ci","fast","link-checker","link-checking","linter","linters","rust","validators"],"created_at":"2024-08-01T15:01:48.096Z","updated_at":"2025-05-15T15:04:09.966Z","avatar_url":"https://github.com/untitaker.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# hyperlink\n\nA command-line tool to find broken links in your static site.\n\n* **Fast.** [docs.sentry.io](https://github.com/getsentry/sentry-docs) produces\n  1.1 GB of HTML files. `hyperlink` handles this amount of data in 4 seconds on\n  a MacBook Pro 2018. See [Alternatives](#alternatives) for a performance comparison.\n\n* **Pay for what you need.** By default, `hyperlink` checks for hard 404s in\n  internal links only. Anything beyond that is opt-in. See [Options](#options)\n  for a list of features to enable.\n\n* **Maps back errors to source files.** If your static site was created from\n  Markdown files, `hyperlink` can try to find the original broken link by\n  fuzzy-matching the content around it. See the [`--sources` option](#options).\n\n* Supports traversing file-system paths only, no arbitrary URLs. Hyperlink does not know how to make network calls.\n\n  However, hyperlink does have tools to [extract external links](#external-links).\n\n* Does not honor `robots.txt`. A broken link is still broken for users even if\n  not indexed by Google.\n\n* Does not parse CSS files, as broken links in CSS have not been a practical\n  concern for us. We are concerned about broken link in the page content, not\n  the chrome around it.\n\n* Only supports UTF-8 encoded HTML files.\n\n## Installation and Usage\n\n[Download the latest binary](https://github.com/untitaker/hyperlink/releases) and:\n\n```bash\n# Check a folder of HTML\n./hyperlink public/\n\n# Also validate anchors\n./hyperlink public/ --check-anchors\n\n# src/ is a folder of Markdown. Show original Markdown file paths in errors\n./hyperlink public/ --sources src/\n```\n\n### GitHub action\n\n```yaml\n- uses: untitaker/hyperlink@0.1.44\n  with:\n    args: public/ --sources src/\n```\n\n### NPM\n\n```bash\nnpm install -g @untitaker/hyperlink\nhyperlink public/ --sources src/\n```\n\n### Docker\n\n```bash\ndocker run -v $PWD:/check ghcr.io/untitaker/hyperlink:0.1.44 /check/public/ --sources /check/src/\n\n# specific commit\ndocker run -v $PWD:/check ghcr.io/untitaker/hyperlink:sha-82ca78c /check/public/ --sources /check/src\n```\n\n[See all available tags](https://github.com/untitaker/hyperlink/pkgs/container/hyperlink)\n\n### From source\n\n```bash\ncargo install --locked hyperlink  # latest stable release\ncargo install --locked --git https://github.com/untitaker/hyperlink  # latest git SHA\n```\n\n## Options\n\nWhen invoked without options, `hyperlink` only checks for 404s of internal\nlinks. However, it can do more.\n\n* `-j/--jobs`: How many threads to spawn for parsing HTML. By default\n  `hyperlink` will attempt to saturate your CPU.\n\n* `--check-anchors`: Opt-in, check for validity of anchors on pages. Broken\n  anchors are considered warnings, meaning that `hyperlink` will `exit 2` if\n  there are *only* broken anchors but no hard 404s.\n\n* `--sources`: A folder of markdown files that were the input for the HTML\n  `hyperlink` has to check. This is used to provide better error messages that\n  point at the actual file to edit. `hyperlink` does very simple content-based\n  matching to figure out which markdown files may have been involved in the\n  creation of a HTML file.\n\n  Why not just crawl and validate links in Markdown at this point? Answer:\n\n  * There are countless of proprietary extensions to markdown out there for\n    creating intra-page links that are generally not supported by link checking\n    tools.\n\n  * The structure of your markdown content does not necessarily match the\n    structure of your HTML (i.e. what the user actually sees). With this setup,\n    `hyperlink` does not have to assume anything about your build pipeline.\n\n* `--github-actions`: Emit [GitHub actions\n  errors](https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#setting-an-error-message),\n  i.e. add error messages in-line to PR diffs. This is only useful with\n  `--sources` set.\n\n  If you are using `hyperlink` through the GitHub action this option is already\n  set. It is only useful if you are downloading/building and running hyperlink\n  yourself in CI.\n\n## Exit codes\n\n* `exit 1`: There have been errors (hard 404s)\n* `exit 2`: There have been only warnings (broken anchors)\n\n## External links\n\nHyperlink does know how to check external links, but it gives you some tools to\nextract them. Output is just the external URLs, separated by newline.\n\n```\nhyperlink dump-external-links build/\n```\n\nOutput:\n\n```\nhttp://example.com/myurl\n...\n```\n\nThis allows you to plug in your own logic that fits the requirements for your\nsite (special handling for social networks, custom URI schemes, ...):\n\n```\n# filter for HTTP URLs and turn off all link-checking for our social media\n# handles, as twitter.com is unreliable and we already know those links are correct.\n\nhyperlink dump-external-links build/ | \\\n  rg '^https?://' | \\\n  rg -v '^https://twitter.com/untitaker' | \\\n  xargs -P20 -I{} bash -c 'curl -ILf \"{}\" \u0026\u003e /dev/null || (echo \"{}\" \u0026\u0026 exit 1)'\n```\n\n...and allows hyperlink to focus on its main job of traversing and parsing HTML.\n\n## Alternatives\n\n*(roughly ranked by performance, determined by some unserious benchmark. this\nsection contains partially dated measurements and is not continuously updated\nwith regards to either performance or featureset)*\n\nNone of the listed alternatives have an equivalent to `hyperlink`'s `--sources`\nand `--github-actions` feature.\n\n* [lychee](https://github.com/lycheeverse/lychee), like `hyperlink`, is a great\n  choice for obscenely large static sites. Additionally it can check\n  external/outbound links. An invocation of `lychee --offline public/` is more or\n  less equivalent to `hyperlink public/`.\n\n* [liche](https://github.com/raviqqe/liche) seems to be fairly fast, but is\n  unmaintained.\n\n* [htmltest](https://github.com/wjdp/htmltest) seems to be fairly fast as well,\n  and is more of a general-purpose HTML linting tool.\n\n* [muffet](https://github.com/raviqqe/muffet) seems to have similar performance\n  as `htmltest`. We tested `muffet` with\n  [`http-server`](https://www.npmjs.com/package/http-server) and webfsd without\n  noticing a change in timings.\n\n* [linkcheck](https://github.com/filiph/linkcheck) is faster than `linkchecker`\n  but still quite slow on large sites.\n\n  We tried `linkcheck` together with\n  [`http-server`](https://www.npmjs.com/package/http-server) on localhost,\n  although that does not seem to be the bottleneck at all.\n\n* [wummel/linkchecker](https://wummel.github.io/linkchecker/) seems to be the\n  fairly feature-rich, but was a non-starter due to performance. This applies\n  to other countless link checkers we tried that are not mentioned here.\n\n## Testimonials\n\n\u003e We use Hyperlink to check for dead links on\n\u003e [Graphviz's static-site user documentation](https://graphviz.org/), because:\n\u003e \n\u003e * Hyperlink is *blazingly* fast, checking 700 HTML pages in 220ms (default) and\n\u003e   850ms (with `--check-anchors`).\n\u003e * Hyperlink's single-binary release, with no library dependencies,\n\u003e   was trivial to integrate into our [continuous integration tests](https://gitlab.com/graphviz/graphviz.gitlab.io/-/blob/5dcfa637b7df17e3a1b821f3d7e9de8f5f82544b/.gitlab-ci.yml#L27).\n\u003e * High coverage: Hyperlink immediately spotted over a thousand broken page\n\u003e   links within both `\u003ca\u003e` tags and HTML redirects, and a further 62 broken\n\u003e   anchor-links with `--check-anchors`.\n\u003e * Hyperlink's design decision to crawl only static files (avoiding HTTP),\n\u003e   avoids test flakiness from network requests, allowing me to confidently\n\u003e   block merging if Hyperlink reports an error.\n\u003e\n\u003e In conclusion, Hyperlink fills the \"static site continuous testing\" niche\n\u003e really nicely.\n\n-- Mark Hansen, Graphviz documentation maintainer\n\n## License\n\nLicensed under the MIT, see [`./LICENSE`](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funtitaker%2Fhyperlink","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funtitaker%2Fhyperlink","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funtitaker%2Fhyperlink/lists"}