{"id":13707165,"url":"https://github.com/aim42/htmlSanityCheck","last_synced_at":"2025-05-06T00:30:35.286Z","repository":{"id":16820701,"uuid":"19579904","full_name":"aim42/htmlSanityCheck","owner":"aim42","description":"Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like. ","archived":false,"fork":false,"pushed_at":"2025-04-17T20:18:43.000Z","size":12851,"stargazers_count":73,"open_issues_count":72,"forks_count":50,"subscribers_count":6,"default_branch":"develop","last_synced_at":"2025-04-17T20:28:38.640Z","etag":null,"topics":["arc42","doctoolchain","gradle","gradle-plugin","groovy","hacktoberfest","hacktoberfest2024","html","sonarcloud","sonarqube","test"],"latest_commit_sha":null,"homepage":"","language":"Groovy","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aim42.png","metadata":{"files":{"readme":"README.adoc","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2014-05-08T16:06:30.000Z","updated_at":"2025-04-17T20:13:20.000Z","dependencies_parsed_at":"2023-01-11T20:25:28.430Z","dependency_job_id":"66291718-01f6-431f-9cc0-0d198ee945b1","html_url":"https://github.com/aim42/htmlSanityCheck","commit_stats":null,"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim42%2FhtmlSanityCheck","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim42%2FhtmlSanityCheck/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim42%2FhtmlSanityCheck/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aim42%2FhtmlSanityCheck/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aim42","download_url":"https://codeload.github.com/aim42/htmlSanityCheck/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252598161,"owners_count":21774212,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arc42","doctoolchain","gradle","gradle-plugin","groovy","hacktoberfest","hacktoberfest2024","html","sonarcloud","sonarqube","test"],"created_at":"2024-08-02T22:01:23.418Z","updated_at":"2025-05-06T00:30:35.277Z","avatar_url":"https://github.com/aim42.png","language":"Groovy","funding_links":[],"categories":["General Architecture Topics","HTML"],"sub_categories":["Modeling and Documentation"],"readme":"= image:htmlsanitycheck-logo.png[HSC] Html Sanity Check (HSC)\n:doctype: book\ninclude::asciidoctor-config.ad[]\n\nifndef::xrefToCli[:xrefToCli: htmlSanityCheck-cli/README.adoc]\nifndef::xrefToGradlePlugin[:xrefToGradlePlugin: htmlSanityCheck-gradle-plugin/README.adoc]\nifndef::xrefToMavenPlugin[:xrefToMavenPlugin: htmlSanityCheck-maven-plugin/README.adoc]\n\nifdef::env-github[]\nTIP: Use the https://hsc.aim42.org/manual/10_manual.html[HSC Site] for a nicely rendered version of this manual.\nendif::env-github[]\n\n[.lead]\n====\nhttps://hsc.aim42.org[HTML Sanity Check] (HSC) provides some basic sanity checking on HTML files.\n\nIt can be helpful in case of HTML generated from, e.g., {asciidoc-url}[Asciidoctor],\nMarkdown or other formats -- as converters usually don't check for missing images\nor broken links.\n====\n\nimage:https://img.shields.io/badge/License-Apache%202.0-blue.svg[link=\"https://www.apache.org/licenses/LICENSE-2.0\"]\nimage:https://img.shields.io/badge/License-ccsa4-green.svg[link=\"https://creativecommons.org/licenses/by-sa/4.0/\"]\nimage:https://github.com/aim42/htmlSanityCheck/actions/workflows/gradle-build.yml/badge.svg[link=https://github.com/aim42/htmlSanityCheck/actions]\nimage:https://sonarcloud.io/api/project_badges/measure?project=aim42_htmlSanityCheck\u0026metric=alert_status[alt='Quality Gate Status',link=https://sonarcloud.io/project/overview?id=aim42_htmlSanityCheck]\nimage:https://jitpack.io/v/org.aim42.htmlSanityCheck/htmlSanityCheck.svg[alt='JitPack Build',link=https://jitpack.io/#org.aim42/htmlSanityCheck]\nimage:https://img.shields.io/maven-central/v/org.aim42.htmlSanityCheck/org.aim42.htmlSanityCheck.gradle.plugin[link=https://central.sonatype.com/search?q=org.aim42.htmlSanityCheck]\nimage:https://img.shields.io/github/issues/aim42/htmlSanityCheck.svg[link=\"https://github.com/aim42/htmlSanityCheck/issues\"]\n\n[[sec:usage,usage]]\n== Usage\n\nHSC can be currently used\n\n* As a xref:{xrefToGradlePlugin}#sec:usage[Gradle plugin], or\n* As a xref:{xrefToCli}#sec:usage[Command Line Interface tool] (CLI), or\n* As a xref:{xrefToMavenPlugin}#sec:usage[Maven Plugin], or\n* Programmatically from Java or other JVM languages (TBD).\n\n[[sec:installation]]\n== Installation\n\nDepending on your \u003c\u003csec:usage\u003e\u003e you have to\n\n* Install the xref:{xrefToGradlePlugin}#sec:installation[Gradle Plugin], or\n* Install the xref:{xrefToCli}#sec:installation[Command Line Interface tool] (CLI), or\n* Install the xref:{xrefToMavenPlugin}#sec:installation[Maven Plugin], or\n* Install the core library for programmatic use (TBD).\n\n[[sec:examples]]\n== Examples\n\nDepending on your \u003c\u003csec:usage\u003e\u003e find respective\n\n* xref:{xrefToGradlePlugin}#sec:examples[Gradle Plugin] examples, and\n* Core library examples (TBD).\n\n== Typical Output\n\n[cols=\"1,1\",width=\"50%\"]\n|===\n| The overall goal is to create neat and clear reports,\nshowing eventual errors within HTML files — as shown in the adjoining figure.\n| image:sample-hsc-report.jpg[width=\"200\",link=\"{imagesdir}/sample-hsc-report.jpg\"\n(click on thumbnail for details)]\n|===\n\n== Types of Sanity Checks\n\n=== Broken Cross References (aka Broken Internal Links)\n\nFind all '\u003ca href=\"XYZ\"\u003e' where XYZ is not defined.\n\n.src/broken.html\n[source,html]\n----\n\u003ca href=\"#missing\"\u003einternal anchor\u003c/a\u003e\n...\n\u003ch2 id=\"missinG\"\u003eBookmark-Header\u003c/h2\u003e\n----\n\nIn this example, the bookmark is _misspelled_.\n\nUse checkerClass _BrokenCrossReferencesChecker_.\n\n=== Missing Images Files\n\nImages, referenced in `\u003cimg src=\"XYZ\"...` tags, refer to external files.\nThe plugin checks the existence of these files.\n\nUse checkerClass _MissingImageFilesChecker_.\n\n=== Multiple Definitions of Bookmarks or ID's\n\nIf any is defined more than once, any anchor linking to it will be confused.\n\nUse checkerClass _DuplicateIdChecker_.\n\n=== Missing Local Resources\n\nAll files, (e.g., downloads) referenced from HTML.\n\nUse checkerClass _MissingLocalResourcesChecker_.\n\n=== Missing Alt-tags in Images\n\nImage-tags should contain an alt-attribute that the browser displays when the original image file cannot be found or cannot be rendered.\nHaving alt-attributes is a good and defensive style.\n\nUse checkerClass _MissingAltInImageTagsChecker_.\n\n=== Broken HTTP Links\n\nThe current version (derived from branch 1.0.0-RC-2) contains a simple implementation that identifies errors (status \u003e400) and warnings (status `1xx` or `3xx`).\n\nStatusCodes are configurable ranges (as some people might want some content behind paywalls NOT to result in errors...)\n\nLocalhost or numerical IP addresses are currently NOT marked as suspicious.\n\nPlease comment in case you have additional requirements.\n\nUse checkerClass _BrokenHttpLinksChecker_.\n\n=== Other types of external links\n\n`ftp`, `ntp` or other protocols are currently not checked, but should...\n\n== Technical Documentation\n\nIn addition to checking HTML, this project serves as an example for https://arc42.de[arc42].\n\nPlease see our https://hsc.aim42.org/arc42/About-This-Docu.html[software architecture documentation].\n\n== Fundamentals\n\nThis tiny piece rests on incredible groundwork:\n\n* https://jsoup.org[Jsoup HTML parser] and analysis toolkit — robust and easy-to-use.\n\n* IntelliJ IDEA — my (Gernot) best (programming) friend.\n\n* Of course, Groovy, Gradle, JUnit and Spock framework.\n\n== Ideas and Origin\n\n* The plugin heavily relies on code provided by {gradle-url}[Gradle].\n\n* Inspiration on code organization, implementation and testing of the plugin came from the {asciidoctor-gradle-plugin-url}[Asciidoctor-Gradle-Plugin] by https://github.com/aalmiray[Andres Almiray].\n\n* Code for string similarity calculation by\nhttps://github.com/rrice/java-string-similarity[Ralph Rice].\n\n* Implementation, maintenance and documentation by\n** Initially: {gernotstarke}[Gernot Starke],\n** Currently: {gerdaschemann}[Gerd Aschemann] and several other contributors.\n\n== Similar Projects\n\n* Once upon a time the https://github.com/rackerlabs/[racketeers] hosted `gradle-linkchecker-plugin` which was an (open source) Gradle plugin.\nIt validated that all links in a local HTML file tree go out to other existing local files or remote web locations, creating a simple text file report.\n+\nCAUTION: However, as of 2024-08-14 they have deleted the repository (there seems to be a fork in https://github.com/leonard84/gradle-linkchecker-plugin[]).\n* It was perhaps based on a similar approach (https://github.com/JamaSoftwareEngineering/linkchecker-maven-plugin[linkchecker-maven-plugin]) for https://maven.apache.org[Maven].\n* https://bmuschko.com/blog/golang-with-gradle/[Benjamin Muschko] has created a (Go-based) command-line tool to check links, called https://github.com/bmuschko/link-verifier[link verifier].\n* https://github.com/gjtorikian/html-proofer[html-proofer] is written in Ruby and provides different usage scenarios (programmatically, CLI, and Docker).\n* https://github.com/wjdp/htmltest[htmltest] is also written in Go(Lang) and claims to be rapid compared to `html-proofer` (stay tuned; we have plans for HSC to run with Graal quickly).\n\n== Development \u0026 Contributing\n\n* Please report {project-issues}[issues or suggestions].\n\n* In case you want to check out, build, fork and/or contribute, take a look into our https://hsc.aim42.org/development/development-intro.html[Development Information]\n\n== Licence\n\nCurrently, code is published under the Apache-2.0 licence, documentation under Creative-Commons-Sharealike-4.0.\nSome day we'll unify that 😬.\n\n== Kudos\n\nBig thanx to image:structure101-logo.png[alt='Structure-101',link=\"https://structure101.com\"] for helping us analyze and restructure our code.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim42%2FhtmlSanityCheck","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faim42%2FhtmlSanityCheck","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim42%2FhtmlSanityCheck/lists"}