Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/aim42/htmlSanityCheck

Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like.
https://github.com/aim42/htmlSanityCheck

arc42 gradle gradle-plugin groovy hacktoberfest hacktoberfest2021 html test

Last synced: 12 days ago
JSON representation

Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like.

Lists

README

        

= image:./htmlsanitycheck-logo.png[Html-SC] Html Sanity Check
:version: 1.1.5

:plugin-url: https://github.com/aim42/htmlSanityCheck
:plugin-issues: https://github.com/aim42/htmlSanityCheck/issues

:asciidoctor-gradle-plugin-url: https://github.com/asciidoctor/asciidoctor-gradle-plugin

:asciidoc-url: http://asciidoctor.org
:gradle-url: https://gradle.org/

:gernotstarke: https://github.com/gernotstarke
:project: htmlSanityCheck
:project-url: https://github.com/aim42/htmlSanityCheck
:project-issues: https://github.com/aim42/htmlSanityCheck/issues
:project-bugs: https://github.com/aim42/htmlSanityCheck/issues?q=is%3Aopen+is%3Aissue+label%3Abug

ifdef::env-github[:outfilesuffix: .adoc]

This project provides some basic sanity checking on html files.

It can be helpful in case of html generated from e.g. {asciidoc-url}[Asciidoctor],
Markdown or other formats - as converters usually don't check for missing images
or broken links.

It can be used as Gradle plugin. Standalone Java and graphical UI
are planned for future releases.

image:https://img.shields.io/badge/License-ccsa4-green.svg[link="https://creativecommons.org/licenses/by-sa/4.0/"]
image:https://github.com/aim42/htmlSanityCheck/actions/workflows/gradle-build.yml/badge.svg[]

== Installation

Use the following snippet inside a Gradle build file:

.build.gradle
[source,groovy]
[subs="attributes"]
----
plugins {
id 'org.aim42.{project}' version '{version}'
}
----

OR

.build.gradle
[source,groovy]
[subs="attributes"]
----
buildscript {
repositories {
maven {
url "https://plugins.gradle.org/m2/"
}
}

dependencies {
classpath ('gradle.plugin.org.aim42:{project}:{version}')
}
}

apply plugin: 'org.aim42.{project}'
----

== Usage

The plugin adds a new task named `htmlSanityCheck`.

This task exposes a few properties as part of its configuration:

[horizontal]
sourceDir:: (mandatory) directory where the html files are located. Type: File. Default: `build/docs`.
sourceDocuments:: (optional) an override to process several source files, which may be a subset of all
files available in [x-]`${sourceDir}`. Type: `org.gradle.api.file.FileCollection`.
Defaults to all files in [x-]`${sourceDir}` whose names end with `.html`.

checkingResultsDir:: (optional) directory where the checking results written to.
Defaults to `${buildDir}/reports/htmlSanityCheck/`

junitResultsDir:: (optional) directory where the results written to in JUnit XML format. JUnit XML can be
read by many tools including CI environments.
Defaults to `${buildDir}/test-results/htmlchecks/`

failOnErrors:: (optional) if set to "true", the build will fail if any error was found in the checked pages.
Defaults to `false`

checkerClasses:: (optional) a set of checker classes to be executed. Defaults to all available checker classes.

== Examples

.build.gradle (small example)
[source,groovy]
----
apply plugin: 'org.aim42.htmlSanityCheck'

htmlSanityCheck {
sourceDir = new File( "$buildDir/docs" )

// where to put results of sanityChecks...
checkingResultsDir = new File( "$buildDir/report/htmlchecks" )

// fail build on errors?
failOnErrors = true

}
----

.build.gradle (extensive example)
[source, groovy]
----

import org.aim42.htmlsanitycheck.check.*

buildscript {
repositories {
maven {
url "https://plugins.gradle.org/m2/"
}
jcenter()
}
}

plugins {
id 'org.aim42.htmlsanitycheck' version '1.1.1'
id 'org.asciidoctor.convert' version '1.5.8'
}

// ==== path definitions =====
// ===========================

// location of AsciiDoc files
def asciidocSrcPath = "$projectDir/src/asciidoc"

// location of images used in AsciiDoc documentation
def srcImagesPath = "$asciidocSrcPath/images"

// results of asciidoc compilation (HTML)
// (input for htmlSanityCheck)
// this is the default path for asciidoc-gradle-convert
def htmlOutputPath = "$buildDir/asciidoc/html5"

// images used by generated html
def targetImagesPath = htmlOutputPath + "/images"

// where HTMLSanityCheck checking results ares stored
def checkingResultsPath = "$buildDir/report/htmlchecks"

apply plugin: 'org.asciidoctor.convert'

asciidoctor {
sourceDir = new File( asciidocSrcPath )

options backends: ['html5'],
doctype: 'book',
icons: 'font',
sectlink: true,
sectanchors: true

resources {
from( srcImagesPath )
into targetImagesPath
}

}

apply plugin: 'org.aim42.htmlSanityCheck'

htmlSanityCheck {

// ensure asciidoctor->html runs first
// and images are copied to build directory

dependsOn asciidoctor

sourceDir = new File( htmlOutputPath )

// files to check, specified as a file tree with filtering
sourceDocuments = fileTree(sourceDir) {
include "many-errors.html", "no-errors.html"
}

// where to put results of sanityChecks...
checkingResultsDir = new File( checkingResultsPath )

// fail build on errors?
failOnErrors = false

// http connection timeout in milliseconds
httpConnectionTimeout = 1000

// which statuscodes shall be interpreted as warning, error or success
// defaults to standard
httpWarningCodes = [401]
// httpErrorCodes
// httpSuccessCodes

// only execute a subset of all available checks
// available checker:
// * BrokenCrossReferencesChecker
// * BrokenHttpLinksChecker
// * DuplicateIdChecker
// * ImageMapChecker
// * MissingAltInImageTagsChecker
// * MissingImageFilesChecker
// * MissingLocalResourcesChecker
checkerClasses = [DuplicateIdChecker, MissingImageFilesChecker]

}

----

== Typical Output

[cols="1,1",width="50%"]
|===
| The overall goal is to create neat and clear reports,
showing eventual errors within HTML files - as shown in the adjoining figure.
| image:sample-hsc-report.jpg[width="200", link="./sample-hsc-report.jpg"
(click on thumbnail for details)]
|===

== Types of Sanity Checks

=== Broken Cross References (aka Broken Internal Links)

Finds all '' where XYZ is not defined.

.src/broken.html
[source,html]
----
internal anchor
...

Bookmark-Header


----

In this example, the bookmark is _misspelled_.

Use checkerClass _BrokenCrossReferencesChecker_.

=== Missing Images Files
Images, referenced in '400) and warnings (status 1xx or 3xx).

StatusCodes are configurable ranges (as some people might
want some content behind paywalls NOT to result in errors...)

Localhost or numerical IP addresses are currently NOT marked as suspicious.

Please comment in case you have additional requirements.

Use checkerClass _BrokenHttpLinksChecker_.

=== Other types of external links
*planned*: ftp, ntp or other protocols are currently not checked,
but should...

== Technical Documentation
In addition to checking HTML, this project serves as an example for http://arc42.de[arc42].

Please see our https://aim42.github.io/htmlSanityCheck/arc42/About-This-Docu.html[software architecture documentation].

== Fundamentals
This tiny piece rests on incredible groundwork:

* https://jsoup.org[Jsoup HTML parser] and analysis toolkit - robust and easy-to-use.

* IntelliJ IDEA - my (Gernot) best (programming) friend.

* Of course, Groovy, Gradle, JUnit and Spockframework.

== Ideas and Origin

* The plugin heavily relies on code provided by {gradle-url}[Gradle].

* Inspiration on code organization, implementation and testing of the plugin
came from the {asciidoctor-gradle-plugin-url}[Asciidoctor-Gradle-Plugin] by [@AAlmiray].

* Code for string similarity calculation by
https://github.com/rrice/java-string-similarity[Ralph Rice].

* Initial implementation, maintenance and documentation by {gernotstarke}[Gernot Starke].

== Development

In case you want to checkout, fork and/or contribute:
The documentation is maintained using the awesome
https://github.com/docToolchain/docToolchain[docToolchain],
created by https://rdmueller.github.io/[@rdmueller].

After checkout you should execute:

`git submodule update -i`

to ensure that the docToolchain submodule is downloaded.

=== Helpful Sources for Development

Several sources provided help during development:

* https://www.gradle.org/docs/current/userguide/custom_plugins.html[Gradle guide on writing custom plugins]
* The code4reference tutorial an Gradle custom plugins,
http://code4reference.com/2012/08/gradle-custom-plugin-part-1/[part 1] and
http://code4reference.com/2012/08/gradle-custom-plugin-part-2/[part 2].
* Of course, the https://jsoup.org/apidocs/[JSoup API documentation]

== Similar Projects

* The https://github.com/rackerlabs/gradle-linkchecker-plugin[gradle-linkchecker-plugin] is an (open source) gradle plugin
which validates that all links in a local HTML file tree go out to other existing local files or remote web locations.
It creates a simple text file report and might be a complement to this `HtmlSanityChecker`.

* https://bmuschko.com/blog/golang-with-gradle/[Benjamin Muschko] has created a (go-based) command-line tool
to check links, called https://github.com/bmuschko/link-verifier[link verifier]

== Contributing
Please report {plugin-issues}[issues or suggestions].

Want to improve the plugin: Fork our {plugin-url}[repository] and
send a pull request.

== Licence
Currently code is published under the Apache-2.0 licence,
documentation under Creative-Commons-Sharealike-4.0.

Some day I'll unify that :-)

Big thanx to Structure-101 for helping us analyze and restructure our code...

image:./structure101-logo.png[link="https://structure101.com"]