{"id":13454654,"url":"https://github.com/stevenvachon/broken-link-checker","last_synced_at":"2025-05-14T05:10:34.919Z","repository":{"id":37706147,"uuid":"29047173","full_name":"stevenvachon/broken-link-checker","owner":"stevenvachon","description":"Find broken links, missing images, etc within your HTML.","archived":false,"fork":false,"pushed_at":"2024-01-08T09:26:30.000Z","size":426,"stargazers_count":2013,"open_issues_count":78,"forks_count":307,"subscribers_count":36,"default_branch":"main","last_synced_at":"2025-05-14T02:15:28.905Z","etag":null,"topics":["html5","http","link-checker","links","nodejs","seo","urls","whatwg"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stevenvachon.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-01-10T04:31:07.000Z","updated_at":"2025-05-09T22:04:21.000Z","dependencies_parsed_at":"2022-07-14T00:50:37.109Z","dependency_job_id":"57f1edb6-3924-490e-8d66-189f9ac8d656","html_url":"https://github.com/stevenvachon/broken-link-checker","commit_stats":{"total_commits":95,"total_committers":8,"mean_commits":11.875,"dds":0.4,"last_synced_commit":"ce9e116590b63d23687f9eb403ab773e60f4fcf1"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevenvachon%2Fbroken-link-checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevenvachon%2Fbroken-link-checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevenvachon%2Fbroken-link-checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stevenvachon%2Fbroken-link-checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stevenvachon","download_url":"https://codeload.github.com/stevenvachon/broken-link-checker/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254076850,"owners_count":22010611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html5","http","link-checker","links","nodejs","seo","urls","whatwg"],"created_at":"2024-07-31T08:00:56.417Z","updated_at":"2025-05-14T05:10:34.892Z","avatar_url":"https://github.com/stevenvachon.png","language":"JavaScript","funding_links":[],"categories":["Other","JavaScript","Utilities","JavaScript (485)","📦 Legacy \u0026 Inactive Projects"],"sub_categories":["Open Redirect","Self-hosted","XSS"],"readme":"# broken-link-checker [![NPM Version][npm-image]][npm-url] ![Build Status][ci-image] [![Coverage Status][coveralls-image]][coveralls-url] [![Dependency Monitor][greenkeeper-image]][greenkeeper-url]\n\n\u003e Find broken links, missing images, etc within your HTML.\n\n* ✅ **Complete**: Unicode, redirects, compression, basic authentication, absolute/relative/local URLs.\n* ⚡️ **Fast**: Concurrent, streamed and cached.\n* 🍰 **Easy**: Convenient defaults and very configurable.\n\nOther features:\n* Support for many HTML elements and attributes; not only `\u003ca href\u003e` and `\u003cimg src\u003e`.\n* Support for relative URLs with `\u003cbase href\u003e`.\n* WHATWG specifications-compliant [HTML](https://html.spec.whatwg.org) and [URL](https://url.spec.whatwg.org) parsing.\n* Honor robot exclusions (robots.txt, headers and `rel`), optionally.\n* Detailed information for reporting and maintenance.\n* URL keyword filtering with simple wildcards.\n* Pause/Resume at any time.\n\n\n## Installation\n\n[Node.js](https://nodejs.org) `\u003e= 14` is required. There're two ways to use it:\n\n### Command Line Usage\nTo install, type this at the command line:\n```shell\nnpm install broken-link-checker -g\n```\nAfter that, check out the help for available options:\n```shell\nblc --help\n```\nA typical site-wide check might look like:\n```shell\nblc http://yoursite.com -ro\n# or\nblc path/to/index.html -ro\n```\n\n**Note:** HTTP proxies are not directly supported. If your network is configured incorrectly with no resolution in sight, you could try using a [container with proxy settings](https://docs.docker.com/network/proxy/).\n\n### Programmatic API\nTo install, type this at the command line:\n```shell\nnpm install broken-link-checker\n```\nThe remainder of this document will assist you in using the API.\n\n\n## Classes\nWhile all classes have been exposed for custom use, the one that you need will most likely be [`SiteChecker`](#sitechecker).\n\n### `HtmlChecker`\nScans an HTML document to find broken links. All methods from [`EventEmitter`](https://nodejs.org/api/events.html#events_class_eventemitter) are available.\n\n```js\nconst {HtmlChecker} = require('broken-link-checker');\n\nconst htmlChecker = new HtmlChecker(options)\n  .on('error', (error) =\u003e {})\n  .on('html', (tree, robots) =\u003e {})\n  .on('queue', () =\u003e {})\n  .on('junk', (result) =\u003e {})\n  .on('link', (result) =\u003e {})\n  .on('complete', () =\u003e {});\n\nhtmlChecker.scan(html, baseURL);\n```\n\n#### Methods \u0026 Properties\n* `.clearCache()` will remove any cached URL responses.\n* `.isPaused` returns `true` if the internal link queue is paused and `false` if not.\n* `.numActiveLinks` returns the number of links with active requests.\n* `.numQueuedLinks` returns the number of links that currently have no active requests.\n* `.pause()` will pause the internal link queue, but will not pause any active requests.\n* `.resume()` will resume the internal link queue.\n* `.scan(html, baseURL)` parses \u0026 scans a single HTML document and returns a `Promise`. Calling this function while a previous scan is in progress will result in a thrown error. Arguments:\n  * `html` must be either a [`Stream`](https://nodejs.org/api/stream.html) or a string.\n  * `baseURL` must be a [`URL`](https://mdn.io/URL). Without this value, links to relative URLs will be given a `BLC_INVALID` reason for being broken (unless an absolute `\u003cbase href\u003e` is found).\n\n#### Events\n* `'complete'` is emitted after the last result or zero results.\n* `'error'` is emitted when an error occurs within any of your event handlers and will prevent the current scan from failing. Arguments:\n  * `error` is the `Error`.\n* `'html'` is emitted after the HTML document has been fully parsed. Arguments:\n  * `tree` is supplied by [parse5](https://npmjs.com/parse5).\n  * `robots` is an instance of [robot-directives](https://npmjs.com/robot-directives) containing any `\u003cmeta\u003e` robot exclusions.\n* `'junk'` is emitted on each skipped/unchecked link, as configured in options. Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n* `'link'` is emitted with the result of each checked/unskipped link (broken or not). Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n* `'queue'` is emitted when a link is internally queued, dequeued or made active.\n\n\n### `HtmlUrlChecker`\nScans the HTML content at each queued URL to find broken links. All methods from [`EventEmitter`](https://nodejs.org/api/events.html#events_class_eventemitter) are available.\n\n```js\nconst {HtmlUrlChecker} = require('broken-link-checker');\n\nconst htmlUrlChecker = new HtmlUrlChecker(options)\n  .on('error', (error) =\u003e {})\n  .on('html', (tree, robots, response, pageURL, customData) =\u003e {})\n  .on('queue', () =\u003e {})\n  .on('junk', (result, customData) =\u003e {})\n  .on('link', (result, customData) =\u003e {})\n  .on('page', (error, pageURL, customData) =\u003e {})\n  .on('end', () =\u003e {});\n\nhtmlUrlChecker.enqueue(pageURL, customData);\n```\n\n#### Methods \u0026 Properties\n* `.clearCache()` will remove any cached URL responses.\n* `.dequeue(id)` removes a page from the queue. Returns `true` on success or `false` on failure.\n* `.enqueue(pageURL, customData)` adds a page to the queue. Queue items are auto-dequeued when their requests are complete. Returns a queue ID on success. Arguments:\n  * `pageURL` must be a [`URL`](https://mdn.io/URL).\n  * `customData` is optional data (of any type) that is stored in the queue item for the page.\n* `.has(id)` returns `true` if the queue contains an active or queued page tagged with `id` and `false` if not.\n* `.isPaused` returns `true` if the queue is paused and `false` if not.\n* `.numActiveLinks` returns the number of links with active requests.\n* `.numPages` returns the total number of pages in the queue.\n* `.numQueuedLinks` returns the number of links that currently have no active requests.\n* `.pause()` will pause the queue, but will not pause any active requests.\n* `.resume()` will resume the queue.\n\n#### Events\n* `'end'` is emitted when the end of the queue has been reached.\n* `'error'` is emitted when an error occurs within any of your event handlers and will prevent the current scan from failing. Arguments:\n  * `error` is the `Error`.\n* `'html'` is emitted after a page's HTML document has been fully parsed. Arguments:\n  * `tree` is supplied by [parse5](https://npmjs.com/parse5).\n  * `robots` is an instance of [robot-directives](https://npmjs.com/robot-directives) containing any `\u003cmeta\u003e` and `X-Robots-Tag` robot exclusions.\n  * `response` is the full HTTP response for the page, excluding the body.\n  * `pageURL` is the `URL` to the current page being scanned.\n  * `customData` is whatever was queued.\n* `'junk'` is emitted on each skipped/unchecked link, as configured in options. Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n  * `customData` is whatever was queued.\n* `'link'` is emitted with the result of each checked/unskipped link (broken or not) within the current page. Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n  * `customData` is whatever was queued.\n* `'page'` is emitted after a page's last result, on zero results, or if the HTML could not be retrieved. Arguments:\n  * `error` will be an `Error` if such occurred or `null` if not.\n  * `pageURL` is the `URL` to the current page being scanned.\n  * `customData` is whatever was queued.\n* `'queue'` is emitted when a URL (link or page) is queued, dequeued or made active.\n\n\n### `SiteChecker`\nRecursively scans (crawls) the HTML content at each queued URL to find broken links. All methods from [`EventEmitter`](https://nodejs.org/api/events.html#events_class_eventemitter) are available.\n\n```js\nconst {SiteChecker} = require('broken-link-checker');\n\nconst siteChecker = new SiteChecker(options)\n  .on('error', (error) =\u003e {})\n  .on('robots', (robots, customData) =\u003e {})\n  .on('html', (tree, robots, response, pageURL, customData) =\u003e {})\n  .on('queue', () =\u003e {})\n  .on('junk', (result, customData) =\u003e {})\n  .on('link', (result, customData) =\u003e {})\n  .on('page', (error, pageURL, customData) =\u003e {})\n  .on('site', (error, siteURL, customData) =\u003e {})\n  .on('end', () =\u003e {});\n\nsiteChecker.enqueue(siteURL, customData);\n```\n\n#### Methods \u0026 Properties\n* `.clearCache()` will remove any cached URL responses.\n* `.dequeue(id)` removes a site from the queue. Returns `true` on success or `false` on failure.\n* `.enqueue(siteURL, customData)` adds [the first page of] a site to the queue. Queue items are auto-dequeued when their requests are complete. Returns a queue ID on success. Arguments:\n  * `siteURL` must be a [`URL`](https://mdn.io/URL).\n  * `customData` is optional data (of any type) that is stored in the queue item for the site.\n* `.has(id)` returns `true` if the queue contains an active or queued site tagged with `id` and `false` if not.\n* `.isPaused` returns `true` if the queue is paused and `false` if not.\n* `.numActiveLinks` returns the number of links with active requests.\n* `.numPages` returns the total number of pages in the queue.\n* `.numQueuedLinks` returns the number of links that currently have no active requests.\n* `.numSites` returns the total number of sites in the queue.\n* `.pause()` will pause the queue, but will not pause any active requests.\n* `.resume()` will resume the queue.\n\n#### Events\n* `'end'` is emitted when the end of the queue has been reached.\n* `'error'` is emitted when an error occurs within any of your event handlers and will prevent the current scan from failing. Arguments:\n  * `error` is the `Error`.\n* `'html'` is emitted after a page's HTML document has been fully parsed. Arguments:\n  * `tree` is supplied by [parse5](https://npmjs.com/parse5).\n  * `robots` is an instance of [robot-directives](https://npmjs.com/robot-directives) containing any `\u003cmeta\u003e` and `X-Robots-Tag` robot exclusions.\n  * `response` is the full HTTP response for the page, excluding the body.\n  * `pageURL` is the `URL` to the current page being scanned.\n  * `customData` is whatever was queued.\n* `'junk'` is emitted on each skipped/unchecked link, as configured in options. Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n  * `customData` is whatever was queued.\n* `'link'` is emitted with the result of each checked/unskipped link (broken or not) within the current page. Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n  * `customData` is whatever was queued.\n* `'page'` is emitted after a page's last result, on zero results, or if the HTML could not be retrieved. Arguments:\n  * `error` will be an `Error` if such occurred or `null` if not.\n  * `pageURL` is the `URL` to the current page being scanned.\n  * `customData` is whatever was queued.\n* `'queue'` is emitted when a URL (link, page or site) is queued, dequeued or made active.\n* `'robots'` is emitted after a site's robots.txt has been downloaded. Arguments:\n  * `robots` is an instance of [robots-txt-guard](https://npmjs.com/robots-txt-guard).\n  * `customData` is whatever was queued.\n* `'site'` is emitted after a site's last result, on zero results, or if the *initial* HTML could not be retrieved. Arguments:\n  * `error` will be an `Error` if such occurred or `null` if not.\n  * `siteURL` is the `URL` to the current site being crawled.\n  * `customData` is whatever was queued.\n\n**Note:** the `filterLevel` option is used for determining which links are recursive.\n\n\n### `UrlChecker`\nRequests each queued URL to determine if they are broken. All methods from [`EventEmitter`](https://nodejs.org/api/events.html#events_class_eventemitter) are available.\n\n```js\nconst {UrlChecker} = require('broken-link-checker');\n\nconst urlChecker = new UrlChecker(options)\n  .on('error', (error) =\u003e {})\n  .on('queue', () =\u003e {})\n  .on('link', (result, customData) =\u003e {})\n  .on('end', () =\u003e {});\n\nurlChecker.enqueue(url, customData);\n```\n\n#### Methods \u0026 Properties\n* `.clearCache()` will remove any cached URL responses.\n* `.dequeue(id)` removes a URL from the queue. Returns `true` on success or `false` on failure.\n* `.enqueue(url, customData)` adds a URL to the queue. Queue items are auto-dequeued when their requests are completed. Returns a queue ID on success. Arguments:\n  * `url` must be a [`URL`](https://mdn.io/URL).\n  * `customData` is optional data (of any type) that is stored in the queue item for the URL.\n* `.has(id)` returns `true` if the queue contains an active or queued URL tagged with `id` and `false` if not.\n* `.isPaused` returns `true` if the queue is paused and `false` if not.\n* `.numActiveLinks` returns the number of links with active requests.\n* `.numQueuedLinks` returns the number of links that currently have no active requests.\n* `.pause()` will pause the queue, but will not pause any active requests.\n* `.resume()` will resume the queue.\n\n#### Events\n* `'end'` is emitted when the end of the queue has been reached.\n* `'error'` is emitted when an error occurs within any of your event handlers and will prevent the current scan from failing. Arguments:\n  * `error` is the `Error`.\n* `'junk'` is emitted for each skipped/unchecked result, as configured in options. Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n  * `customData` is whatever was queued.\n* `'link'` is emitted for each checked/unskipped result (broken or not). Arguments:\n  * `result` is a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js).\n  * `customData` is whatever was queued.\n* `'queue'` is emitted when a URL is queued, dequeued or made active.\n\n\n## Options\n\n### `cacheMaxAge`\nType: `Number`  \nDefault Value: `3_600_000` (1 hour)  \nThe number of milliseconds in which a cached response should be considered valid. This is only relevant if the `cacheResponses` option is enabled.\n\n### `cacheResponses`\nType: `Boolean`  \nDefault Value: `true`  \nURL request results will be cached when `true`. This will ensure that each unique URL will only be checked once.\n\n### `excludedKeywords`\nType: `Array\u003cString\u003e`  \nDefault value: `[]`  \nWill not check links that match the keywords and glob patterns within this list. The only wildcards supported are [`*` and `!`](https://npmjs.com/matcher).\n\nThis option does *not* apply to `UrlChecker`.\n\n### `excludeExternalLinks`\nType: `Boolean`  \nDefault value: `false`  \nWill not check external links (different protocol and/or host) when `true`; relative links with a remote `\u003cbase href\u003e` included.\n\nThis option does *not* apply to `UrlChecker`.\n\n### `excludeInternalLinks`\nType: `Boolean`  \nDefault value: `false`  \nWill not check internal links (same protocol and host) when `true`.\n\nThis option does *not* apply to `UrlChecker` nor `SiteChecker`'s *crawler*.\n\n### `excludeLinksToSamePage`\nType: `Boolean`  \nDefault value: `false`  \nWill not check links to the same page; relative and absolute fragments/hashes included. This is only relevant if the `cacheResponses` option is disabled.\n\nThis option does *not* apply to `UrlChecker`.\n\n### `filterLevel`\nType: `Number`  \nDefault value: `1`  \nThe tags and attributes that are considered links for checking, split into the following levels:\n* `0`: clickable links\n* `1`: clickable links, media, frames, meta refreshes\n* `2`: clickable links, media, frames, meta refreshes, stylesheets, scripts, forms\n* `3`: clickable links, media, frames, meta refreshes, stylesheets, scripts, forms, metadata\n\nRecursive links have a slightly different filter subset. To see the exact breakdown of both, check out the [tag map](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/tags.js). `\u003cbase href\u003e` is not listed because it is not a link, though it is always parsed.\n\nThis option does *not* apply to `UrlChecker`.\n\n### `honorRobotExclusions`\nType: `Boolean`  \nDefault value: `true`  \nWill not scan pages that search engine crawlers would not follow. Such will have been specified with any of the following:\n* `\u003ca rel=\"nofollow\" href=\"…\"\u003e`\n* `\u003carea rel=\"nofollow\" href=\"…\"\u003e`\n* `\u003cmeta name=\"robots\" content=\"noindex,nofollow,…\"\u003e`\n* `\u003cmeta name=\"googlebot\" content=\"noindex,nofollow,…\"\u003e`\n* `\u003cmeta name=\"robots\" content=\"unavailable_after: …\"\u003e`\n* `X-Robots-Tag: noindex,nofollow,…`\n* `X-Robots-Tag: googlebot: noindex,nofollow,…`\n* `X-Robots-Tag: otherbot: noindex,nofollow,…`\n* `X-Robots-Tag: unavailable_after: …`\n* robots.txt\n\nThis option does *not* apply to `UrlChecker`.\n\n### `includedKeywords`\nType: `Array\u003cString\u003e`  \nDefault value: `[]`  \nWill only check links that match the keywords and glob patterns within this list, _if any_. The only wildcard supported is `*`.\n\nThis option does *not* apply to `UrlChecker`.\n\n### `includeLink`\nType: `Function`  \nDefault value: `link =\u003e true`  \nA synchronous callback that is called after all other filters have been performed. Return `true` to include `link` (a [`Link`](https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/Link.js)) in the list of links to be checked, or return `false` to have it skipped.\n\nThis option does *not* apply to `UrlChecker`.\n\n### `includePage`\nType: `Function`  \nDefault value: `url =\u003e true`  \nA synchronous callback that is called after all other filters have been performed. Return `true` to include `url` (a [`URL`](https://mdn.io/URL)) in the list of pages to be crawled, or return `false` to have it skipped.\n\nThis option does *not* apply to `UrlChecker` nor `HtmlUrlChecker`.\n\n### `maxSockets`\nType: `Number`  \nDefault value: `Infinity`  \nThe maximum number of links to check at any given time.\n\n### `maxSocketsPerHost`\nType: `Number`  \nDefault value: `2`  \nThe maximum number of links per host/port to check at any given time. This avoids overloading a single target host with too many concurrent requests. This will not limit concurrent requests to other hosts.\n\n### `rateLimit`\nType: `Number`  \nDefault value: `0`  \nThe number of milliseconds to wait before each request.\n\n### `requestMethod`\nType: `String`  \nDefault value: `'head'`  \nThe HTTP request method used in checking links. If you experience problems, try using `'get'`, however the `retryHeadFail` option should have you covered.\n\n### `retryHeadCodes`\nType: `Array\u003cNumber\u003e`  \nDefault value: `[405]`  \nThe list of HTTP status codes for the `retryHeadFail` option to reference.\n\n### `retryHeadFail`\nType: `Boolean`  \nDefault value: `true`  \nSome servers do not respond correctly to a `'head'` request method. When `true`, a link resulting in an HTTP status code listed within the `retryHeadCodes` option will be re-requested using a `'get'` method before deciding that it is broken. This is only relevant if the `requestMethod` option is set to `'head'`.\n\n### `userAgent`\nType: `String`  \nDefault value: `'broken-link-checker/0.8.0 Node.js/14.16.0 (OS X; x64)'` (or similar)  \nThe HTTP user-agent to use when checking links as well as retrieving pages and robot exclusions.\n\n\n## Handling Broken/Excluded Links\nA broken link will have an `isBroken` value of `true` and a reason code defined in `brokenReason`. A link that was not checked (emitted as `'junk'`) will have a `wasExcluded` value of `true`, a reason code defined in `excludedReason` and a `isBroken` value of `null`.\n```js\nif (link.get('isBroken')) {\n  console.log(link.get('brokenReason'));\n  //-\u003e HTTP_406\n} else if (link.get('wasExcluded')) {\n  console.log(link.get('excludedReason'));\n  //-\u003e BLC_ROBOTS\n}\n```\n\nAdditionally, more descriptive messages are available for each reason code:\n```js\nconst {reasons} = require('broken-link-checker');\n\nconsole.log(reasons.BLC_ROBOTS);       //-\u003e Robots exclusion\nconsole.log(reasons.ERRNO_ECONNRESET); //-\u003e connection reset by peer (ECONNRESET)\nconsole.log(reasons.HTTP_404);         //-\u003e Not Found (404)\n\n// List all\nconsole.log(reasons);\n```\n\nPutting it all together:\n```js\nif (link.get('isBroken')) {\n  console.log(reasons[link.get('brokenReason')]);\n} else if (link.get('wasExcluded')) {\n  console.log(reasons[link.get('excludedReason')]);\n}\n```\n\nFinally, **it is important** to analyze links excluded with the `BLC_UNSUPPORTED` reason as it's possible for them to be broken.\n\n\n## Roadmap Features\n* `'info'` event with messaging such as 'Site does not support HTTP HEAD method' (regarding `retryHeadFail` option)\n* add cheerio support by using parse5's htmlparser2 tree adaptor?\n* load sitemap.xml at *start* of each `SiteChecker` site (since cache can expire) to possibly check pages that were not linked to, removing from list as *discovered* links are checked\n* change order of checking to: tcp error, 4xx code (broken), 5xx code (undetermined), 200\n* abort download of body when `options.retryHeadFail===true`\n* option to retry broken links a number of times (default=0)\n* option to scrape `response.body` for erroneous sounding text (using [fathom](https://npmjs.com/fathom-web)?), since an error page could be presented but still have code 200\n* option to detect parked domain (302 with no redirect?)\n* option to check broken link on archive.org for archived version (using [this lib](https://npmjs.com/archive.org))\n* option to run `HtmlUrlChecker` checks on page load (using [jsdom](https://npmjs.com/jsdom)) to include links added with JavaScript?\n* option to check if hashes exist in target URL document?\n* option to parse Markdown in `HtmlChecker` for links\n* option to check plain text URLs\n* add throttle profiles (0–9, -1 for \"custom\") for easy configuring\n* check [ftp:](https://npmjs.com/ftp), [sftp:](https://npmjs.com/ssh2) (for downloadable files)\n* check ~~mailto:~~, news:, nntp:, telnet:?\n* check that data URLs are valid (with [valid-data-url](https://www.npmjs.com/valid-data-url))?\n* supply CORS error for file:// links on sites with a different protocol\n* create an example with http://astexplorer.net\n* use [debug](https://npmjs.com/debug)\n* use [bunyan](https://npmjs.com/bunyan) with JSON output for CLI\n* store request object/headers (or just auth) in `Link`?\n* supply basic auth for \"page\" events?\n* add option for `URLCache` normalization profiles\n\n\n[npm-image]: https://img.shields.io/npm/v/broken-link-checker.svg\n[npm-url]: https://npmjs.org/package/broken-link-checker\n[ci-image]: https://img.shields.io/github/workflow/status/stevenvachon/broken-link-checker/tests\n[coveralls-image]: https://img.shields.io/coveralls/stevenvachon/broken-link-checker.svg\n[coveralls-url]: https://coveralls.io/github/stevenvachon/broken-link-checker\n[greenkeeper-image]: https://badges.greenkeeper.io/stevenvachon/broken-link-checker.svg\n[greenkeeper-url]: https://greenkeeper.io/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevenvachon%2Fbroken-link-checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevenvachon%2Fbroken-link-checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevenvachon%2Fbroken-link-checker/lists"}