https://github.com/kikobeats/html-urls
Get all urls from a HTML markup
https://github.com/kikobeats/html-urls
Last synced: 13 days ago
JSON representation
Get all urls from a HTML markup
- Host: GitHub
- URL: https://github.com/kikobeats/html-urls
- Owner: Kikobeats
- License: mit
- Created: 2018-01-16T15:20:23.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2025-03-19T17:28:53.000Z (about 1 month ago)
- Last Synced: 2025-04-14T10:04:08.618Z (13 days ago)
- Language: JavaScript
- Homepage:
- Size: 278 KB
- Stars: 12
- Watchers: 2
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# html-urls

[](https://coveralls.io/github/Kikobeats/html-urls)
[](https://www.npmjs.org/package/html-urls)> Get all URLs from a HTML markup. It's based on [W3C link checker](https://github.com/w3c/node-linkchecker).
## Install
```bash
$ npm install html-urls --save
```## Usage
```js
const got = require('got')
const htmlUrls = require('html-urls');(async () => {
const url = process.argv[2]
if (!url) throw new TypeError('Need to provide an url as first argument.')
const { body: html } = await got(url)
const links = htmlUrls({ html, url })links.forEach(({ url }) => console.log(url))
// => [
// 'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
// 'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
// 'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
// 'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
// 'https://microlink.io/commons-8b286eac293678e1c98c.js',
// 'https://microlink.io',
// ...
// ]
})()
```It returns the following structure per every value detect on the HTML markup:
##### value
Type: ``The original value.
##### url
Type: ``The normalized URL, if the value can be considered an URL.
##### uri
Type: ``The normalized value as URI.
See [examples](/examples) for more!
## API
### htmlUrls([options])
#### options
##### html
Type: `string`
Default: `''`The HTML markup.
##### url
Type: `string`
Default: `''`The URL associated with the HTML markup.
It is used for resolve relative links that can be present in the HTML markup.
##### whitelist
Type: `array`
Default: `[]`A list of links to be excluded from the final output. It supports regex patterns.
See [matcher](https://github.com/sindresorhus/matcher#matcher) for know more.
##### removeDuplicates
Type: `boolean`
Default: `true`Remove duplicated links detected over all the HTML tags.
## Related
- [xml-urls](https://github.com/Kikobeats/xml-urls) – Get all urls from a Feed/Atom/RSS/Sitemap xml markup.
- [css-urls](https://github.com/Kikobeats/css-urls) – Get all URLs referenced from stylesheet files.## License
**html-urls** © [Kiko Beats](https://kikobeats.com), released under the [MIT](https://github.com/Kikobeats/html-urls/blob/master/LICENSE.md) License.
Authored and maintained by Kiko Beats with help from [contributors](https://github.com/Kikobeats/html-urls/contributors).> [kikobeats.com](https://kikobeats.com) · GitHub [@Kiko Beats](https://github.com/Kikobeats) · X [@Kikobeats](https://x.com/Kikobeats)