Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/peerigon/parse-domain
Splits a hostname into subdomains, domain and (effective) top-level domains.
https://github.com/peerigon/parse-domain
domain javascript parse publicsuffix tld typescript url
Last synced: 10 days ago
JSON representation
Splits a hostname into subdomains, domain and (effective) top-level domains.
- Host: GitHub
- URL: https://github.com/peerigon/parse-domain
- Owner: peerigon
- License: mit
- Created: 2014-05-13T12:25:42.000Z (over 10 years ago)
- Default Branch: main
- Last Pushed: 2024-09-24T03:21:56.000Z (about 2 months ago)
- Last Synced: 2024-10-30T03:36:27.189Z (11 days ago)
- Topics: domain, javascript, parse, publicsuffix, tld, typescript, url
- Language: TypeScript
- Homepage:
- Size: 2.33 MB
- Stars: 508
- Watchers: 14
- Forks: 71
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# parse-domain
**Splits a hostname into subdomains, domain and (effective) top-level domains.**
[![Version on NPM](https://img.shields.io/npm/v/parse-domain?style=for-the-badge)](https://www.npmjs.com/package/parse-domain)
[![Semantically released](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg?style=for-the-badge)](https://github.com/semantic-release/semantic-release)
[![Monthly downloads on NPM](https://img.shields.io/npm/dm/parse-domain?style=for-the-badge)](https://www.npmjs.com/package/parse-domain)
[![NPM Bundle size minified](https://img.shields.io/bundlephobia/min/parse-domain?style=for-the-badge)](https://bundlephobia.com/result?p=parse-domain)
[![NPM Bundle size minified and gzipped](https://img.shields.io/bundlephobia/minzip/parse-domain?style=for-the-badge)](https://bundlephobia.com/result?p=parse-domain)
[![License](https://img.shields.io/npm/l/parse-domain?style=for-the-badge)](./LICENSE)Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains. In order to do that **parse-domain** uses a [large list of known top-level domains](https://publicsuffix.org/list/public_suffix_list.dat) from [publicsuffix.org](https://publicsuffix.org/):
```javascript
import { parseDomain, ParseResultType } from "parse-domain";const parseResult = parseDomain(
// This should be a string with basic latin letters only.
// More information below.
"www.some.example.co.uk",
);// Check if the domain is listed in the public suffix list
if (parseResult.type === ParseResultType.Listed) {
const { subDomains, domain, topLevelDomains } = parseResult;console.log(subDomains); // ["www", "some"]
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
} else {
// Read more about other parseResult types below...
}
```This package has been designed for modern Node and browser environments with ECMAScript modules support. It assumes an ES2015 environment with [`Symbol()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol), [`URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL) and [`TextDecoder()`](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder) globally available. You need to transpile it down to ES5 (e.g. by using [Babel](https://babeljs.io/)) if you need to support older environments.
The list of top-level domains is stored in a [trie](https://en.wikipedia.org/wiki/Trie) data structure and serialization format to ensure the fastest lookup and the smallest possible library size.
## Installation
```sh
npm install parse-domain
```## Updates
💡 **Please note:** [publicsuffix.org](https://publicsuffix.org/) is updated several times per month. This package comes with a prebuilt list that has been downloaded at the time of `npm publish`. In order to get an up-to-date list, you should run `npx parse-domain-update` everytime you start or build your application. This will download the latest list from `https://publicsuffix.org/list/public_suffix_list.dat`.
## Expected input
**⚠️ [`parseDomain`](#api-js-parseDomain) does not parse whole URLs**. You should only pass the [puny-encoded](https://en.wikipedia.org/wiki/Punycode) hostname section of the URL:
| ❌ Wrong | ✅ Correct |
| ---------------------------------------------- | -------------------- |
| `https://[email protected]:8080/path?query` | `www.example.com` |
| `münchen.de` | `xn--mnchen-3ya.de` |
| `食狮.com.cn?query` | `xn--85x722f.com.cn` |There is the utility function [`fromUrl`](#api-js-fromUrl) which tries to extract the hostname from a (partial) URL and puny-encodes it:
```javascript
import { parseDomain, fromUrl } from "parse-domain";const { subDomains, domain, topLevelDomains } = parseDomain(
fromUrl("https://www.münchen.de?query"),
);console.log(subDomains); // ["www"]
console.log(domain); // "xn--mnchen-3ya"
console.log(topLevelDomains); // ["de"]// You can use the 'punycode' NPM package to decode the domain again
import { toUnicode } from "punycode";console.log(toUnicode(domain)); // "münchen"
```[`fromUrl`](#api-js-fromUrl) parses the URL using [`new URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL). Depending on your target environments you need to make sure that there is a [polyfill](https://www.npmjs.com/package/whatwg-url) for it. It's globally available in [all modern browsers](https://caniuse.com/#feat=url) (no IE) and in [Node v10](https://nodejs.org/api/url.html#url_class_url).
## Expected output
When parsing a hostname there are 5 possible results:
- invalid
- it is an ip address
- it is formally correct and the domain is
- reserved
- not listed in the public suffix list
- listed in the public suffix list[`parseDomain`](#api-js-parseDomain) returns a [`ParseResult`](#api-ts-ParseResult) with a `type` property that allows to distinguish these cases.
### 👉 Invalid domains
The given input is first validated against [RFC 3696](https://datatracker.ietf.org/doc/html/rfc3696#section-2) (the domain labels are limited to basic latin letters, numbers and hyphens). If the validation fails, `parseResult.type` will be `ParseResultType.Invalid`:
```javascript
import { parseDomain, ParseResultType } from "parse-domain";const parseResult = parseDomain("münchen.de");
console.log(parseResult.type === ParseResultType.Invalid); // true
```Check out the [API](#api-ts-ValidationError) if you need more information about the validation error.
If you don't want the characters to be validated (e.g. because you need to allow underscores in hostnames), there's also a more relaxed validation mode (according to [RFC 2181](https://www.rfc-editor.org/rfc/rfc2181#section-11)).
```javascript
import { parseDomain, ParseResultType, Validation } from "parse-domain";const parseResult = parseDomain("_jabber._tcp.gmail.com", {
validation: Validation.Lax,
});console.log(parseResult.type === ParseResultType.Listed); // true
```See also [#134](https://github.com/peerigon/parse-domain/issues/134) for the discussion.
### 👉 IP addresses
If the given input is an IP address, `parseResult.type` will be `ParseResultType.Ip`:
```javascript
import { parseDomain, ParseResultType } from "parse-domain";const parseResult = parseDomain("192.168.2.1");
console.log(parseResult.type === ParseResultType.Ip); // true
console.log(parseResult.ipVersion); // 4
```It's debatable if a library for parsing domains should also accept IP addresses. In fact, you could argue that [`parseDomain`](#api-js-parseDomain) should reject an IP address as invalid. While this is true from a technical point of view, we decided to report IP addresses in a special way because we assume that a lot of people are using this library to make sense from an arbitrary hostname (see [#102](https://github.com/peerigon/parse-domain/issues/102)).
### 👉 Reserved domains
There are 5 top-level domains that are not listed in the public suffix list but reserved according to [RFC 6761](https://tools.ietf.org/html/rfc6761) and [RFC 6762](https://tools.ietf.org/html/rfc6762):
- `localhost`
- `local`
- `example`
- `invalid`
- `test`In these cases, `parseResult.type` will be `ParseResultType.Reserved`:
```javascript
import { parseDomain, ParseResultType } from "parse-domain";const parseResult = parseDomain("pecorino.local");
console.log(parseResult.type === ParseResultType.Reserved); // true
console.log(parseResult.labels); // ["pecorino", "local"]
```### 👉 Domains that are not listed
If the given hostname is valid, but not listed in the downloaded public suffix list, `parseResult.type` will be `ParseResultType.NotListed`:
```javascript
import { parseDomain, ParseResultType } from "parse-domain";const parseResult = parseDomain("this.is.not-listed");
console.log(parseResult.type === ParseResultType.NotListed); // true
console.log(parseResult.labels); // ["this", "is", "not-listed"]
```If a domain is not listed, it can be caused by an outdated list. Make sure to [update the list once in a while](#installation).
⚠️ **Do not treat parseDomain as authoritative answer.** It cannot replace a real DNS lookup to validate if a given domain is known in a certain network.
### 👉 Effective top-level domains
Technically, the term _top-level domain_ describes the very last domain in a hostname (`uk` in `example.co.uk`). Most people, however, use the term _top-level domain_ for the _public suffix_ which is a namespace ["under which Internet users can directly register names"](https://publicsuffix.org/).
Some examples for public suffixes:
- `com` in `example.com`
- `co.uk` in `example.co.uk`
- `co` in `example.co`
- but also `com.co` in `example.com.co`If the hostname is listed in the public suffix list, the `parseResult.type` will be `ParseResultType.Listed`:
```javascript
import { parseDomain, ParseResultType } from "parse-domain";const parseResult = parseDomain("example.co.uk");
console.log(parseResult.type === ParseResultType.Listed); // true
console.log(parseResult.labels); // ["example", "co", "uk"]
```Now `parseResult` will also provide a `subDomains`, `domain` and `topLevelDomains` property:
```javascript
const { subDomains, domain, topLevelDomains } = parseResult;console.log(subDomains); // []
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]
```### 👉 Switch over `parseResult.type` to distinguish between different parse results
We recommend switching over the `parseResult.type`:
```javascript
switch (parseResult.type) {
case ParseResultType.Listed: {
const { hostname, topLevelDomains } = parseResult;console.log(`${hostname} belongs to ${topLevelDomains.join(".")}`);
break;
}
case ParseResultType.Reserved:
case ParseResultType.NotListed: {
const { hostname } = parseResult;console.log(`${hostname} is a reserved or unknown domain`);
break;
}
default:
throw new Error(`${hostname} is an ip address or invalid domain`);
}
```### ⚠️ Effective TLDs !== TLDs acknowledged by ICANN
What's surprising to a lot of people is that the definition of public suffix means that regular user domains can become effective top-level domains:
```javascript
const { subDomains, domain, topLevelDomains } = parseDomain(
"parse-domain.github.io",
);console.log(subDomains); // []
console.log(domain); // "parse-domain"
console.log(topLevelDomains); // ["github", "io"] 🤯
```In this case, `github.io` is nothing else than a private domain name registrar. `github.io` is the _effective_ top-level domain and browsers are treating it like that (e.g. for setting [`document.domain`](https://developer.mozilla.org/en-US/docs/Web/API/Document/domain)).
If you want to deviate from the browser's understanding of a top-level domain and you're only interested in top-level domains acknowledged by [ICANN](https://en.wikipedia.org/wiki/ICANN), there's an `icann` property:
```javascript
const parseResult = parseDomain("parse-domain.github.io");
const { subDomains, domain, topLevelDomains } = parseResult.icann;console.log(subDomains); // ["parse-domain"]
console.log(domain); // "github"
console.log(topLevelDomains); // ["io"]
```### ⚠️ `domain` can also be `undefined`
```javascript
const { subDomains, domain, topLevelDomains } = parseDomain("co.uk");console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // ["co", "uk"]
```### ⚠️ `""` is a valid (but reserved) domain
The empty string `""` represents the [DNS root](https://en.wikipedia.org/wiki/DNS_root_zone) and is considered to be valid. `parseResult.type` will be `ParseResultType.Reserved` in that case:
```javascript
const { type, subDomains, domain, topLevelDomains } = parseDomain("");console.log(type === ParseResultType.Reserved); // true
console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // []
```## API
🧩 = JavaScript export
🧬 = TypeScript export
🧩export parseDomain(hostname: string | typeof NO_HOSTNAME, options?: ParseDomainOptions): ParseResult
Takes a hostname (e.g. `"www.example.com"`) and returns a [`ParseResult`](#api-ts-ParseResult). The hostname must only contain basic latin letters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.
```javascript
import { parseDomain } from "parse-domain";const parseResult = parseDomain("www.example.com");
```Use `Validation.Lax` if you want to allow all characters:
```javascript
import { parseDomain, Validation } from "parse-domain";const parseResult = parseDomain("_jabber._tcp.gmail.com", {
validation: Validation.Lax,
});
```
🧩export fromUrl(input: string): string | typeof NO_HOSTNAME
Takes a URL-like string and tries to extract the hostname. Requires the global [`URL` constructor](https://developer.mozilla.org/en-US/docs/Web/API/URL) to be available on the platform. Returns the [`NO_HOSTNAME`](#api-js-NO_HOSTNAME) symbol if the input was not a string or the hostname could not be extracted. Take a look [at the test suite](/src/from-url.test.ts) for some examples. Does not throw an error, even with invalid input.
🧩export NO_HOSTNAME: unique symbol
`NO_HOSTNAME` is a symbol that is returned by [`fromUrl`](#api-js-fromUrl) when it was not able to extract a hostname from the given string. When passed to [`parseDomain`](#api-js-parseDomain), it will always yield a [`ParseResultInvalid`](#api-ts-ParseResultInvalid).
🧬export type ParseDomainOptions
```ts
export type ParseDomainOptions = {
/**
* If no validation is specified, Validation.Strict will be used.
**/
validation?: Validation;
};
```
🧩export Validation
An object that holds all possible [Validation](#api-ts-Validation) `validation` values:
```javascript
export const Validation = {
/**
* Allows any octets as labels
* but still restricts the length of labels and the overall domain.
*
* @see https://www.rfc-editor.org/rfc/rfc2181#section-11
**/
Lax: "LAX",/**
* Only allows ASCII letters, digits and hyphens (aka LDH),
* forbids hyphens at the beginning or end of a label
* and requires top-level domain names not to be all-numeric.
*
* This is the default if no validation is configured.
*
* @see https://datatracker.ietf.org/doc/html/rfc3696#section-2
*/
Strict: "STRICT",
};
```
🧬export Validation
This type represents all possible `validation` values.
🧬export ParseResult
A `ParseResult` is either a [`ParseResultInvalid`](#api-ts-ParseResultInvalid), [`ParseResultIp`](#api-ts-ParseResultIp), [`ParseResultReserved`](#api-ts-ParseResultReserved), [`ParseResultNotListed`](#api-ts-ParseResultNotListed) or [`ParseResultListed`](#api-ts-ParseResultListed).
All parse results have a `type` property that is either `"INVALID"`, `"IP"`,`"RESERVED"`,`"NOT_LISTED"`or`"LISTED"`. Use the exported [ParseResultType](#api-js-ParseResultType) to check for the type instead of checking against string literals.
All parse results also have a `hostname` property that provides access to the sanitized hostname that was passed to [`parseDomain`](#api-js-parseDomain).
🧩export ParseResultType
An object that holds all possible [ParseResult](#api-ts-ParseResult) `type` values:
```javascript
const ParseResultType = {
Invalid: "INVALID",
Ip: "IP",
Reserved: "RESERVED",
NotListed: "NOT_LISTED",
Listed: "LISTED",
};
```
🧬export ParseResultType
This type represents all possible [ParseResult](#api-ts-ParseResult) `type` values.
🧬export ParseResultInvalid
Describes the shape of the parse result that is returned when the given hostname does not adhere to [RFC 1034](https://tools.ietf.org/html/rfc1034):
- The hostname is not a string
- The hostname is longer than 253 characters
- A domain label is shorter than 1 character
- A domain label is longer than 63 characters
- A domain label contains a character that is not a basic latin character, digit or hyphen```ts
type ParseResultInvalid = {
type: ParseResultType.INVALID;
hostname: string | typeof NO_HOSTNAME;
errors: Array;
};// Example
{
type: "INVALID",
hostname: ".com",
errors: [...]
}
```
🧬export ValidationError
Describes the shape of a validation error as returned by [`parseDomain`](#api-js-parseDomain)
```ts
type ValidationError = {
type: ValidationErrorType;
message: string;
column: number;
};// Example
{
type: "LABEL_MIN_LENGTH",
message: `Label "" is too short. Label is 0 octets long but should be at least 1.`,
column: 1,
}
```
🧩export ValidationErrorType
An object that holds all possible [ValidationError](#api-ts-ValidationError) `type` values:
```javascript
const ValidationErrorType = {
NoHostname: "NO_HOSTNAME",
DomainMaxLength: "DOMAIN_MAX_LENGTH",
LabelMinLength: "LABEL_MIN_LENGTH",
LabelMaxLength: "LABEL_MAX_LENGTH",
LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
LastLabelInvalid: "LAST_LABEL_INVALID",
};
```
🧬export ValidationErrorType
This type represents all possible `type` values of a [ValidationError](#api-ts-ValidationError).
🧬export ParseResultIp
This type describes the shape of the parse result that is returned when the given hostname was an IPv4 or IPv6 address.
```ts
type ParseResultIp = {
type: ParseResultType.Ip;
hostname: string;
ipVersion: 4 | 6;
};// Example
{
type: "IP",
hostname: "192.168.0.1",
ipVersion: 4
}
```According to [RFC 3986](https://tools.ietf.org/html/rfc3986#section-3.2.2), IPv6 addresses need to be surrounded by `[` and `]` in URLs. [`parseDomain`](#api-js-parseDomain) accepts both IPv6 address with and without square brackets:
```js
// Recognized as IPv4 address
parseDomain("192.168.0.1");
// Both are recognized as proper IPv6 addresses
parseDomain("::");
parseDomain("[::]");
```
🧬export ParseResultReserved
This type describes the shape of the parse result that is returned when the given hostname
- is the root domain (the empty string `""`)
- belongs to the top-level domain `localhost`, `local`, `example`, `invalid` or `test````ts
type ParseResultReserved = {
type: ParseResultType.Reserved;
hostname: string;
labels: Array;
};// Example
{
type: "RESERVED",
hostname: "pecorino.local",
labels: ["pecorino", "local"]
}
```⚠️ Reserved IPs, such as `127.0.0.1`, will not be reported as reserved, but as `ParseResultIp`. See [#117](https://github.com/peerigon/parse-domain/issues/117).
🧬export ParseResultNotListed
Describes the shape of the parse result that is returned when the given hostname is valid and does not belong to a reserved top-level domain, but is not listed in the downloaded public suffix list.
```ts
type ParseResultNotListed = {
type: ParseResultType.NotListed;
hostname: string;
labels: Array;
};// Example
{
type: "NOT_LISTED",
hostname: "this.is.not-listed",
labels: ["this", "is", "not-listed"]
}
```
🧬export ParseResultListed
Describes the shape of the parse result that is returned when the given hostname belongs to a top-level domain that is listed in the public suffix list.
```ts
type ParseResultListed = {
type: ParseResultType.Listed;
hostname: string;
labels: Array;
subDomains: Array;
domain: string | undefined;
topLevelDomains: Array;
icann: {
subDomains: Array;
domain: string | undefined;
topLevelDomains: Array;
};
};// Example
{
type: "LISTED",
hostname: "parse-domain.github.io",
labels: ["parse-domain", "github", "io"]
subDomains: [],
domain: "parse-domain",
topLevelDomains: ["github", "io"],
icann: {
subDomains: ["parse-domain"],
domain: "github",
topLevelDomains: ["io"]
}
}
```## License
MIT
## Sponsors
[](https://peerigon.com)