https://github.com/wooorm/parse-entities
Parse HTML character references
https://github.com/wooorm/parse-entities
character entities entity html parse reference
Last synced: 11 months ago
JSON representation
Parse HTML character references
- Host: GitHub
- URL: https://github.com/wooorm/parse-entities
- Owner: wooorm
- License: mit
- Created: 2015-12-23T19:32:25.000Z (over 10 years ago)
- Default Branch: main
- Last Pushed: 2024-12-13T11:08:06.000Z (over 1 year ago)
- Last Synced: 2025-05-03T08:09:41.043Z (12 months ago)
- Topics: character, entities, entity, html, parse, reference
- Language: JavaScript
- Homepage:
- Size: 184 KB
- Stars: 49
- Watchers: 4
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- Funding: funding.yml
- License: license
Awesome Lists containing this project
README
# parse-entities
[![Build][build-badge]][build]
[![Coverage][coverage-badge]][coverage]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]
Parse HTML character references.
## Contents
* [What is this?](#what-is-this)
* [When should I use this?](#when-should-i-use-this)
* [Install](#install)
* [Use](#use)
* [API](#api)
* [`parseEntities(value[, options])`](#parseentitiesvalue-options)
* [Types](#types)
* [Compatibility](#compatibility)
* [Security](#security)
* [Related](#related)
* [Contribute](#contribute)
* [License](#license)
## What is this?
This is a small and powerful decoder of HTML character references (often called
entities).
## When should I use this?
You can use this for spec-compliant decoding of character references.
Itβs small and fast enough to do that well.
You can also use this when making a linter, because there are different warnings
emitted with reasons for why and positional info on where they happened.
## Install
This package is [ESM only][esm].
In Node.js (version 14.14+, 16.0+), install with [npm][]:
```sh
npm install parse-entities
```
In Deno with [`esm.sh`][esmsh]:
```js
import {parseEntities} from 'https://esm.sh/parse-entities@3'
```
In browsers with [`esm.sh`][esmsh]:
```html
import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
```
## Use
```js
import {parseEntities} from 'parse-entities'
console.log(parseEntities('alpha & bravo')))
// => alpha & bravo
console.log(parseEntities('charlie ©cat; delta'))
// => charlie Β©cat; delta
console.log(parseEntities('echo Β© foxtrot β golf π hotel'))
// => echo Β© foxtrot β golf π hotel
```
## API
This package exports the identifier `parseEntities`.
There is no default export.
### `parseEntities(value[, options])`
Parse HTML character references.
##### `options`
Configuration (optional).
###### `options.additional`
Additional character to accept (`string?`, default: `''`).
This allows other characters, without error, when following an ampersand.
###### `options.attribute`
Whether to parse `value` as an attribute value (`boolean?`, default: `false`).
This results in slightly different behavior.
###### `options.nonTerminated`
Whether to allow nonterminated references (`boolean`, default: `true`).
For example, `©cat` for `Β©cat`.
This behavior is compliant to the spec but can lead to unexpected results.
###### `options.position`
Starting `position` of `value` (`Position` or `Point`, optional).
Useful when dealing with values nested in some sort of syntax tree.
The default is:
```js
{line: 1, column: 1, offset: 0}
```
###### `options.warning`
Error handler ([`Function?`][warning]).
###### `options.text`
Text handler ([`Function?`][text]).
###### `options.reference`
Reference handler ([`Function?`][reference]).
###### `options.warningContext`
Context used when calling `warning` (`'*'`, optional).
###### `options.textContext`
Context used when calling `text` (`'*'`, optional).
###### `options.referenceContext`
Context used when calling `reference` (`'*'`, optional)
##### Returns
`string` β decoded `value`.
#### `function warning(reason, point, code)`
Error handler.
###### Parameters
* `this` (`*`) β refers to `warningContext` when given to `parseEntities`
* `reason` (`string`) β human readable reason for emitting a parse error
* `point` ([`Point`][point]) β place where the error occurred
* `code` (`number`) β machine readable code the error
The following codes are used:
| Code | Example | Note |
| ---- | ------------------ | --------------------------------------------- |
| `1` | `foo & bar` | Missing semicolon (named) |
| `2` | `foo { bar` | Missing semicolon (numeric) |
| `3` | `Foo &bar baz` | Empty (named) |
| `4` | `Foo ` | Empty (numeric) |
| `5` | `Foo &bar; baz` | Unknown (named) |
| `6` | `Foo Β baz` | [Disallowed reference][invalid] |
| `7` | `Foo baz` | Prohibited: outside permissible unicode range |
#### `function text(value, position)`
Text handler.
###### Parameters
* `this` (`*`) β refers to `textContext` when given to `parseEntities`
* `value` (`string`) β string of content
* `position` ([`Position`][position]) β place where `value` starts and ends
#### `function reference(value, position, source)`
Character reference handler.
###### Parameters
* `this` (`*`) β refers to `referenceContext` when given to `parseEntities`
* `value` (`string`) β decoded character reference
* `position` ([`Position`][position]) β place where `source` starts and ends
* `source` (`string`) β raw source of character reference
## Types
This package is fully typed with [TypeScript][].
It exports the additional types `Options`, `WarningHandler`,
`ReferenceHandler`, and `TextHandler`.
## Compatibility
This package is at least compatible with all maintained versions of Node.js.
As of now, that is Node.js 14.14+ and 16.0+.
It also works in Deno and modern browsers.
## Security
This package is safe: it matches the HTML spec to parse character references.
## Related
* [`wooorm/stringify-entities`](https://github.com/wooorm/stringify-entities)
β encode HTML character references
* [`wooorm/character-entities`](https://github.com/wooorm/character-entities)
β info on character references
* [`wooorm/character-entities-html4`](https://github.com/wooorm/character-entities-html4)
β info on HTML4 character references
* [`wooorm/character-entities-legacy`](https://github.com/wooorm/character-entities-legacy)
β info on legacy character references
* [`wooorm/character-reference-invalid`](https://github.com/wooorm/character-reference-invalid)
β info on invalid numeric character references
## Contribute
Yes please!
See [How to Contribute to Open Source][contribute].
## License
[MIT][license] Β© [Titus Wormer][author]
[build-badge]: https://github.com/wooorm/parse-entities/workflows/main/badge.svg
[build]: https://github.com/wooorm/parse-entities/actions
[coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/parse-entities.svg
[coverage]: https://codecov.io/github/wooorm/parse-entities
[downloads-badge]: https://img.shields.io/npm/dm/parse-entities.svg
[downloads]: https://www.npmjs.com/package/parse-entities
[size-badge]: https://img.shields.io/bundlephobia/minzip/parse-entities.svg
[size]: https://bundlephobia.com/result?p=parse-entities
[npm]: https://docs.npmjs.com/cli/install
[esmsh]: https://esm.sh
[license]: license
[author]: https://wooorm.com
[esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
[typescript]: https://www.typescriptlang.org
[warning]: #function-warningreason-point-code
[text]: #function-textvalue-position
[reference]: #function-referencevalue-position-source
[invalid]: https://github.com/wooorm/character-reference-invalid
[point]: https://github.com/syntax-tree/unist#point
[position]: https://github.com/syntax-tree/unist#position
[contribute]: https://opensource.guide/how-to-contribute/