Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yeonjuan/es-html-parser

HTML parser for static analysis
https://github.com/yeonjuan/es-html-parser

html html-parser

Last synced: about 1 month ago
JSON representation

HTML parser for static analysis

Awesome Lists containing this project

README

        

# ES HTML Parser


CI Badge









ES HTML Parser is a HTML parser that generates an abstract syntax tree similar to the ESTree specification.

This project began as a fork of [hyntax](https://github.com/mykolaharmash/hyntax) and is developed to follow [ESTree](https://github.com/estree/estree)-like ast specification.

See [online demo](https://yeonjuan.github.io/es-html-parser/).

## Table of Contents

- [Install](#install)
- [Usage](#usage)
- [API Reference](#api-reference)
- [AST Format](#ast-format)
- [License](#license)

## Install

```
npm install es-html-parser
```

## Usage

```js
import { parse } from "es-html-parser";

const input = `


press here

`;

const { ast, tokens } = parse(input);
```

## API Reference

- [Functions](#functions)
- [Types](#types)
- [Constants](#constants)

### Functions

#### parse

```ts
parse(html: string): ParseResult;
```

**Arguments**

- `html`: HTML string to parse.

**Returns**

- `ParseResult`: Result of parsing

### Types

#### ParseResult

```ts
interface ParseResult {
ast: DocumentNode;
tokens: AnyToken[];
}
```

- `ast`: The root node of the ast.
- `tokens`: An array of resulting tokens.

#### AnyNode

The `AnyNode` is an union type of all nodes.

```ts
type AnyNode =
| DocumentNode
| TextNode
| TagNode
| OpenTagStartNode
| OpenTagEndNode
| CloseTagNode
| AttributeNode
| AttributeKeyNode
| AttributeValueNode
| AttributeValueWrapperStartNode
| AttributeValueWrapperEndNode
| ScriptTagNode
| OpenScriptTagStartNode
| CloseScriptTagNode
| OpenScriptTagEndNode
| ScriptTagContentNode
| StyleTagNode
| OpenStyleTagStartNode
| OpenStyleTagEndNode
| StyleTagContentNode
| CloseStyleTagNode
| CommentNode
| CommentOpenNode
| CommentCloseNode
| CommentContentNode
| DoctypeNode
| DoctypeOpenNode
| DoctypeCloseNode
| DoctypeAttributeNode
| DoctypeAttributeValueNode
| DoctypeAttributeWrapperStartNode
| DoctypeAttributeWrapperEndNode;
```

#### AnyToken

The `AnyToken` is an union type all tokens.

```ts
type AnyToken =
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token;
```

### Constants

#### TokenTypes

```ts
enum TokenTypes {
Text = "Text",
OpenTagStart = "OpenTagStart",
OpenTagEnd = "OpenTagEnd",
CloseTag = "CloseTag",
AttributeKey = "AttributeKey",
AttributeAssignment = "AttributeAssignment",
AttributeValueWrapperStart = "AttributeValueWrapperStart",
AttributeValue = "AttributeValue",
AttributeValueWrapperEnd = "AttributeValueWrapperEnd",
DoctypeOpen = "DoctypeOpen",
DoctypeAttributeValue = "DoctypeAttributeValue",
DoctypeAttributeWrapperStart = "DoctypeAttributeWrapperStart",
DoctypeAttributeWrapperEnd = "DoctypeAttributeWrapperEnd",
DoctypeClose = "DoctypeClose",
CommentOpen = "CommentOpen",
CommentContent = "CommentContent",
CommentClose = "CommentClose",
OpenScriptTagStart = "OpenScriptTagStart",
OpenScriptTagEnd = "OpenScriptTagEnd",
ScriptTagContent = "ScriptTagContent",
CloseScriptTag = "CloseScriptTag",
OpenStyleTagStart = "OpenStyleTagStart",
OpenStyleTagEnd = "OpenStyleTagEnd",
StyleTagContent = "StyleTagContent",
CloseStyleTag = "CloseStyleTag",
}
```

#### NodeTypes

```ts
enum NodeTypes {
Document = "Document",
Tag = "Tag",
Text = "Text",
Doctype = "Doctype",
Comment = "Comment",
CommentOpen = "CommentOpen",
CommentClose = "CommentClose",
CommentContent = "CommentContent",
Attribute = "Attribute",
AttributeKey = "AttributeKey",
AttributeValue = "AttributeValue",
AttributeValueWrapperStart = "AttributeValueWrapperStart",
AttributeValueWrapperEnd = "AttributeValueWrapperEnd",
CloseTag = "CloseTag",
OpenTagEnd = "OpenTagEnd",
OpenTagStart = "OpenTagStart",
DoctypeOpen = "DoctypeOpen",
DoctypeAttribute = "DoctypeAttribute",
DoctypeClose = "DoctypeClose",
ScriptTag = "ScriptTag",
OpenScriptTagStart = "OpenScriptTagStart",
OpenScriptTagEnd = "OpenScriptTagEnd",
ScriptTagContent = "ScriptTagContent",
StyleTag = "StyleTag",
OpenStyleTagStart = "OpenStyleTagStart",
OpenStyleTagEnd = "OpenStyleTagEnd",
StyleTagContent = "StyleTagContent",
CloseStyleTag = "CloseStyleTag",
CloseScriptTag = "CloseScriptTag",
DoctypeAttributeValue = "DoctypeAttributeValue",
DoctypeAttributeWrapperStart = "DoctypeAttributeWrapperStart",
DoctypeAttributeWrapperEnd = "DoctypeAttributeWrapperEnd",
}
```

## AST Format

- [Common](#common)

- [BaseNode](#basenode)
- [SourceLocation](#sourcelocation)
- [Position](#position)
- [Token](#token)

- [DocumentNode](#documentnode)
- [TextNode](#textnode)
- [TagNode](#tagnode)
- [OpenTagStartNode](#opentagstartnode)
- [OpenTagEndNode](#opentagendnode)
- [CloseTagNode](#closetagnode)
- [AttributeNode](#attributenode)
- [AttributeKeyNode](#attributekeynode)
- [AttributeValueWrapperStartNode](#attributevaluewrapperstartnode)
- [AttributeValueWrapperEndNode](#attributevaluewrapperendnode)
- [AttributeValueNode](#attributevaluenode)
- [ScriptTagNode](#scripttagnode)
- [OpenScriptTagStartNode](#openscripttagstartnode)
- [OpenScriptTagEndNode](#openscripttagendnode)
- [CloseScriptTagNode](#closescripttagnode)
- [ScriptTagContentNode](#scripttagcontentnode)
- [StyleTagNode](#styletagnode)
- [OpenStyleTagStartNode](#openstyletagstartnode)
- [OpenStyleTagEndNode](#openstyletagendnode)
- [CloseStyleTagNode](#closestyletagnode)
- [StyleTagContentNode](#styletagcontentnode)
- [CommentNode](#commentnode)
- [CommentOpenNode](#commentopennode)
- [CommentCloseNode](#commentclosenode)
- [CommentContentNode](#commentcontentnode)
- [DoctypeNode](#doctypenode)
- [DoctypeOpenNode](#doctypeopennode)
- [DoctypeCloseNode](#doctypeclosenode)
- [DoctypeAttributeNode](#doctypeattributenode)
- [DoctypeAttributeValueNode](#doctypeattributevaluenode)
- [DoctypeAttributeWrapperStartNode](#doctypeattributewrapperstartnode)
- [DoctypeAttributeWrapperEndNode](#doctypeattributewrapperendnode)

### Common

#### BaseNode

Every AST node and token implements the `BaseNode` interface.

```ts
interface BaseNode {
type: string;
loc: SourceLocation;
range: [number, number];
}
```

The `type` field is representing the AST type. Its value is one of the `NodeTypes` or `TokenTypes`.
The `loc` and `range` fields represent the source location of the node.

#### SourceLocation

```ts
interface SourceLocation {
start: Position;
end: Position;
}
```

The `start` field represents the start location of the node.

The `end` field represents the end location of the node.

#### Position

```ts
interface Position {
line: number; // >= 1
column: number; // >= 0
}
```

The `line` field is a number representing the line number where the node positioned. (1-based index).

The `column` field is a number representing the offset in the line. (0-based index).

#### Token

All tokens implement the `Token` interface.

```ts
interface Token extends BaseNode {
type: T;
value: string;
}
```

### DocumentNode

`DocumentNode` represents a whole parsed document. It's a root node of the AST.

```ts
interface DocumentNode extends BaseNode {
type: "Document";
children: Array;
}
```

### TextNode

`TextNode` represents any plain text in HTML.

```ts
interface TextNode extends BaseNode {
type: "Text";
value: string;
}
```

### TagNode

`TagNode` represents all kinds of tag nodes in HTML except for doctype, script, style, and comment. (e.g. `

`, `` ...)

```ts
interface TagNode extends BaseNode {
type: "Tag";
selfClosing: boolean;
name: string;
openStart: OpenTagStartNode;
openEnd: OpenTagEndNode;
close?: CloseTagNode;
children: Array;
attributes: Array;
}
```

#### OpenTagStartNode

`OpenTagStartNode` represents the opening part of the [Start tags](https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#start-tags). (e.g. `

`, `/>`)

```ts
interface OpenTagEndNode extends BaseNode {
type: "OpenTagEnd";
value: string;
}
```

#### CloseTagNode

`ClosingTagNode` represents the [End tags](https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#end-tags). (e.g. `

`)

```ts
interface CloseTagNode extends BaseNode {
type: "CloseTag";
value: string;
}
```

### AttributeNode

`AttributeNode` represents an attribute. (e.g. `id="foo"`)

```ts
interface AttributeNode extends BaseNode {
type: "Attribute";
key: AttributeKeyNode;
value?: AttributeValueNode;
startWrapper?: AttributeValueWrapperStartNode;
endWrapper?: AttributeValueWrapperEndNode;
}
```

#### AttributeKeyNode

`AttributeKeyNode` represents a key part of an attribute. (e.g. `id`)

```ts
interface AttributeKeyNode extends BaseNode {
type: "AttributeKey";
value: string;
}
```

#### AttributeValueWrapperStartNode

`AttributeValueWrapperStartNode` represents the left side character that wraps the value of the attribute. (e.g. `"`, `'`)

```ts
interface AttributeValueWrapperStartNode extends BaseNode {
type: "AttributeValueWrapperStart";
value: string;
}
```

#### AttributeValueWrapperEndNode

`AttributeValueWrapperEndNode` represents the right side character that wraps the value of the attribute. (e.g. `"`, `'`)

```ts
interface AttributeValueWrapperEndNode extends BaseNode {
type: "AttributeValueWrapperEnd";
value: string;
}
```

#### AttributeValueNode

`AttributeValueNode` represents the value part of the attribute. It does not include wrapper characters. (e.g. `foo`)

```ts
interface AttributeValueNode extends BaseNode {
type: "AttributeValue";
value: string;
}
```

### ScriptTagNode

The `ScriptTagNode` represents a script tags in the HTML. (e.g.` console.log('hello'); `).

```ts
interface ScriptTagNode extends BaseNode {
type: "ScriptTag";
attributes: Array;
openStart: OpenScriptTagStartNode;
openEnd: OpenScriptTagEndNode;
close: CloseScriptTagNode;
value?: ScriptTagContentNode;
}
```

#### OpenScriptTagStartNode

`OpenScriptTagStartNode` represents an opening part of a start script tag. (e.g. ``)

```ts
interface OpenScriptTagEndNode extends BaseNode {
type: "OpenScriptTagEnd";
value: string;
}
```

#### CloseScriptTagNode

`CloseScriptTagNode` represents a close script tag. (e.g. ``)

```ts
interface CloseScriptTagNode extends BaseNode {
type: "CloseScriptTag";
value: string;
}
```

#### ScriptTagContentNode

`ScriptTagContentNode` represents a script content in script tag. (e.g. `console.log('hello');`)

```ts
interface ScriptTagContentNode extends BaseNode {
type: "ScriptTagContent";
value: string;
}
```

### StyleTagNode

`StyleTagNode` represents style tags. (e.g. ` .foo {} `)

```ts
interface StyleTagNode extends BaseNode {
type: "StyleTag";
attributes: Array;
openStart: OpenStyleTagStartNode;
openEnd: OpenStyleTagEndNode;
close: CloseStyleTagNode;
value?: StyleTagContentNode;
}
```

#### OpenStyleTagStartNode

`OpenStyleTagStartNode` represents an opening part of a start style tag. (e.g. ``)

```ts
interface OpenStyleTagEndNode extends BaseNode {
type: "OpenStyleTagEnd";
value: string;
}
```

#### CloseStyleTagNode

`CloseStyleTagNode` represents a close style tag. (e.g. ``)

```ts
interface CloseStyleTagNode extends BaseNode {
type: "CloseStyleTag";
value: string;
}
```

#### StyleTagContentNode

`StyleTagContentNode` represents a style content in style tag.

```ts
interface StyleTagContentNode extends BaseNode {
type: "StyleTagContent";
value: string;
}
```

### CommentNode

`CommentNode` represents comment in HTML. (e.g. ` `)

```ts
interface CommentNode extends BaseNode {
type: "Comment";
open: CommentOpenNode;
close: CommentCloseNode;
value: CommentContentNode;
}
```

#### CommentOpenNode

`CommentOpenNode` represents comment start character sequence. (e.g. ``)

```ts
interface CommentCloseNode extends BaseNode {
type: "CommentClose";
value: string;
}
```

#### CommentContentNode

The `CommentContentNode` represents text in the comment.

```ts
interface CommentContentNode extends BaseNode {
type: "CommentContent";
value: string;
}
```

### DoctypeNode

`DoctypeNode` represents the [DOCTYPE](https://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#the-doctype) in html.

```ts
interface DoctypeNode extends BaseNode {
type: "Doctype";
attributes: Array;
open: DoctypeOpenNode;
close: DoctypeCloseNode;
}
```

#### DoctypeOpenNode

`DoctypeOpenNode` represents character sequence of doctype start . (``)

```ts
interface DoctypeCloseNode extends BaseNode {
type: "DoctypeClose";
value: string;
}
```

### DoctypeAttributeNode

`DoctypeAttributeNode` represents an attribute of doctype node. (e.g. `html`, `"-//W3C//DTD HTML 4.01 Transitional//EN"`)

```ts
interface DoctypeAttributeNode extends BaseNode {
type: "DoctypeAttribute";
key: DoctypeAttributeKey;
}
```

#### DoctypeAttributeValueNode

`DoctypeAttributeValueNode` represents a value of doctype node's attribute. (e.g. `html`, `-//W3C//DTD HTML 4.01 Transitional//EN`)
. It does not include wrapper characters (`'`, `"`)

```ts
interface DoctypeAttributeValueNode extends BaseNode {
type: "DoctypeAttributeValue";
value: string;
}
```

#### DoctypeAttributeWrapperStartNode

`DoctypeAttributeWrapperStartNode` represents a left side character that wraps the value of the attribute. (e.g. `"`, `'`)

```ts
interface DoctypeAttributeWrapperStartNode extends BaseNode {
type: "DoctypeAttributeWrapperStart";
value: string;
}
```

#### DoctypeAttributeWrapperEndNode

`DoctypeAttributeWrapperEndNode` represents a right side character that wraps the value of the attribute. (e.g. `"`, `'`)

```ts
interface DoctypeAttributeWrapperEndNode extends BaseNode {
type: "DoctypeAttributeWrapperEnd";
value: string;
}
```

## License

[MIT](./LICENSE.md)