Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yeonjuan/es-html-parser
HTML parser for static analysis
https://github.com/yeonjuan/es-html-parser
html html-parser
Last synced: about 1 month ago
JSON representation
HTML parser for static analysis
- Host: GitHub
- URL: https://github.com/yeonjuan/es-html-parser
- Owner: yeonjuan
- License: mit
- Created: 2022-08-18T14:16:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-26T16:34:05.000Z (about 1 year ago)
- Last Synced: 2024-09-16T01:31:51.438Z (2 months ago)
- Topics: html, html-parser
- Language: TypeScript
- Homepage:
- Size: 9.92 MB
- Stars: 11
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# ES HTML Parser
ES HTML Parser is a HTML parser that generates an abstract syntax tree similar to the ESTree specification.
This project began as a fork of [hyntax](https://github.com/mykolaharmash/hyntax) and is developed to follow [ESTree](https://github.com/estree/estree)-like ast specification.
See [online demo](https://yeonjuan.github.io/es-html-parser/).
## Table of Contents
- [Install](#install)
- [Usage](#usage)
- [API Reference](#api-reference)
- [AST Format](#ast-format)
- [License](#license)## Install
```
npm install es-html-parser
```## Usage
```js
import { parse } from "es-html-parser";const input = `
press here
`;
const { ast, tokens } = parse(input);
```## API Reference
- [Functions](#functions)
- [Types](#types)
- [Constants](#constants)### Functions
#### parse
```ts
parse(html: string): ParseResult;
```**Arguments**
- `html`: HTML string to parse.
**Returns**
- `ParseResult`: Result of parsing
### Types
#### ParseResult
```ts
interface ParseResult {
ast: DocumentNode;
tokens: AnyToken[];
}
```- `ast`: The root node of the ast.
- `tokens`: An array of resulting tokens.#### AnyNode
The `AnyNode` is an union type of all nodes.
```ts
type AnyNode =
| DocumentNode
| TextNode
| TagNode
| OpenTagStartNode
| OpenTagEndNode
| CloseTagNode
| AttributeNode
| AttributeKeyNode
| AttributeValueNode
| AttributeValueWrapperStartNode
| AttributeValueWrapperEndNode
| ScriptTagNode
| OpenScriptTagStartNode
| CloseScriptTagNode
| OpenScriptTagEndNode
| ScriptTagContentNode
| StyleTagNode
| OpenStyleTagStartNode
| OpenStyleTagEndNode
| StyleTagContentNode
| CloseStyleTagNode
| CommentNode
| CommentOpenNode
| CommentCloseNode
| CommentContentNode
| DoctypeNode
| DoctypeOpenNode
| DoctypeCloseNode
| DoctypeAttributeNode
| DoctypeAttributeValueNode
| DoctypeAttributeWrapperStartNode
| DoctypeAttributeWrapperEndNode;
```#### AnyToken
The `AnyToken` is an union type all tokens.
```ts
type AnyToken =
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token
| Token;
```### Constants
#### TokenTypes
```ts
enum TokenTypes {
Text = "Text",
OpenTagStart = "OpenTagStart",
OpenTagEnd = "OpenTagEnd",
CloseTag = "CloseTag",
AttributeKey = "AttributeKey",
AttributeAssignment = "AttributeAssignment",
AttributeValueWrapperStart = "AttributeValueWrapperStart",
AttributeValue = "AttributeValue",
AttributeValueWrapperEnd = "AttributeValueWrapperEnd",
DoctypeOpen = "DoctypeOpen",
DoctypeAttributeValue = "DoctypeAttributeValue",
DoctypeAttributeWrapperStart = "DoctypeAttributeWrapperStart",
DoctypeAttributeWrapperEnd = "DoctypeAttributeWrapperEnd",
DoctypeClose = "DoctypeClose",
CommentOpen = "CommentOpen",
CommentContent = "CommentContent",
CommentClose = "CommentClose",
OpenScriptTagStart = "OpenScriptTagStart",
OpenScriptTagEnd = "OpenScriptTagEnd",
ScriptTagContent = "ScriptTagContent",
CloseScriptTag = "CloseScriptTag",
OpenStyleTagStart = "OpenStyleTagStart",
OpenStyleTagEnd = "OpenStyleTagEnd",
StyleTagContent = "StyleTagContent",
CloseStyleTag = "CloseStyleTag",
}
```#### NodeTypes
```ts
enum NodeTypes {
Document = "Document",
Tag = "Tag",
Text = "Text",
Doctype = "Doctype",
Comment = "Comment",
CommentOpen = "CommentOpen",
CommentClose = "CommentClose",
CommentContent = "CommentContent",
Attribute = "Attribute",
AttributeKey = "AttributeKey",
AttributeValue = "AttributeValue",
AttributeValueWrapperStart = "AttributeValueWrapperStart",
AttributeValueWrapperEnd = "AttributeValueWrapperEnd",
CloseTag = "CloseTag",
OpenTagEnd = "OpenTagEnd",
OpenTagStart = "OpenTagStart",
DoctypeOpen = "DoctypeOpen",
DoctypeAttribute = "DoctypeAttribute",
DoctypeClose = "DoctypeClose",
ScriptTag = "ScriptTag",
OpenScriptTagStart = "OpenScriptTagStart",
OpenScriptTagEnd = "OpenScriptTagEnd",
ScriptTagContent = "ScriptTagContent",
StyleTag = "StyleTag",
OpenStyleTagStart = "OpenStyleTagStart",
OpenStyleTagEnd = "OpenStyleTagEnd",
StyleTagContent = "StyleTagContent",
CloseStyleTag = "CloseStyleTag",
CloseScriptTag = "CloseScriptTag",
DoctypeAttributeValue = "DoctypeAttributeValue",
DoctypeAttributeWrapperStart = "DoctypeAttributeWrapperStart",
DoctypeAttributeWrapperEnd = "DoctypeAttributeWrapperEnd",
}
```## AST Format
- [Common](#common)
- [BaseNode](#basenode)
- [SourceLocation](#sourcelocation)
- [Position](#position)
- [Token](#token)- [DocumentNode](#documentnode)
- [TextNode](#textnode)
- [TagNode](#tagnode)
- [OpenTagStartNode](#opentagstartnode)
- [OpenTagEndNode](#opentagendnode)
- [CloseTagNode](#closetagnode)
- [AttributeNode](#attributenode)
- [AttributeKeyNode](#attributekeynode)
- [AttributeValueWrapperStartNode](#attributevaluewrapperstartnode)
- [AttributeValueWrapperEndNode](#attributevaluewrapperendnode)
- [AttributeValueNode](#attributevaluenode)
- [ScriptTagNode](#scripttagnode)
- [OpenScriptTagStartNode](#openscripttagstartnode)
- [OpenScriptTagEndNode](#openscripttagendnode)
- [CloseScriptTagNode](#closescripttagnode)
- [ScriptTagContentNode](#scripttagcontentnode)
- [StyleTagNode](#styletagnode)
- [OpenStyleTagStartNode](#openstyletagstartnode)
- [OpenStyleTagEndNode](#openstyletagendnode)
- [CloseStyleTagNode](#closestyletagnode)
- [StyleTagContentNode](#styletagcontentnode)
- [CommentNode](#commentnode)
- [CommentOpenNode](#commentopennode)
- [CommentCloseNode](#commentclosenode)
- [CommentContentNode](#commentcontentnode)
- [DoctypeNode](#doctypenode)
- [DoctypeOpenNode](#doctypeopennode)
- [DoctypeCloseNode](#doctypeclosenode)
- [DoctypeAttributeNode](#doctypeattributenode)
- [DoctypeAttributeValueNode](#doctypeattributevaluenode)
- [DoctypeAttributeWrapperStartNode](#doctypeattributewrapperstartnode)
- [DoctypeAttributeWrapperEndNode](#doctypeattributewrapperendnode)### Common
#### BaseNode
Every AST node and token implements the `BaseNode` interface.
```ts
interface BaseNode {
type: string;
loc: SourceLocation;
range: [number, number];
}
```The `type` field is representing the AST type. Its value is one of the `NodeTypes` or `TokenTypes`.
The `loc` and `range` fields represent the source location of the node.#### SourceLocation
```ts
interface SourceLocation {
start: Position;
end: Position;
}
```The `start` field represents the start location of the node.
The `end` field represents the end location of the node.
#### Position
```ts
interface Position {
line: number; // >= 1
column: number; // >= 0
}
```The `line` field is a number representing the line number where the node positioned. (1-based index).
The `column` field is a number representing the offset in the line. (0-based index).
#### Token
All tokens implement the `Token` interface.
```ts
interface Token extends BaseNode {
type: T;
value: string;
}
```### DocumentNode
`DocumentNode` represents a whole parsed document. It's a root node of the AST.
```ts
interface DocumentNode extends BaseNode {
type: "Document";
children: Array;
}
```### TextNode
`TextNode` represents any plain text in HTML.
```ts
interface TextNode extends BaseNode {
type: "Text";
value: string;
}
```### TagNode
`TagNode` represents all kinds of tag nodes in HTML except for doctype, script, style, and comment. (e.g. `
`, `` ...)```ts
interface TagNode extends BaseNode {
type: "Tag";
selfClosing: boolean;
name: string;
openStart: OpenTagStartNode;
openEnd: OpenTagEndNode;
close?: CloseTagNode;
children: Array;
attributes: Array;
}
```#### OpenTagStartNode
`OpenTagStartNode` represents the opening part of the [Start tags](https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#start-tags). (e.g. `
`, `/>`)`)```ts
interface OpenTagEndNode extends BaseNode {
type: "OpenTagEnd";
value: string;
}
```#### CloseTagNode
`ClosingTagNode` represents the [End tags](https://www.w3.org/TR/2011/WD-html5-20110405/syntax.html#end-tags). (e.g. `
```ts
interface CloseTagNode extends BaseNode {
type: "CloseTag";
value: string;
}
```### AttributeNode
`AttributeNode` represents an attribute. (e.g. `id="foo"`)
```ts
interface AttributeNode extends BaseNode {
type: "Attribute";
key: AttributeKeyNode;
value?: AttributeValueNode;
startWrapper?: AttributeValueWrapperStartNode;
endWrapper?: AttributeValueWrapperEndNode;
}
```#### AttributeKeyNode
`AttributeKeyNode` represents a key part of an attribute. (e.g. `id`)
```ts
interface AttributeKeyNode extends BaseNode {
type: "AttributeKey";
value: string;
}
```#### AttributeValueWrapperStartNode
`AttributeValueWrapperStartNode` represents the left side character that wraps the value of the attribute. (e.g. `"`, `'`)
```ts
interface AttributeValueWrapperStartNode extends BaseNode {
type: "AttributeValueWrapperStart";
value: string;
}
```#### AttributeValueWrapperEndNode
`AttributeValueWrapperEndNode` represents the right side character that wraps the value of the attribute. (e.g. `"`, `'`)
```ts
interface AttributeValueWrapperEndNode extends BaseNode {
type: "AttributeValueWrapperEnd";
value: string;
}
```#### AttributeValueNode
`AttributeValueNode` represents the value part of the attribute. It does not include wrapper characters. (e.g. `foo`)
```ts
interface AttributeValueNode extends BaseNode {
type: "AttributeValue";
value: string;
}
```### ScriptTagNode
The `ScriptTagNode` represents a script tags in the HTML. (e.g.` console.log('hello'); `).
```ts
interface ScriptTagNode extends BaseNode {
type: "ScriptTag";
attributes: Array;
openStart: OpenScriptTagStartNode;
openEnd: OpenScriptTagEndNode;
close: CloseScriptTagNode;
value?: ScriptTagContentNode;
}
```#### OpenScriptTagStartNode
`OpenScriptTagStartNode` represents an opening part of a start script tag. (e.g. ``)
```ts
interface OpenScriptTagEndNode extends BaseNode {
type: "OpenScriptTagEnd";
value: string;
}
```#### CloseScriptTagNode
`CloseScriptTagNode` represents a close script tag. (e.g. ``)
```ts
interface CloseScriptTagNode extends BaseNode {
type: "CloseScriptTag";
value: string;
}
```#### ScriptTagContentNode
`ScriptTagContentNode` represents a script content in script tag. (e.g. `console.log('hello');`)
```ts
interface ScriptTagContentNode extends BaseNode {
type: "ScriptTagContent";
value: string;
}
```### StyleTagNode
`StyleTagNode` represents style tags. (e.g. ` .foo {} `)
```ts
interface StyleTagNode extends BaseNode {
type: "StyleTag";
attributes: Array;
openStart: OpenStyleTagStartNode;
openEnd: OpenStyleTagEndNode;
close: CloseStyleTagNode;
value?: StyleTagContentNode;
}
```#### OpenStyleTagStartNode
`OpenStyleTagStartNode` represents an opening part of a start style tag. (e.g. ``)
```ts
interface OpenStyleTagEndNode extends BaseNode {
type: "OpenStyleTagEnd";
value: string;
}
```#### CloseStyleTagNode
`CloseStyleTagNode` represents a close style tag. (e.g. ``)
```ts
interface CloseStyleTagNode extends BaseNode {
type: "CloseStyleTag";
value: string;
}
```#### StyleTagContentNode
`StyleTagContentNode` represents a style content in style tag.
```ts
interface StyleTagContentNode extends BaseNode {
type: "StyleTagContent";
value: string;
}
```### CommentNode
`CommentNode` represents comment in HTML. (e.g. ` `)
```ts
interface CommentNode extends BaseNode {
type: "Comment";
open: CommentOpenNode;
close: CommentCloseNode;
value: CommentContentNode;
}
```#### CommentOpenNode
`CommentOpenNode` represents comment start character sequence. (e.g. ``)
```ts
interface CommentCloseNode extends BaseNode {
type: "CommentClose";
value: string;
}
```#### CommentContentNode
The `CommentContentNode` represents text in the comment.
```ts
interface CommentContentNode extends BaseNode {
type: "CommentContent";
value: string;
}
```### DoctypeNode
`DoctypeNode` represents the [DOCTYPE](https://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#the-doctype) in html.
```ts
interface DoctypeNode extends BaseNode {
type: "Doctype";
attributes: Array;
open: DoctypeOpenNode;
close: DoctypeCloseNode;
}
```#### DoctypeOpenNode
`DoctypeOpenNode` represents character sequence of doctype start . (``)
```ts
interface DoctypeCloseNode extends BaseNode {
type: "DoctypeClose";
value: string;
}
```### DoctypeAttributeNode
`DoctypeAttributeNode` represents an attribute of doctype node. (e.g. `html`, `"-//W3C//DTD HTML 4.01 Transitional//EN"`)
```ts
interface DoctypeAttributeNode extends BaseNode {
type: "DoctypeAttribute";
key: DoctypeAttributeKey;
}
```#### DoctypeAttributeValueNode
`DoctypeAttributeValueNode` represents a value of doctype node's attribute. (e.g. `html`, `-//W3C//DTD HTML 4.01 Transitional//EN`)
. It does not include wrapper characters (`'`, `"`)```ts
interface DoctypeAttributeValueNode extends BaseNode {
type: "DoctypeAttributeValue";
value: string;
}
```#### DoctypeAttributeWrapperStartNode
`DoctypeAttributeWrapperStartNode` represents a left side character that wraps the value of the attribute. (e.g. `"`, `'`)
```ts
interface DoctypeAttributeWrapperStartNode extends BaseNode {
type: "DoctypeAttributeWrapperStart";
value: string;
}
```#### DoctypeAttributeWrapperEndNode
`DoctypeAttributeWrapperEndNode` represents a right side character that wraps the value of the attribute. (e.g. `"`, `'`)
```ts
interface DoctypeAttributeWrapperEndNode extends BaseNode {
type: "DoctypeAttributeWrapperEnd";
value: string;
}
```## License
[MIT](./LICENSE.md)