{"id":13602551,"url":"https://github.com/mykolaharmash/hyntax","last_synced_at":"2025-04-05T12:09:14.138Z","repository":{"id":27356116,"uuid":"99039075","full_name":"mykolaharmash/hyntax","owner":"mykolaharmash","description":"Straightforward HTML parser for JavaScript","archived":false,"fork":false,"pushed_at":"2024-07-12T14:31:21.000Z","size":2411,"stargazers_count":139,"open_issues_count":14,"forks_count":8,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-29T11:09:48.051Z","etag":null,"topics":["dom","html","html-parser","javascript"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mykolaharmash.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-01T20:08:54.000Z","updated_at":"2024-09-30T06:01:36.000Z","dependencies_parsed_at":"2024-01-16T22:21:09.588Z","dependency_job_id":"8163e66e-3884-47a3-b6ac-50622b372cad","html_url":"https://github.com/mykolaharmash/hyntax","commit_stats":{"total_commits":163,"total_committers":6,"mean_commits":"27.166666666666668","dds":0.2883435582822086,"last_synced_commit":"849eaceb9bbcbe94ab1de1fd4c431fe46c0ee416"},"previous_names":["nik-garmash/hyntax"],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mykolaharmash%2Fhyntax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mykolaharmash%2Fhyntax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mykolaharmash%2Fhyntax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mykolaharmash%2Fhyntax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mykolaharmash","download_url":"https://codeload.github.com/mykolaharmash/hyntax/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247332612,"owners_count":20921853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dom","html","html-parser","javascript"],"created_at":"2024-08-01T18:01:27.819Z","updated_at":"2025-04-05T12:09:14.118Z","avatar_url":"https://github.com/mykolaharmash.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n\n\u003cimg src=\"./logo.png\" alt=\"Hyntax project logo — lego bricks in the shape of a capital letter H\" width=\"250\"\u003e\n\n\u003c/p\u003e\n\n# Hyntax\n\nStraightforward HTML parser for JavaScript. [Live Demo](https://astexplorer.net/#/gist/6bf7f78077333cff124e619aebfb5b42/latest).\n\n-   **Simple.** API is straightforward, output is clear.\n-   **Forgiving.** Just like a browser, normally parses invalid HTML.\n-   **Supports streaming.** Can process HTML while it's still being loaded.\n-   **No dependencies.**\n\n## Table Of Contents\n\n-   [Usage](#usage)\n-   [TypeScript Typings](#typescript-typings)\n-   [Streaming](#streaming)\n-   [Tokens](#tokens)\n-   [AST Format](#ast-format)\n-   [API Reference](#api-reference)\n-   [Types Reference](#types-reference)\n\n## Usage\n\n```bash\nnpm install hyntax\n```\n\n```javascript\nconst { tokenize, constructTree } = require('hyntax')\nconst util = require('util')\n\nconst inputHTML = `\n\u003chtml\u003e\n  \u003cbody\u003e\n      \u003cinput type=\"text\" placeholder=\"Don't type\"\u003e\n      \u003cbutton\u003eDon't press\u003c/button\u003e\n  \u003c/body\u003e\n\u003c/html\u003e\n`\n\nconst { tokens } = tokenize(inputHTML)\nconst { ast } = constructTree(tokens)\n\nconsole.log(JSON.stringify(tokens, null, 2))\nconsole.log(util.inspect(ast, { showHidden: false, depth: null }))\n```\n\n## TypeScript Typings\n\nHyntax is written in JavaScript but has [integrated TypeScript typings](./index.d.ts) to help you navigate around its data structures. There is also [Types Reference](#types-reference) which covers most common types.\n\n\n\n## Streaming\n\nUse `StreamTokenizer` and `StreamTreeConstructor` classes to parse HTML chunk by chunk while it's still being loaded from the network or read from the disk.\n\n```javascript\nconst { StreamTokenizer, StreamTreeConstructor } = require('hyntax')\nconst http = require('http')\nconst util = require('util')\n\nhttp.get('http://info.cern.ch', (res) =\u003e {\n  const streamTokenizer = new StreamTokenizer()\n  const streamTreeConstructor = new StreamTreeConstructor()\n\n  let resultTokens = []\n  let resultAst\n\n  res.pipe(streamTokenizer).pipe(streamTreeConstructor)\n\n  streamTokenizer\n    .on('data', (tokens) =\u003e {\n      resultTokens = resultTokens.concat(tokens)\n    })\n    .on('end', () =\u003e {\n      console.log(JSON.stringify(resultTokens, null, 2))\n    })\n\n  streamTreeConstructor\n    .on('data', (ast) =\u003e {\n      resultAst = ast\n    })\n    .on('end', () =\u003e {\n      console.log(util.inspect(resultAst, { showHidden: false, depth: null }))\n    })\n}).on('error', (err) =\u003e {\n  throw err;\n})\n```\n\n\n\n## Tokens\n\nHere are all kinds of tokens which Hyntax will extract out of HTML string.\n\n![Overview of all possible tokens](./tokens-list.png)\n\nEach token conforms to [Tokenizer.Token](#TokenizerToken) interface.\n\n\n\n## AST Format\n\nResulting syntax tree will have at least one top-level [Document Node](#ast-node-types) with optional children nodes nested within.\n\n\u003c!-- You can play around with the [AST Explorer](https://astexplorer.net) to see how AST looks like. --\u003e\n\n```javascript\n{\n  nodeType: TreeConstructor.NodeTypes.Document,\n  content: {\n    children: [\n      {\n        nodeType: TreeConstructor.NodeTypes.AnyNodeType,\n        content: {…}\n      },\n      {\n        nodeType: TreeConstructor.NodeTypes.AnyNodeType,\n        content: {…}\n      }\n    ]\n  }\n}\n```\n\nContent of each node is specific to node's type, all of them are described in [AST Node Types](#ast-node-types) reference.\n\n\n\n## API Reference\n\n### Tokenizer\n\nHyntax has its tokenizer as a separate module. You can use generated tokens on their own or pass them further to a tree constructor to build an AST.\n\n#### Interface\n\n```typescript\ntokenize(html: String): Tokenizer.Result\n```\n\n#### Arguments\n\n-   `html`  \nHTML string to process  \n  Required.  \nType: string.\n\n#### Returns [Tokenizer.Result](#TokenizerResult)\n\n### Tree Constructor\n\nAfter you've got an array of tokens, you can pass them into tree constructor to build an AST.\n\n#### Interface\n\n```typescript\nconstructTree(tokens: Tokenizer.AnyToken[]): TreeConstructor.Result\n```\n\n#### Arguments\n\n-   `tokens`  \nArray of tokens received from the tokenizer.  \n  Required.  \nType: [Tokenizer.AnyToken[]](#tokenizeranytoken)\n\n#### Returns [TreeConstructor.Result](#TreeConstructorResult)\n\n\n\n## Types Reference\n\n#### Tokenizer.Result\n\n```typescript\ninterface Result {\n  state: Tokenizer.State\n  tokens: Tokenizer.AnyToken[]\n}\n```\n\n-   `state`   \nThe current state of tokenizer. It can be persisted and passed to the next tokenizer call if the input is coming in chunks.\n-   `tokens`  \n  Array of resulting tokens.  \n  Type: [Tokenizer.AnyToken[]](#tokenizeranytoken)\n\n#### TreeConstructor.Result\n\n```typescript\ninterface Result {\n  state: State\n  ast: AST\n}\n```\n\n-   `state`  \nThe current state of the tree constructor. Can be persisted and passed to the next tree constructor call in case when tokens are coming in chunks.\n  \n-   `ast`  \n  Resulting AST.  \n  Type: [TreeConstructor.AST](#treeconstructorast)  \n\n#### Tokenizer.Token\n\nGeneric Token, other interfaces use it to create a specific Token type.\n\n```typescript\ninterface Token\u003cT extends TokenTypes.AnyTokenType\u003e {\n  type: T\n  content: string\n  startPosition: number\n  endPosition: number\n}\n```\n\n-   `type`  \nOne of the [Token types](#TokenizerTokenTypesAnyTokenType).\n  \n-   `content `   \nPiece of original HTML string which was recognized as a token.\n  \n-   `startPosition `   \nIndex of a character in the input HTML string where the token starts.\n  \n-   `endPosition`  \nIndex of a character in the input HTML string where the token ends.\n\n#### Tokenizer.TokenTypes.AnyTokenType\n\nShortcut type of all possible tokens.\n\n```typescript\ntype AnyTokenType =\n  | Text\n  | OpenTagStart\n  | AttributeKey\n  | AttributeAssigment\n  | AttributeValueWrapperStart\n  | AttributeValue\n  | AttributeValueWrapperEnd\n  | OpenTagEnd\n  | CloseTag\n  | OpenTagStartScript\n  | ScriptTagContent\n  | OpenTagEndScript\n  | CloseTagScript\n  | OpenTagStartStyle\n  | StyleTagContent\n  | OpenTagEndStyle\n  | CloseTagStyle\n  | DoctypeStart\n  | DoctypeEnd\n  | DoctypeAttributeWrapperStart\n  | DoctypeAttribute\n  | DoctypeAttributeWrapperEnd\n  | CommentStart\n  | CommentContent\n  | CommentEnd\n```\n\n#### Tokenizer.AnyToken\n\nShortcut to reference any possible token.\n\n```typescript\ntype AnyToken = Token\u003cTokenTypes.AnyTokenType\u003e\n```\n\n#### TreeConstructor.AST\n\nJust an alias to DocumentNode. AST always has one top-level DocumentNode. See [AST Node Types](#ast-node-types)\n\n```typescript\ntype AST = TreeConstructor.DocumentNode\n```\n\n### AST Node Types\n\nThere are 7 possible types of Node. Each type has a specific content.\n\n```typescript\ntype DocumentNode = Node\u003cNodeTypes.Document, NodeContents.Document\u003e\t\n```\n\n```typescript\ntype DoctypeNode = Node\u003cNodeTypes.Doctype, NodeContents.Doctype\u003e\n```\n\n```typescript\ntype TextNode = Node\u003cNodeTypes.Text, NodeContents.Text\u003e\n```\n\n```typescript\ntype TagNode = Node\u003cNodeTypes.Tag, NodeContents.Tag\u003e\n```\n\n```typescript\ntype CommentNode = Node\u003cNodeTypes.Comment, NodeContents.Comment\u003e\n```\n\n```typescript\ntype ScriptNode = Node\u003cNodeTypes.Script, NodeContents.Script\u003e\n```\n\n```typescript\ntype StyleNode = Node\u003cNodeTypes.Style, NodeContents.Style\u003e\n```\n\nInterfaces for each content type:\n\n- [Document](#TreeConstructorNodeContentsDocument)\n- [Doctype](#TreeConstructorNodeContentsDoctype)\n- [Text](#TreeConstructorNodeContentsText)\n- [Tag](#TreeConstructorNodeContentsTag)\n- [Comment](#TreeConstructorNodeContentsComment)\n- [Script](#TreeConstructorNodeContentsScript)\n- [Style](#TreeConstructorNodeContentsStyle)\n\n#### TreeConstructor.Node\n\nGeneric Node, other interfaces use it to create specific Nodes by providing type of Node and type of the content inside the Node.\n\n```typescript\ninterface Node\u003cT extends NodeTypes.AnyNodeType, C extends NodeContents.AnyNodeContent\u003e {\n  nodeType: T\n  content: C\n}\n```\n\n#### TreeConstructor.NodeTypes.AnyNodeType\n\nShortcut type of all possible Node types.\n\n```typescript\ntype AnyNodeType =\n  | Document\n  | Doctype\n  | Tag\n  | Text\n  | Comment\n  | Script\n  | Style\n```\n\n### Node Content Types\n\n#### TreeConstructor.NodeTypes.AnyNodeContent\n\nShortcut type of all possible types of content inside a Node.\n\n```typescript\ntype AnyNodeContent =\n  | Document\n  | Doctype\n  | Text\n  | Tag\n  | Comment\n  | Script\n  | Style\n```\n\n#### TreeConstructor.NodeContents.Document\n\n```typescript\ninterface Document {\n  children: AnyNode[]\n}\n```\n\n#### TreeConstructor.NodeContents.Doctype\n\n```typescript\ninterface Doctype {\n  start: Tokenizer.Token\u003cTokenizer.TokenTypes.DoctypeStart\u003e\n  attributes?: DoctypeAttribute[]\n  end: Tokenizer.Token\u003cTokenizer.TokenTypes.DoctypeEnd\u003e\n}\n```\n\n#### TreeConstructor.NodeContents.Text\n\n```typescript\ninterface Text {\n  value: Tokenizer.Token\u003cTokenizer.TokenTypes.Text\u003e\n}\n```\n\n#### TreeConstructor.NodeContents.Tag\n\n```typescript\ninterface Tag {\n  name: string\n  selfClosing: boolean\n  openStart: Tokenizer.Token\u003cTokenizer.TokenTypes.OpenTagStart\u003e\n  attributes?: TagAttribute[]\n  openEnd: Tokenizer.Token\u003cTokenizer.TokenTypes.OpenTagEnd\u003e\n  children?: AnyNode[]\n  close?: Tokenizer.Token\u003cTokenizer.TokenTypes.CloseTag\u003e\n}\n```\n\n#### TreeConstructor.NodeContents.Comment\n\n```typescript\ninterface Comment {\n  start: Tokenizer.Token\u003cTokenizer.TokenTypes.CommentStart\u003e\n  value: Tokenizer.Token\u003cTokenizer.TokenTypes.CommentContent\u003e\n  end: Tokenizer.Token\u003cTokenizer.TokenTypes.CommentEnd\u003e\n}\n```\n\n#### TreeConstructor.NodeContents.Script\n\n```typescript\ninterface Script {\n  openStart: Tokenizer.Token\u003cTokenizer.TokenTypes.OpenTagStartScript\u003e\n  attributes?: TagAttribute[]\n  openEnd: Tokenizer.Token\u003cTokenizer.TokenTypes.OpenTagEndScript\u003e\n  value: Tokenizer.Token\u003cTokenizer.TokenTypes.ScriptTagContent\u003e\n  close: Tokenizer.Token\u003cTokenizer.TokenTypes.CloseTagScript\u003e\n}\n```\n\n#### TreeConstructor.NodeContents.Style\n\n```typescript\ninterface Style {\n  openStart: Tokenizer.Token\u003cTokenizer.TokenTypes.OpenTagStartStyle\u003e,\n  attributes?: TagAttribute[],\n  openEnd: Tokenizer.Token\u003cTokenizer.TokenTypes.OpenTagEndStyle\u003e,\n  value: Tokenizer.Token\u003cTokenizer.TokenTypes.StyleTagContent\u003e,\n  close: Tokenizer.Token\u003cTokenizer.TokenTypes.CloseTagStyle\u003e\n}\n```\n\n#### TreeConstructor.DoctypeAttribute\n\n```typescript\ninterface DoctypeAttribute {\n  startWrapper?: Tokenizer.Token\u003cTokenizer.TokenTypes.DoctypeAttributeWrapperStart\u003e,\n  value: Tokenizer.Token\u003cTokenizer.TokenTypes.DoctypeAttribute\u003e,\n  endWrapper?: Tokenizer.Token\u003cTokenizer.TokenTypes.DoctypeAttributeWrapperEnd\u003e\n}\n```\n\n#### TreeConstructor.TagAttribute\n\n```typescript\ninterface TagAttribute {\n  key?: Tokenizer.Token\u003cTokenizer.TokenTypes.AttributeKey\u003e,\n  startWrapper?: Tokenizer.Token\u003cTokenizer.TokenTypes.AttributeValueWrapperStart\u003e,\n  value?: Tokenizer.Token\u003cTokenizer.TokenTypes.AttributeValue\u003e,\n  endWrapper?: Tokenizer.Token\u003cTokenizer.TokenTypes.AttributeValueWrapperEnd\u003e\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmykolaharmash%2Fhyntax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmykolaharmash%2Fhyntax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmykolaharmash%2Fhyntax/lists"}