{"id":13716827,"url":"https://github.com/syntax-tree/hast-util-to-nlcst","last_synced_at":"2025-06-29T22:03:49.804Z","repository":{"id":57261713,"uuid":"69396772","full_name":"syntax-tree/hast-util-to-nlcst","owner":"syntax-tree","description":"utility to transform hast to nlcst","archived":false,"fork":false,"pushed_at":"2023-08-08T17:24:18.000Z","size":158,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-06-14T09:43:09.513Z","etag":null,"topics":["hast","hast-util","html","natural-language","nlcst","nlcst-util","syntax-tree","unist","util"],"latest_commit_sha":null,"homepage":"https://unifiedjs.com","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/syntax-tree.png","metadata":{"funding":{"github":"unifiedjs","open_collective":"unified"},"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-09-27T20:40:34.000Z","updated_at":"2023-05-24T15:49:41.000Z","dependencies_parsed_at":"2024-01-14T22:03:39.565Z","dependency_job_id":"0f3c24dc-3904-447d-a414-f2dce35204c3","html_url":"https://github.com/syntax-tree/hast-util-to-nlcst","commit_stats":{"total_commits":104,"total_committers":3,"mean_commits":"34.666666666666664","dds":"0.038461538461538436","last_synced_commit":"bc67eeaece2548a741e60642bc051947ada32e3a"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/syntax-tree/hast-util-to-nlcst","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fhast-util-to-nlcst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fhast-util-to-nlcst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fhast-util-to-nlcst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fhast-util-to-nlcst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/syntax-tree","download_url":"https://codeload.github.com/syntax-tree/hast-util-to-nlcst/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fhast-util-to-nlcst/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261852988,"owners_count":23219778,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hast","hast-util","html","natural-language","nlcst","nlcst-util","syntax-tree","unist","util"],"created_at":"2024-08-03T00:01:14.843Z","updated_at":"2025-06-29T22:03:49.755Z","avatar_url":"https://github.com/syntax-tree.png","language":"JavaScript","funding_links":["https://github.com/sponsors/unifiedjs","https://opencollective.com/unified"],"categories":["hast utilities"],"sub_categories":[],"readme":"# hast-util-to-nlcst\n\n[![Build][build-badge]][build]\n[![Coverage][coverage-badge]][coverage]\n[![Downloads][downloads-badge]][downloads]\n[![Size][size-badge]][size]\n[![Sponsors][sponsors-badge]][collective]\n[![Backers][backers-badge]][collective]\n[![Chat][chat-badge]][chat]\n\n[hast][] utility to transform to [nlcst][].\n\n## Contents\n\n*   [What is this?](#what-is-this)\n*   [When should I use this?](#when-should-i-use-this)\n*   [Install](#install)\n*   [Use](#use)\n*   [API](#api)\n    *   [`toNlcst(tree, file, Parser)`](#tonlcsttree-file-parser)\n    *   [`ParserConstructor`](#parserconstructor)\n    *   [`ParserInstance`](#parserinstance)\n*   [Types](#types)\n*   [Compatibility](#compatibility)\n*   [Security](#security)\n*   [Related](#related)\n*   [Contribute](#contribute)\n*   [License](#license)\n\n## What is this?\n\nThis package is a utility that takes a [hast][] (HTML) syntax tree as input and\nturns it into [nlcst][] (natural language).\n\n## When should I use this?\n\nThis project is useful when you want to deal with ASTs and inspect the natural\nlanguage inside HTML.\nUnfortunately, there is no way yet to apply changes to the nlcst back into\nhast.\n\nThe mdast utility [`mdast-util-to-nlcst`][mdast-util-to-nlcst] does the same but\nuses a markdown tree as input.\n\nThe rehype plugin [`rehype-retext`][rehype-retext] wraps this utility to do the\nsame at a higher-level (easier) abstraction.\n\n## Install\n\nThis package is [ESM only][esm].\nIn Node.js (version 16+), install with [npm][]:\n\n```sh\nnpm install hast-util-to-nlcst\n```\n\nIn Deno with [`esm.sh`][esmsh]:\n\n```js\nimport {toNlcst} from 'https://esm.sh/hast-util-to-nlcst@4'\n```\n\nIn browsers with [`esm.sh`][esmsh]:\n\n```html\n\u003cscript type=\"module\"\u003e\n  import {toNlcst} from 'https://esm.sh/hast-util-to-nlcst@4?bundle'\n\u003c/script\u003e\n```\n\n## Use\n\nSay our document `example.html` contains:\n\n```html\n\u003carticle\u003e\n  Implicit.\n  \u003ch1\u003eExplicit: \u003cstrong\u003efoo\u003c/strong\u003es-ball\u003c/h1\u003e\n  \u003cpre\u003e\u003ccode class=\"language-foo\"\u003ebar()\u003c/code\u003e\u003c/pre\u003e\n\u003c/article\u003e\n```\n\n…and our module `example.js` looks as follows:\n\n```js\nimport {fromHtml} from 'hast-util-from-html'\nimport {toNlcst} from 'hast-util-to-nlcst'\nimport {ParseEnglish} from 'parse-english'\nimport {read} from 'to-vfile'\nimport {inspect} from 'unist-util-inspect'\n\nconst file = await read('example.html')\nconst tree = fromHtml(file)\n\nconsole.log(inspect(toNlcst(tree, file, ParseEnglish)))\n```\n\n…now running `node example.js` yields (positional info removed for brevity):\n\n```txt\nRootNode[2] (1:1-6:1, 0-134)\n├─0 ParagraphNode[3] (1:10-3:3, 9-24)\n│   ├─0 WhiteSpaceNode \"\\n  \" (1:10-2:3, 9-12)\n│   ├─1 SentenceNode[2] (2:3-2:12, 12-21)\n│   │   ├─0 WordNode[1] (2:3-2:11, 12-20)\n│   │   │   └─0 TextNode \"Implicit\" (2:3-2:11, 12-20)\n│   │   └─1 PunctuationNode \".\" (2:11-2:12, 20-21)\n│   └─2 WhiteSpaceNode \"\\n  \" (2:12-3:3, 21-24)\n└─1 ParagraphNode[1] (3:7-3:43, 28-64)\n    └─0 SentenceNode[4] (3:7-3:43, 28-64)\n        ├─0 WordNode[1] (3:7-3:15, 28-36)\n        │   └─0 TextNode \"Explicit\" (3:7-3:15, 28-36)\n        ├─1 PunctuationNode \":\" (3:15-3:16, 36-37)\n        ├─2 WhiteSpaceNode \" \" (3:16-3:17, 37-38)\n        └─3 WordNode[4] (3:25-3:43, 46-64)\n            ├─0 TextNode \"foo\" (3:25-3:28, 46-49)\n            ├─1 TextNode \"s\" (3:37-3:38, 58-59)\n            ├─2 PunctuationNode \"-\" (3:38-3:39, 59-60)\n            └─3 TextNode \"ball\" (3:39-3:43, 60-64)\n```\n\n## API\n\nThis package exports the identifier [`toNlcst`][api-to-nlcst].\nThere is no default export.\n\n### `toNlcst(tree, file, Parser)`\n\nTurn a hast tree into an nlcst tree.\n\n\u003e 👉 **Note**: `tree` must have positional info and `file` must be a `VFile`\n\u003e corresponding to `tree`.\n\n##### Parameters\n\n*   `tree` ([`HastNode`][hast-node])\n    — hast tree to transform\n*   `file` ([`VFile`][vfile])\n    — virtual file\n*   `Parser` ([`ParserConstructor`][api-parser-constructor] or\n    [`ParserInstance`][api-parser-instance])\n    — parser to use.\n\n##### Returns\n\n[`NlcstNode`][nlcst-node].\n\n##### Notes\n\n###### Implied paragraphs\n\nThe algorithm supports implicit and explicit paragraphs, such as:\n\n```html\n\u003carticle\u003e\n  An implicit paragraph.\n  \u003ch1\u003eAn explicit paragraph.\u003c/h1\u003e\n\u003c/article\u003e\n```\n\nOverlapping paragraphs are also supported (see the tests or the HTML spec for\nmore info).\n\n###### Ignored nodes\n\nSome elements are ignored and their content will not be present in\n**[nlcst][]**: `\u003cscript\u003e`, `\u003cstyle\u003e`, `\u003csvg\u003e`, `\u003cmath\u003e`, `\u003cdel\u003e`.\n\nTo ignore other elements, add a `data-nlcst` attribute with a value of `ignore`:\n\n```html\n\u003cp\u003eThis is \u003cspan data-nlcst=\"ignore\"\u003ehidden\u003c/span\u003e.\u003c/p\u003e\n\u003cp data-nlcst=\"ignore\"\u003eCompletely hidden.\u003c/p\u003e\n```\n\n###### Source nodes\n\n`\u003ccode\u003e` elements are mapped to [`Source`][nlcst-source] nodes in\n**[nlcst][]**.\n\nTo mark other elements as source, add a `data-nlcst` attribute with a value\nof `source`:\n\n```html\n\u003cp\u003eThis is \u003cspan data-nlcst=\"source\"\u003emarked as source\u003c/span\u003e.\u003c/p\u003e\n\u003cp data-nlcst=\"source\"\u003eCompletely marked.\u003c/p\u003e\n```\n\n### `ParserConstructor`\n\nCreate a new parser (TypeScript type).\n\n###### Type\n\n```ts\ntype ParserConstructor = new () =\u003e ParserInstance\n```\n\n### `ParserInstance`\n\nnlcst parser (TypeScript type).\n\nFor example, [`parse-dutch`][parse-dutch], [`parse-english`][parse-english], or\n[`parse-latin`][parse-latin].\n\n###### Type\n\n```ts\ntype ParserInstance = {\n  parse(value?: string | null | undefined): NlcstRoot\n  tokenize(value?: string | null | undefined): Array\u003cNlcstSentenceContent\u003e\n  tokenizeParagraph(value?: string | null | undefined): NlcstParagraph\n  tokenizeParagraphPlugins: Array\u003c(node: NlcstParagraph) =\u003e undefined | void\u003e\n  tokenizeSentencePlugins: Array\u003c(node: NlcstSentence) =\u003e undefined | void\u003e\n}\n```\n\n## Types\n\nThis package is fully typed with [TypeScript][].\nIt exports the additional types [`ParserConstructor`][api-parser-constructor]\nand [`ParserInstance`][api-parser-instance].\n\n## Compatibility\n\nProjects maintained by the unified collective are compatible with maintained\nversions of Node.js.\n\nWhen we cut a new major release, we drop support for unmaintained versions of\nNode.\nThis means we try to keep the current release line, `hast-util-to-nlcst@^4`,\ncompatible with Node.js 16.\n\n## Security\n\n`hast-util-to-nlcst` does not change the original syntax tree so there are no\nopenings for [cross-site scripting (XSS)][xss] attacks.\n\n## Related\n\n*   [`mdast-util-to-nlcst`](https://github.com/syntax-tree/mdast-util-to-nlcst)\n    — transform mdast to nlcst\n*   [`hast-util-to-mdast`](https://github.com/syntax-tree/hast-util-to-mdast)\n    — transform hast to mdast\n*   [`hast-util-to-xast`](https://github.com/syntax-tree/hast-util-to-xast)\n    — transform hast to xast\n\n## Contribute\n\nSee [`contributing.md`][contributing] in [`syntax-tree/.github`][health] for\nways to get started.\nSee [`support.md`][support] for ways to get help.\n\nThis project has a [code of conduct][coc].\nBy interacting with this repository, organization, or community you agree to\nabide by its terms.\n\n## License\n\n[MIT][license] © [Titus Wormer][author]\n\n\u003c!-- Definitions --\u003e\n\n[build-badge]: https://github.com/syntax-tree/hast-util-to-nlcst/workflows/main/badge.svg\n\n[build]: https://github.com/syntax-tree/hast-util-to-nlcst/actions\n\n[coverage-badge]: https://img.shields.io/codecov/c/github/syntax-tree/hast-util-to-nlcst.svg\n\n[coverage]: https://codecov.io/github/syntax-tree/hast-util-to-nlcst\n\n[downloads-badge]: https://img.shields.io/npm/dm/hast-util-to-nlcst.svg\n\n[downloads]: https://www.npmjs.com/package/hast-util-to-nlcst\n\n[size-badge]: https://img.shields.io/badge/dynamic/json?label=minzipped%20size\u0026query=$.size.compressedSize\u0026url=https://deno.bundlejs.com/?q=hast-util-to-nlcst\n\n[size]: https://bundlejs.com/?q=hast-util-to-nlcst\n\n[sponsors-badge]: https://opencollective.com/unified/sponsors/badge.svg\n\n[backers-badge]: https://opencollective.com/unified/backers/badge.svg\n\n[collective]: https://opencollective.com/unified\n\n[chat-badge]: https://img.shields.io/badge/chat-discussions-success.svg\n\n[chat]: https://github.com/syntax-tree/unist/discussions\n\n[npm]: https://docs.npmjs.com/cli/install\n\n[esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c\n\n[esmsh]: https://esm.sh\n\n[typescript]: https://www.typescriptlang.org\n\n[license]: license\n\n[author]: https://wooorm.com\n\n[health]: https://github.com/syntax-tree/.github\n\n[contributing]: https://github.com/syntax-tree/.github/blob/main/contributing.md\n\n[support]: https://github.com/syntax-tree/.github/blob/main/support.md\n\n[coc]: https://github.com/syntax-tree/.github/blob/main/code-of-conduct.md\n\n[rehype-retext]: https://github.com/rehypejs/rehype-retext\n\n[vfile]: https://github.com/vfile/vfile\n\n[hast]: https://github.com/syntax-tree/hast\n\n[hast-node]: https://github.com/syntax-tree/hast#nodes\n\n[nlcst]: https://github.com/syntax-tree/nlcst\n\n[nlcst-node]: https://github.com/syntax-tree/nlcst#nodes\n\n[nlcst-source]: https://github.com/syntax-tree/nlcst#source\n\n[mdast-util-to-nlcst]: https://github.com/syntax-tree/mdast-util-to-nlcst\n\n[xss]: https://en.wikipedia.org/wiki/Cross-site_scripting\n\n[parse-english]: https://github.com/wooorm/parse-english\n\n[parse-latin]: https://github.com/wooorm/parse-latin\n\n[parse-dutch]: https://github.com/wooorm/parse-dutch\n\n[api-to-nlcst]: #tonlcsttree-file-parser\n\n[api-parser-constructor]: #parserconstructor\n\n[api-parser-instance]: #parserinstance\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyntax-tree%2Fhast-util-to-nlcst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsyntax-tree%2Fhast-util-to-nlcst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyntax-tree%2Fhast-util-to-nlcst/lists"}