{"id":13716520,"url":"https://github.com/syntax-tree/nlcst","last_synced_at":"2025-10-14T18:41:30.708Z","repository":{"id":21560776,"uuid":"24880547","full_name":"syntax-tree/nlcst","owner":"syntax-tree","description":"Natural Language Concrete Syntax Tree format","archived":false,"fork":false,"pushed_at":"2024-10-04T12:49:57.000Z","size":72,"stargazers_count":223,"open_issues_count":0,"forks_count":10,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-10-08T03:39:20.684Z","etag":null,"topics":["ast","cst","natural-language","syntax-tree","unist"],"latest_commit_sha":null,"homepage":"https://unifiedjs.com","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/syntax-tree.png","metadata":{"funding":{"github":"unifiedjs","open_collective":"unified","thanks_dev":"u/gh/syntax-tree"},"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2014-10-07T07:18:15.000Z","updated_at":"2025-09-27T13:27:52.000Z","dependencies_parsed_at":"2023-02-18T17:45:17.690Z","dependency_job_id":"dbb48899-672b-479f-b732-99c44fcada78","html_url":"https://github.com/syntax-tree/nlcst","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/syntax-tree/nlcst","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fnlcst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fnlcst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fnlcst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fnlcst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/syntax-tree","download_url":"https://codeload.github.com/syntax-tree/nlcst/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syntax-tree%2Fnlcst/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279020361,"owners_count":26086866,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ast","cst","natural-language","syntax-tree","unist"],"created_at":"2024-08-03T00:01:11.292Z","updated_at":"2025-10-14T18:41:30.692Z","avatar_url":"https://github.com/syntax-tree.png","language":null,"readme":"# ![nlcst][logo]\n\n**N**atural **L**anguage **C**oncrete **S**yntax **T**ree format.\n\n***\n\n**nlcst** is a specification for representing natural language in a [syntax\ntree][syntax-tree].\nIt implements the **[unist][]** spec.\n\nThis document may not be released.\nSee [releases][] for released documents.\nThe latest released version is [`1.0.2`][latest].\n\n## Contents\n\n* [Introduction](#introduction)\n  * [Where this specification fits](#where-this-specification-fits)\n* [Types](#types)\n* [Nodes (abstract)](#nodes-abstract)\n  * [`Literal`](#literal)\n  * [`Parent`](#parent)\n* [Nodes](#nodes)\n  * [`Paragraph`](#paragraph)\n  * [`Punctuation`](#punctuation)\n  * [`Root`](#root)\n  * [`Sentence`](#sentence)\n  * [`Source`](#source)\n  * [`Symbol`](#symbol)\n  * [`Text`](#text)\n  * [`WhiteSpace`](#whitespace)\n  * [`Word`](#word)\n* [Glossary](#glossary)\n* [List of utilities](#list-of-utilities)\n* [Related](#related)\n* [References](#references)\n* [Contribute](#contribute)\n* [Acknowledgments](#acknowledgments)\n* [License](#license)\n\n## Introduction\n\nThis document defines a format for representing natural language as a [concrete\nsyntax tree][syntax-tree].\nDevelopment of nlcst started in May 2014,\nin the now deprecated [textom][] project for [retext][],\nbefore [unist][] existed.\nThis specification is written in a [Web IDL][webidl]-like grammar.\n\n### Where this specification fits\n\nnlcst extends [unist][],\na format for syntax trees,\nto benefit from its [ecosystem of utilities][utilities].\n\nnlcst relates to [JavaScript][] in that it has an [ecosystem of\nutilities][list-of-utilities] for working with compliant syntax trees in\nJavaScript.\nHowever,\nnlcst is not limited to JavaScript and can be used in other programming\nlanguages.\n\nnlcst relates to the [unified][] and [retext][] projects in that nlcst syntax\ntrees are used throughout their ecosystems.\n\n## Types\n\nIf you are using TypeScript,\nyou can use the nlcst types by installing them with npm:\n\n```sh\nnpm install @types/nlcst\n```\n\n## Nodes (abstract)\n\n### `Literal`\n\n```idl\ninterface Literal \u003c: UnistLiteral {\n  value: string\n}\n```\n\n**Literal** ([**UnistLiteral**][dfn-unist-literal]) represents a node in nlcst\ncontaining a value.\n\nIts `value` field is a `string`.\n\n### `Parent`\n\n```idl\ninterface Parent \u003c: UnistParent {\n  children: [Paragraph | Punctuation | Sentence | Source | Symbol | Text | WhiteSpace | Word]\n}\n```\n\n**Parent** ([**UnistParent**][dfn-unist-parent]) represents a node in nlcst\ncontaining other nodes (said to be [*children*][term-child]).\n\nIts content is limited to only other nlcst content.\n\n## Nodes\n\n### `Paragraph`\n\n```idl\ninterface Paragraph \u003c: Parent {\n  type: 'ParagraphNode'\n  children: [Sentence | Source | WhiteSpace]\n}\n```\n\n**Paragraph** ([**Parent**][dfn-parent]) represents a unit of discourse dealing\nwith a particular point or idea.\n\n**Paragraph** can be used in a [**root**][dfn-root] node.\nIt can contain [**sentence**][dfn-sentence],\n[**whitespace**][dfn-whitespace],\nand [**source**][dfn-source] nodes.\n\n### `Punctuation`\n\n```idl\ninterface Punctuation \u003c: Literal {\n  type: 'PunctuationNode'\n}\n```\n\n**Punctuation** ([**Literal**][dfn-literal]) represents typographical devices\nwhich aid understanding and correct reading of other grammatical units.\n\n**Punctuation** can be used in [**sentence**][dfn-sentence] or\n[**word**][dfn-word] nodes.\n\n### `Root`\n\n```idl\ninterface Root \u003c: Parent {\n  type: 'RootNode'\n}\n```\n\n**Root** ([**Parent**][dfn-parent]) represents a document.\n\n**Root** can be used as the [*root*][term-root] of a [*tree*][term-tree],\nnever as a [*child*][term-child].\nIts content model is not limited,\nit can contain any nlcst content,\nwith the restriction that all content must be of the same category.\n\n### `Sentence`\n\n```idl\ninterface Sentence \u003c: Parent {\n  type: 'SentenceNode'\n  children: [Punctuation | Source | Symbol | WhiteSpace | Word]\n}\n```\n\n**Sentence** ([**Parent**][dfn-parent]) represents grouping of grammatically\nlinked words,\nthat in principle tells a complete thought,\nalthough it may make little sense taken in isolation out of context.\n\n**Sentence** can be used in a [**paragraph**][dfn-paragraph] node.\nIt can contain [**word**][dfn-word],\n[**symbol**][dfn-symbol],\n[**punctuation**][dfn-punctuation],\n[**whitespace**][dfn-whitespace],\nand [**source**][dfn-source] nodes.\n\n### `Source`\n\n```idl\ninterface Source \u003c: Literal {\n  type: 'SourceNode'\n}\n```\n\n**Source** ([**Literal**][dfn-literal]) represents an external (ungrammatical)\nvalue embedded into a grammatical unit: a hyperlink,\ncode,\nand such.\n\n**Source** can be used in [**root**][dfn-root],\n[**paragraph**][dfn-paragraph],\n[**sentence**][dfn-sentence],\nor [**word**][dfn-word] nodes.\n\n### `Symbol`\n\n```idl\ninterface Symbol \u003c: Literal {\n  type: 'SymbolNode'\n}\n```\n\n**Symbol** ([**Literal**][dfn-literal]) represents typographical devices\ndifferent from characters which represent sounds (like letters and numerals),\nwhite space,\nor punctuation.\n\n**Symbol** can be used in [**sentence**][dfn-sentence] or [**word**][dfn-word]\nnodes.\n\n### `Text`\n\n```idl\ninterface Text \u003c: Literal {\n  type: 'TextNode'\n}\n```\n\n**Text** ([**Literal**][dfn-literal]) represents actual content in nlcst\ndocuments: one or more characters.\n\n**Text** can be used in [**word**][dfn-word] nodes.\n\n### `WhiteSpace`\n\n```idl\ninterface WhiteSpace \u003c: Literal {\n  type: 'WhiteSpaceNode'\n}\n```\n\n**WhiteSpace** ([**Literal**][dfn-literal]) represents typographical devices\ndevoid of content,\nseparating other units.\n\n**WhiteSpace** can be used in [**root**][dfn-root],\n[**paragraph**][dfn-paragraph],\nor [**sentence**][dfn-sentence] nodes.\n\n### `Word`\n\n```idl\ninterface Word \u003c: Parent {\n  type: 'WordNode'\n  children: [Punctuation | Source | Symbol | Text]\n}\n```\n\n**Word** ([**Parent**][dfn-parent]) represents the smallest element that may be\nuttered in isolation with semantic or pragmatic content.\n\n**Word** can be used in a [**sentence**][dfn-sentence] node.\nIt can contain [**text**][dfn-text],\n[**symbol**][dfn-symbol],\n[**punctuation**][dfn-punctuation],\nand [**source**][dfn-source] nodes.\n\n## Glossary\n\nSee the [unist glossary][glossary].\n\n## List of utilities\n\nSee the [unist list of utilities][utilities] for more utilities.\n\n* [`nlcst-affix-emoticon-modifier`](https://github.com/syntax-tree/nlcst-affix-emoticon-modifier)\n  — merge affix emoticons into the previous sentence\n* [`nlcst-emoji-modifier`](https://github.com/syntax-tree/nlcst-emoji-modifier)\n  — support emoji\n* [`nlcst-emoticon-modifier`](https://github.com/syntax-tree/nlcst-emoticon-modifier)\n  — support emoticons\n* [`nlcst-is-literal`](https://github.com/syntax-tree/nlcst-is-literal)\n  — check whether a node is meant literally\n* [`nlcst-normalize`](https://github.com/syntax-tree/nlcst-normalize)\n  — normalize a word for easier comparison\n* [`nlcst-search`](https://github.com/syntax-tree/nlcst-search)\n  — search for patterns\n* [`nlcst-to-string`](https://github.com/syntax-tree/nlcst-to-string)\n  — serialize a node\n* [`nlcst-test`](https://github.com/syntax-tree/nlcst-test)\n  — validate a node\n* [`mdast-util-to-nlcst`](https://github.com/syntax-tree/mdast-util-to-nlcst)\n  — transform mdast to nlcst\n* [`hast-util-to-nlcst`](https://github.com/syntax-tree/hast-util-to-nlcst)\n  — transform hast to nlcst\n\n## Related\n\n* [mdast](https://github.com/syntax-tree/mdast)\n  — Markdown Abstract Syntax Tree format\n* [hast](https://github.com/syntax-tree/hast)\n  — Hypertext Abstract Syntax Tree format\n* [xast](https://github.com/syntax-tree/xast)\n  — Extensible Abstract Syntax Tree\n\n## References\n\n* **unist**:\n  [Universal Syntax Tree][unist].\n  T. Wormer; et al.\n* **JavaScript**:\n  [ECMAScript Language Specification][javascript].\n  Ecma International.\n* **Web IDL**:\n  [Web IDL][webidl],\n  C. McCormack.\n  W3C.\n\n## Contribute\n\nSee [`contributing.md`][contributing] in [`syntax-tree/.github`][health] for\nways to get started.\nSee [`support.md`][support] for ways to get help.\nIdeas for new utilities and tools can be posted in [`syntax-tree/ideas`][ideas].\n\nA curated list of awesome syntax-tree,\nunist,\nmdast,\nhast,\nxast,\nand nlcst resources can be found in [awesome syntax-tree][awesome].\n\nThis project has a [code of conduct][coc].\nBy interacting with this repository,\norganization,\nor community you agree to abide by its terms.\n\n## Acknowledgments\n\nThe initial release of this project was authored by\n[**@wooorm**](https://github.com/wooorm).\n\nThanks to\n[**@nwtn**](https://github.com/nwtn),\n[**@tmcw**](https://github.com/tmcw),\n[**@muraken720**](https://github.com/muraken720),\nand [**@dozoisch**](https://github.com/dozoisch)\nfor contributing to nlcst and related projects!\n\n## License\n\n[CC-BY-4.0][license] © [Titus Wormer][author]\n\n\u003c!--Definitions--\u003e\n\n[license]: https://creativecommons.org/licenses/by/4.0/\n\n[author]: https://wooorm.com\n\n[logo]: https://raw.githubusercontent.com/syntax-tree/nlcst/a89561d/logo.svg?sanitize=true\n\n[health]: https://github.com/syntax-tree/.github\n\n[contributing]: https://github.com/syntax-tree/.github/blob/HEAD/contributing.md\n\n[support]: https://github.com/syntax-tree/.github/blob/HEAD/support.md\n\n[coc]: https://github.com/syntax-tree/.github/blob/HEAD/code-of-conduct.md\n\n[awesome]: https://github.com/syntax-tree/awesome-syntax-tree\n\n[ideas]: https://github.com/syntax-tree/ideas\n\n[releases]: https://github.com/syntax-tree/nlcst/releases\n\n[latest]: https://github.com/syntax-tree/nlcst/releases/tag/1.0.2\n\n[list-of-utilities]: #list-of-utilities\n\n[dfn-unist-parent]: https://github.com/syntax-tree/unist#parent\n\n[dfn-unist-literal]: https://github.com/syntax-tree/unist#literal\n\n[dfn-parent]: #parent\n\n[dfn-literal]: #literal\n\n[dfn-root]: #root\n\n[dfn-paragraph]: #paragraph\n\n[dfn-sentence]: #sentence\n\n[dfn-word]: #word\n\n[dfn-symbol]: #symbol\n\n[dfn-punctuation]: #punctuation\n\n[dfn-whitespace]: #whitespace\n\n[dfn-text]: #text\n\n[dfn-source]: #source\n\n[term-tree]: https://github.com/syntax-tree/unist#tree\n\n[term-child]: https://github.com/syntax-tree/unist#child\n\n[term-root]: https://github.com/syntax-tree/unist#root\n\n[unist]: https://github.com/syntax-tree/unist\n\n[syntax-tree]: https://github.com/syntax-tree/unist#syntax-tree\n\n[javascript]: https://www.ecma-international.org/ecma-262/9.0/index.html\n\n[webidl]: https://heycam.github.io/webidl/\n\n[glossary]: https://github.com/syntax-tree/unist#glossary\n\n[utilities]: https://github.com/syntax-tree/unist#list-of-utilities\n\n[unified]: https://github.com/unifiedjs/unified\n\n[retext]: https://github.com/retextjs/retext\n\n[textom]: https://github.com/wooorm/textom\n","funding_links":["https://github.com/sponsors/unifiedjs","https://opencollective.com/unified","https://thanks.dev/u/gh/syntax-tree"],"categories":["Others","nlcst utilities","Official"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyntax-tree%2Fnlcst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsyntax-tree%2Fnlcst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyntax-tree%2Fnlcst/lists"}