Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ndonfris/tree-sitter-types-builder


https://github.com/ndonfris/tree-sitter-types-builder

Last synced: 18 days ago
JSON representation

Awesome Lists containing this project

README

        

# tree-sitter-types-builder

This tool is a helpful utility for developers to generate every `.type` of possible
`SyntaxNode` that can be found in a `tree-sitter` grammar, as [string literals](https://www.typescriptlang.org/docs/handbooj/2/template-literal-types.html). Even in most small
languages, the number of `SyntaxNode` types can be quite large __(well into the hundreds of
definitions)__. While many of the definitions are redundant _(after analysis provided by tree-sitter)_,
it is much easier to remove these types than to find what types will be needed.

- [Usage/Installation](#usageinstallation)
- [Example (TS) | Introduction](#example-ts--introduction)
- [How do the generated types help? (ADVACNED COMPARISON)](#how-do-the-generated-types-help-advacned-comparison)
- [Auto-completion/Intellisense/GoTo-References](#auto-completionintellisensegoto-references)
- [Extensiblilty & Ambiguity](#extensiblilty--ambiguity)
- [Easy Testability & Maintainability](#easy-testability--maintainability)
- [Consistency](#consistency)
- [Further Reading](#further-reading)
- [Conclusion](#conclusion)
- [License](#license)

## Usage/Installation

Global Installation

1. Install the package globally __(using your preferred package manager)__

```bash
# npm installation
npm i -g tree-sitter-types-builder

# yarn installation
yarn global add tree-sitter-types-builder

# pnpm installation
pnpm add --global tree-sitter-types-builder
```

2. Use `tree-sitter-types-builder` command where needed

```bash
# in some project with a wasm file
tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts
```

Local Project Installation

> __Note:__

> requires [web-tree-sitter](https://www.npmjs.com/package/web-tree-sitter), and [tree-sitter-cli](https://www.npmjs.com/package/tree-sitter-cli).

1. Install inside package inside project

```bash
pnpm install --save-dev tree-sitter-types-builder
```

2. Build a wasm file

```bash
# for example, to build a wasm file for the bash language
npx tree-sitter build-wasm ./tree-sitter-bash
```

> This will create a `tree-sitter-bash.wasm` file in the `tree-sitter-bash` directory

```bash
# for newer tree-sitter-cli versions
npx tree-sitter build --wasm ./tree-sitter-bash
```

3. Run the command for your language

```bash
npx tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts
```

> edit the generated types to fit your needs

## Example (TS) | Introduction

The [recommended](#example-ts--introduction) :heavy_check_mark: example below, assumes that you have already compiled a wasm file for your language
and have generated the types. It also assumes that you are using `web-tree-sitter`
to parse your code. If you have completed these steps, you can now use the generated
types to build _any_ features for your language

```typescript
import { SyntaxNode } from 'web-tree-sitter';
import { LangNodeType } from './types' // generated by tree-sitter-types-builder

// 1.) initialize parser for a language
// 2.) parse some code to get the Tree of SyntaxNode's from web-tree-sitter
// 3.) build features, by selecting nodes of interest using the generated LangNodeType

function findChildOfType(rootNode: SyntaxNode, type: LangNodeType): SyntaxNode | null {
if (rootNode.type === type) return rootNode;
for (const child of rootNode.children) {
const found = findChildOfType(child, type);
if (found) return found;
}
return null;
}

// now you get auto-completion for LangNodeType.FunctionDeclaration
// and avoid passing incorrect strings to the function
findChildOfType(rootNode, LangNodeType.FunctionDeclaration);
```

> This process automates potentially error-prone manual work and makes the code more robust.
> It also makes the code more readable and easier to maintain. A tree-sitter-{lang} maintainer
> can now update their grammar without breaking the code of their users.


Unrecommended :x: way of using tree-sitter, without the generated types below:

> Brief outline displaying how quickly exact context/naming of types, tree-sitter-api requires

```typescript
import { SyntaxNode } from 'web-tree-sitter';

function findChildOfType(rootNode: SyntaxNode, type: string): SyntaxNode | null {
if (rootNode.type === type) return rootNode;
for (const child of rootNode.children) {
const found = findChildOfType(child, type);
if (found) return found;
}
return null;
}

// now, the user must test the exact string into the findChildOfType function
// and will not be able to get auto-completion for the type of node they are looking for.
findChildOfType(rootNode, 'function_declaration');

// Furhtermore, consider implementing features that require multiple types of
// nodes to be selected. The context of the code will be much harder to understand
// and properly deduce.
function findUnreachableCode(rootNode: SyntaxNode): SyntaxNode | null {
const functionNode = findChildOfType(rootNode, 'function');
const blockNode = findChildOfType(functionNode, 'block');
const returnNode = findChildOfType(blockNode, 'return_statement');
// check for returnNode's to have siblings after them, within the current
// block scope
return returnNode;
}
```

> Did you catch the potential bug in the above code? Depending on the language,
> a function might not have anything other than the identifier for the function
> name (common in shell languages). The `block` node would also potentially also
> just be for the keyword of the block-scope.


### How do the generated types help? (ADVACNED COMPARISON)

#### Auto-completion/Intellisense/GoTo-References

Using this package will give you language features, project wide. This is useful for adding
other features __later__, especially if they require similar implementations/node-types
to your currently completed features. You can use a goto-refrences request on a `LangNodeType` to see all
the places where that specific node has been used.

- [Wide Type Definition](https://github.com/tree-sitter/tree-sitter/blob/0fc92c9a7d0ddb417bd74bf7f533bb8f3042dbe3/lib/binding_web/tree-sitter-web.d.ts#L61) in tree-sitter API
- Generated type definitions provide a _string literal_ for each type of node

#### Extensiblilty & Ambiguity

Context wise, you can also extend the types generated by the tool to include
additional type-narrowing. For example, only allowing a specific set of nodes to
be searched for is much clearer to define in as a singular new type definition.

>
> :heavy_check_mark: Generated Usage :heavy_check_mark:
>
> ```typescript
> export type BlockScopeNode = LangNodeType.Block | LangNodeType.FunctionDeclaration | LangNodeType.IfStatement | LangNodeType.WhileStatement;
> ```
>
>
>
> :x: Non-Generated Usage :x:
>
> ```typescript
> // no auto-completion for the types of nodes that can be used
> // no reference to where the type is used (for block_statement, function_declaration, if_statement, while_statement)
> export type BlockScopeNode = 'block' | 'function_declaration' | 'if_statement' | 'while_statement'
>
> // if another type-narrowing intends to use an overlaping type, the tree-sitter
> // API can easily hide using the wrong the string meant for the type
> export type StatementScope = 'block_statement' | 'if_statement' | 'while_statement' | 'for_statement'
> ```
>
>

#### Easy Testability & Maintainability

Allows for the indented types of nodes to be selected, and __tested__ before
new maintainers approach the code. Consider the following example,
where you are comparing two nodes that might correspond to similiar string values
(this could be different forms of `whitespaces`, `comments`, or even something like `block` vs `block-scope`).

>
> Example Test File
>
> ```typescript
> import Parser, { SyntaxNode } from 'web-tree-sitter';
> import { LangNodeType } from './types.ts';
>
> function nodeMatchesType(node: SyntaxNode, type: LangNodeType): boolean {
> return node.type === type;
> }
>
> const nodeA = LangNodeType.block;
> const nodeB = LangNodeType.blockScope;
>
> function getInOrderNodes(rootNode: SyntaxNode, collectedNodes: SyntaxNode[] = []): SyntaxNode[] {
> collectedNodes.push(rootNode);
> for (const child of rootNode.children) {
> if (child) getNodes(child, collectedNodes);
> }
> return collectedNodes;
> }
>
> for (const node of getInOrderNodes(rootNode)) {
> if (nodeMatchesType(node, nodeA)) {
> // do something with nodeA
> } else if (nodeMatchesType(node, nodeB)) {
> // do something with nodeB
> }
> }
>
> // can also use the namespace getKeys() function to iterate over all the types
> LangNodeType.getKeys().forEach((key) => {
> const node = LangNodeType[key];
> if (nodeMatchesType(node, nodeA)) {
> // do something with nodeA
> } else if (nodeMatchesType(node, nodeB)) {
> // do something with nodeB
> }
> });
>```
>
>

The project's **maintainability** is the core reason for the creation of this tool.
In a project where I used `tree-sitter` to parse a language and did not
separately define the types of nodes, the complexity of not separating the
tree-sitter-wasm API from the rest of the code was a major issue. Refactoring a
project of large scale, without the `SyntaxNode` types statically defined becomes exponentially
more difficult as the project grows.

#### Consistency

This file can be used to check for equivalent type conversions across different
apis. This is an important feature for project that might grow very large.
Keeping the relevant types in a location that can be easily navigated to is
a good practice for any project.

## Further Reading

The syntax generated by this tool is based on the type definitions in the
[language server protocol](https://github.com/microsoft/vscode-languageserver-node) and the exploits the Type system's
ability to extend types with additional properties/functions _(through the use
of a namespace)_. This allows the type definitions to be more expressive by
allowing for them to be iterated over, while keeping their ability to be statically referenced.

The specific type definitions use a string literal to represent the type of `SyntaxNode` that
is being referenced. Not onlyd does this help abstract the `tree-sitter` API
from the user, but it also allows for the type definitions to be more expressive
by displaying all type definitions in a single place.

This would be especially useful for developers who are just beginning a project that uses
the `tree-sitter` API. They can now easily see all the types that are available to them,
and can easily determine which types they need to use. Properly defining the
set of `SyntaxNode` types relevant to the features of the project is a much
clearer method than having to rely on the very wide type definition it corresponds to
from a tree-sitter's parser.

## Conclusion

This projects aims to provide a clear and testable method for building a feature
rich set of language features from a tree-sitter grammer. It also can be helpful
to keep this tool on hand to check for name changes across releases of a languages grammar.

## License

MIT