Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ChimeHQ/SwiftTreeSitter

Swift API for the tree-sitter incremental parsing system
https://github.com/ChimeHQ/SwiftTreeSitter

ios macos parser parsing swift tree-sitter

Last synced: about 1 month ago
JSON representation

Swift API for the tree-sitter incremental parsing system

Awesome Lists containing this project

README

        

[![Build Status][build status badge]][build status]
[![Platforms][platforms badge]][platforms]
[![Documentation][documentation badge]][documentation]
[![Discord][discord badge]][discord]

# SwiftTreeSitter

Swift API for the [tree-sitter](https://tree-sitter.github.io/) incremental parsing system.

- Close to full coverage of the C API
- Swift/Foundation types where possible
- Standard query result mapping for highlights and injections
- Query predicate/directive support via `ResolvingQueryMatchSequence`
- Nested language support
- Swift concurrency support where possible

# Structure

This project is actually split into two parts: `SwiftTreeSitter` and `SwiftTreeSitterLayer`.

The SwiftTreeSitter target is a close match to the C runtime API. It adds only a few additional types to help support querying. It is fairly low-level, and there will be significant work to use it in a real project.

SwiftTreeSitterLayer is an abstraction built on top of SwiftTreeSitter. It supports documents with nested languages and transparent querying across those nestings. It also supports asynchronous language resolution. While still low-level, SwiftTreeSitterLayer is easier to work with while also supporting more features.

And yet there's more! If you are looking a higher-level system for syntax highlighting and other syntactic operations, you might want to have a look at [Neon](https://github.com/ChimeHQ/Neon). It is much easier to integrate with a text system, and has lots of additional performance-related features.

## Integration

```swift
dependencies: [
.package(url: "https://github.com/ChimeHQ/SwiftTreeSitter")
],
targets: [
.target(
name: "MySwiftTreeSitterTarget",
dependencies: ["SwiftTreeSitter"]
),
.target(
name: "MySwiftTreeSitterLayerTarget",
dependencies: [
.product(name: "SwiftTreeSitterLayer", package: "SwiftTreeSitter"),
]
),
]
```

## Range Translation

The tree-sitter runtime operates on raw string data. This means it works with bytes, and is string-encoding-sensitive. Swift's `String` type is an abstraction on top of raw data and cannot be used directly. To overcome this, you also have to be aware of the types of indexes you are using and how string data is translated back and forth.

To help, SwiftTreeSitter supports the base tree-sitter encoding facilities. You can control this via `Parser.parse(tree:encoding:readBlock:)`. But, by default this will assume UTF-16-encoded data. This is done to offer direct compatibility with Foundation strings and `NSRange`, which both use UTF-16.

Also, to help with all the back and forth, SwiftTreeSitter includes some accessors that are NSRange-based, as well as extension on `NSRange`. These **must** be used when working with the native tree-sitter types unless you take care to handle encoding yourself.

To keep things clear, consistent naming and types are used. `Node.byteRange` returns a `Range`, which is an encoding-dependent value. `Node.range` is an `NSRange` which is defined to use UTF-16.

```swift
let node = tree.rootNode!

// this is encoding-dependent and cannot be used with your storage
node.byteRange

// this is a UTF-16-assumed translation of the byte ranges
node.range

// converting UTF-16-based changed ranges on re-parse
let ranges: [NSRange] = newtree.changedRanges(from: oldTree)
.map{ $0.bytes.range }
```

## Query Conflicts

SwiftTreeSitter does its best to resolve poor/incorrect query constructs, which are surprisingly common.

When using injections, child query ranges are automatically expanded using parent matches. This handles cases where a parent has queries that overlap with children in conflicting ways. Without expansion, it is possible to construct queries that fall within children ranges but produce on parent matches.

All matches are sorted by:

- depth
- location in content
- specificity of match label (more components => more specific)
- occurrence in the query source

Even with these, it is possible to produce queries that will result in "incorrect" behavior that are either ambiguous or undefined in the query definition.

## Highlighting

A very common use of tree-sitter is to do syntax highlighting. It is possible to use this library directly, especially if your source text does not change. Here's a little example that sets everything up with a SPM-bundled language.

First, check out how it works with SwiftTreeSitterLayer. It's complex, but does a lot for you.

````swift
// LanguageConfiguration takes care of finding and loading queries in SPM-created bundles.
let markdownConfig = try LanguageConfiguration(tree_sitter_markdown(), name: "Markdown")
let markdownInlineConfig = try LanguageConfiguration(
tree_sitter_markdown_inline(),
name: "MarkdownInline",
bundleName: "TreeSitterMarkdown_TreeSitterMarkdownInline"
)
let swiftConfig = try LanguageConfiguration(tree_sitter_swift(), name: "Swift")

// Unfortunately, injections do not use standardized language names, and can even be content-dependent. Your system must do this mapping.
let config = LanguageLayer.Configuration(
languageProvider: {
name in
switch name {
case "markdown":
return markdownConfig
case "markdown_inline":
return markdownInlineConfig
case "swift":
return swiftConfig
default:
return nil
}
}
)

let rootLayer = try LanguageLayer(languageConfig: markdownConfig, configuration: config)

let source = """
# this is markdown

```swift
func main(a: Int) {
}
```

## also markdown

```swift
let value = "abc"
```
"""

rootLayer.replaceContent(with: source)

let fullRange = NSRange(source.startIndex..