Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/savetheclocktower/tree-sitter-hyperlink
Tree-sitter grammar for detecting URLs in arbitrary text
https://github.com/savetheclocktower/tree-sitter-hyperlink
Last synced: about 1 month ago
JSON representation
Tree-sitter grammar for detecting URLs in arbitrary text
- Host: GitHub
- URL: https://github.com/savetheclocktower/tree-sitter-hyperlink
- Owner: savetheclocktower
- License: mit
- Created: 2023-03-20T23:25:29.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-21T00:38:39.000Z (2 months ago)
- Last Synced: 2024-10-21T04:00:29.171Z (2 months ago)
- Language: JavaScript
- Size: 49.8 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tree-sitter-hyperlink
[Tree-sitter](https://github.com/tree-sitter/tree-sitter) grammar for detecting URLs in prose.
Eventually designed to do all the things [TextMate’s hyperlink helper bundle](https://github.com/textmate/hyperlink-helper.tmbundle/blob/master/Syntaxes/Hyperlink.tmLanguage) can do. That bundle got converted to Atom as [language-hyperlink](https://github.com/pulsar-edit/language-hyperlink/), but a tree-sitter version needs to exist for injecting into tree-sitter grammars in [Pulsar](https://pulsar-edit.dev/).
## Syntax
Support is currently limited to URLs that
* begin with `http` or `https`,
* have sections of ordinary text with dots/slashes in between, and
* do not end with any of `,."']`, all of which are far more likely to be meant
as a prose delimiter rather than part of the URL.URLs that end in `)` will have that `)` included in the URL _if_ it was preceded by an earlier `(` in the URL; otherwise it’ll be treated as a delimiter like the characters listed above.
Validity of the URL, or of any TLDs, is _far_ beyond the ambitions of this parser.
## Examples
URLs that will be correctly identified and highlighted:
```
http://example.com/foo?bar=bazYou might find my web site at https://example.com. The period at the end of the last sentence is not part of the URL.
One example would be [this one](http://example.com/foo?bar=baz), like in Markdown.
Or this one [http://example.com]
This fragment will be ignored http://
As will this one https://blehCSS URL without quotes:
@import url(https://www.example.com/style.css);CSS URL with quotes:
div {
background-image: url("https://www.example.com/style.gif");
}also https://example.net.
Good news, Elvis: https://en.wikipedia.org/wiki/Alison_(song) (because the closing parenthesis is mistakenly assumed not to be part of the URL)
http://example.com (it'll recognize both instances)
[A link to the Elvis Costello song in question](https://en.wikipedia.org/wiki/Alison_(song)) will correctly interpret the first ) as being part of the URL, but _not_ the second one.
```
There are surely some URLs out there in the wild that run afoul of these rules, so open an issue if you like.
## Tests
What I need to be able to test this properly is to inspect _the exact boundaries_ of a match, and [neither](https://tree-sitter.github.io/tree-sitter/creating-parsers#command-test) [kind](https://tree-sitter.github.io/tree-sitter/syntax-highlighting#unit-testing) of tree-sitter test does exactly what I need it to.
Until I get around to something more rigorous: if you’re contributing a change and want to guard against regressions, compare the output of `tree-sitter parse examples/example.txt` with the contents of `example_tree.txt`.
## TODO
* Fix the issue with parens in URLs — ideally by allowing one `)` for each `(` encountered in a URL, and the same with `[]`