Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/denosaurs/tokenizer
⚙️ A simple tokenizer for deno
https://github.com/denosaurs/tokenizer
deno lexer tokenizer
Last synced: 2 months ago
JSON representation
⚙️ A simple tokenizer for deno
- Host: GitHub
- URL: https://github.com/denosaurs/tokenizer
- Owner: denosaurs
- License: mit
- Created: 2019-09-07T17:13:37.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-05-05T12:37:32.000Z (over 2 years ago)
- Last Synced: 2024-04-13T21:20:16.082Z (9 months ago)
- Topics: deno, lexer, tokenizer
- Language: TypeScript
- Size: 38.1 KB
- Stars: 15
- Watchers: 3
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-deno - deno_tokenizer - 简单的Deno标记器。 (Uncategorized / Uncategorized)
README
# Tokenizer [![Badge License]][License]
*A simple **Deno** library*
---
[![Badge Status]][Actions]
---
## Examples
```js
import { Tokenizer } from 'https://deno.land/x/tokenizer/mod.ts';const input = 'abc 123 HELLO [a cool](link)';
const rules = [{
type : 'HELLO' ,
pattern : 'HELLO'
},{
type : 'WORD' ,
pattern : /[a-zA-Z]+/
},{
type : 'DIGITS' ,
pattern : /\d+/ ,
value : m => Number.parseInt(m.match)
},{
type : 'LINK' ,
pattern : /\[([^\[]+)\]\(([^\)]+)\)/
},{
type : 'SPACE' ,
pattern : / / ,
ignore: true // Or leave type blank and remove "ignore: true"
}];const tokenizer = new Tokenizer(input,rules);
```
### Option A
```js
console.log(...tokenizer);
``````
{ type: "WORD", match: "abc", value: "abc", groups: [], position: { start: 0, end: 3 } },
{ type: "DIGITS", match: "123", value: 123, groups: [], position: { start: 4, end: 7 } },
{ type: "HELLO", match: "HELLO", value: "HELLO", groups: [], position: { start: 8, end: 13 } },
{ type: "LINK", match: "[a cool](link)", value: "[a cool](link)", groups: [ "a cool", "link" ], position: { start: 14, end: 28 } }
```
### Option B
```js
while(!tokenizer.done)
console.log(tokenizer.next().value);
``````
{ type: "WORD", match: "abc", value: "abc", groups: [], position: { start: 0, end: 3 } }
```
```
{ type: "DIGITS", match: "123", value: 123, groups: [], position: { start: 4, end: 7 } }
```
```
{ type: "HELLO", match: "HELLO", value: "HELLO", groups: [], position: { start: 8, end: 13 } }
```
```
{ type: "LINK", match: "[a cool](link)", value: "[a cool](link)", groups: [ "a cool", "link" ], position: { start: 14, end: 28 } }
```
### Option C
```js
// Add a parameter to the tokenize method to override the source string
console.log(tokenizer.tokenize());
``````
[{ type: "WORD", match: "abc", value: "abc", groups: [], position: { start: 0, end: 3 } },
{ type: "DIGITS", match: "123", value: 123, groups: [], position: { start: 4, end: 7 } },
{ type: "HELLO", match: "HELLO", value: "HELLO", groups: [], position: { start: 8, end: 13 } },
{ type: "LINK", match: "[a cool](link)", value: "[a cool](link)", groups: [ "a cool", "link" ], position: { start: 14, end: 28 } } ]
```## TODO
- [x] Custom patterns using functions
- [x] Add position information to Token
- [x] Array patterns (Multiple patterns for the same rule)
- [x] Documentation
- [x] Better error handling
- [x] Group matching
- [x] Value transform
- [ ] More and better tests for everything
- [ ] Examples
- [ ] Line and column information? Or just a helper function to get line and column from index
- [ ] BNF / EBNF ?
- [ ] Generate a tokenizer[Badge License]: https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge
[Badge Status]: https://github.com/eliassjogreen/deno_tokenizer/workflows/Tests/badge.svg[Actions]: https://github.com/eliassjogreen/deno_tokenizer/actions
[License]: LICENSE