{"id":15554969,"url":"https://github.com/drafakiller/tokenparser-dart","last_synced_at":"2025-08-02T23:08:39.716Z","repository":{"id":64112483,"uuid":"572567454","full_name":"DrafaKiller/TokenParser-dart","owner":"DrafaKiller","description":"An intuitive Token Parser that includes grammar definition, tokenization, parsing, syntax error and debugging. Implementation based on Lexical Analysis for Dart.","archived":false,"fork":false,"pushed_at":"2022-12-19T10:38:54.000Z","size":799,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T03:01:29.003Z","etag":null,"topics":["dart","debugging","grammar","lexer","lexical-analysis","package","parser","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Dart","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DrafaKiller.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-11-30T14:52:52.000Z","updated_at":"2024-08-26T13:35:50.000Z","dependencies_parsed_at":"2023-01-14T22:45:26.024Z","dependency_job_id":null,"html_url":"https://github.com/DrafaKiller/TokenParser-dart","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrafaKiller%2FTokenParser-dart","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrafaKiller%2FTokenParser-dart/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrafaKiller%2FTokenParser-dart/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrafaKiller%2FTokenParser-dart/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DrafaKiller","download_url":"https://codeload.github.com/DrafaKiller/TokenParser-dart/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250504085,"owners_count":21441527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dart","debugging","grammar","lexer","lexical-analysis","package","parser","tokenizer"],"created_at":"2024-10-02T15:05:26.854Z","updated_at":"2025-04-23T19:48:36.264Z","avatar_url":"https://github.com/DrafaKiller.png","language":"Dart","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Pub.dev package](https://img.shields.io/badge/pub.dev-token__parser-blue)](https://pub.dev/packages/token_parser)\n[![GitHub repository](https://img.shields.io/badge/GitHub-TokenParser--dart-blue?logo=github)](https://github.com/DrafaKiller/TokenParser-dart)\n\n# Token Parser\n\nAn intuitive Token Parser that includes syntax/grammar definition, tokenization and parsing.\n\nImplementation based on Lexical Analysis.\u003cbr\u003e\nRead more about it on [Wikipedia](https://en.wikipedia.org/wiki/Lexical_analysis), or with a [Basic Diagram](https://raw.githubusercontent.com/DrafaKiller/TokenParser-dart/main/doc/Lexical%20Analysis.png).\n\n## Features\n\n- Syntax/grammar definition\n- Tokenization\n- Parsing\n- Referencing, and self-reference\n- Lexical Syntax Error\n- Debugging\n\n## Getting Started\n\n```\ndart pub add token_parser\n```\n\nAnd import the package:\n\n```dart\nimport 'package:token_parser/token_parser.dart';\n```\n\n## Usage\n\nThis package is based on a syntax/grammar definition, which is a list of lexemes that define the grammar. Here is a brief example:\n\n```dart\nfinal letter = '[a-zA-Z]'.regex;\nfinal digit = '[0-9]'.regex;\n\nfinal number = digit.multiple \u0026 ('.' \u0026 digit.multiple).optional;\nfinal identifier = letter \u0026 (letter | digit).multiple.optional;\n\nfinal grammar = Grammar(\n  main: identifier + '=' + number,\n  rules: {\n    'letter': letter,\n    'digit': digit,\n\n    'number': number,\n    'identifier': identifier,\n  }\n);\n\nvoid main() {\n  final result = grammar.parse('myNumber = 12.3');\n\n  print('Identifier: ${ result.get(lexeme: identifier).first.value }');\n  print('Number: ${ result.get(lexeme: number).first.value }');\n  // [Output]\n  // Identifier: myNumber\n  // Number: 12.3\n}\n```\n\n## Lexeme\n\nA **lexeme** is a grammar definition that will be used to tokenize an input.\nIt's a pattern that must be matched, essentially a grammar rule.\n\nThe syntax/grammar definition is done by defining what each token must have, using Lexical Analysis.\n\nThis composition of lexemes is what will define the grammar.\nLexemes can contain other lexemes to form a more complex lexical grammar.\n\n```dart\nfinal abc = 'a' | 'b' | 'c';\nfinal def = 'd' | 'e' | 'f';\n\nfinal expression = abc \u0026 def;\n```\n\nUsing the `\u0026` operator to combine tokens with an \"and\" operation, and the `|` operator to combine tokens with an \"or\" operation.\nWe can define an expression that can take any combination of the lexemes `abc` and `def`.\n\nLexemes may be extended to have slightly different properties.\n\n```dart\nfinal abc = ('a' | 'b' | 'c').multiple;\n\nfinal expression = abc \u0026 'd'.optional;\n```\n\nFor convenience, a lexeme can be defined using a regular expression.\n\nLexeme modification methods available:\n  - `.not`\n  - `.multiple` / `.multipleOrNone`\n  - `.full`\n  - `.optional`\n  - `.regex`\n  - `.character`\n  - `.spaced`\n  - `.optionalSpaced`\n  - `.repeat(int min, [int max])`\n  - `.until(Pattern pattern)`\n  - `.pad(Pattern pattern)`\n\n```dart\nfinal digit = '[0-9]'.regex;\nfinal number = digit.multiple \u0026 ('.' \u0026 digit.multiple).optional;\n\nfinal letter = '[a-zA-Z]'.regex;\nfinal word = letter.multiple;\nfinal phrase = word \u0026 (' ' \u0026 word).multiple.optional;\n```\n\n### Operators\n\nPatterns, such as String, RegExp and Lexeme can be combined or modified using operators.\n\nSome operators can only be used to combine patterns, and others can only be used to modify patterns.\nModifying operators must be placed before the target pattern.\n\nThe available operators are:\n\nOperator   | Description             | Action  | Tokenization\n---------- | ----------------------- | ------- | ------------\n`\u0026`        | And                     | Combine | `ab`\n`\\|`       | Or                      | Combine | `a` / `b`\n`+`        | And, spaced             | Combine | `a b`\n`*`        | And, optional spaced    | Combine | `ab` / `a b`\n`-`        | Not                     | Modify  | `c`\n`~`        | Optional spaced around  | Modify  | `\"a\"` / `\" a \"`\n\n### Negative Lexemes\n\nThe negation of lexemes might work differently than expected.\nNegation will not consume the input, but rather ensure that the pattern ahead does not match the target lexeme.\n\nThis means negating a lexeme does **not** mean the same as \"any character that is not\".\nTo consume any character that doesn't match the lexeme, use a `.not.character` combination.\n\nAdditionally, notice the difference between the use of the negation operator with other modifiers:\n```dart\nfinal wrongLexeme = -'a'.multiple.optional;\nfinal lexeme = (-'a').multiple.optional;\n```\n\nAlthough `-'a'` would consume any character that is not \"a\", the multiple and optional are added before the negation. The negation of `wrongLexeme` was applied to the optional lexeme. \n\nTo ensure that the negation of a character is applied to the multiple and optional, you may use `.not.character.multiple.optional`\n\n### Reference, and self-reference\n\nReference lexemes are placeholders,\nthat when requested to tokenize an input will find the lexeme in the grammar bound to it,\nassociated with a name.\n\nLexemes can be referenced using the functions `reference(String name)` and `self()`, or `ref(String name)` for short.\n\n```dart\nfinal abc = 'a' | 'b' | 'c' | reference('def');\nfinal def = ('d' | 'e' | 'f') \u0026 self().optional;\n```\n\nFor a reference to have an effect, it must be bound to the grammar,\nand the referenced lexeme must be present in the same grammar.\nIf referenced lexeme is not present, it will throw an error when tokenizing.\n\n## Grammar\n\nA grammar is a list of lexemes that will be used to parse an input,\nessentially a list of rules that define the language.\n\nA grammar has an entry point, called the **main** lexeme.\nThis lexeme is used to parse the input and will be the only one returned.\n\nGrammar can be defined in two ways, using the constructor:\n\n```dart\nfinal grammar = Grammar(\n  main: phrase | number,\n  rules: {\n    'digit': digit,\n    'number': number,\n\n    'letter': letter,\n    'word': word,\n    'phrase': phrase,\n\n    'abc': 'a' | 'b' | 'c',\n    'def': 'd' | 'e' | 'f',\n  },\n);\n```\n\nOr using the `.add(String name, Pattern pattern)` method:\n\n```dart\nfinal grammar = Grammar();\n\ngrammar.addMain(phrase | number);\n\ngrammar.add('digit', digit);\ngrammar.add( ... );\n```\n\nLexemes can tokenize an input by themselves,\nbut it's often more consistent to group the lexemes in a grammar.\n\nThat way allowing the use of **references and main lexeme**.\nAdding any lexeme to a grammar will effectively bind them together,\nalong with a name, and resolves any **self-references**.\n\n### Parsing an input\n\nThe grammar is used for parsing any input, which will tokenize it,\ntaking into account all the lexemes previously added.\n\nParse an input using `.parse(String input, { Lexeme? main })` method.\n\n```dart\nfinal grammar = Grammar(...);\n\ngrammar.parse('123');\ngrammar.parse('123.456');\n\ngrammar.parse('word');\ngrammar.parse('two words');\n```\n\nYou can override the main lexeme used for parsing the input,\nby passing it as a parameter.\n\nWhen parsing an input, it will return a resulting token,\nwhich can be used to get the value and position of the lexemes that matched.\nIt can also be used to get the children tokens.\n\n### Removing unwanted patterns\n\nWhen parsing an input, the grammar may not care about some lexemes, such as comments.\nTo remove these patterns you can use the `remove` parameter in the constructor, or\nthe `.addRemover(Lexeme lexeme)` method.\n\n## Token\n\nA token is a result of matching a lexeme to an input.\nIt contains the value of the lexeme that matched and the position of the token.\n\nThe process of generating this token is called **tokenization**.\n\n```dart\nfinal grammar = Grammar(...);\nfinal token = grammar.parse('123');\n\nprint('''\n  Value: ${ token.value }\n  Lexeme: ${ token.lexeme.name }\n\n  Start: ${ token.start }\n  End: ${ token.end }\n  Length: ${ token.length }\n''');\n```\n\n### Lexical Syntax Error\n\nWhen tokenizing, if the input doesn't match any lexeme,\nit will throw a `LexicalSyntaxError` error.\n\nThis error displays the position of the error and the lexemes that were expected to match the input.\nAdditionally, it will display the list of the lexemes that were traversed, as the path to the error.\n\nThis error will skip any lexeme that is not named.\n\n### Analyzing the Token Tree\n\nYou may use this token to analyze the resulting tree. Using the `.get({ Lexeme? lexeme, String? name })` method will get all the tokens that match the lexeme or name.\n\nThe reach of the search can be limited by using the `bool shallow` parameter, the default is `false` when having a lexeme or name, and `true` when no search parameters are given.\n\n```dart\nfinal result = grammar.parse('two words');\n\nfinal tokens = result.get();\nfinal words = result.get(lexeme: word);\nfinal letters = result.get(name: 'letter');\n\nprint('Words: ${ words.map((token) =\u003e token.value) }');\nprint('Letters: ${ letters.get(letter).map((token) =\u003e token.value) }');\n```\n\nYou may also use the `.children` and `.allChildren` for a more direct approach.\nAlthough the children are not guaranteed to be tokens,\nthey may also be basic matching values, such as of Match type.\n\n## Debugging\n\nIt's important to know how the grammar is tokenizing the input,\nand what lexemes are being used.\nFor this reason, a debug mode and syntax errors are available.\n\n### Debug Mode\n\nEnable the debug mode by instantiating a `DebugGrammar` instead of a `Grammar`.\n\n```dart\nfinal grammar = DebugGrammar(...);\n```\n\nAdditionally, you can specify debugging parameters:\n- `bool showAll`: Include lexemes with no name, defaults to `false`\n- `bool showPath`: Show the path to the lexeme, defaults to `false`\n- `Duration delay`: Delay between each step, defaults to `Duration.zero`\n\nThe informative output is as follows.\n\n```log\n│\n│  (#3)\n├► Tokenizing named syntaxRule\n│    at index 0, character \"/\"\n│    on path: (main) → syntax → syntaxRule\n│\n```\n\n### Lexical Syntax Errors\n\nSyntax errors are thrown when the input doesn't match a required lexeme.\nThe error will display the character, index, lexeme and path.\n\n```log\nLexicalSyntaxError: Unexpected character \"/\"\n\tat index 0\n\twith lexeme \"syntax\"\n\ton path:\n\t\t→ syntax\n\t\t↑ (main)\n```\n\n## Example\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n    Tokenization\n    \u003ca href=\"https://github.com/DrafaKiller/TokenParser-dart/blob/main/example/main.dart\"\u003e\n      \u003ccode\u003e(/example/main.dart)\u003c/code\u003e\n    \u003c/a\u003e\n  \u003c/summary\u003e\n    \n  ```dart\n  import 'package:token_parser/token_parser.dart';\n\n  final whitespace = ' ' | '\\t';\n  final lineBreak = '\\n' | '\\r';\n  final space = (whitespace | lineBreak).multiple;\n\n  final letter = '[a-zA-Z]'.regex;\n  final digit = '[0-9]'.regex;\n\n  final identifier = letter \u0026 (letter | digit).multiple.optional;\n\n  final number = digit.multiple \u0026 ('.' \u0026 digit.multiple).optional;\n  final string = '\"' \u0026 '[^\"]*'.regex \u0026 '\"'\n                | \"'\" \u0026 \"[^']*\".regex \u0026 \"'\";\n\n  final variableDeclaration =\n    'var' \u0026 space \u0026 identifier \u0026 space.optional \u0026 '=' \u0026 space.optional \u0026 (number | string) \u0026 space.optional \u0026 (';' | space);\n\n  final grammar = Grammar(\n    main: (variableDeclaration | space).multiple,\n    rules: {\n      'whitespace': whitespace,\n      'lineBreak': lineBreak,\n      'space': space,\n\n      'letter': letter,\n      'digit': digit,\n\n      'identifier': identifier,\n\n      'number': number,\n      'string': string,\n\n      'variableDeclaration': variableDeclaration,\n    },\n  );\n\n  void main() {\n    final result = grammar.parse('''\n      var hello = \"world\";\n      var foo = 123;\n      var bar = 123.456;\n    ''');\n\n    final numbers = result.get(lexeme: number).map((token) =\u003e token.value);\n    final identifiers = result.get(lexeme: identifier).map((token) =\u003e '\"${ token.value }\"');\n\n    print('Numbers: $numbers');\n    print('Identifiers: $identifiers');\n  }\n  ```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n    Referencing\n    \u003ca href=\"https://github.com/DrafaKiller/TokenParser-dart/blob/main/example/reference.dart\"\u003e\n      \u003ccode\u003e(/example/reference.dart)\u003c/code\u003e\n    \u003c/a\u003e\n  \u003c/summary\u003e\n    \n  ```dart\n  import 'package:token_parser/token_parser.dart';\n\n  final expression = 'a' \u0026 Lexeme.reference('characterB').optional;\n  final characterB = 'b'.lexeme();\n\n  final recursive = 'a' \u0026 Lexeme.self().optional;\n\n  final grammar = Grammar(\n    main: expression,\n    rules: {\n      'expression': expression,\n      'characterB': characterB,\n      \n      'recursive': recursive,\n    }\n  );\n\n  void main() {\n    print(grammar.parse('ab').get(lexeme: characterB));\n    print(grammar.parse('aaa', recursive).get(lexeme: recursive));\n  }\n  ```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n    RGB Color Parser\n    \u003ca href=\"https://github.com/DrafaKiller/TokenParser-dart/blob/main/example/rgb_color.dart\"\u003e\n      \u003ccode\u003e(/example/rgb_color.dart)\u003c/code\u003e\n    \u003c/a\u003e\n  \u003c/summary\u003e\n\n  ```dart\n  import 'package:token_parser/token_parser.dart';\n\n  void main() {\n    final result = grammar.parse('rgb(255, 100, 0)');\n    print('Red: ${ result.get(lexeme: red).first.value }');\n    print('Green: ${ result.get(lexeme: green).first.value }');\n    print('Blue: ${ result.get(lexeme: blue).first.value }');\n\n    // [Output]\n    // Red: 255\n    // Green: 100\n    // Blue: 0\n  }\n\n  final grammar = Grammar(\n    main: rgb,\n    rules: {\n      'rgbNumber': rgbNumber,\n\n      'red': red,\n      'green': green,\n      'blue': blue,\n\n      'rgb': rgb,\n    }\n  );\n\n  final rgbNumber = range(0, 255).lexeme();\n\n  final red = rgbNumber.copy();\n  final green = rgbNumber.copy();\n  final blue = rgbNumber.copy();\n\n  final rgb = [ 'rgb(', red, ',', green, ',', blue, ')' ].optionalSpaced;\n  ```\n\u003c/details\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrafakiller%2Ftokenparser-dart","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrafakiller%2Ftokenparser-dart","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrafakiller%2Ftokenparser-dart/lists"}