{"id":13992235,"url":"https://github.com/h0tk3y/better-parse","last_synced_at":"2025-04-04T12:08:52.138Z","repository":{"id":22623490,"uuid":"96618996","full_name":"h0tk3y/better-parse","owner":"h0tk3y","description":"A nice parser combinator library for Kotlin","archived":false,"fork":false,"pushed_at":"2023-09-20T13:30:47.000Z","size":490,"stargazers_count":426,"open_issues_count":38,"forks_count":42,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-03-27T04:44:03.157Z","etag":null,"topics":["dsl","grammar","kotlin","language","parser","parser-combinator","syntax-trees"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/h0tk3y.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-07-08T12:54:57.000Z","updated_at":"2025-03-19T19:35:56.000Z","dependencies_parsed_at":"2023-09-25T01:34:05.190Z","dependency_job_id":null,"html_url":"https://github.com/h0tk3y/better-parse","commit_stats":{"total_commits":127,"total_committers":9,"mean_commits":14.11111111111111,"dds":0.1889763779527559,"last_synced_commit":"af4599c04f84463a4b708e7e1385217b41ae7b9e"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h0tk3y%2Fbetter-parse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h0tk3y%2Fbetter-parse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h0tk3y%2Fbetter-parse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h0tk3y%2Fbetter-parse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/h0tk3y","download_url":"https://codeload.github.com/h0tk3y/better-parse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247174423,"owners_count":20896078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dsl","grammar","kotlin","language","parser","parser-combinator","syntax-trees"],"created_at":"2024-08-09T14:01:53.544Z","updated_at":"2025-04-04T12:08:52.117Z","avatar_url":"https://github.com/h0tk3y.png","language":"Kotlin","readme":"# better-parse\n\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.h0tk3y.betterParse/better-parse/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.github.h0tk3y.betterParse/better-parse)\n[![Gradle build](https://github.com/h0tk3y/better-parse/workflows/Gradle%20build/badge.svg) ](https://github.com/h0tk3y/better-parse/actions?query=workflow%3A%22Gradle+build%22)\n\nA nice parser combinator library for Kotlin JVM, JS, and Multiplatform projects\n\n```kotlin\nval booleanGrammar = object : Grammar\u003cBooleanExpression\u003e() {\n    val id by regexToken(\"\\\\w+\")\n    val not by literalToken(\"!\")\n    val and by literalToken(\"\u0026\")\n    val or by literalToken(\"|\")\n    val ws by regexToken(\"\\\\s+\", ignore = true)\n    val lpar by literalToken(\"(\")\n    val rpar by literalToken(\")\")\n\n    val term by \n        (id use { Variable(text) }) or\n        (-not * parser(this::term) map { Not(it) }) or\n        (-lpar * parser(this::rootParser) * -rpar)\n\n    val andChain by leftAssociative(term, and) { l, _, r -\u003e And(l, r) }\n    override val rootParser by leftAssociative(andChain, or) { l, _, r -\u003e Or(l, r) }\n}\n\nval ast = booleanGrammar.parseToEnd(\"a \u0026 !b | b \u0026 (!a | c)\")\n ```\n    \n### Using with Gradle\n\n```groovy\ndependencies {\n   implementation(\"com.github.h0tk3y.betterParse:better-parse:0.4.4\")\n}\n```\n\nWith multiplatform projects, it's OK to add the dependency just to the `commonMain` source set, or some other source set if you want it for specific parts of the code.\n\n## Tokens ##\nAs many other language recognition tools, `better-parse` abstracts away from raw character input by \npre-processing it with a `Tokenizer`, that can match `Token`s (with regular expressions, literal values or arbitrary \nagainst an input character sequence.\n\nThere are several kinds of supported `Token`s:\n\n* a `regexToken(\"(?:my)?(?:regex))` is matched as a regular expression;\n* a `literalToken(\"foo\")` is matched literally, character to character;\n* a `token { (charSequence, from) -\u003e ... }` is matched using the passed function.\n\nA `Tokenizer` tokenizes an input sequence such as `InputStream` or a `String` into a `Sequence\u003cTokenMatch\u003e`, providing \neach with a position in the input.\n\nOne way to create a `Tokenizer` is to first define the `Tokens` to be matched:\n\n```kotlin\nval id = regexToken(\"\\\\w+\")\nval cm = literalToken(\",\")\nval ws = regexToken(\"\\\\s+\", ignore = true)\n```\n\n\u003e A `Token` can be ignored by setting its `ignore = true`. An ignored token can still be matched explicitly, but if \nanother token is expected, the ignored one is just dropped from the sequence.\n\n```kotlin\nval tokenizer = DefaultTokenizer(listOf(id, cm, ws))\n```\n    \n\u003e Note: the tokens order matters in some cases, because the tokenizer tries to match them in exactly this order. \n\u003e For instance, if `literalToken(\"a\")` \n\u003e is listed before `literalToken(\"aa\")`, the latter will never be matched. Be careful with keyword tokens! \n\u003e If you match them with regexes, a word boundary `\\b` in the end may help against ambiguity.\n\n```kotlin\nval tokenMatches: Sequence\u003cTokenMatch\u003e = tokenizer.tokenize(\"hello, world\")\n```\n    \n\u003e A more convenient way of defining tokens is described in the [**Grammar**](#grammar) section.\n\nIt is possible to provide a custom implementation of a `Tokenizer`.\n\n## Parser ##\n\nA `Parser\u003cT\u003e` is an object that accepts an input sequence (`TokenMatchesSequence`) and\ntries to convert some (from none to all) of its items into a `T`. In `better-parse`, parsers are also \nthe building blocks used to create new parsers by *combining* them.\n\nWhen a parser tries to process the input, there are two possible outcomes:\n\n* If it succeeds, it returns `Parsed\u003cT\u003e` containing the `T` result and the `nextPosition: Int` that points to what \nit left unprocessed. The latter can then be, and often is, passed to another parser.\n\n* If it fails, it reports the failure returning an `ErrorResult`, which provides detailed information about the failure.\n\nA very basic parser to start with is a `Token` itself: given an input `TokenMatchesSequence` and a position in it, \nit succeeds if the sequence starts with the match of this token itself \n_(possibly, skipping some **ignored** tokens)_ and returns that `TokenMatch`, pointing at the next token \nwith the `nextPosition`.\n\n```kotlin\nval a = regexToken(\"a+\")\nval b = regexToken(\"b+\")\nval tokenMatches = DefaultTokenizer(listOf(a, b)).tokenize(\"aabbaaa\")\nval result = a.tryParse(tokenMatches, 0) // contains the match for \"aa\" and the next index is 1 for the match of b\n```\n    \n## Combinators ## \n\nSimpler parsers can be combined to build a more complex parser, from tokens to terms and to the whole language. \nThere are several kinds of combinators included in `better-parse`:\n\n* `map`, `use`, `asJust`\n \n    The map combinator takes a successful input of another parser and applies a transforming function to it. \n    The error results are returned unchanged.\n    \n    ```kotlin\n    val id = regexToken(\"\\\\w+\")\n    val aText = a map { it.text } // Parser\u003cString\u003e, returns the matched text from the input sequence\n    ```\n      \n    A parser for objects of a custom type can be created with `map`:\n    \n    ```kotlin\n    val variable = a map { JavaVariable(name = it.text) } // Parser\u003cJavaVariable\u003e.\n    ```\n      \n    * `someParser use { ... }` is a `map` equivalent that takes a function with receiver instead. Example: `id use { text }`.\n    \n    * `foo asJust bar` can be used to map a parser to some constant value.\n    \n* `optional(...)`\n \n     Given a `Parser\u003cT\u003e`, tries to parse the sequence with it, but returns a `null` result if the parser failed, and thus never fails itself:\n     \n     ```kotlin\n     val p: Parser\u003cT\u003e = ...\n     val o = optional(p) // Parser\u003cT?\u003e    \n     ```\n\n* `and`, `and skip(...)`\n\n    The tuple combinator arranges the parsers in a sequence, so that the remainder of the first one goes to the second one and so on. \n    If all the parsers succeed, their results are merged into a `Tuple`. If either parser failes, its `ErrorResult` is returned by the combinator.\n    \n    ```kotlin\n    val a: Parser\u003cA\u003e = ...\n    val b: Parser\u003cB\u003e = ...\n    val aAndB = a and b                 // This is a `Parser\u003cTuple2\u003cA, B\u003e\u003e`\n    val bAndBAndA = b and b and a       // This is a `Parser\u003cTuple3\u003cB, B, A\u003e\u003e`\n    ```\n      \n     You can `skip(...)` components in a tuple combinator: the parsers will be called just as well, but their results won't be included in the\n     resulting tuple:\n     \n     ```kotlin\n     val bbWithoutA = skip(a) and b and skip(a) and b and skip(a)  // Parser\u003cTuple2\u003cB, B\u003e\u003e\n     ```\n      \n     \u003e If all the components in an `and` chain are skipped except for one `Parser\u003cT\u003e`, the resulting parser\n      is `Parser\u003cT\u003e`, not `Parser\u003cTuple1\u003cT\u003e\u003e`. \n      \n     To process the resulting `Tuple`, use the aforementioned `map` and `use`. These parsers are equivalent:\n     \n     * ```val fCall = id and skip(lpar) and id and skip(rpar) map { (fName, arg) -\u003e FunctionCall(fName, arg) }```\n      \n     * ```val fCall = id and lpar and id and rpar map { (fName, _, arg, _) -\u003e FunctionCall(fName, arg) }```\n      \n     * ```val fCall = id and lpar and id and rpar use { FunctionCall(t1, t3) }```\n     \n     * ```val fCall = id * -lpar * id * -rpar use { FunctionCall(t1, t2) }``` (see operators below)\n     \n     \u003e There are `Tuple` classes up to `Tuple16` and the corresponding `and` overloads.\n     \n     ##### Operators\n     \n     There are operator overloads for more compact `and` chains definition:\n     \n     * `a * b` is equivalent to `a and b`.\n     \n     * `-a` is equivalent to `skip(a)`.\n     \n     With these operators, the parser `a and skip(b) and skip(c) and d` can also be defined as \n     `a * -b * -c * d`.\n     \n * `or`\n \n     The alternative combinator tries to parse the sequence with the parsers it combines one by one until one succeeds. If all the parsers fail,\n     the returned `ErrorResult` is an `AlternativesFailure` instance that contains all the failures from the parsers.\n     \n     The result type for the combined parsers is the least common supertype (which is possibly `Any`).\n     \n     ```kotlin\n     val expr = const or variable or fCall\n     ```\n     \n  * `zeroOrMore(...)`, `oneOrMore(...)`, `N times`, `N timesOrMore`, `N..M times`\n  \n      These combinators transform a `Parser\u003cT\u003e` into a `Parser\u003cList\u003cT\u003e\u003e`, invokng the parser several times and failing if there was not\n      enough matches.\n      \n      ```kotlin\n      val modifiers = zeroOrMore(functionModifier)\n      val rectangleParser = 4 times number map { (a, b, c, d) -\u003e Rect(a, b, c, d) }\n      ```\n      \n  * `separated(term, separator)`, `separatedTerms(term, separator)`, `leftAssociative(...)`, `rightAssociative(...)`\n  \n      Combines the two parsers, invoking them in turn and thus parsing a sequence of `term` matches separated by `separator` matches.\n      \n      The result is a `Separated\u003cT, S\u003e` which provides the matches of both parsers (note that terms are one more than separators) and \n      can also be reduced in either direction.\n      \n      ```kotlin\n      val number: Parser\u003cInt\u003e = ...\n      val sumParser = separated(number, plus) use { reduce { a, _, b -\u003e a + b } }\n      ```\n  \n      The `leftAssociative` and `rightAssociative` combinators do exactly this, but they take the reducing operation as they are built:\n      \n      ```kotlin\n      val term: Parser\u003cTerm\u003e\n      val andChain = leftAssociative(term, andOperator) { l, _, r -\u003e And(l, r) }\n      ```\n        \n## Grammar\n\nAs a convenient way of defining a grammar of a language, there is an abstract class `Grammar`, that collects the `by`-delegated \nproperties into a `Tokenizer` automatically, and also behaves as a composition of the `Tokenizer` and the `rootParser`.\n\n*Note:* a `Grammar` also collects `by`-delegated `Parser\u003cT\u003e` properties so that they can be accessed as \n`declaredParsers` along with the tokens. As a good style, declare the parsers inside a `Grammar` by delegation as well.\n\n```kotlin\ninterface Item\nclass Number(val value: Int) : Item\nclass Variable(val name: String) : Item\n\nclass ItemsParser : Grammar\u003cList\u003cItem\u003e\u003e() {\n    val num by regexToken(\"\\\\d+\")\n    val word by regexToken(\"[A-Za-z]+\")\n    val comma by regexToken(\",\\\\s+\")\n\n    val numParser by num use { Number(text.toInt()) }\n    val varParser by word use { Variable(text) }\n\n    override val rootParser by separatedTerms(numParser or varParser, comma)\n}\n\nval result: List\u003cItem\u003e = ItemsParser().parseToEnd(\"one, 2, three, 4, five\")\n```\n    \nTo use a parser that has not been constructed yet, reference it with `parser { someParser }` or `parser(this::someParser)`:\n\n```kotlin\nval term by\n    constParser or \n    variableParser or \n    (-lpar and parser(this::term) and -rpar)\n```\n\nA `Grammar` implementation can override the `tokenizer` property to provide a custom implementation of `Tokenizer`.\n\n## Syntax trees\n\nA `Parser\u003cT\u003e` can be converted to another `Parser\u003cSyntaxTree\u003cT\u003e\u003e`, where a `SyntaxTree\u003cT\u003e`, along with the parsed `T` \ncontains the children syntax trees, the reference to the parser and the positions in the input sequence. \nThis can be done with `parser.liftToSyntaxTreeParser()`.\n\nThis can be used for syntax highlighting and inspecting the resulting tree in case the parsed result\ndoes not contain the full syntactic structure.\n\nFor convenience, a `Grammar` can also be lifted to that parsing a `SyntaxTree` with \n`grammar.liftToSyntaxTreeGrammar()`. \n\n```kotlin\nval treeGrammar = booleanGrammar.liftToSyntaxTreeGrammar()\nval tree = treeGrammar.parseToEnd(\"a \u0026 !b | c -\u003e d\")\nassertTrue(tree.parser == booleanGrammar.implChain)\nval firstChild = tree.children.first()\nassertTrue(firstChild.parser == booleanGrammar.orChain)\nassertTrue(firstChild.range == 0..9)\n```\n\nThere are optional arguments for customizing the transformation:\n\n* `LiftToSyntaxTreeOptions`\n  * `retainSkipped` — whether the resulting syntax tree should include skipped `and` components;\n  * `retainSeparators` — whether the `Separated` combinator parsed separators should be included;\n* `structureParsers` — defines the parsers that are retained in the syntax tree; the nodes with parsers that are\n  not in this set are flattened so that their children are attached to their parents in their place. \n  \n  For `Parser\u003cT\u003e`, the default is `null`, which means no nodes are flattened.\n  \n  In case of `Grammar\u003cT\u003e`, `structureParsers` defaults to the grammar's `declaredParsers`.\n   \n* `transformer` — a strategy to transform non-built-in parsers. If you define your own combinators and want them\n  to be lifted to syntax tree parsers, pass a `LiftToSyntaxTreeTransformer` that will be called on the parsers. When\n  a custom combinator nests another parser, a transformer implementation should call `default.transform(...)` on that parser.\n\nSee [`SyntaxTreeDemo.kt`](https://github.com/h0tk3y/better-parse/blob/master/demo/demo-jvm/src/main/kotlin/com/example/SyntaxTreeDemo.kt) for an example of working with syntax trees.   \n\n## Examples\n\n* A boolean expressions parser that constructs a simple AST: [`BooleanExpression.kt`](https://github.com/h0tk3y/better-parse/blob/master/demo/demo-jvm/src/main/kotlin/com/example/BooleanExpression.kt)\n* An integer arithmetic expressions evaluator: [`ArithmeticsEvaluator.kt`](https://github.com/h0tk3y/better-parse/blob/master/demo/demo-jvm/src/main/kotlin/com/example/ArithmeticsEvaluator.kt)\n* A toy programming language parser: [(link)](https://github.com/h0tk3y/compilers-course/blob/master/src/main/kotlin/com/github/h0tk3y/compilersCourse/parsing/Parser.kt)\n* A sample JSON parser by [silmeth](https://github.com/silmeth): [(link)](https://github.com/silmeth/jsonParser)\n\n## Benchmarks\n\nSee the benchmarks repository [`h0tk3y/better-parse-benchmark`](https://github.com/h0tk3y/better-parse-benchmark) and feel free to contribute.\n","funding_links":[],"categories":["Kotlin"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh0tk3y%2Fbetter-parse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fh0tk3y%2Fbetter-parse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh0tk3y%2Fbetter-parse/lists"}