{"id":13837759,"url":"https://github.com/alllex/parsus","last_synced_at":"2025-03-23T22:35:23.678Z","repository":{"id":67344124,"uuid":"358529919","full_name":"alllex/parsus","owner":"alllex","description":"Parser-combinators with Multiplatform Kotlin Coroutines","archived":false,"fork":false,"pushed_at":"2024-01-15T15:17:01.000Z","size":551,"stargazers_count":133,"open_issues_count":6,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-03-06T09:13:23.424Z","etag":null,"topics":["combinators","coroutines","kotlin","kotlin-multiplatform","parser"],"latest_commit_sha":null,"homepage":"https://javadoc.io/doc/me.alllex.parsus/parsus-jvm/latest/index.html","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alllex.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-16T08:30:15.000Z","updated_at":"2024-05-30T07:12:54.065Z","dependencies_parsed_at":null,"dependency_job_id":"dbd87a8a-cf5f-4240-9438-f6f44ce1bd3d","html_url":"https://github.com/alllex/parsus","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alllex%2Fparsus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alllex%2Fparsus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alllex%2Fparsus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alllex%2Fparsus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alllex","download_url":"https://codeload.github.com/alllex/parsus/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221923142,"owners_count":16902459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["combinators","coroutines","kotlin","kotlin-multiplatform","parser"],"created_at":"2024-08-04T15:01:24.256Z","updated_at":"2024-10-28T19:49:00.818Z","avatar_url":"https://github.com/alllex.png","language":"Kotlin","readme":"# Parsus\n\n[![Maven Central](https://img.shields.io/maven-central/v/me.alllex.parsus/parsus.svg?color=success)](https://search.maven.org/search?q=g:me.alllex.parsus)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Gradle build](https://github.com/alllex/parsus/actions/workflows/check.yml/badge.svg)](https://github.com/alllex/parsus/actions/workflows/check.yml)\n\nA framework for writing composable parsers for JVM, JS and Kotlin/Native based on Kotlin Coroutines.\n\n```kotlin\nval booleanGrammar = object : Grammar\u003cExpr\u003e() {\n    init { regexToken(\"\\\\s+\", ignored = true) }\n    val id by regexToken(\"\\\\w+\")\n    val lpar by literalToken(\"(\")\n    val rpar by literalToken(\")\")\n    val not by literalToken(\"!\")\n    val and by literalToken(\"\u0026\")\n    val or by literalToken(\"|\")\n    val impl by literalToken(\"-\u003e\")\n\n    val variable by id map { Var(it.text) }\n    val negation by -not * ref(::term) map { Not(it) }\n    val braced by -lpar * ref(::root) * -rpar\n\n    val term: Parser\u003cExpr\u003e by variable or negation or braced\n\n    val andChain by leftAssociative(term, and, ::And)\n    val orChain by leftAssociative(andChain, or, ::Or)\n    val implChain by rightAssociative(orChain, impl, ::Impl)\n\n    override val root by implChain\n}\n\nval ast = booleanGrammar.parse(\"a \u0026 (b1 -\u003e c1) | a1 \u0026 !b | !(a1 -\u003e a2) -\u003e a\").getOrThrow()\n```\n\n## Usage\n\n\u003cdetails open\u003e\n\u003csummary\u003eUsing with Gradle for JVM projects\u003c/summary\u003e\n\n```kotlin\ndependencies {\n    implementation(\"me.alllex.parsus:parsus-jvm:0.6.1\")\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n\u003csummary\u003eUsing with Gradle for Multiplatform projects\u003c/summary\u003e\n\n```kotlin\nkotlin {\n    sourceSets {\n        val commonMain by getting {\n            dependencies {\n                implementation(\"me.alllex.parsus:parsus:0.6.1\")\n            }\n        }\n    }\n}\n```\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003eUsing with Maven for JVM projects\u003c/summary\u003e\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003eme.alllex.parsus\u003c/groupId\u003e\n  \u003cartifactId\u003eparsus-jvm\u003c/artifactId\u003e\n  \u003cversion\u003e0.6.1\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n\u003c/details\u003e\n\n## Features\n\n* **0-dependencies**. Parsus only depends on Kotlin Standard Library.\n* **Pure Kotlin**. Parsers are specified by users directly in Kotlin without the need for any codegen.\n* **Debuggable**. Since parsers are pure non-generated Kotlin, they can be debugged like any other program.\n* **Stack-Neutral**. Leveraging the power of coroutines, parsers are able to process inputs with arbitrary nesting\n  entirely avoiding stack-overflow problems.\n* **Extensible**. Parser combinators provided out-of-the-box are built on top of only a few core primitives. Therefore,\n  users can extend the library with custom powerful combinators suitable for their use-case.\n* **Composable**. Parsers are essentially functions, so they can be composed in imperative or declarative fashion\n  allowing for unlimited flexibility.\n\nThere are, however, no pros without cons. Parsus relies heavily on coroutines machinery. This comes at a cost of some\nperformance and memory overhead as compared to other techniques such as generating parsers at compile-time from special\ngrammar formats.\n\n## Quick Reference\n\nThis is a reference of some of the basic combinators provided by the library.\n\nThere is a combinator available in both procedural-style and combinator-style grammars.\nYou can pick and choose the style for each parser and sub-parser, as there are no restrictions.\n\n| Description | Grammars |\n| ----------- | -------- |\n| Parsing a token and getting its text\u0026#13;\u0026#13;Parses: `ab`, `aB` | Procedural:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval ab by regexToken(\"a[bB]\")\u0026#13;override val root by parser {\u0026#13;    val abMatch = ab()\u0026#13;    abMatch.text\u0026#13;}\u003c/pre\u003eCombinator:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval ab by regexToken(\"a[bB]\")\u0026#13;override val root by ab map { it.text }\u003c/pre\u003e |\n| Parsing two tokens sequentially\u0026#13;\u0026#13;Parses: `ab`, `aB` | Procedural:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by parser {\u0026#13;    val aMatch = a()\u0026#13;    val bMatch = b()\u0026#13;    aMatch.text to bMatch.text\u0026#13;}\u003c/pre\u003eCombinator:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by a and b map\u0026#13;    { (aM, bM) -\u003e aM.text to bM.text }\u003c/pre\u003e |\n| Parsing one of two tokens\u0026#13;\u0026#13;Parses: `a`, `b`, `B` | Procedural:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by parser {\u0026#13;    val abMatch = choose(a, b)\u0026#13;    abMatch.text\u0026#13;}\u003c/pre\u003eCombinator:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by a or b map { it.text }\u003c/pre\u003e |\n| Parsing an optional token\u0026#13;\u0026#13;Parses: `ab`, `aB`, `b`, `B` | Procedural:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by parser {\u0026#13;    val aMatch = poll(a)\u0026#13;    val bMatch = b()\u0026#13;    aMatch?.text to bMatch.text\u0026#13;}\u003c/pre\u003eCombinator:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by maybe(a) and b map\u0026#13;    { (aM, bM) -\u003e aM?.text to bM.text }\u003c/pre\u003e |\n| Parsing a token and ignoring its value\u0026#13;\u0026#13;Parses: `ab`, `aB` | Procedural:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by parser {\u0026#13;    skip(a) // or just a() without using the value\u0026#13;    val bMatch = b()\u0026#13;    bMatch.text\u0026#13;}\u003c/pre\u003eCombinator:\u0026#13;\u003cpre lang=\"kotlin\"\u003eval a by literalToken(\"a\")\u0026#13;val b by regexToken(\"[bB]\")\u0026#13;override val root by -a * b map { it.text }\u003c/pre\u003e |\n\n## Introduction\n\nThe goal of a grammar is to define rules by which to turn an input string of characters into a structured value. This\nvalue is usually an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree). But it could also be an\nevaluated result, if we have specified evaluation rules directly in the grammar.\n\nIn order to define a grammar we only need two things: list of tokens and a root parser. Here is how one of the simplest\ngrammars looks with Parsus:\n\n```kotlin\nval g1 = object : Grammar\u003cString\u003e() {\n    val tokenA by literalToken(\"a\")\n    override val root by parser { tokenA().text }\n}\n\nprintln(g1.parseOrThrow(\"a\")) // prints \"a\"\n```\n\nIt is just a few lines of declarative code, but there a lot going on under the hood. So, let us break it down.\n\n### Grammars\n\nFirst, there is the `Grammar` class that needs to be extended in order to define you custom grammar. In the example\nabove an anonymous class is declared, but it could just as well be a normal class.\n\n```kotlin\nclass MyClass : Grammar\u003cMyResult\u003e() {\n    // tokens and parsers go here\n\n    override val root: Parser\u003cMyResult\u003e = TODO()\n}\n```\n\nThere are two important things to note. The `Grammar` is a generic class, and has a type parameter that defines the\nresult type of the `root` parser. Because Kotlin requires us to specify type parameters of the class, often the explicit\ntype of the `root` parser can be omitted. The `root` parser will be used to produce the parsed result when calling a\nmethod such as `parseToEnd` on a grammar. However, before we can discuss how to define the `root` and other parsers, we\nneed to understand the basic building block of any parser - a token.\n\n### Tokens\n\nEach token we declare within a grammar describes a pattern of how this token can be recognized in the input string.\nWhenever a parser requires the next token to proceed, the parser asks the grammar to find a token match for the current\nposition in the input. When a match is found it is described by the token, an `offset` in the input string where the\nmatch starts, and the `length` of the match.\n\nThe simplest type of token is a literal token. It matches only strings that are exactly like the given literal.\nTherefore, the token `tokenA` from the example will only match if the character in the current position is `\"a\"`.\n\n```kotlin\n    val tokenA by literalToken(\"a\")\n```\n\nAnother thing to note is that the member `tokenA` is declared via the `by` keyword, meaning that it uses Kotlin's\nproperty-delegation mechanism. When declaring tokens this way, they are automatically registred within a grammar, so\nthey can participate in the matching process when parsing.\n\nAlternatively, the token could be registered anonymously. This could be useful, when we do not need to reference the\ntoken anywhere else when writing parsers. Most often, the tokens that need to be ignored are defined this way.\n\n```kotlin\nval g2 = object : Grammar\u003cString\u003e() {\n    init {\n        regexToken(\"\\\\s+\", ignored = true)\n    }\n\n    val tokenA by literalToken(\"a\")\n    override val root by parser { tokenA().text }\n}\n\nprintln(g2.parseOrThrow(\" a\\t\")) // prints \"a\"\n```\n\nIn this example, we create a token by calling `regexToken`.\nThis token will use the regular expression to match any whitespace in the input string.\nSince we want to simply ignore the whitespace, we will not reference this token in any of the parsers.\nTherefore, we register the token in the init-block of the class without assigning it to a member.\n\nNow, that we know how to declare and register different kinds of tokens, let us explore how to use those tokens to write\nparsers.\n\n### Parsers\n\nParser definition achieves two goals. Firstly, it defines the sequence of tokens that is expected to appear in the\ninput. Secondly, it transforms the matched tokens into a value.\n\nOne of the simplest parsers that we can construct expects only one token and returns the text of the token match as a\nvalue. And that is exactly what we saw previously.\n\n```kotlin\nval g1 = object : Grammar\u003cString\u003e() {\n    val tokenA by literalToken(\"a\")\n    override val root by parser { tokenA().text }\n}\n```\n\nIn order to understand how to use parsers, we need to take a look at the core abstractions.\n\nThe central piece of the puzzle is the `Parser` interface itself.\n\n```kotlin\ninterface Parser\u003cout T\u003e {\n    suspend fun ParsingScope.parse(): T\n}\n```\n\nEssentially, a parser is a function that can be called within a parsing scope and would return a parsed value.\nWhen something is a function, it can almost certainly be represented as a lambda.\nThis is exactly how we have seen the parsers to be defined using the `parser { ... }` function that takes lambda and returns a parser.\n\nThe parsing result is an explicit representation of either a successfully parsed value, or an error that the parser\nencountered while trying to process the input.\n\n```kotlin\nsealed class ParseResult\u003cout T\u003e\ndata class ParsedValue\u003cT\u003e(val value: T) : ParseResult\u003cT\u003e()\nabstract class ParseError : ParseResult\u003cNothing\u003e()\ndata class MismatchedToken(val expected: Token, val found: TokenMatch) : ParseError()\n// more parser errors\n```\n\nThe most powerful thing about parsers is that they can be composed. A parsing scope is what gives parsers this power.\nThe parser scope interface provides an extension function to execute any parser and extract its result.\n\n```kotlin\ninterface ParsingScope {\n    suspend operator fun \u003cR\u003e Parser\u003cR\u003e.invoke(): R\n    // ... more ...\n}\n```\n\nWe have already seen an example with a call to this function: tokens are parsers too. The `Token` class\nimplements `Parser\u003cTokenMatch\u003e`, and when invoked within a parsing scope it would return an actual `TokenMatch`. From\nthis match we can take the text fragment of the input string to which this match corresponds. The text fragment can then\nbe converted into a number or stored as a name of an identifier, etc.\n\nHere is grammar that parses an integer:\n\n```kotlin\nval g3 = object : Grammar\u003cInt\u003e() {\n    val tokenNum by regexToken(\"[0-9]+\")\n    override val root by parser { tokenNum().text.toInt() }\n}\n\nprintln(g3.parseOrThrow(\"123\")) // prints 123\n```\n\n### Parser Combinators\n\nIn order to combine parsers, we need to define more than one. The intermediate parsers can be declared as members of the\nsame grammar class to make them easier to be reused.\n\nAs we have learned previously, tokens are parsers. So we can define a couple of them to play with.\n\n```kotlin\nval g4 = object : Grammar\u003cString\u003e() {\n    val tokenNum by regexToken(\"[0-9]+\")\n    val tokenId by regexToken(\"[a-z]+\")\n    val tokenPlus by literalToken(\"+\")\n    override val root by parser {\n        val id = tokenId().text\n        tokenPlus()\n        val num = tokenNum().text\n        \"($id) + ($num)\"\n    }\n}\n\nprintln(g4.parseOrThrow(\"abc+123\")) // prints \"(abc) + (123)\"\n```\n\nThis example shows the main way in which parsers are combined - sequentially. The `root` parser expects first an id to\nappear, then a plus-sign, then a number. If at any point there is an unexpected token, then the whole parser fails with\nthe mismatched-token error.\n\nNotice also, that we use another useful property of the sequential execution. With the `tokenPlus()` statement we\nexecute the parser, but we ignore the result. This is most often used with token-parsers when we only need to make sure\nthat a certain piece of syntax is in the expected place in the input.\n\nAnother important way of combining parsers is to say that we expect *one of several* parsers to succeed at a certain\npoint. Even in the case when the first parser fails, the parent parser does not produce an error immediately. Instead,\nthe parent parser tries out the remaining alternatives. If there is one alternative that succeeds, the parent parser\ntakes its result and proceeds without any errors.\n\nWe can use the `choose` function from the `ParsingScope` to achieve this behaviour:\n\n```kotlin\nval g5 = object : Grammar\u003cString\u003e() {\n    val tokenNum by regexToken(\"[0-9]+\")\n    val tokenId by regexToken(\"[a-z]+\")\n    val tokenPlus by literalToken(\"+\")\n    override val root by parser {\n        val idOrNum1 = choose(tokenNum, tokenId).text\n        tokenPlus()\n        val idOrNum2 = choose(tokenNum, tokenId).text\n        \"($idOrNum1) + ($idOrNum2)\"\n    }\n}\n\nprintln(g5.parseOrThrow(\"abc+123\")) // prints \"(abc) + (123)\"\nprintln(g5.parseOrThrow(\"909+wow\")) // prints \"(909) + (wow)\"\n```\n\nNow we have a repeating piece of code inside our parser implementation. So we ought to refactor it by introducing\nanother intermediate parser `term` to do the job. Since `term` is a parser, it can be invoked within the parsing scope.\n\n```kotlin\nval g6 = object : Grammar\u003cString\u003e() {\n    val tokenNum by regexToken(\"[0-9]+\")\n    val tokenId by regexToken(\"[a-z]+\")\n    val tokenPlus by literalToken(\"+\")\n    val term by parser { choose(tokenNum, tokenId).text }\n    override val root by parser {\n        val idOrNum1 = term()\n        tokenPlus()\n        val idOrNum2 = term()\n        \"($idOrNum1) + ($idOrNum2)\"\n    }\n}\n```\n\nArmed with this knowledge of the basics, you can now explore more sophisticated parser implementations that use various\nextension functions to make parser definitions look declarative. Also, feel free to get familiar the with core\ninterfaces and their extension functions to learn how more elaborate parser combinators can be created from the provided\nprimitives.\n\n## Examples\n\nHere are some examples of grammars written with Parsus:\n\n* Arithmetic expression parser and calculator: [Arithmetic.kt](./demo/src/commonMain/kotlin/Arithmetic.kt)\n* Boolean expression parser: [BooleanExpression.kt](./demo/src/commonMain/kotlin/BooleanExpression.kt)\n* S-expression parser: [SExpression.kt](./demo/src/commonMain/kotlin/SExpression.kt)\n* JSON parser: [(link)](benchmarks/src/main/kotlin/me/alllex/parsus/bench/NaiveJsonGrammar.kt)\n\n## Coroutines\n\nMost often, coroutines in Kotlin are explored and used in the context of concurrency. This is not surprising, because\nthey allow turning callback-ridden asynchronous code into sequential implementations that are less error-prone and\neasier to read.\n\nIn Kotlin, [structured concurrency][structured-concurrency] and other machinery related to multi-threaded environments\nare provided by `kotlinx.coroutines` library. Note the `x` after `kotlin`. This library, like any other, makes use of\nlower-level capabilities of the language itself. More specifically, the main and only mechanism of Kotlin enabling\ncoroutines is **suspension**.\n\nKotlin's `suspend` keyword allows declaring so called *suspending functions*. Most of the time adding this additional\nkeyword will be seen as a necessary down payment prior to entering the world of structured concurrency. Not all the\ntime, though. Even in the Kotlin standard library there is at least one example of using suspending functions without\nany multi-threaded context. Namely, sequence builders.\n\nYou can build an infinite sequence of Fizz-Buzz numbers like this:\n\n```kotlin\nfun main() {\n    val fb = sequence {\n        var i = 1\n        while (true) {\n            if (i % 3 == 0 || i % 5 == 0) yield(i)\n            i++\n        }\n    }\n\n    for (x in fb.take(10)) {\n        println(x)\n    }\n}\n```\n\nAs you may have guessed, the lambda we pass to the `sequence` builder is a suspending function. From inside this lambda\nwe can use `yield` function, which is also suspending.\n\nAfter a careful inspection, we can conclude that suspending functions related to sequence builders have nothing to do\nwith dispatchers, flows and channels from `kotlinx.coroutines`. Both of these cases simply highlight Kotlin's more\npowerful built-in capabilities. Even more applications of \"bare\" coroutines can be found elsewhere. E.g. coroutines can\naid in rather idiomatic [implementation of monads][arrow-monad-tutorial]\ndirectly in Kotlin.\n\nFinally, this project itself takes on a mission of leveraging coroutines to construct and execute parsers.\nContinuations, as first-class citizens, can be stored in memory, entirely avoiding unexpected stack-overflows for\nheavily nested parsing rules and deeply-structured input. Suspending functions make sequential composition of parsers\ntrivial. Error-handling mechanisms that come with coroutines allow for declarative definition of branching in parsers.\nEverything else is a fully extensible and debuggable collection of combinators on top of just a couple core primitives.\n\n## Acknowledgements\n\nThe structure of the project as well as the form of the grammar DSL is heavily inspired by the [better-parse] library.\n\n## License\n\nDistributed under the MIT License. See `LICENSE` for more information.\n\n\n[structured-concurrency]: https://kotlinlang.org/docs/coroutines-basics.html#structured-concurrency\n\n[arrow-monad-tutorial]: https://arrow-kt.io/docs/patterns/monads/\n\n[better-parse]: https://github.com/h0tk3y/better-parse\n","funding_links":[],"categories":["Kotlin"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falllex%2Fparsus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falllex%2Fparsus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falllex%2Fparsus/lists"}