{"id":21694052,"url":"https://github.com/kkarnauk/parsek","last_synced_at":"2025-10-27T23:05:32.677Z","repository":{"id":57733843,"uuid":"440554625","full_name":"kkarnauk/parsek","owner":"kkarnauk","description":"Parser combinators in Kotlin for Kotlin Multiplatform","archived":false,"fork":false,"pushed_at":"2022-05-13T08:02:18.000Z","size":177,"stargazers_count":17,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-12T10:55:31.464Z","etag":null,"topics":["grammar","kotlin","lexer","parser","parser-combinator","tokenizer"],"latest_commit_sha":null,"homepage":"https://kkarnauk.github.io/parsek/","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kkarnauk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-21T14:58:50.000Z","updated_at":"2022-06-29T05:38:09.000Z","dependencies_parsed_at":"2022-08-24T11:20:27.910Z","dependency_job_id":null,"html_url":"https://github.com/kkarnauk/parsek","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/kkarnauk/parsek","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkarnauk%2Fparsek","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkarnauk%2Fparsek/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkarnauk%2Fparsek/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkarnauk%2Fparsek/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kkarnauk","download_url":"https://codeload.github.com/kkarnauk/parsek/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkarnauk%2Fparsek/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281355386,"owners_count":26486904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-27T02:00:05.855Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["grammar","kotlin","lexer","parser","parser-combinator","tokenizer"],"created_at":"2024-11-25T18:26:06.932Z","updated_at":"2025-10-27T23:05:32.660Z","avatar_url":"https://github.com/kkarnauk.png","language":"Kotlin","readme":"# Parsek\n\n[![Build and Tests](https://github.com/kkarnauk/parsek/actions/workflows/build_test.yml/badge.svg)](https://github.com/kkarnauk/parsek/actions/workflows/build_test.yml)\n\nParsek provides parser combinators (kombinators in fact) for Kotlin usages. With the library you can easily implement\nparsers and lexers in Kotlin. It's a multiplatform library, so it can be used not only inside JVM projects, but also\ninside Kotlin JS and Kotlin Native.\n\n\n## Table of contents\n* [Using](#using)\n  * [Gradle](#gradle)\n  * [Maven](#maven)\n* [Examples](#examples)\n  * [Parsing an integer](#parsing-an-integer)\n  * [Parsing an arithmetic expression](#parsing-an-arithmetic-expression)\n* [Tokens](#tokens)\n  * [Types](#types)\n  * [Type providers](#type-providers)\n  * [Producers and tokenizers](#producers-and-tokenizers)\n  * [Tokenizer providers](#tokenizer-providers)\n* [Parsers](#parsers)\n* [Grammars](#grammars)\n* [Inspiration](#inspiration)\n\n## Using\n\nTo use the library, it's enough to include the dependency from **Maven Central**.\n\n### Gradle\n\n```kotlin\ndependencies {\n    implementation(\"io.github.kkarnauk:parsek:0.1\")\n}\n```\n\n### Maven\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003eio.github.kkarnauk\u003c/groupId\u003e\n  \u003cartifactId\u003eparsek\u003c/artifactId\u003e\n  \u003cversion\u003e0.1\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n## Examples\n\n### Parsing an integer\n\nLet's start with an easy task. For instance, you want to parse an integer. There are several ways to do that.\nThe first one:\n```kotlin\nobject IntegerGrammar : Grammar\u003cInt\u003e() {\n    val digits by chars { it.isDigit() } // define a token for sequence of digits\n    val minus by char('-') // define a token for a minus\n    \n    val positiveInteger by digits map { it.text.toInt() } // parser for a positive integer \n    val integer by (-minus seq positiveInteger map { -it }) alt positiveInteger // parser for an integer\n    \n    override val parser by integer\n}\n```\nThe second one is much simpler:\n```kotlin\nobject IntegerGrammar : Grammar\u003cInt\u003e() {\n    private val integer by regex(\"-?\\\\d+\") // define a token for an entire integer\n    \n    override val parser by integer map { it.text.toInt() } // map a text into an integer\n}\n```\nAnd now, if you want to parse an integer in your program, you can just do something like that:\n```kotlin\nval integer = IntegerGrammar.parse(\"-42\")\n```\n\n### Parsing an arithmetic expression\n\nYes, that's too simple an example, isn't it? Let's try something bigger. For instance, you want to parse an arithmetic\nexpression with summing, multiplying, powering and parenthesis. So:\n```kotlin\nobject ArithmeticExpressionGrammar : Grammar\u003cArithmeticExpression\u003e() {\n    val num by regex(\"-?\\\\d+\")\n    val sum by char('+')\n    val mul by char('*')\n    val pow by text(\"**\")\n    val lpar by char('(')\n    val rpar by char(')')\n    \n    val whitespaces by chars { it.isWhitespace() }.ignored()\n    \n    val primary by (num map { Value(it.text.toInt()) }) alt (-lpar seq ref(::parser) seq -rpar)\n    val pows by rightAssociative(primary, pow) map { cur, res -\u003e Power(cur, res) }\n    val mults by leftAssociative(pows, mul) map { res, cur -\u003e Mul(cur, res) }\n    val sums by leftAssociative(mults, sum) map { res, cur -\u003e Sum(cur, res) }\n    \n    override val parser by sums\n}\n```\nThere are some, maybe, not obvious things like the unary minuses. So let's go into it.\n* First, we declared the main tokens. Note, that we've used `ignored()` on the `whitespaces`-token,\n  so this token will not be passed to the parser, therefore we don't need to care about all the whitespaces.\n* After that, we've declared the `primary`-rule. There are actually two different rules, separated by\n  the `alt`-combinator, that means \"First, I'll try the first parser, and if it fails, I'll try the another one\".\n* The `map`-combinator transforms the result under the parser, if it succeeds.\n* The `seq`-combinator tells \"To succeed, I need the first parser to be successful and after that the another one.\".\n  The unary minus tells that we don't care about the result of the parser and it must be skipped, if is succeeds.\n  If we didn't use the unary minus, then we would have `Pair\u003cT, S\u003e` as the result of successful parsing.\n* The `ref`-combinator is used to take not-initialized parsers. We cannot directly use `parser`, because\n  it isn't initialized yet.\n* The `rightAssociative` and `leftAssociative` are used to perform `foldr` and `foldl` on the results of\n  consecutive parsers `item`, `separator`, `item` and so on. For example, `item=primary` and `separator=pow` here.\n\nAfter it's all done, we can use the grammar in the following way:\n```kotlin\nval expression = ArithmeticExpressionGrammar.parse(\"1 + 2 ** 3 ** 4  * 12 + 3 * (1 + 2) ** 2 ** 3\")\n```\n\n## Tokens\n\nA [token](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/Token.kt) represents a matched part of an input.\nA parser doesn't consume an initial input, it does consume a sequence of tokens.\n\nEach token can tell you:\n* The [type](#types)\n* The input used to produce it\n* The length of the matched part in the input\n* The location where it was matched: the offset, row and column of the match in the input\n* The substring of the input matched by the token\n\nThe important point is that you don't have to produce tokens on your own. \nThis task is completed by [producers](#producers-and-tokenizers).\n\n### Types\n\nA [token type](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/type/TokenType.kt) \nintroduces a family for tokens. \n\nFor example, you may have an input `Hello, my friend!` a token type `any non-empty sequence of letters`.\nThen there are several tokens with the type: `Hello`, `my` and `friend`.\n\nEach token type can tell you:\n* The name (that can be helpful while debugging)\n* Whether tokens of that type are ignored by a parser: if `true`, then tokens are not passed to a parser\n  (but they still consume the input!)\n\nEach token type has the match method with the signature:\n```kotlin\nfun match(input: CharSequence, fromIndex: Int): Int\n```\nIt starts matching `input` from `fromIndex` and returns the number of matched chars.\nIf the result is considered as successful if and only if the result is not 0.\n\nThe good news that lots of token types are already implemented. Check out:\n* [Text](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/type/TextTokenType.kt). \n  Has parameters `text: String` and `ignoreCase: Boolean` \n  and matches only the strings that are equal to `text` up to `ignoreCase`.\n* [Char](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/type/CharTokenType.kt). \n  Has parameters `char: Char` and `ignoreCase: Boolean`\n  and matches only the string that are equal to single `char` up to `ignoreCase`.\n* [Regex](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/type/RegexTokenType.kt). \n  Has parameters `regex: String` and `options: Set\u003cRegexOption`.\n  Compiles `regex` with `options` and adds something like `$` to make it match from the start.\n* [Char predicate](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/type/CharPredicateTokenType.kt). \n  Accepts a lambda `(Char) -\u003e Boolean` and matches chars while lambda returns `true`.\n* [General token type](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/type/PredicateTokenType.kt). \n  There are two very general token types:\n  * The first one accepts a lambda `(String, Int) -\u003e Int` and it creates a token type with the match function\n  replaced with the given lambda. \n  * The second token type accepts a lambda `(String) -\u003e Int`, creates a view on the `input` from `fromIndex` in\n  `match` and invokes the given lambda.\n\nAlso, there are even more good news. You don't have to name token types on your own.\nFor convenience, there are token type providers!\n\n### Type providers\n\nThe purpose\nof [TokenTypeProvider](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/type/provider/TokenTypeProvider.kt)\nis very easy: we don't want to write extra information while creating new types. When we create a new token type, we\nwrite it to a property, so, for example, there is already the name of the token type!\n```kotlin\nval digits by chars { it.isDigit() } \n```\nIn the example, we create a token type of type `Char predicate` (described above) and it now has the name `digits`.\n\nAlso, we've talked about ignoring tokens by parsers. If you want tokens of the specific type to be ignored by a parser,\nyou just go with one of the following ways:\n```kotlin\nval digits by regex(\"\\\\d+\").ignored()\nval digits by regex(\"\\\\d+\", ignored = true)\n```\n\nSo the providers help you to create tokens in a more convenient way.\n\n**Note:** you must use **delegation** here in order to pass created tokens into a **tokenizer**.\n\nNow, let's map the described token types into their providers:\n* Char:\n ```kotlin\nval letterA by char('a', ignoreCase = true)\nval plus by char('+') // case is not ignored by default\n ```\n* Text:\n```kotlin\nval myName by text(\"kirill\", ignoreCase = true)\nval classKeyword by text(\"class\") // case is not ignored by default\n```\n* Regex:\n```kotlin\nval whitespaces by regex(\"\\\\s+\", setOf(RegexOption.MULTILINE)).ignored()\nval name by regex(\"[A-Za-z_]+\") // no regex options by default\n```\n* Char predicate:\n```kotlin\nval digits by chars { it.isDigit() }\n```\n* General token type:\n```kotlin\nval beforeLastChar by tokenType { input -\u003e input.substringBeforeLast(char).length }\nval findStr by tokenType { input, fromIndex -\u003e input.find(str, fromIndex) }\n\nval myType: TokenType = getTokenType()\nval myTokenType by tokenType(myType)\n```\n\nEach provider takes the name of the created token type from the name of the property.\n\nOn each of those providers you can invoke `.ignored()` or pass `ignore = true` to them in order to make them\nignored by parsers!\n\n### Producers and tokenizers\n\nWe've talked much about tokens and token types, \nbut you still don't know how to convert a string into a collection of tokens.\n\nThere is an interface [TokenProducer](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/TokenProducer.kt), \nwhich is responsible to provide tokens. The only interesting method it has is `nextToken(): Token?`. \nSo, it either returns a new `Token`, or `null` if nothing can be produced.\n\nBut how to get a producer? For now, there is only one way to get it: \n[Tokenizer](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/tokenizer/Tokenizer.kt). \nThe main purpose of tokenizers is to take a string and turn it into a producer of tokens.\n\nUsually, a tokenizer is initialized with a list of token types. These token types are retrieved automatically while you define them. \nExactly for this purpose you must use delegation when creating a token type.\n\nFor now, there are two different implementations of a tokenizer:\n* [Longest match tokenizer](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/tokenizer/LongestMatchTokenizer.kt): \n  on each step tries to find a token type that matches the largest number of characters.\n  After that turns the token type and current location in the input into a new token and returns it.\n* [First match tokenizer](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/tokenizer/FirstMatchTokenizer.kt): \n  almost the same, but takes the first token type with non-zero match result.\n\n**The default tokenizer** is the longest match one.\n\nNote that parsers accepts \n[IndexedTokenProcucer](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/TokenProducer.kt), \nnot the regular [Token producer](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/TokenProducer.kt).\nThe reason is that the regular token producers are lazy and there is no possibility to reuse the produced tokens.\nThe indexed producers memorize produced tokens and allow getting them by an index.\n\nAnyway, you **don't need to implement an indexed producer** on your own. \nEach producer can be effectively turned into an indexed one by calling `producer.indexed()`.\n\n### Tokenizer providers\n\nAs already mentioned above, tokenizers pull information about token types when you delegate token type providers.\nIf a tokenizer is initialized before all token types are initialized, it would not get all information.\nIt's not convenient to make users use `by lazy` or something like that on each override of `tokenizer`.\n\nSo there is a new abstraction: \n[TokenizerProvider](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/tokenizer/provider/TokenizerProvider.kt).\nYou give a list of token type, the provider gives a tokenizer. \n\nNow, the method for getting`a tokenizer for your [Grammar](#grammars) is **final** and lazy-implemented.\nIf you want to change a tokenizer for your grammar, override the method for getting a tokenizer provider.\n\nThere are implementations for providing default tokenizers:\n[longest match](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/tokenizer/provider/LongestMatchTokenizerProvider.kt)\nand\n[first match](src/commonMain/kotlin/io/github/kkarnauk/parsek/token/tokenizer/provider/FirstMatchTokenizerProvider.kt).\n\n## Parsers\n\nTODO\n\n## Grammars\n\nTODO\n\n## Inspiration\nThe project is inspired by an interesting library [better-parse](https://github.com/h0tk3y/better-parse) for the\ncombinators. I really liked that and decided to implement parser combinators on my own.\nI'll try to make parsers better :)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkkarnauk%2Fparsek","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkkarnauk%2Fparsek","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkkarnauk%2Fparsek/lists"}