{"id":13415914,"url":"https://github.com/jon-hanson/parsecj","last_synced_at":"2026-04-07T19:31:56.227Z","repository":{"id":25268929,"uuid":"28694365","full_name":"jon-hanson/parsecj","owner":"jon-hanson","description":"Java monadic parser combinator framework for constructing LL(1) parsers","archived":false,"fork":false,"pushed_at":"2025-05-29T16:50:39.000Z","size":577,"stargazers_count":122,"open_issues_count":1,"forks_count":6,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-12-20T17:52:51.283Z","etag":null,"topics":["java","parser","parser-combinators"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jon-hanson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-01-01T20:00:40.000Z","updated_at":"2025-09-22T11:41:15.000Z","dependencies_parsed_at":"2024-01-15T23:38:18.797Z","dependency_job_id":null,"html_url":"https://github.com/jon-hanson/parsecj","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/jon-hanson/parsecj","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jon-hanson%2Fparsecj","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jon-hanson%2Fparsecj/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jon-hanson%2Fparsecj/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jon-hanson%2Fparsecj/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jon-hanson","download_url":"https://codeload.github.com/jon-hanson/parsecj/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jon-hanson%2Fparsecj/sbom","scorecard":{"id":529848,"data":{"date":"2025-08-11","repo":{"name":"github.com/jon-hanson/parsecj","commit":"c6b5ad831bcd61eae29f6f5bdc8d24374ad6c1b1"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.1,"checks":[{"name":"Code-Review","score":0,"reason":"Found 2/26 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":1,"reason":"1 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 6 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-20T05:25:58.325Z","repository_id":25268929,"created_at":"2025-08-20T05:25:58.326Z","updated_at":"2025-08-20T05:25:58.326Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31526666,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","parser","parser-combinators"],"created_at":"2024-07-30T21:00:52.993Z","updated_at":"2026-04-07T19:31:56.220Z","avatar_url":"https://github.com/jon-hanson.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"ParsecJ\n============\n\n![ParsecJ](https://github.com/jon-hanson/parsecj/blob/master/ParsecJ.png)\n\n- [Introduction](#introduction)\n  - [Parser Combinators](#parser-combinators)\n- [Getting Started](#getting-started)\n  - [Requirements](#requirements)\n  - [Maven](#maven)\n  - [Example](#example)\n  - [General Approach](#general-approach)\n  - [Types](#types)\n- [Defining Parsers](#defining-parsers)\n  - [Combinators](#combinators)\n  - [Text](#text)\n- [Advanced Examples](#advanced-examples)\n  - [Expression Language Parser](#expression-language-parser)\n  - [JSON Parser](#json-parser)\n- [Notes on the Implementation](#notes-on-the-implementation)\n  - [Translating Haskell into Java](#translating-haskell-into-java)\n    - [\"Restricting lookahead\"](#restricting-lookahead)\n    - [\"Basic combinators\"](#basic-combinators)\n  - [Parser Monad](#parser-monad)\n    - [Proving the Laws](#proving-the-laws)\n- [Related Work](#related-work)\n\n# Introduction\n\n:warning: *__Note:__ ParsecJ has been superceded by [funcj.parser](https://github.com/typemeta/funcj/tree/master/parser). The latter uses an applicative framework instead of monads, but is otherwise very similar to ParsecJ.*\n\n**ParsecJ** is a Java monadic parser combinator framework for constructing [LL(1) parsers](http://en.wikipedia.org/wiki/LL_parser).\nIt is a port of the Haskell [Parsec library](https://hackage.haskell.org/package/parsec).\nThe implementation is, where possible, a direct Java port of the Haskell code outlined in the original [Parsec paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf).\n\nSome notable features include:\n* Composable parser combinators, which provide a DSL for implementing parsers from grammars.\n* Informative error messages in the event of parse failures.\n* Thread-safe due to immutable parsers and inputs.\n* A combinator approach that mirrors that of Parsec, its Haskell counterpart, allowing grammars written for Parsec to be translated into equivalent ParsecJ grammars.\n* Lightweight library (the Jar file size is less than 50Kb) with zero dependencies (aside from JUnit and JMH for the tests).\n\n## Parser Combinators\n\nA typical approach to implementing parsers for special-purpose languages\nis to use a parser generation tool, such as Yacc/Bison or ANTLR.\nWith these tools the language is expressed as a series of production rules,\ndescribed using a grammar language specific to the tool.\nThe parsing code for the language is then generated from the grammar definition.\n\nAn alternative approach is to implement a\n[recursive descent parser](http://en.wikipedia.org/wiki/Recursive_descent_parser),\nwhereby the production rules comprising the grammar\nare translated by hand into parse functions.\nThe advantage here is that the rules are expressed in the host programming language,\nobviating the need for a separate grammar language and the consequent code-generation phase.\nA limitation of this approach\nis that the extra plumbing required to implement error-handling and backtracking\nobscures the relationship between the parsing functions and the language rules\n\n[Monadic parser combinators](http://www.cs.nott.ac.uk/~gmh/bib.html#pearl)\nare an extension of recursive descent parsing,\nwhich use a monad to encapsulate the plumbing.\nThe framework provides the basic building blocks -\nparsers for constituent language elements such as characters, words and numbers.\nIt also provides combinators that allow more complex parsers to be constructed by composing existing parsers.\nThe framework effectively provides a Domain Specific Language for expressing language grammars,\nwhereby each grammar instance implements an executable parser.\n\n# Getting Started\n\n## Requirements\n\nParsecJ requires Java 1.8 (or higher).\n\n## Resources\n\n* **Release builds** are available on the [Releases](http://github.com/jon-hanson/parsecj/releases) page.\n* **Maven Artifacts** are available on the [Sonatype Nexus repository](https://oss.sonatype.org/#nexus-search;quick~parsecj)\n* **Javadocs** are for the latest build are on the [Javadocs](http://jon-hanson.github.io/parsecj/javadocs/) page.\n\n## Maven\n\nAdd this dependency to your project pom.xml:\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.javafp\u003c/groupId\u003e\n    \u003cartifactId\u003eparsecj\u003c/artifactId\u003e\n    \u003cversion\u003e0.6\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n## Example\n\nAs a quick illustration of implementing a parser using ParsecJ,\nconsider a simple expression language for expressions of the form *x+y*, where *x* and *y* are integers.\n\nThe grammar for the language consists of a single production rule:\n\n```\nsum ::= integer '+' integer\n```\n\nThis can be translated into the following ParsecJ parser:\n\n```java\nimport org.javafp.parsecj.*;\nimport static org.javafp.parsecj.Combinators.*;\nimport static org.javafp.parsecj.Text.*;\n\nclass Test {\n   public static void main(String[] args) throws Exception {\n        Parser\u003cCharacter, Integer\u003e sum =\n            intr.bind(x -\u003e                  // parse an integer and bind the result to the variable x.\n                chr('+').then(              // parse a '+' sign, and throw away the result.\n                    intr.bind(y -\u003e          // parse an integer and bind the result to the variable y.\n                        retn(x+y))));       // return the sum of x and y.\n    }\n}\n```\n\nThe parser is constructed by taking the `intr` parser for integers,\nthe `chr` parser for single characters,\nand combining them using the `bind`, `then` and `retn` combinators.\n\nThe parser can be used as follows:\n\n```java\nint i = sum.parse(Input.of(\"1+2\")).getResult();\nassert i == 3;\n```\n\nMeanwhile, if we give it invalid input:\n\n```java\nint i2 = sum.parse(Input.of(\"1+z\")).getResult();\n```\n\nthen it throws an exception with an error message that pinpoints the problem:\n\n```java\nException in thread \"main\" java.lang.Exception: Message{position=2, sym=\u003cz\u003e, expected=[integer]}\n```\n\n## General Approach\n\nA typical approach to using the library to implement a parser for a language is as follows:\n\n1. Define a model for the language, i.e. a set of classes that represent the language elements.\n2. Define a grammar for the language - a set of production rules.\n3. Translate the production rules into parsers using the library combinators. The parsers will typically construct values from the model.\n4. Book-end the parser for the top-level element with the `eof` combinator.\n5. Invoke the parser by passing a `Input` object, usually constructed from a `String`, to the `parse` method.\n6. The resultant `Reply` result holds either the successfully parsed value or an error message.\n\n## Types\n\nThere are three principal types to be aware of.\n\n### `Parser`\n\nAll parsers implement the `Parser` (functional) interface,\nwhich has an `apply` method:\n\n```java\n@FunctionalInterface\npublic interface Parser\u003cI, A\u003e {\n    ConsumedT\u003cI, A\u003e apply(Input\u003cI\u003e input);\n\n    default Reply\u003cI, A\u003e parse(Input\u003cI\u003e input) {\n        return apply(input).getReply();\n    }\n    // ...\n}\n```\n\nI.e. a `Parser\u003cI, A\u003e` is essentially a function from a `Input\u003cI\u003e` to a `ConsumedT\u003cI, A\u003e`,\nwhere `I` is the input stream symbol type (usually `Character`),\nand `A` is the type of the value being parsed.\nFor example, a parser that operates on character input and parses an integer would have type `Parser\u003cCharacter, Integer\u003e`.\n\nThe `apply` method contains the main machinery of the parser,\nand combinators use this method to compose parsers.\nHowever, since the `ConsumedT` type returned by `apply` is an intermediate type,\nthe `parse` method is also provided to apply the parser and extract the `Reply` parse result.\n\n### `Input`\n\nThe `Input` interface is an abstraction representing an immutable input state.\nIt provides several static `of` methods for constructing `Input` instances from sequences of symbols:\n\n```java\npublic interface Input\u003cI\u003e {\n    static \u003cI\u003e Input\u003cI\u003e of(I[] symbols) {\n        return new ArrayInput\u003cI\u003e(symbols);\n    }\n\n    static Input\u003cCharacter\u003e of(Character[] symbols) {\n        return new CharArrayInput(symbols);\n    }\n\n    static Input\u003cCharacter\u003e of(String symbols) {\n        return new StringInput(symbols);\n    }\n\n    // ...\n}\n```\n\n### `Reply`\n\nThe `ConsumedT` object returned by `Parser.apply` is an intermediate result wrapper,\ntypically only of interest to combinator implementations.\nThe `ConsumedT.getReply` method returns the parser result wrapper,\nalternatively the `Parser.parse` method can be used to bypass `ConsumedT` entirely.\n\n```java\nReply\u003cT\u003e reply = p.apply(input).getReply();\n// is equivalent to:\nReply\u003cT\u003e reply2 = p.parse(input);\n\nassert(reply.equals(reply2));\n```\n\nA `Reply` can be either a successful parse result (represented by the `Ok` subtype)\nor an error (represented by the `Error` subtype).\n\n```java\npublic abstract class Reply\u003cI, A\u003e {\n    public abstract \u003cB\u003e B match(Function\u003cOk\u003cI, A\u003e, B\u003e ok, Function\u003cError\u003cI, A\u003e, B\u003e error);\n\n    public abstract A getResult() throws Exception;\n\n    public abstract boolean isOk();\n    \n    public abstract boolean isError();\n}\n```\n\nThe `isOk` and `isError` methods can be used to test the type.\nAlternatively, use the `match` method to handle both cases, e.g.:\n\n```java\nString msg =\n    parser.parse(input)\n        .match(\n            ok -\u003e \"Result : \" + ok.getResult(),\n            error -\u003e \"Error : \" + error.getMsg()\n        );\n```\n\nA third option is to use the `getResult` method which either returns the successfully parsed result,\nif the reply is an `Ok`,\nor throws an exception if it's an `Error`.\n\n```java\n// May throw.\nParser\u003cCharacter, MyResult\u003e p = ...\nMyResult res = parser.parse(input).getResult();\n```\n\n# Defining Parsers\n\nA parser for a language is defined by translating the production rules comprising the language grammar into parsers,\nusing the combinators provided by the library.\n\n## Combinators\n\nCombinators create new parsers by composing existing ones.\nThe `Combinators` package provides the following core combinator parsers:\n\n| Name | Parser Description\n|-----|-------------\n`retn(value)` | Always succeeds.\n`bind(p, f)` | First applies the parser `p`. If it succeeds it then applies the function `f` to the result to yield another parser that is then applied.\n`map(p, f)` | Functor map operation - map a function over the result of a successful parse.\n`fail()` | Always fails.\n`satisfy(test)` | Applies a test to the next input symbol.\n`satisfy(value)` | Succeeds if the next input symbol equals `value`.\n`eof()` | Succeeds if the end of the input is reached.\n`then(p, q)` | First applies the parser `p`. If it succeeds it then applies parser `q`.\n`or(p, q)` | First applies the parser `p`. If it succeeds the result is returned otherwise it applies parser `q`.\n\n(see the Combinators javadocs for the full list)\n\nCombinators that take a `Parser` as a first parameter, such as `bind` and `or`,\nalso exist as methods on the `Parser` interface, to allow parsers to be constructed in a fluent style.\nE.g. `p.bind(f)` is equivalent to `bind(p, f)`.\n\nWe'll cover a few of these in more detail.\n\n### The `retn` Combinator\n\n```Java\n\u003cI, A\u003e Parser\u003cI, A\u003e retn(A x)\n```\n\nThe `retn` combinator creates a parser from a value.\nThe parser simply returns the original value, without consuming any input.\n\nIt is perhaps unclear why you would need such a simple parser - the motivation should become clear in the following sections.\n\n### The `satisfy` Combinator\n\n```java\n\u003cS\u003e Parser\u003cI, I\u003e satisfy(Predicate\u003cI\u003e test)\n\u003cS\u003e Parser\u003cI, I\u003e satisfy(I value)\n```\n\nThis combinator accepts the next input symbol only if it satisfies the criteria.\nIn the first variation the criteria is expressed by the `test` predicate,\nwhich gets applied to to the next symbol, and if is passes then the symbol is returned.\nThe second variation is simply a shorthand for `satisfy(x -\u003e x.equals(value))`,\nand it will successfully return the next input if it equals the supplied `value` argument.\n\nSo, for example `satisfy(c -\u003e Character.isDigit(c))` is a parser\nwhich will return the next character if it's a decimal digit.\n\n### The `bind` Combinator\n\n```Java\n\u003cI, A, B\u003e Parser\u003cI, B\u003e bind(Parser\u003cI, ? extends A\u003e p, Function\u003cA, Parser\u003cI, B\u003e\u003e f)\n```\n\nThe bind combinator is the mechanism by which parsers are sequentially composed.\nIt corresponds to production rules of the form:\n\n```\nr ::= p q\n```\n\nIt first calls the first parser `p` on the input stream,\nand if it succeeds the result is passed to the function `f` to yield a second parser.\nThis parser is then invoked on the input stream and the result is returned.\nAlternatively if `p` fails to parse then the result is returned immediately and `f` is never called.\n\nUsing lamda expressions `f` can expressed quite succinctly as `x -\u003e { ... }`,\ni.e. the bind expression typically looks something like `bind(p, x -\u003e { ... })` (or `p.bind(x -\u003e { ... })` using the fluent form).\n\nNote, the `then` combinator is just a variant of `bind` where the result of the first parser is thrown away.\nI.e. `then(p, q)` is equivalent to `bind(p, x -\u003e q`).\n\nIf we return to the `sum` example parser defined earlier:\n\n```java\nintr.bind(x -\u003e                  // parse an integer and bind the result to the variable x.\n    chr('+').then(              // parse a '+' sign, and throw away the result.\n        intr.bind(y -\u003e          // parse an integer and bind the result to the variable y.\n            retn(x+y))));       // return the sum of a and y.\n```\n\nthen the meaning should be clear.\nNote that `chr` is just a version of `satisfy` specialised for the Character type.\n\n### The `or` Combinator\n\n```java\n\u003cI, A\u003e Parser\u003cI, A\u003e or(Parser\u003cI, A\u003e p, Parser\u003cI, A\u003e q)\n```\n\nThe `or` combinator provides the means to express a choice between one parser and another. It corresponds to production rules of the form:\n\n```\nr ::= p | q\n```\n\nThe combinator will first invoke parser `p`.\nIf it succeeds then the result is returned, otherwise the result of invoking parser `q` is returned.\n\nAn example usage is `intr.or(retn(0))`, which means attempt to parse an integer, and if it fails then just return `0`.\n\n## Text\n\nThe `Text` package provides in addition to the parsers in `Combinators`,\nthe following parsers specialised for parsing text input:\n\nName | Parser Description | Returns\n-----|-------------|--------\n`alpha` | Succeeds if the next character is alphabetic. | The character\n`digit` | Succeeds if the next character is a digit. | The character\n`intr` | Parses an integer. | The integer\n`dble` | Parses a double. | The double\n`string(s)` | Parses the supplied string. | The string\n`alphaNum` | Parses an alphanumeric string. | The string\n`regex(regex)` | Parses a string matching the supplied regex. | The string matching the regex\n\n# Advanced Examples\n\n## Expression Language parser\n\nThe `test/org.javafp.parsecj.expr.Grammar` class provides a more detailed illustration of how this library can be used,\nby implementing a parser for simple mathematical expressions.\n\nThe grammar for this language is as follows:\n\n```\nexpr      ::= number | binOpExpr\nbinOpExpr ::= '(' expr binOp expr ')'\nbinOp     ::= '+' | '-' | '*' | '/'\n```\n\nValid expressions conforming to this language include:\n\n```\n1\n(1.2+3.4)\n((1.2*3.4)+5.6)\n```\n\nTypically parsers will construct values using a set of model classes corresponding to the language elements.\nFor the above example that would mean defining `Expr`, `NumberExpr`, and `BinOpExpr` classes.\nTo keep the example simple the parsers for this language will simply compute the evaluated result of each expression.\nI.e. numbers will be parsed into their values,\noperators will be parsed into binary functions,\nand binary operator expressions will be parsed into the evaluated result of the expression.\n\nThe above grammar then, can be translated into the following Java implementation:\n\n```java\n// Forward declare expr to allow for circular references.\nfinal org.javafp.parsecj.Parser.Ref\u003cCharacter, Double\u003e expr = Parser.ref();\n\n// Hint to the compiler for the type of retn.\nfinal Parser\u003cCharacter, BinaryOperator\u003cDouble\u003e\u003e add = retn((l, r) -\u003e l + r);\nfinal Parser\u003cCharacter, BinaryOperator\u003cDouble\u003e\u003e subt = retn((l, r) -\u003e l - r);\nfinal Parser\u003cCharacter, BinaryOperator\u003cDouble\u003e\u003e times = retn((l, r) -\u003e l * r);\nfinal Parser\u003cCharacter, BinaryOperator\u003cDouble\u003e\u003e divide = retn((l, r) -\u003e l / r);\n\n// bin-op ::= '+' | '-' | '*' | '/'\nfinal Parser\u003cCharacter, BinaryOperator\u003cDouble\u003e\u003e binOp =\n    choice(\n        chr('+').then(add),\n        chr('-').then(subt),\n        chr('*').then(times),\n        chr('/').then(divide)\n    );\n\n// bin-expr ::= '(' expr bin-op expr ')'\nfinal Parser\u003cCharacter, Double\u003e binOpExpr =\n    chr('(')\n        .then(expr.bind(\n            l -\u003e binOp.bind(\n                op -\u003e expr.bind(\n                    r -\u003e chr(')')\n                        .then(retn(op.apply(l, r)))))));\n\n// expr ::= dble | binOpExpr\nexpr.set(choice(dble, binOpExpr));\n\n// Hint to the compiler for the type of eof.\nfinal Parser\u003cCharacter, Unit\u003e eof = eof();\n\n// parser = expr end\nfinal Parser\u003cCharacter, Double\u003e parser = expr.bind(d -\u003e eof.then(retn(d)));\n\nfinal String s = \"((1.2*3.4)+5.6)\";\nSystem.out.println(s + \" = \" + parser.parse(Input.of(s)).getResult());\n```\n\nThe correspondence between the production rules of the simple expression language and the above set of parsers should be apparent.\n\n**Notes**\n* The expression language is recursive - `expr` refers to `binOpExpr`, which in turn refers to `expr`. Since Java doesn't allow us to define a mutually recursive set of variables, we have to break the circularity by making the `expr` parser a `Parser.Ref`, which gets declared at the beginning and initialised at the end. `Ref` implements the `Parser` interface, hence it can be used as a parser.\n* The return type of each combinator function is `Parser\u003cS, A\u003e` and the compiler attempts to infer the types of `S` and `A` from the arguments. Certain combinators do not have parameters of both types - `retn` and `eof` for instance, which causes the type inference to fail resulting in a compilation error. If this happens the error can be avoided by either assigning the combinator to a variable or by explicitly specifying the generic types, e.g. `Combinators.\u003cCharacter, BinaryOperator\u003cDouble\u003e\u003eretn`.\n* We add the `eof` parser, which succeeds if it encounters the end of the input, to bookend the `expr` parser. This ensures the parser does not inadvertently parse malformed inputs that begin with a valid expression, such as `(1+2)Z`.\n\n## JSON Parser\n\nFor a more \"real world\" example the test sub-directory contains a full implementation of JSON parser - see the `Grammar` for the parser.\nThe entire grammar is encapsulated in a single class, which, including imports and blank lines, is only 124 lines of code.\n\n# Notes on the Implementation\n\n## Translating Haskell into Java\n\nThis section describes how the Haskell code from the [Parsec paper](http://research.microsoft.com/en-us/um/people/daan/download/papers/parsec-paper.pdf)\npaper has been translated into Java.\n\nNote, the Java code described below does not exactly match the implementation code of ParsecJ -\nit has been simplified for expository purposes.\n\n### \"Restricting lookahead\"\n\nSection 3 of the paper begins to describe the implementation of Parsec, starting with these three types:\n\n```Haskell\ntype Parser a = String -\u003e Consumed a\ndata Consumed a = Consumed (Reply a)\n                | Empty (Reply a)\ndata Reply a = Ok a String | Error\n```\n\nThe `Reply` type is a discriminated union between an `Ok` and an `Error`.\nWe can model this in Java with a `Reply` base class (or interface),\nwith two sub-classes:\n\n```java\npublic abstract class Reply\u003cA\u003e {\n    public static \u003cA\u003e Ok\u003cA\u003e ok(A result, String rest) {\n        return new Ok\u003cA\u003e(result, rest);\n    }\n\n    public static \u003cA\u003e Error\u003cA\u003e error() {\n        return new Error\u003cA\u003e();\n    }\n\n    public abstract \u003cB\u003e B match(Function\u003cOk\u003cA\u003e, B\u003e ok, Function\u003cError\u003cA\u003e, B\u003e error);\n\n    public static final class Ok\u003cA\u003e extends Reply\u003cA\u003e {\n\n        public final A result;\n\n        public final String rest;\n\n        Ok(A result, String rest) {\n            this.result = result;\n            this.rest = rest;\n        }\n\n        @Override\n        public \u003cU\u003e U match(Function\u003cOk\u003cA\u003e, U\u003e ok, Function\u003cError\u003cA\u003e, U\u003e error) {\n            return ok.apply(this);\n        }\n\n        // Usual toString, equals etc.\n    }\n\n    public static final class Error\u003cA\u003e extends Reply\u003cA\u003e {\n\n        Error() {}\n\n        @Override\n        public \u003cB\u003e B match(Function\u003cOk\u003cA\u003e, B\u003e ok, Function\u003cError\u003cA\u003e, B\u003e error) {\n            return error.apply(this);\n        }\n\n        // Usual toString, equals etc.\n    }\n}\n```\n\nThe `match` method provides a poor-man's equivalent to Haskell's pattern-matching.\nIt could be used, for example, to extract the result from a `Reply`:\n\n```java\n\u003cA\u003e A getResult(Reply\u003cA\u003e reply) {\n    return reply.match(\n        ok -\u003e ok.result,\n        error -\u003e {throw new RuntimeException(\"Error\");}\n    );\n}\n```\n\nThe `Consumed` type could in theory be handled in a similar fashion,\nhowever there are two subtleties to take into account:\n\n1. The Haskell code uses the name `Consumed` for both the type and the type constructor -\nin Java we chose to call the former `ConsumedT` to distinguish it from the latter.\n1. We learn further on in the document that Parsec relies on the `Consumed` type constructor being lazy (as is standard in Haskell). In order to simulate this in Java we need to make the `Consumed` class lazily constructed, using a `Supplier` instance:\n\n```java\npublic abstract static class ConsumedT\u003cA\u003e {\n    public static \u003cA\u003e ConsumedT\u003cA\u003e consumed(Supplier\u003cReply\u003cA\u003e\u003e supplier) {\n        return new Consumed\u003cA\u003e(supplier);\n    }\n\n    public static \u003cA\u003e ConsumedT\u003cA\u003e empty(Reply\u003cA\u003e reply) {\n        return new Empty\u003cA\u003e(reply);\n    }\n\n    public abstract \u003cB\u003e B match(Function\u003cConsumed\u003cA\u003e, B\u003e consumed, Function\u003cEmpty\u003cA\u003e, B\u003e empty);\n\n    public abstract boolean isConsumed();\n\n    public abstract Reply\u003cA\u003e getReply();\n\n    public static class Consumed\u003cA\u003e extends ConsumedT\u003cA\u003e {\n\n        // Lazy Reply supplier.\n        private Supplier\u003cReply\u003cA\u003e\u003e supplier;\n\n        // Lazy-initialised Reply.\n        private Reply\u003cA\u003e reply;\n\n        Consumed(Supplier\u003cReply\u003cA\u003e\u003e supplier) {\n            this.supplier = supplier;\n        }\n\n        public boolean isConsumed() {\n            return true;\n        }\n\n        @Override\n        public Reply\u003cA\u003e getReply() {\n            if (supplier != null) {\n                reply = supplier.get();\n                supplier = null;\n            }\n\n            return reply;\n        }\n\n        @Override\n        public \u003cB\u003e B match(Function\u003cConsumed\u003cA\u003e, B\u003e consumed, Function\u003cEmpty\u003cA\u003e, B\u003e empty) {\n            return consumed.apply(this);\n        }\n    }\n\n    public static class Empty\u003cA\u003e extends ConsumedT\u003cA\u003e {\n        public final Reply\u003cA\u003e reply;\n\n        public Empty(Reply\u003cA\u003e reply) {\n            this.reply = reply;\n        }\n\n        public boolean isConsumed() {\n            return false;\n        }\n\n        @Override\n        public Reply\u003cA\u003e getReply() {\n            return reply;\n        }\n\n        @Override\n        public \u003cB\u003e B match(Function\u003cConsumed\u003cA\u003e, B\u003e consumed, Function\u003cEmpty\u003cA\u003e, B\u003e empty) {\n            return empty.apply(this);\n        }\n    }\n}\n```\n\nWe can then construct `ConsumedT` instances using a lambda function with an empty argument list:\n\n```java\nConsumedT\u003cS, A\u003e cons = consumed(() -\u003e ok(...));\n```\n\nThe final of the three Haskell types is `Parser a`,\nwhich is a type synonym for a function from `String` to `Consumed a`.\nWe can model this as a functional interface in Java (Java 8 that is):\n\n```java\n@FunctionalInterface\npublic interface Parser\u003cA\u003e {\n    ConsumedT\u003cA\u003e parse(String input);\n}\n```\n\nSince `Parser` is a functional interface we can construct `Parser` instances using the Java 8 lambda syntax:\n\n```java\nParser\u003cInteger\u003e p = s -\u003e { ... };\n```\n\n### \"Basic combinators\"\n\nSection 3.1 of the paper outlines the implementation of the core combinators.\n\n#### The `return` Combinator\n\nThe `return` combinator:\n\n```haskell\nreturn x\n= \\input -\u003e Empty (Ok x input)\n```\n\nhas to be renamed in Java as `return` is a reserved word, however the definition otherwise maps fairly easily:\n\n```java\npublic static \u003cA\u003e Parser\u003cA\u003e retn(A x) {\n    return input -\u003e empty(ok(x, input));\n}\n```\n\n#### The `satisfy` Combinator\n\nThe `satisfy` combinator applies a predicate `test` to the next symbol on the input:\n\n```haskell\nsatisfy :: (Char -\u003e Bool) -\u003e Parser Char\nsatisfy test\n  = \\input -\u003e case (input) of\n      [] -\u003e Empty Error\n      (c:cs) | test c -\u003e Consumed (Ok c cs)\n             | otherwise -\u003e Empty Error\n```\n\nHere the combinator is returning a function that is a `Parser`.\nUsing Java 8 lambda functions we can define `satisfy` in a similar fashion:\n\n```java\npublic static Parser\u003cCharacter\u003e satisfy(Predicate\u003cCharacter\u003e test) {\n    return input -\u003e {\n        if (!input.isEmpty()) {\n            final char c = input.charAt(0);\n            if (test.test(c)) {\n                return consumed(() -\u003e ok(c, input.substring(1)));\n            } else {\n                return empty(error());\n            }\n        } else {\n            return empty(error());\n        }\n    };\n}\n```\n\n#### The `bind` Combinator\n\nThe bind combinator in Haskell is implemented as the `\u003e\u003e=` operator:\n\n```haskell\n(\u003e\u003e=) :: Parser a -\u003e (a -\u003e Parser b) -\u003e Parser b\np \u003e\u003e= f\n  = \\input -\u003e case (p input) of\n      Empty reply1\n        -\u003e case (reply1) of\n             Ok x rest -\u003e ((f x) rest)\n             Error -\u003e Empty Error\n      Consumed reply1\n        -\u003e Consumed\n           (case (reply1) of\n              Ok x rest\n                    -\u003e case ((f x) rest) of\n                         Consumed reply2 -\u003e reply2\n                         Empty reply2 -\u003e reply2\n              error -\u003e error\n           )\n```\n\nJava doesn't support custom operators so we will implement this as a `bind` function:\n\n```java\npublic static \u003cA, B\u003e Parser\u003cB\u003e bind(\n        Parser\u003c? extends A\u003e p,\n        Function\u003cA, Parser\u003cB\u003e\u003e f) {\n    return input -\u003e\n        p.parse(input).\u003cConsumedT\u003cB\u003e\u003ematch(\n            cons -\u003e consumed(() -\u003e\n                cons.getReply().\u003cReply\u003cB\u003e\u003ematch(\n                    ok -\u003e f.apply(ok.result).parse(ok.rest).getReply(),\n                    error -\u003e error()\n                )\n            ),\n            empty -\u003e empty.getReply().\u003cConsumedT\u003cB\u003e\u003ematch(\n                ok -\u003e f.apply(ok.result).parse(ok.rest),\n                error -\u003e empty(error())\n            )\n        );\n}\n```\n\n## Parser Monad\n\nThe `retn` and `bind` combinators are slightly special as they are what make `Parser` a monad.\nThe key point is that they observe the three [monad laws](https://www.haskell.org/haskellwiki/Monad_laws):\n\n1. **Left Identity** : `retn(a).bind(f)` = `f.apply(a)`\n1. **Right Identity** : `p.bind(x -\u003e retn(x)` = `p`\n1. **Associativity** : `p.bind(f).bind(g)` = `p.bind(x -\u003e f.apply(x).bind(g))`\n\nwhere `p` and `q` are parsers, `a` is a parse result, and `f` a function from a parse result to a parser.\n\nOr, using the standalone `bind` function instead of the fluent `Parser.bind` method:\n\n1. **Left Identity** : `bind(retn(a), f)` = `f.apply(a)`\n1. **Right Identity** : `bind(p, x -\u003e retn(x))` = `p`\n1. **Associativity** : `bind(bind(p, f), g)` = `bind(p, x -\u003e bind(f.apply(x), g))`\n\nThe first two laws tell us that `retn` acts as an identity of the `bind` operation.\nThe third law tells us that when we have three parser expressions being combined with `bind`,\nthe order in which the expressions are evaluated has no effect on the result.\nThis becomes relevant when using the fluent chaining,\nas it means we do not need to worry too much about bracketing when chaining parsers.\nThe intent becomes (slightly) more clear if we add some redundant brackets to the equality:\n\n`(p.bind(f)).bind(g)` = `p.bind(x -\u003e (f.apply(x).bind(g)))`\n\nIt's analogous to associativity of addition over numbers,\nwhere *a+b+c* yields the same result regardless of whether we evaluate it as *(a+b)+c* or *a+(b+c)*.\n\nAlso of note is the `fail` parser, which is a monadic zero,\nsince if combined with any other parser the result is always a parser that fails.\n\n### Proving the Laws\n\nGiven the above definitions of `retn` and `bind` we can attempt to prove the monad laws.\nNote, that since the `retn` and `bind` combinators have been defined as pure functions,\nthey are referentially transparent,\nmeaning we can substitute the function body in place of calls to the function when reasoning about the combinators.\n\n#### Left Identity\n\nThis law requires that `retn(a).bind(f)` = `f.apply(a)`.\nWe prove this by reducing the LHS to the same form as the RhS through a series of steps.\n\nTaking the LHS as the starting point:\n\n```java\nretn(a).bind(f)\n```\n\nwe can reduce this by substituting the definition of the `retn` function in place of the call to the function:\n\n\u0026#8594; (from the definition of `retn`)\n```java\n(input -\u003e empty(ok(a, input))).bind(f)\n```\n\nLikewise now we substitute the definition of `bind`, and so on:\n\n\u0026#8594; (from the definition of `bind`)\n```java\ninput -\u003e empty(ok(a, input)).match(\n    cons -\u003e consumed(() -\u003e\n        cons.getReply().match(\n            ok -\u003e f.apply(ok.result).parse(ok.rest).getReply(),\n            error -\u003e error()\n        )\n    ),\n    empty -\u003e empty.getReply().match(\n        ok -\u003e f.apply(ok.result).parse(ok.rest),\n        error -\u003e empty(error())\n    )\n)\n```\n\n\u0026#8594; (from definition of `ConsumedT.match`)\n```java\ninput -\u003e empty(ok(a, input)).getReply().match(\n    ok -\u003e f.apply(ok.result).parse(ok.rest),\n    error -\u003e empty(error())\n)\n```\n\n\u0026#8594; (from definition of `Empty.getReply`)\n```java\ninput -\u003e ok(a, input).match(\n    ok -\u003e f.apply(ok.result).parse(ok.rest),\n    error -\u003e empty(error())\n)\n```\n\n\u0026#8594; (from definition of `Reply.match`)\n```java\ninput -\u003e f.apply(ok(a, input).result).parse(ok(a, input).rest)\n```\n\n\u0026#8594; (from definition of `Ok.result`)\n```java\ninput -\u003e f.apply(a).parse(ok(a, input).rest)\n```\n\n\u0026#8594; (from definition of `Ok.rest`)\n```java\ninput -\u003e f.apply(a).parse(input);\n```\n\n\u0026#8594; (function introduction and application cancel out)\n```java\nf.apply(a);\n```\n\u0026#8718;\n\nI.e. we have shown the LHS of the first law can be reduced to RHS, in other words we have proved to law to hold.\n\n#### Right Identity\n\nThis law requires that `p.bind(x -\u003e retn(x))` = `p`\n\nAgain taking the LHS:\n\n```java\np.bind(x -\u003e retn(x))\n```\n\nwe can reduce this as follows:\n\n\u0026#8594; (from the definition of `retn`)\n```java\np.bind(x -\u003e input -\u003e `empty(ok(x, input))\n```\n\n\u0026#8594; (from the definition of `bind`)\n```java\ninput -\u003e\n    p.apply(input).match(\n        cons -\u003e consumed(() -\u003e\n            cons.getReply().match(\n                ok -\u003e (x -\u003e input2 -\u003e empty(ok(x, input2))).apply(ok.result).parse(ok.rest).getReply(),\n                error -\u003e error()\n            )\n        ),\n        empty -\u003e empty.getReply().match(\n            ok -\u003e (x -\u003e input2 -\u003e empty(ok(x, input2))).apply(ok.result).parse(ok.rest),\n            error -\u003e empty(error())\n        )\n    )\n```\n\n\u0026#8594; (function application)\n```java\ninput -\u003e\n    p.apply(input).match(\n        cons -\u003e consumed(() -\u003e\n            cons.getReply().match(\n                ok -\u003e (input2 -\u003e empty(ok(ok.result, input2))).parse(ok.rest).getReply(),\n                error -\u003e error()\n            )\n        ),\n        empty -\u003e empty.getReply().match(\n            ok -\u003e (input2 -\u003e empty(ok(ok.result, input2))).parse(ok.rest),\n            error -\u003e empty(error())\n        )\n    )\n```\n\n\u0026#8594; (function application)\n```java\ninput -\u003e\n    p.apply(input).match(\n        cons -\u003e consumed(() -\u003e\n            cons.getReply().match(\n                ok -\u003e empty(ok(ok.result, ok.rest)).getReply(),\n                error -\u003e error()\n            )\n        ),\n        empty -\u003e empty.getReply().match(\n            ok -\u003e empty(ok(ok.result, ok.rest)),\n            error -\u003e empty(error())\n        )\n    )\n```\n\n\u0026#8594; (from definition of Ok)\n```java\ninput -\u003e\n    p.apply(input).match(\n        cons -\u003e consumed(() -\u003e\n            cons.getReply().match(\n                ok -\u003e empty(ok).getReply(),\n                error -\u003e error()\n            )\n        ),\n        empty -\u003e empty.getReply().match(\n            ok -\u003e empty(ok),\n            error -\u003e empty(error())\n        )\n    )\n```\n\n\u0026#8594; (simplification)\n```java\ninput -\u003e\n    p.apply(input).match(\n        cons -\u003e consumed(() -\u003e cons),\n        empty -\u003e empty\n    )\n```\n\n\u0026#8594; (simplification)\n```java\ninput -\u003e\n    p.apply(input).match(\n        cons -\u003e cons,\n        empty -\u003e empty\n    )\n```\n\n\u0026#8594; (simplification)\n```java\ninput -\u003e p.apply(input)\n```\n\n\u0026#8594; (simplification)\n```java\np\n```\n\n\u0026#8718;\n\nAgain, we have reduced the LHS of the law to the same form as the RHS, proving the law holds.\n\n#### Associativity\n\nProving the associativity law is a little more involved than the other two laws, and is beyond the scope of this document.\nOne approach would be to first note that the expression `p.parse(s)`,\nthat is the Parser `p` applied to an input `s`,\nmust yield one of the following four outputs:\n\n* `consumed(ok(a, r))`\n* `consumed(error())`\n* `empty(ok(a, r))`\n* `empty(error())`\n\nand then proving the law holds for each of these cases.\n\n# Related Work\n\nAs mentioned at the outset, ParsecJ is based on the [Parsec paper](http://research.microsoft.com/en-us/um/people/daan/download/papers/parsec-paper.pdf).\nThe current incarnation of the [Haskell Parsec](https://hackage.haskell.org/package/parsec) library has evolved considerably since the paper,\nhowever it still essentially follows the same monadic combinator approach.\n\n[JParsec](https://github.com/jparsec/jparsec) is an existing Java port of Parsec.\nWhile it follows a similar combinator approach,\nthe implementation of the parsers themselves use a much more object-oriented style as opposed to the more functional style of ParsecJ.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjon-hanson%2Fparsecj","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjon-hanson%2Fparsecj","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjon-hanson%2Fparsecj/lists"}