{"id":22007258,"url":"https://github.com/s3b4s/monpar","last_synced_at":"2025-08-09T14:09:34.262Z","repository":{"id":57302591,"uuid":"403130902","full_name":"S3B4S/monpar","owner":"S3B4S","description":"A monadic parser implemented in TS that an be used to create various kinds of parsers, such as HTML, JSON or CSV parsers.","archived":false,"fork":false,"pushed_at":"2022-09-28T20:51:24.000Z","size":215,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-14T03:18:16.322Z","etag":null,"topics":["functional-programming","parser","parser-combinators","parsing"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/S3B4S.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-04T18:37:58.000Z","updated_at":"2024-09-09T14:24:21.000Z","dependencies_parsed_at":"2022-09-20T19:50:51.614Z","dependency_job_id":null,"html_url":"https://github.com/S3B4S/monpar","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S3B4S%2Fmonpar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S3B4S%2Fmonpar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S3B4S%2Fmonpar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/S3B4S%2Fmonpar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/S3B4S","download_url":"https://codeload.github.com/S3B4S/monpar/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227247926,"owners_count":17753566,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["functional-programming","parser","parser-combinators","parsing"],"created_at":"2024-11-30T01:19:20.344Z","updated_at":"2024-11-30T01:19:21.052Z","avatar_url":"https://github.com/S3B4S.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MonPar\nThis is an unambiguous (meaning it expects one correct output for each input) parser that makes use of combinators to combine small pieces of parsers to create bigger ones. Try it out straight away on [runkit](https://npm.runkit.com/monpar).\n\n# Guide\n- [Installing](#installing)\n- [Parser type](#parser-type)\n- [Combining](#combining)\n- [Helpers](#helpers)\n  * [take](#take)\n  * [peek](#peek)\n  * [char](#char)\n  * [sat](#sat)\n  * [alt](#alt)\n  * [alts](#alts)\n  * [guards](#guards)\n  * [token](#token)\n  * [sentence](#sentence)\n  * [tap](#tap)\n  * [logId](#logid)\n  * [unpack](#unpack)\n- [Why alt(s) can take \"thunks\"](#why-alts-can-take-thunks)\n- [Credits](#credits)\n\n\u003csmall\u003e\u003ci\u003e\u003ca href='http://ecotrust-canada.github.io/markdown-toc/'\u003eTable of contents generated with markdown-toc\u003c/a\u003e\u003c/i\u003e\u003c/small\u003e\n\n## Installing\n```\nnpm i \"monpar\"\n```\n\n## Parser type\nLet's think about what we want a parser to achieve for a second, let's take a simple string.\n\n```ts\n\"Hello world\"\n```\n\nOur parser needs to recognize the structure and extract the desired content `Hello world`. However, the parser can also fail, imagine we were given the following instead while we were only expecting alphabetical characters.\n\n```ts\n\"{ value: 5 }\"\n```\n\nThus, we can consider our parser essentially a function that takes in an input, processes it, and then returns some output indicating that the parsing went right and returns the desired content, or that the parsing has failed.\n\nIn other words, take string as input, return result of parsing:\n\n```ts\ntype Parser\u003cT\u003e = (inp: string) =\u003e ParserRes\u003cT\u003e\n```\n\nWhat does the result of a parser look like? There are many options we could choose, this library chooses to encapsulate the result in a list, thus, the output is a list of the parsed result, or an empty list if it has failed.\n\n```ts\ntype ParserRes\u003cT\u003e = T[]\n```\n\nHere comes something confusing, we don't just return a list, we return a list of results, and each result itself is a tuple that contains the output of the parser in the first position, and the remainder of the string in the second position.\n\n```ts\ntype ParserRes\u003cT\u003e = [T, string][]\n\nparseHTML(\"\u003cp\u003eThis is inner text\u003c/p\u003e\") // -\u003e [[\"This is inner text\", \"\"]]\n// This went well, and there is nothing left to parse!\n\nparseHTML(\"{ value: 5 }\") // -\u003e []\n// Ouch, an empty list, something went wrong.\n```\n\nSo, this implies that a parser can just partially parse a string, and return its result and then pass the remainder of the string along. Imagine we want to just parse the opening tag and extract the name of the element:\n\n```ts\nparseOpeningTag(\"\u003cp\u003eThis is inner text\u003c/p\u003e\") // -\u003e [[\"p\", \"This is inner text\u003c/p\u003e\"]]\n```\n\nSo, our parsers are essentially functions that take an input and return the result of parsing and might fail or succeed.\n\n## Combining\nAs mentioned in the introduction, the core idea behind this parser is to combine small blocks of parsers to form a bigger parser. What does this look like? Let's start with a small parser:\n```ts\ntake(\"\u003cp\u003eThis is inner text\u003c/p\u003e\") // -\u003e [[\"\u003c\", \"p\u003eThis is inner text\u003c/p\u003e\"]]\n```\nThe `take` parser is a parser exported from the library and it does something very simple, it just takes the first character of the string and passes the remainder of the string along.\n\nBut what if we would like to take not one, but *two* characters of the string? We could do this\n```ts\nconst parserHasFailed = \u003cT\u003e(result: ParserRes\u003cT\u003e): boolean =\u003e result.length === 0\n\nconst takeTwo = inp =\u003e {\n  const res = take(\"\u003cp\u003eThis is inner text\u003c/p\u003e\")\n  // Check if parser didn't return empty list\n  if (parserHasFailed(res)) return [];\n  const [[v, rem]] = res;\n\n  // Parse for second element\n  const res2 = take(rem);\n  if (parserHasFailed(res2)) return [];\n  const [[v2, rem2]] = res2;\n  return [[v + v2, rem2]]\n}\n```\nSo, we're basically saying to run `take` twice, that if it at any point fails it should return `[]`, and if all goes well, return the 2 concatenated characters and the remainder of the string. This doesn't look nice now, does it? Gladly, the library provides a function that encapsulates this behavior.\n```ts\nconst takeTwo = bind(take, x =\u003e bind(take, y =\u003e inp =\u003e [[x + y, inp]]))\n```\nLet's break it down. The function `bind` takes a parser and another function that takes out the result of the previously given parser.\n\nSo, if we have\n```ts\nconst log = bind(take, x =\u003e {\n  console.log(x);\n  return inp =\u003e [[x, inp]];\n})\nlog(\"\u003cp\u003e\")\n// logs: \u003c\n// returns: [[\"\u003c\", \"p\u003e\"]]\n```\n\nOkay so the function in the second position has access to the parsed output of the `take`, let's consider the type of `bind`:\n```ts\nconst bind = \u003cA, B\u003e(parser: Parser\u003cA\u003e, fn: (a: A) =\u003e Parser\u003cB\u003e): Parser\u003cB\u003e = { /* ... */ }\n```\n\nSo, considering that `bind` also returns a parser, we can keep chaining it. This allows us to gain access to the outputs of multiple parsers at once:\n```ts\nconst takeTwo = bind(take, x =\u003e // x is the output of the first take\n                bind(take, y =\u003e // y is the output of the second take\n                  inp =\u003e [[x + y, inp]] // return parser that concatenates x \u0026 y and returns the remainder of the input\n                ))\n\ntakeTwo(\"\u003cp\u003e\") // -\u003e [[\"\u003cp\", \"\u003e\"]]\n```\n\nNow you might be thinking, but what if the second take would fail? We had to check for the results ourselves in the first `takeTwo` we implemented ourselves.\n```ts\ntakeTwo(\"\u003c\")\n```\nThe wonderful thing about `bind` is that checking for whether a parser has returned a valid result is built-in. If at any point a parser fails, the result will be an empty list `[]`.\n```ts\ntakeTwo(\"\u003c\") // []\n```\n\nHowever, we're still not there yet, if we really would like to combine our parsers we need more, imagine we want to take 4 elements:\n```ts\nconst takeFour =  bind(take, w =\u003e \n                  bind(take, x =\u003e\n                  bind(take, y =\u003e \n                  bind(take, z =\u003e \n                    inp =\u003e [[w + x + y + z, \"\"]]\n                  ))))\n```\nThat doesn't look nice either now does it?\n\nFor this, we have another function: `liftAs`\n```ts\nconst liftAs = \u003cT\u003e(fn: any, ...fns: Parser\u003cany\u003e[]): Parser\u003cT\u003e = { /* ... */ }\n\nconst takeFour = liftAs(\n  w =\u003e x =\u003e y =\u003e z =\u003e w + x + y + z,\n  take,\n  take,\n  take,\n  take,\n)\n```\n\nConsider `liftAs` syntactic sugar that helps you avoid nesting all those `bind`'s. The first function supplied is a curried function that takes a number of parameters equal to the parsers that come afterward and the order is maintained, meaning:\n```ts\nconst takeFour = liftAs(\n  (w: string) =\u003e (x: string) =\u003e (y: string) =\u003e (z: string) =\u003e w + x + y + z,\n  take, // this supplies w\n  take, // this supplies x\n  take, // this supplies y\n  take, // this supplies z\n)\n```\n\nAnother thing to note: the first function does not return a parser anymore! It just returns the value in the way you would like to combine it, so we don't need to worry about returning a parser as well.\n\nAnd of course, `liftAs` returns a parser, so we can use the output of that to keep combining parsers.\n\n```ts\nconst takeTwo = liftAs(\n  (x: string) =\u003e (y: string) =\u003e x + y,\n  take,\n  take,\n)\n\nconst takeFour = liftAs(\n  (x: string) =\u003e (y: string) =\u003e x + y,\n  takeTwo,\n  takeTwo,\n)\n\ntakeFour(\"\u003cp\u003eThis is inner text\u003c/p\u003e\")\n// -\u003e [[\"\u003cp\u003eT\", \"his is inner text\u003c/p\u003e\"]]\n```\n\nI hope that at this point the reader at least knows how to use `liftAs`, this will be your biggest friend when using this parser library.\n\n## Helpers\nThis library provides many utility parsers to get started with, this section will detail how to use these utility parsers.\n\n### take\nTakes 1 character out of the input\n```ts\ntype take = Parser\u003cstring\u003e\n\ntake(\"\u003cp\u003e\") // [[\"\u003c\", \"p\u003e\"]]\ntake(\"\") // []\n```\n\n### peek\nShows the first character but does not affect the input\n```ts\ntype peek = Parser\u003cstring\u003e\n\npeek(\"\u003cp\u003e\") // [[\"\u003c\", \"\u003cp\u003e\"]]\npeek(\"\") // [[\"\", \"\"]]\n```\n\n### char\nChecks if given character appears in input, if it matches, extract it, else fail.\n```ts\ntype char = (c: string) =\u003e Parser\u003cstring\u003e\n\nconst star = char(\"*\")\nstar(\"*\") // [[\"*\", \"\"]]\nstar(\"-\") // []\n```\n\n### sat\nGiven a predicate that takes the first character as input check whether that holds, if it does, return the character, else fail.\n```ts\ntype sat = (pred: (s: string) =\u003e boolean) =\u003e Parser\u003cstring\u003e\n\nconst numeric = sat(c =\u003e /[0-9]/.test(c))\nnumeric(\"007\") // [[\"0\", \"07\"]]\nnumeric(\"Bond\") // []\n```\n\n### alt\nGiven two parsers, going from left to right, return any successful result encountered, else fail.\nNote that both parsers should return the same result.\nThe `LazyVal\u003cParser\u003cT\u003e\u003e` type might surprise you a bit, despite a `Parser\u003cT\u003e` type being passed in the example, the need for `LazyVal\u003cParser\u003cT\u003e\u003e` is explained in [this section](#why-alts-can-take-thunks).\n```ts\ntype alt = \u003cT\u003e(parserA: LazyVal\u003cParser\u003cT\u003e\u003e, parserB: LazyVal\u003cParser\u003cT\u003e\u003e) =\u003e Parser\u003cT\u003e\n\nconst numeric = sat(c =\u003e /[0-9]/.test(c))\nconst alpha = sat(c =\u003e /[a-zA-Z]/.test(c));\n\nconst alphaNumeric = alt(numeric, alpha)\n\nalphaNumeric(\"0123\") // [[\"0\", \"123\"]]\nalphaNumeric(\"abc\") // [[\"a\", \"bc\"]]\nalphaNumeric(\"****\") // []\n```\n\n### alts\nTakes in a list of functions that return parsers, goes through the entire list until it finds a parser that successfully returns a result, else fails.\nThis is a variation of `alt` where you can pass a list of parsers.\nThe `LazyVal\u003cParser\u003cT\u003e\u003e` type might surprise you a bit, despite a `Parser\u003cT\u003e` type being passed in the example, the need for `LazyVal\u003cParser\u003cT\u003e\u003e` is explained in [this section](#why-alts-can-take-thunks).\n```ts\ntype alts = \u003cT\u003e(...parsers: LazyVal\u003cParser\u003cT\u003e\u003e[]) =\u003e Parser\u003cT\u003e\n\nconst numeric = sat(c =\u003e /[0-9]/.test(c))\nconst alpha = sat(c =\u003e /[a-zA-Z]/.test(c));\nconst star = char(\"*\")\n\nconst alphaNumericOrStar = alts(\n  numeric,\n  alpha,\n  star,\n)\n\nalphaNumericOrStar(\"0123\") // [[\"0\", \"123\"]]\nalphaNumericOrStar(\"abc\") // [[\"a\", \"bc\"]]\nalphaNumericOrStar(\"*****\") // [[\"*\", \"****\"]]\nalphaNumericOrStar(\"----\") // []\n```\n\n### guards\nThese are exported parsers that will take the first character when it fulfills the predicate.\nThough, these are pretty simple, I encourage the reader to create their own such guards suited for their use case.\nThe name guard here is made up arbitrarily and doesn't carry a heavy meaning.\n```ts\ntype sat = (pred: (s: string) =\u003e boolean) =\u003e Parser\u003cstring\u003e\n// -\u003e\ntype guard = Parser\u003cstring\u003e\n\n// Each of these are referred to as a guard\nconst alpha = sat(c =\u003e /[a-zA-Z]/.test(c));\nconst numeric = sat(c =\u003e /[0-9]/.test(c));\nconst alphaNumeric = alt(alpha, () =\u003e numeric);\nconst space = sat(eq(\" \"));\nconst whitespace = sat(c =\u003e /[\\n\\t ]/.test(c));\n```\n\n### token\nStrips away all whitespace around given parser.\n```ts\ntype token = \u003cT\u003e(parser: Parser\u003cT\u003e) =\u003e Parser\u003cT\u003e\n\nconst pTag = token(sentence(\"\u003cp\u003e\"))\n\npTag(\"    \u003cp\u003e    \") // [[\"\u003cp\u003e\", \"\"]]\npTag(\"    \u003cp\u003e    Inner text     \u003c/p\u003e  \") // [[\"\u003cp\u003e\", \"Inner text     \u003c/p\u003e  \"]]\npTag(\"    \u003ch1\u003e    \") // []\n```\n\n### sentence\nChecks if given string appears in input, if it matches, extract it, else fail.\n```ts\ntype sentence = (str: string) =\u003e Parser\u003cstring\u003e\n\nconst pTag = sentence(\"\u003cp\u003e\")\n\npTag(\"\u003cp\u003eInner text\u003c/p\u003e\") // [[\"\u003cp\u003e\", \"Inner text\u003c/p\u003e\"]]\npTag(\"\u003ch1\u003eHeader\u003c/h1\u003e\") // []\n```\n\n### tap\nThis is a parser that helps with debugging, the supplied function will be applied to the input and then the input gets passed along.\n```ts\ntype tap = (tapFn: (s: string) =\u003e void) =\u003e Parser\u003cundefined\u003e\n\ntap(inp =\u003e { /* have access to inp to inspect if, log for example */ })\nconst log = tap(inp =\u003e { console.log(inp) })\nlog(\"\u003cp\u003e\")\n// logs: \u003cp\u003e\n// returns: [[undefined, \"\u003cp\u003e\"]]\n```\n\n### logId\nThis is a parser that helps with debugging, if you wrap a parser in this, the input will be logged and execution will pass on.\n```ts\ntype logId = \u003cT\u003e(parser: Parser\u003cT\u003e) =\u003e Parser\u003cT\u003e\nconst pTag = sentence(\"\u003cp\u003e\")\nlogId(pTag)(\"\u003cp\u003e\")\n// logs: \u003cp\u003e\n// returns: [[\"\u003cp\u003e\", \"\"]]\n```\n\n### unpack\nWhen you would like to unpack the parsed result out of `Parser\u003cT\u003e`.\nUnpacking will happen successfully if\n  - The parser returns a successful result\n  - In the result, the remainder of the input is empty (meaning, unpack expects the entire string to have gone through the parser)\n\nElse it will return `undefined`.\n```ts\ntype unpack = \u003cT\u003e(parser: Parser\u003cT\u003e) =\u003e (inp: string) =\u003e T | undefined\n\nconst pTag = sentence(\"\u003cp\u003e\")\nunpack(pTag)(\"\u003cp\u003e\") // \"\u003cp\u003e\"\nunpack(pTag)(\"\u003cp\u003eInner text\u003c/p\u003e\") // undefined\nunpack(pTag)(\"\u003ch1\u003e\") // undefined\n```\n\n## Why alt(s) can take \"thunks\"\nOne thing that we'd like to do with parsing is to be able to call it recursively. Imagine we have the following input.\n```ts\nconst input = `\n  \u003chtml\u003e\n    \u003cbody\u003e\n      Main text!\n    \u003c/body\u003e\n  \u003c/html\u003e\n`\n```\n\nAnd we have a parser that can parse away an opening tag, a closing tag and checks whether the part in the middle is text, else assume that it's another node and recursively calls itself.\n\n```ts\nimport { alt, liftAs } from \"monpar\"\n\nconst parseHTMLNode = liftAs(\n  (tag: string) =\u003e (child: Node | string) =\u003e () =\u003e ({ node: tag, child }),\n  parseOpeningTag,\n  alt(parseInnerText, parseHTMLNode),\n  parseClosingTag,\n)\n```\n\nThis will give us an error because the variable can't refer to itself from within, so, we have to convert this to a function and recursively call itself to get the parser:\n\n```ts\nimport { alt, liftAs } from \"monpar\"\n\nconst parseHTMLNode = () =\u003e liftAs(\n  (tag: string) =\u003e (child: Node | string) =\u003e () =\u003e ({ node: tag, child }),\n  parseOpeningTag,\n  alt(parseInnerText, parseHTMLNode()),\n  parseClosingTag,\n)\n```\n\nSo, now we can correctly call `parseHTMLNode` recursively, but, another issue arises now. Because JavaScript will evaluate the argument before passing it down, this will cause an infinite loop. But that shouldn't be necessary right? Because if `parseInnerText` would succeed in `alt`, we don't want to even evaluate the second parser. Thus, the solution here is to pass a \"thunk\", meaning, wrap it in a function and only evaluate when you do need it:\n\n```ts\nimport { alt, liftAs, thunk } from \"monpar\"\n\nconst parseHTMLNode = () =\u003e liftAs(\n  (tag: string) =\u003e (child: Node | string) =\u003e () =\u003e ({ node: tag, child }),\n  parseOpeningTag,\n  alt(parseInnerText, thunk(parseHTMLNode),\n  parseClosingTag,\n)\n```\nNow `alt` takes a thunk for the second parameter and only evaluates it if the first would fail, thus we don't have the issue of infinite recursion.\n\nThe type of the thunk simply looks like\n```ts\ntype LazyVal\u003cT\u003e = (() =\u003e T) | T\n```\n\nSo really, all it means is that the given argument might be wrapped in a function so we can delay the evaluation (call it when we need it, in other words, it's lazy).\n\nThe `thunk` function is the following:\n```ts\nexport const thunk = \u003cT\u003e(x: T): LazyVal\u003cT\u003e =\u003e () =\u003e x\n```\nIt just wraps the given argument in a function, thus, the following lines are equivalent:\n```ts\nalt(parseInnerText, thunk(parseHTMLNode),\nalt(parseInnerText, () =\u003e parseHTMLNode()),\nalt(parseInnerText, parseHTMLNode),\n```\n\nOne more thing, both positions in `alt` can take a `LazyVal\u003cParser\u003cT\u003e\u003e`, incuding the one where we don't need it, as the first parser will always be called, it's the parser (or parsers in case of `alts`) that come afterwards we _might_ call.\n\nHowever, purely for ergonomical reasons all of the arguments passed are of type `LazyVal\u003cParser\u003cT\u003e\u003e` so that you can choose to write the following:\n```ts\nalts(\n  thunk(emptyTag),\n  thunk(HTMLelement),\n  thunk(Innertext),\n)\n```\n\ninstead of being forced to do\n```ts\nalts(\n  emptyTag,\n  thunk(HTMLelement),\n  thunk(Innertext),\n)\n```\n\nSo, it's up to the reader what they would prefer, the important thing is knowing that the first parser always gets called, the ones that come afterwards _might_ get called.\n\n## Credits\nShout out to [@emiflake](https://github.com/emiflake) for helping out with the creation of this library.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs3b4s%2Fmonpar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fs3b4s%2Fmonpar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fs3b4s%2Fmonpar/lists"}