{"id":13725837,"url":"https://github.com/masala/masala-parser","last_synced_at":"2026-01-12T06:44:33.519Z","repository":{"id":44165348,"uuid":"59269174","full_name":"masala/masala-parser","owner":"masala","description":"Javascript Generalized Parser Combinators","archived":false,"fork":false,"pushed_at":"2025-01-14T20:56:03.000Z","size":2206,"stargazers_count":148,"open_issues_count":28,"forks_count":11,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-09T21:52:28.566Z","etag":null,"topics":["functional-programming","generalization","monad","parsec","parser-combinator"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-2.1","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/masala.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-05-20T06:03:41.000Z","updated_at":"2025-04-01T08:52:52.000Z","dependencies_parsed_at":"2023-01-25T21:46:41.986Z","dependency_job_id":null,"html_url":"https://github.com/masala/masala-parser","commit_stats":null,"previous_names":["d-plaindoux/masala-parser","d-plaindoux/parsec"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masala%2Fmasala-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masala%2Fmasala-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masala%2Fmasala-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/masala%2Fmasala-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/masala","download_url":"https://codeload.github.com/masala/masala-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249256917,"owners_count":21239079,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["functional-programming","generalization","monad","parsec","parser-combinator"],"created_at":"2024-08-03T01:02:37.016Z","updated_at":"2025-04-16T15:49:45.301Z","avatar_url":"https://github.com/masala.png","language":"JavaScript","readme":"# Masala Parser: Javascript Parser Combinators\n\n[![npm version](https://badge.fury.io/js/%40masala%2Fparser.svg)](https://badge.fury.io/js/%40masala%2Fparser)\n[![Build Status](https://travis-ci.org/d-plaindoux/masala-parser.svg)](https://travis-ci.org/d-plaindoux/masala-parser)\n[![Coverage Status](https://coveralls.io/repos/d-plaindoux/masala-parser/badge.png?branch=master)](https://coveralls.io/r/d-plaindoux/masala-parser?branch=master)\n[![stable](http://badges.github.io/stability-badges/dist/stable.svg)](http://github.com/badges/stability-badges)\n\nMasala Parser is inspired by the paper titled:\n[Direct Style Monadic Parser Combinators For The Real World](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf).\n\nMasala Parser is a Javascript implementation of the Haskell **Parsec**.\n It is plain Javascript that works in the browser, is tested with more than 450 unit tests, covering 100% of code lines.\n\n### Use cases\n\n* It can create a **full parser from scratch**\n* It can extract data from a big text and **replace complex regexp**\n* It works in any **browser**\n* There is a good **typescript** type declaration\n* It can validate complete structure with **variations**\n* It's a great starting point for parser education. It's **way simpler than Lex \u0026 Yacc**.\n* It's designed to be written in other languages (Python, Java, Rust) with the same interface\n\nMasala Parser keywords are **simplicity**, **variations** and **maintainability**. You won't\nneed theoretical bases on languages for extraction or validation use cases.\n\nMasala Parser has relatively good performances, however, Javascript is obviously not the fastest machine.\n\n# Usage\n\nWith Node Js or modern build        \n        \n        npm install -S @masala/parser\n\nOr in the browser \n\n* [download Release](https://github.com/d-plaindoux/masala-parser/releases)\n* `\u003cscript src=\"masala-parser.min.js\"/\u003e`\n\nCheck the [Change Log](./changelog.md) if you can from a previous version.\n\n# Reference\n\nYou will find an [Masala Parser online reference](http://www.robusta.io/masala-parser/ts/modules/_masala_parser_d_.html), generated from typescript interface.\n\n# Quick Examples\n\n## Hello World\n\n```js\nconst helloParser = C.string('hello');\nconst white = C.char(' ');\nconst worldParser = C.string('world');\nconst combinator = helloParser.then(white.rep()).then(worldParser);\n``` \n\n## Floor notation\n\n```js\n// N: Number Bundle, C: Chars Bundle\nconst {Streams, N, C}= require('@masala/parser');\n\nconst stream = Stream.ofString('|4.6|');\nconst floorCombinator = C.char('|').drop()\n    .then(N.number())      // we have ['|', 4.6], we drop '|'\n    .then(C.char('|').drop())   // we have [4.6, '|'], we keep [4.6]\n    .single() // we had [4.6], now just 4.6\n    .map(x =\u003eMath.floor(x));\n\n// The parser parses a stream of characters\nconst parsing = floorCombinator.parse(stream);\nassertEquals( 4, parsing.value, 'Floor parsing');\n```\n\n## Explanations\n\nAccording to Wikipedia *\"in functional programming, a parser combinator is a\nhigher-order function that accepts several parsers as input and returns a new\nparser as its output.\"*\n\n## The Parser\n\nLet's say we have a document :\n\n\u003e\u003e\u003e The James Bond series, by writer Ian Fleming, focuses on a fictional British Secret Service agent created in 1953, who featured him in twelve novels and two short-story collections. Since Fleming's death in 1964, eight other authors have written authorised Bond novels or novelizations: Kingsley Amis, Christopher Wood, John Gardner, Raymond Benson, Sebastian Faulks, Jeffery Deaver, William Boyd and Anthony Horowitz.\n\nThe parser could fetch every name, ie two consecutive words starting with uppercase. \nThe parser will read through the document and aggregate a Response,\n which contains a value and the current offset in the text.\n\nThis value will evolve when the parser will meet new characters, \nbut also with some function calls, such as the `map()` function.\n\n![](./documentation/parsec-monoid.png)\n\n\n\n## The Response\n\nBy definition, a Parser takes text as an input, and the Response is a structure that represents your problem. \nAfter parsing, there are two subtypes of `Response`:\n \n* `Accept` when it found something.    \n* `Reject` if it could not.\n\n\n```js\n\n    let response = C.char('a').rep().parse(Streams.ofString('aaaa'));\n    assertEquals(response.value.join(''), 'aaaa' );\n    assertEquals(response.offset, 4 );\n    assertTrue(response.isAccepted());\n    assertTrue(response.isConsumed());\n    \n    // Partially accepted\n    response = C.char('a').rep().parse(Streams.ofString('aabb'));\n    assertEquals(response.value.join(''), 'aa' );\n    assertEquals(response.offset, 2 );\n    assertTrue(response.isAccepted());\n    assertFalse(response.isConsumed());\n\n```\n\n\n## Building the Parser, and execution  \n\nLike a language, the parser is built then executed. With Masala, we build using other parsers.\n\n```js\nconst helloParser = C.string('hello');\nconst white = C.char(' ');\nconst worldParser = C.char('world');\nconst combinator = helloParser.then(white.rep()).then(worldParser);\n``` \n\nThere is a compiling time when you combine your parser, and an execution time when the parser \nruns its `parse(stream)` function. You will have the `Response` after parsing. \n\n\nSo after building, the parser is executed against a stream of token. \nFor simplicity, we will use a stream of characters, which is a text :)\n \n\n\n## Hello Gandhi\n\nThe goal is to check that we have Hello 'someone', then to grab that name\n\n```js\n// Plain old javascript\nconst {Streams,  C}= require('@masala/parser');\n\nvar helloParser = C.string(\"Hello\")\n                    .then(C.char(' ').rep())\n                    .then(C.letters()) // succession of A-Za-z letters\n                    .last();    // keeping previous letters\n\nvar value = helloParser.val(\"Hello Gandhi\");  // val(x) is a shortcut for parse(Stream.ofString(x)).value;\n\nassertEquals('Gandhi', value);\n```\n\n\n\n\n# Parser Combinations\n\nLet's use a real example. We combine many functions that return a new Parser. And each new Parser\nis a combination of Parsers given by the standard bundles or previous functions.\n\n```js\nimport  {Streams, N,C, F} from '@masala/parser';\n\nconst blanks = ()=\u003eC.char(' ').optrep();\n\nfunction operator(symbol) {\n    return blanks().drop()\n        .then(C.char(symbol))   // '+' or '*'\n        .then(blanks().drop())\n        .single();\n}\n\nfunction sum() {\n    return N.integer()\n        .then(operator('+').drop())\n        .then(N.integer())  // then(x) creates a tuple - here, one value was dropped\n        .map(tuple =\u003e tuple.at(0) + tuple.at(1)); \n        \n}\n\nfunction multiplication() {\n    return N.integer()\n        .then(operator('*').drop())\n        .then(N.integer())\n        .array() // we can have access to the value of the tuple\n        .map( ([left,right])=\u003e left * right); // more modern js \n}\n\nfunction scalar() {\n    return N.integer();\n}\n\nfunction combinator() {\n    return F.try(sum())\n        .or(F.try(multiplication()))    // or() will often work with try()\n        .or(scalar());\n}\n\nfunction parseOperation(line) {\n    return combinator().parse(Streams.ofString(line));\n}\n\nassertEquals(4, parseOperation('2   +2').value, 'sum: ');\nassertEquals(6, parseOperation('2 * 3').value, 'multiplication: ');\nassertEquals(8, parseOperation('8').value, 'scalar: ');\n```\n\nA curry paste is a higher-order ingredient made from a good combination of spices.\n\n![](./documentation/images/curry-paste.jpg)\n\n## Precedence\n\nPrecedence is a technical term for priority. Using:\n\n```js\nfunction combinator() {\n    return F.try(sum())\n        .or(F.try(multiplication()))    // or() will often work with try()\n        .or(scalar());\n}\n\nconsole.info('sum: ',parseOperation('2+2').value);\n```\n\nWe will give priority to sum, then multiplication, then scalar. If we had put `scalar()` first, we would have first\naccepted `2`, then what could we do with `+2` alone ? It's not a valid sum ! Moreover `+2` and `-2` are acceptable scalars. \n\n## try(x).or(y)\n\n\n`or()` will often be used with `try()`, that makes [backtracking](https://en.wikipedia.org/wiki/Backtracking) \n: it saves the current offset, then tries an option. And as soon that it's not satisfied, it goes back to the original \noffset and use the parser inside the `.or(P)` expression.`.\n\n Like Haskell's Parsec, Masala Parser can parse infinite look-ahead grammars but\n performs best on predictive (LL[1]) grammars.\n\nLet see how with `try()`, we can look a bit ahead of next characters, then go back:\n\n        F.try(sum()).or(F.try(multiplication())).or(scalar())\n        // try(sum()) parser in action\n        2         *2\n        ..ok..ok  ↑oups: go back and try multiplication. Should be OK.\n\n\nSuppose we do not `try()` but use `or()` directly:\n\n        sum().or(multiplication()).or(scalar())\n        // testing sum()\n        2         *2\n        ..ok..ok  ↑oups: cursor is NOT going back. So now we must test '*2' ;\n                                                   Is it (multiplication())? No ;\n                                                   or(scalar()) ? neither\n\n\n\n\n# Recursion\n\nMasala-Parser (like Parsec) is a top-down parser and doesn't like [Left Recursion](https://cs.stackexchange.com/a/9971).\n\nHowever, it is a resolved problem for this kind of parsers, with a lot of documentation. You can read more on [recursion\nwith Masala](./documentation/recursion.md), and checkout examples on our Github repository \n( [simple recursion](https://github.com/d-plaindoux/masala-parser/blob/master/integration-npm/examples/recursion/aaab-lazy-recursion.js), \nor [calculous expressions](https://github.com/d-plaindoux/masala-parser/blob/master/integration-npm/examples/operations/plus-minus.js) ).\n\n\n\n# Simple documentation of Core bundles\n\n## Core Parser Functions\n\nHere is a link for [Core functions documentation](./documentation/parser-core-functions.md).\n\nIt will explain `then()`, `drop()`, `map()`, `rep()`, `opt()` and other core functions of the Parser\nwith code examples.\n\n### \n\n## The Chars Bundle\n\nExample: \n\n```js\nC.char('-')\n    .then(C.letters())\n    .then(C.char('-'))\n// accepts  '-hello-' ; value is ['-','hello','-']\n// reject '-hel lo-' because space is not a letter    \n```\n\n[General use](./documentation/chars-bundle.md)\n\n* `letter()`: accept a european letter (and moves the cursor)\n* `letters()`: accepts many letters and returns a string\n* `letterAs(symbol)`: accepts a european(default), ascii, or utf8 Letter. [More here](./documentation/chars-bundle.md)\n* `lettersAs(symbol)`: accepts many letters and returns a string\n* `emoji()`: accept any emoji sequence. [Opened Issue](https://github.com/d-plaindoux/masala-parser/issues/86).\n* `notChar(x)`: accept if next input is not `x`\n* `char(x)`: accept if next input is `x`\n* `charIn('xyz')`: accept if next input is `x`, `y` or `z`\n* `charNotIn('xyz')`: accept if next input is not `x`, `y` or `z`\n* `subString(length)`: accept any next *length* characters and returns the equivalent string\n* `string(word)`: accept if next input is the given `word`  \n* `stringIn(words)`: accept if next input is the given `words` [More here](./documentation/chars-bundle.md)\n* `notString(word)`: accept if next input is *not* the given `word`\n* `charLiteral()`: single quoted char element in C/Java : `'a'` is accepted\n* `stringLiteral()`: double quoted string element in java/json: `\"hello world\"` is accepted\n* `lowerCase()`: accept any next lower case inputs\n* `upperCase()`: accept any next uppercase inputs\n\nOther example:\n\n```js\nC.string('Hello')\n    .then(C.char(' '))\n    .then(C.lowerCase().rep().join(''))\n\n// accepts Hello johnny ; value is ['Hello', ' ', 'johnny']\n// rejects Hello Johnny : J is not lowercase ; no value\n```\n\n## The Numbers Bundle\n\n\n* `number()`: accept any float number, such as -2.3E+24, and returns a float    \n* `digit()`: accept any single digit, and returns a **number**\n* `digits()`: accept many digits, and returns a **number**. Warning: it does not accept **+-** signs symbols.\n* `integer()`: accept any positive or negative integer\n\n\n\n\n## The Flow Bundle\n\nThe flow bundle will mix ingredients together.\n\nFor example, if you have a Parser `p`, `F.not(p)` will accept anything\nthat does not satisfy `p`\n\nAll of these functions will return a brand new Parser that you can combine with others.\n\nMost important:\n\n* `F.try(parser).or(otherParser)`: Try a parser and come back to `otherParser` if failed\n* `F.any()`: Accept any character (and so moves the cursor)\n* `F.not(parser)`: Accept anything that is not a parser. Often used to accept until a given *stop*  \n* `F.eos()`: Accepted if the Parser has reached the **E**nd **O**f **S**tream\n* `F.moveUntil(string|stopParser)`: Alternative for **regex**. Will traverse the document **until** the *stop parser*\n    - returns `undefined` if *stop* is not found\n    - returns all characters if *stop* is found, and set the cursor at the spot of the stop\n* `F.dropTo(string|stopParser)`: Will traverse the document **including** the *stop parser*\n    \n\nOthers:\n\n* `F.lazy(parser, ?params)`: Makes a lazy evaluation. May be used for Left recursion (difficult)\n* `F.parse(parserFunction)`: Create a new Parser from a function. Usually, you won't start here.\n* `F.subStream(length)`: accept any next characters  \n* `F.returns(value)`: forces a returned value\n* `F.error()`: returns an error. Parser will never be accepted\n* `F.satisfy(predicate)`: check if condition is satisfied\n* `F.startsWith(value)`: create a no-op parser with initial value \n\n\n\n\n\n## License\n\nCopyright (C)2016-2025 Didier Plaindoux \u0026 Nicolas Zozol\n\nThis program is  free software; you can redistribute  it and/or modify\nit  under the  terms  of  the GNU  Lesser  General  Public License  as\npublished by  the Free Software  Foundation; either version 2,  or (at\nyour option) any later version.\n\nThis program  is distributed in the  hope that it will  be useful, but\nWITHOUT   ANY  WARRANTY;   without  even   the  implied   warranty  of\nMERCHANTABILITY  or FITNESS  FOR  A PARTICULAR  PURPOSE.  See the  GNU\nLesser General Public License for more details.\n\nYou  should have  received a  copy of  the GNU  Lesser General  Public\nLicense along with this program; see the file COPYING.  If not, write\nto the  Free Software Foundation,  675 Mass Ave, Cambridge,  MA 02139,\nUSA.\n","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasala%2Fmasala-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmasala%2Fmasala-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasala%2Fmasala-parser/lists"}