{"id":13690245,"url":"https://github.com/kkoch986/js-parse","last_synced_at":"2026-01-16T17:30:03.206Z","repository":{"id":18992208,"uuid":"22214013","full_name":"kkoch986/js-parse","owner":"kkoch986","description":"A generic node.js based LR(1) shift-reduce parser.","archived":false,"fork":false,"pushed_at":"2017-06-14T13:08:05.000Z","size":1811,"stargazers_count":12,"open_issues_count":6,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-12T15:43:27.305Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kkoch986.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-07-24T13:33:25.000Z","updated_at":"2023-03-07T00:35:58.000Z","dependencies_parsed_at":"2022-09-25T03:41:31.621Z","dependency_job_id":null,"html_url":"https://github.com/kkoch986/js-parse","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkoch986%2Fjs-parse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkoch986%2Fjs-parse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkoch986%2Fjs-parse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkoch986%2Fjs-parse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kkoch986","download_url":"https://codeload.github.com/kkoch986/js-parse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251998721,"owners_count":21678007,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T16:00:50.453Z","updated_at":"2026-01-16T17:30:03.199Z","avatar_url":"https://github.com/kkoch986.png","language":"JavaScript","funding_links":[],"categories":["Strings"],"sub_categories":[],"readme":"\n# Generic Tokenizer and Modular, Bottom-up incremental parser.\n\nThis project provides a node.js implementation of an LR(1) parser.\nThe parser works in a bottom-up fashion which permits us to do cool things\nlike incremental processing. If you look at the basic examples, you can see\nthat the parser fires callbacks whenever productions in the grammer are produced.\nThis may happen even before the end of the token stream is encountered.\n\nGrammars are specified by a JSON file currently, but i plan to expand on this to use javascript files\nand potential a declarative-type-language-thing depending on how things go.\n\nIn the example (`index.js`) you will see examples of what makes this project cool:\n\n1. __Modules__ -- js-parse provides the ability to build pieces of your grammar in separate files and\nimport then on the fly into the grammar (see `test.json`, `re_wrapper.json` and `re/re.json` for examples).\nThe modules can be used on their own or as part of a larger grammar which makes grammar files more readable,\nmaintainable and reusable.\n1. __Incremental__ -- In true javascript fashion, everything about the parser can be treated as asynchronous.\nThe parser will notify you whenever important pieces of the grammar are recognized so you may start processing\nthem before reaching the end of the input stream.\n\nThings are pretty alpha right now but i'll try to keep this readme up to date and improving as time goes on.\nFeel free to open issues or contribute!\n\n## How it works\n\n### Creating a parser\nCurrently, everything is based around a JSON object called a parser description.\nThe parser description contains all of the info about the grammar and its symbols.\n\nParser descriptions can also be used to build sub-modules which can be included\nto keep description files concise and reusable.\n\nMore info on the format and details for a parser description are included below.\n\n### Parsing\nOnce you have built your parser description, load it into node.js and use it to create\na parser (`Parser.Create` or more commonly, `Parser.CreateWithLexer`). The parser\nemit 2 standard events: `accept` and `error` as well as an event for each non-terminal symbol\nprovided by the parser description or its included modules.\n\nThe `error` event is pretty self-explanatory. If there is a syntax error\nor tokenization error (only if tokenizer is attached using `CreateWithLexer`).\n\nThe `accept` event is fired when the parser has been notified that the stream is over\nand it has built a complete parse tree. This will be called with the entire parse tree\nas an argument and is useful for getting a look at the whole parse tree.\n\nThe `accept` callback is not the only way to interact with the parser, one of the coolest\nfeatures about js-parse is its bottom-up nature. Since js-parse builds parse trees from the smallest\nelements up to the biggest you can actually begin to process the code before parsing the entire\nfile.\n\nFor example, consider the grammar:\n```\nA -\u003e b c D.\nD -\u003e e f g.\n```\n\nIt is possible to bind handlers to the parse such as\n```javascript\nparser.on(\"D\", function(D){ console.log(\"parsed D\"); });\nparser.on(\"A\", function(A){ console.log(\"parsed A\"); });\n```\n\nThese callbacks will be fired as soon as the parser constructs the D or A element,\nno need to wait until the entire stream is processed.\n\n## Basic parsing example\n\n```javascript\nvar Parser = require(\"./lib\").Parser.LRParser;\nvar pd = require(\"./examples/parser_description.json\");\n\n// Create the parser\nvar parser = Parser.CreateWithLexer(pd);\n\nparser.on(\"accept\", function(token_stack){\n\tconsole.log(\"Parser Accept:\", require('util').inspect(token_stack, true, 1000));\n});\n\nparser.on(\"error\", function(error){\n\tconsole.log(\"Parse Error: \", error.message);\n\tthrow error.message;\n});\n\n// Begin processing the input\nvar input = \"[a-zA-Z0-9]+([W]*)[0-9]+\";\nfor(var i in input) {\n\tparser.append(input[i]);\n}\nparser.end();\n\n```\n\n## Writing Parser Descriptions\n\nWriting a parser description is as simple as creating a basic JSON object.\nIt all starts with the basic template:\n\n```json\n{\n\t\t\"symbols\": { },\n\t\t\"productions\": { },\n\t\t\"modules\":{ },\n\t\t\"startSymbols\": [ ]\n}\n```\n\nLet's take a closer look at each section.\n\n### Symbols\n\nThe symbols section is where you define all of the terminal symbols in the grammar.\nAll terminal symbols must be provided in this section along with a regular expression\nto be used by the lexer to create tokens from the input stream.\n\nNon-terminal symbols are optional in this section and only need to be included if\nyou are setting custom options for them.\n\n#### Terminals\n\nTerminal symbols must be defined in the parser description along with a regular expression\nused by the lexer to extract the symbol as a token from the input stream.\n\nLets take a look at a sample terminal symbol definition:\n\n```json\n\"WS\": {\n\t\"terminal\":true,\n\t\"match\":\"[ \\t\\n]+\"\n}\n```\n\nThis is a common token i use when writing parsers, whitespace. Typically, my parsers\nare written to extract the whitespace between tokens and discard it. This symbol is\ndefined so that it will match the longest symbol matching `^[ \\t\\n]+$` will be\nrecognized as a whitespace token and passed along to the parser as such.\n\nThe `terminal` option is required in all symbols. The `match` option is required when\n`terminal` is `true`.\n\nLets say we didn't care about whitespace and didn't want to bog down our grammer with WS\ntokens everywhere. We can include some options to make sure it isn't included:\n\n```json\n\"WS\": {\n\t\"terminal\":true,\n\t\"match\":\"[ \\t\\n]+\",\n\t\"excludeFromProduction\":true,\n\t\"includeInStream\":false\n}\n```\n\nNotice the two new options, `excludeFromProduction` and `includeInStream`.\n\n`includeInStream` - Setting this option to `false` will cause the lexer to recognize\nand discard this token rather than passing it along to the parser.\n\n`excludeFromProduction` - Setting this option to `true` will cause the parser to discard\nthis symbol when it recognizes it in a parse tree structure. In the context above, this\noption would be redundant since the parser would never find out about the WS token anyway.\n\n##### Other options\n\n`matchOnly` - `matchOnly` is useful for tokens like strings in quotes, consider the symbol definition:\n```json\n\"atom_quote\": {\n\t\"terminal\":true,\n\t\"match\":\"\\\"((?:[^\\\"\\\\\\\\]|\\\\\\\\.)*)\\\"\",\n\t\"matchOnly\":1\n},\n```\nIn this case, `matchOnly` will remove the opening and closing quotes from the stream but only include\nthe section of the token given by `(/^\u003cregex\u003e/).exec(string)[matchOnly];` In the case above,\nit will return the part of the string in the first set of captured parenthesis (the string without the quotes).\n\n`matchCaseInsensitive` - This will add the `/i` flag to the regex, causeing it to match regardless\nof case.\n\n`lookAhead` - `lookAhead` is useful when you need to look ahead in the stream to determine which\ntoken to use. Consider the following tokens:\n\n```\n\"KW_endforeach\":{\n\t\"terminal\":true,\n\t\"match\":\"(endforeach)\",\n\t\"matchCaseInsensitive\":true\n},\n\"KW_endfor\":{\n\t\"terminal\":true,\n\t\"match\":\"(endfor)\",\n\t\"matchCaseInsensitive\":true\n},\n```\n\nIts clear that the lexer will always match `endfor` and never match `endforeach` so we\nneed a way to differentiate the two. We can add the following `lookAhead` option to make\nsure that we see the next symbol before matching this token.\n\n```\n\"KW_endforeach\":{\n\t\"terminal\":true,\n\t\"match\":\"(endforeach)\",\n\t\"matchCaseInsensitive\":true\n},\n\"KW_endfor\":{\n\t\"terminal\":true,\n\t\"match\":\"(endfor)\",\n\t\"lookAhead\":\"[^e]\",\n\t\"matchCaseInsensitive\":true\n},\n```\n\nThis means, the token will not be considered a match unless we see a match for `match` followed\nby a match for `lookAhead`. This will allow the lexer to find the `endforeach` token\nwhen appropriate.\n\n\n#### Non-terminals\n\nNon-terminal symbols are not required in the symbol definitions since they are easily\nextracted from the `productions` object described below.\n\nNon-terminal symbols can be included if you want to set some custom options on them.\n\n##### excludeFromProduction\n\nAs described in the terminal symbol section above, the `excludeFromProduction` option is also\navailable for non-terminal symbols. It will cause the parser, upon realizing this symbol\nto process it and continue without adding it to the current production it is trying to build.\n\n##### mergeIntoParent\n\n`mergeIntoParent` goes hand-in-hand with `mergeRecursive` below.\nSetting this option to true for a non-terminal, will cause this productions children\nto be set as arguments to the productions parent. It will cause the current symbol\nto be excluded.\n\nConsider the grammar:\n\n```\nSET_PARENT -\u003e SET.\nSET -\u003e POS_SET | NEG_SET.\n```\n\nThis will produce a parse tree structure like:\n\n```\nSET_PARENT ( SET ( POS_SET( ... ) ) )\n```\n\nThere is really no need in the parse tree to include the `SET` element, so we can\nmerge its children into the `SET_PARENT`'s arguments.\n\nAfter setting `mergeIntoParent` on `SET`, the parse tree will look like this instead:\n\n```\nSET_PARENT ( POS_SET ( ... ) )\n```\n\n##### mergeRecursive\n\n`mergeRecursive` is a pretty powerful tool when building a useful parse tree.\n\nConsider a construct like an argument list. Typically you might define one as such:\n\n```\nArgList -\u003e Arg | Arg ArgList\n```\n\nThis works great, the only problem is when parsing something like this:\n\n```\na(a,b,c,d).\n```\n\nThe `ArgList` parse tree will look something like this:\n\n```\nArgList(\n\t\tArg(a),\n\t\tArgList(\n\t\t\t\tArg(b),\n\t\t\t\tArgList(\n\t\t\t\t\tArg(c),\n\t\t\t\t\tArgList(\n\t\t\t\t\t\t\tArg(d)\n\t\t\t\t\t)\n\t\t\t\t)\n\t\t)\n)\n```\n\nBy providing `mergeRecursive` on the `ArgList` symbol, the parser will detect this\nkind of recursive structure being built and will merge them into a single array.\nProducing something much more tolerable:\n\n```\nArgList( Arg(a), Arg(b), Arg(c), Arg(d) )\n```\n\n##### abandon\n\nThe `abandon` option will cause the matched production to be dropped from the parser\nstack. This means when for all intents and purposes it vanishes from the parse\ntree. I added this option to support really incremental processing.\n\nFor example, the `statement_list` defined in [NodePL](https://github.com/kkoch986/nodepl/blob/master/grammars/source.json#L28-L32).\nIn that case I had no use for the statements once they were already parsed since\nthey were handled by the EventEmitter. What would happen when parsing large files\nwith a few thousand statements is it would parse pretty fast but on the file step\nit would have to traverse all the way back to the beginning of the file. This was\na waste, so instead i drop them from the parse tree and a 10s parse is transformed\ninto a 0.5s parse!\n\n### Productions\n\nDefining the productions is the most crucial part of the parser description.\nIt's where all the action happens.\n\nThis section most closely resembles the BNF-style grammars you're probably used to.\nIt is composed of keys which represent non-terminal symbols and their value is an array\nof string arrays representing the productions which compose the symbol.\n\nFor example consider the grammar:\n```\nA -\u003e b c D\nA -\u003e f\nD -\u003e e f g\n```\n\nThe `productions` object for this would look like this:\n\n```json\n\"productions\":{\n\t\"A\":[\n\t\t[\"b\", \"c\", \"D\"],\n\t\t[\"f\"]\n\t],\n\t\"D\":[\n\t\t[\"e\", \"f\", \"g\"]\n\t]\n}\n```\n\nThats about all there is to it in this section.\n\n### StartSymbols\n\nThe `startSymbols` array is also quite simple, it defines the top-most entities\nin your grammar and will be used to define when to accept the input as a complete structure.\n\nThe `startSymbols` array should be a simple array of strings representing the symbols\nwhich serve as starting points for the grammar. In the example above, if we wanted to\nstart on A only, we would provide `startSymbols` as:\n\n```json\n\"startSymbols\":[\"A\"]\n```\n\n### Modules\n\nOne of the most exciting features of js-parse is its module system.\nIts a very simple system to use and it allows self-contained, reusable\ngrammar modules to be created. The `modules` object is completely optional.\n\nThe `modules` object contains key/value pairs where the key\nis a name to apply to the module and the value is the path to the JSON object which defines it.\n\nConsider the following files:\n\n_a.json_:\n```json\n{\n\t\"symbols\":{\n\t\t\"a\":{\"terminal\":true, \"match\":\"a\"},\n\t\t\"b\":{\"terminal\":true, \"match\":\"b\"}\n\t},\n\t\"productions\":{\n\t\t\"S\":[ \"a\", \"b\", \"a\" ]\n\t},\n\t\"startSymbols\":[\"S\"]\n}\n```\n\n_b.json_:\n```json\n{\n\t\"symbols\":{\n\t\t\"c\":{\"terminal\":true, \"match\":\"c\"},\n\t\t\"d\":{\"terminal\":true, \"match\":\"d\"}\n\t},\n\t\"productions\":{\n\t\t\"S\":[\n\t\t\t[\"MOD_A.S\", \"c\", \"d\", \"c\"]\n\t\t]\n\t},\n\t\"modules\":{\n\t\t\"MOD_A\":\"./a.json\"\n\t},\n\t\"startSymbols\":[\"S\"]\n}\n```\n\nAs you can see, the _a.json_ parser description can be used in the _b.json_ grammar.\nAll of the symbols in _a.json_ are prefixed with the key given in _b.json_.\n\nThe symbols in `A` can also be modified when imported into `B` simply by using the dot syntax.\nFor instance, in _b.json_ we could add the following to the `symbols` object:\n```json\n\"MOD_A.b\":{\"terminal\":true, \"match\":\"Z\"}\n```\n\nNow the `b` symbol in the `A` grammar will be matched only by `Z` (when imported into `B` this way).\n\n## CLI for creating grammar specifications\n\nFor now i have written a CLI tool to generate parser definitions using a BNF-like syntax.\nIt doesn't leverage some of the new features of js-parse but should help serve as both an example\nand a tool for getting started with the JSON grammar specification format.\nThe tool can be invoked using:\n\n```\nnode meta.js\n```\nLines can be entered using the following format:\n\n```\nSYMBOL -\u003e SYMBOL [SYMBOL [SYMBOL ...]] [| SYMBOL].\n```\n\nTo finish, enter `\\q` on a line.\n\nHere is an example run through:\n\n```\nBegin entering the Gramamar (\\q to finish):\n\u003e S -\u003e a S e | B.\n[META] Processing rules for: S\n[META] Using S as start symbol.\n\u003e B -\u003e b B e | C.\n[META] Processing rules for: B\n\u003e C -\u003e c C e | d.\n[META] Processing rules for: C\n\u003e \\q\nGrammar definition:\n{\"symbols\":{\"WS\":{\"includeInStream\":false,\"terminal\":true,\"excludeFromProduction\":true,\"match\":\"[ \\t\\n]+\"},\"a\":{\"terminal\":false},\"S\":{\"terminal\":true,\"match\":\"S\"},\"e\":{\"terminal\":true,\"match\":\"e\"},\"B\":{\"terminal\":false,\"match\":\"B\"},\"b\":{\"terminal\":true,\"match\":\"b\"},\"C\":{\"terminal\":false,\"match\":\"C\"},\"c\":{\"terminal\":true,\"match\":\"c\"},\"d\":{\"terminal\":true,\"match\":\"d\"}},\"productions\":{\"a\":[[\"a\",\"S\",\"e\"],[\"B\"]],\"B\":[[\"b\",\"B\",\"e\"],[\"C\"]],\"C\":[[\"c\",\"C\",\"e\"],[\"d\"]]},\"startSymbols\":[\"a\"]}\n```\n\nIt doesn't support any of the more advanced features available in the parser description\nobject, but its useful for getting started with the basic structure you'll need.\n\n## Last remarks\nThis documentation was but together pretty hastily, so please feel free to open issues, leave comments\nor submit pull requests. js-parse is in pretty early stages and theres a lot of room\nto grow!\n\nAlso note, some of the examples have fallen a little out of date. I will try to update them ASAP.\n\n## TODO:\n1. Cleanup/reorganize code to make something more maintainable.\n1. Fix tests broken by parser conflicts.\n1. MORE TESTS!!!\n1. Easier to use grammar specification format.\n\n## License\n\tCopyright (c) 2014, Kenneth Koch \u003ckkoch986@gmail.com\u003e\n\n\tPermission is hereby granted, free of charge, to any person obtaining a copy\n\tof this software and associated documentation files (the \"Software\"), to deal\n\tin the Software without restriction, including without limitation the rights\n\tto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n\tcopies of the Software, and to permit persons to whom the Software is\n\tfurnished to do so, subject to the following conditions:\n\n\tThe above copyright notice and this permission notice shall be included in\n\tall copies or substantial portions of the Software.\n\n\tTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n\tIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n\tFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n\tAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n\tLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n\tOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n\tTHE SOFTWARE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkkoch986%2Fjs-parse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkkoch986%2Fjs-parse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkkoch986%2Fjs-parse/lists"}