{"id":18032460,"url":"https://github.com/duffsdevice/tiny-parser","last_synced_at":"2025-08-24T09:17:21.585Z","repository":{"id":183966232,"uuid":"671086480","full_name":"DuffsDevice/tiny-parser","owner":"DuffsDevice","description":"Write use-case specific parsers within minutes!","archived":false,"fork":false,"pushed_at":"2023-08-14T16:49:25.000Z","size":103,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-10T06:31:50.827Z","etag":null,"topics":["context-free-grammar","parser","parser-generator","parser-library","tokenizer","tokenizer-parser"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DuffsDevice.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-26T14:03:43.000Z","updated_at":"2023-12-15T16:10:10.000Z","dependencies_parsed_at":"2024-10-30T10:23:30.814Z","dependency_job_id":null,"html_url":"https://github.com/DuffsDevice/tiny-parser","commit_stats":null,"previous_names":["duffsdevice/tiny-parser"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuffsDevice%2Ftiny-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuffsDevice%2Ftiny-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuffsDevice%2Ftiny-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DuffsDevice%2Ftiny-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DuffsDevice","download_url":"https://codeload.github.com/DuffsDevice/tiny-parser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247256115,"owners_count":20909240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["context-free-grammar","parser","parser-generator","parser-library","tokenizer","tokenizer-parser"],"created_at":"2024-10-30T10:13:26.091Z","updated_at":"2025-04-04T22:09:52.416Z","avatar_url":"https://github.com/DuffsDevice.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tiny-parser\n[![Licence](https://img.shields.io/badge/licence-BSD--3-e20000.svg)](https://github.com/DuffsDevice/tiny-parser/blob/master/LICENCE)\n\ntiny-parser enables you to **write arbitrary use-case specific parsers within minutes**.\n\nIt ships with a collection of predefined language defintions I started to write.\n\n## Example: Parsing JSON\nDefining the grammar of JSON using tiny-parser looks like this:\n\n```python\nfrom tinyparser import Rule, Token, Language\n\njson = Language({\n    \"root.number.\": [Token.NUMBER],\n    \"root.string.\": [Token.STRING],\n    \"root.list.\": [Token.LEFT_SQUARE_BRACKET, \"list.\", Token.RIGHT_SQUARE_BRACKET],\n    \"root.object.\": [Token.LEFT_CURLY_BRACKET, \"object.\", Token.RIGHT_CURLY_BRACKET],\n    \"list.nonempty.multiple.\": [\"root.\", Token.COMMA, \"list.nonempty.\"],\n    \"list.nonempty.single.\": [\"root.\"],\n    \"list.empty.\": [],\n    \"object.nonempty.multiple.\": [\"attribute.\", Token.COMMA, \"object.nonempty.\"],\n    \"object.nonempty.single.\": [\"attribute.\"],\n    \"object.empty.\": [],\n    \"attribute.\": [Token.STRING, Token.COLON, \"root.\"],\n})\n```\n**That's it!**\n\n\nIf you'd like to parse some json now, you can do this through:\n```python\n# Parse input into ast\nast = tinyparser.parse(json, '{\"Hello\": \"World\"}')\n\n# Inspection:\ntinyparser.print_ast(ast)\n\n\"\"\" Output:\n\u003croot\u003e = [root. \u003e root.object.]\n    .children = [\n        .1 = [Token.LEFT_CURLY_BRACKET] = '{'\n        .2 = [object. \u003e object.nonempty.single.]\n            .children = [\n                .1 = [attribute.]\n                    .children = [\n                        .1 = [Token.STRING] = 'Hello'\n                        .2 = [Token.COLON] = ':'\n                        .3 = [root. \u003e root.string.]\n                            .children = [\n                                .1 = [Token.STRING] = 'World'\n                            ]\n                    ]\n            ]\n        .3 = [Token.RIGHT_CURLY_BRACKET] = '}'\n    ]\n\"\"\"\n```\n\nWhile this parsing result has all necessary information, it also contains unnecessary information.\nTo improve this, tiny-parser allows you to **post-process intermediate parsing results** to enable the pretty datastructure of your choice.\nWhether its custom classes, dictionaries, lists... you name it.\n\nSince JSON is primarily a data-description language, why shouldn't we simpy turn the string input into the python datastructure!?\nIn order to do this, our language grammar needs some meta information on how\nto process each rule (don't worry, everything you'll see will be explained later):\n\n```python\njson = Language({\n    \"root.number.\": (eval, (Token.NUMBER, (None, \"value\"))),\n    \"root.string.\": (None, (Token.STRING, (None, \"value\"))),\n    \"root.list.\": (\"#\", Token.LEFT_SQUARE_BRACKET, (\"list.\", \"#\"), Token.RIGHT_SQUARE_BRACKET),\n    \"root.object.\": (\"#\", Token.LEFT_CURLY_BRACKET, (\"object.\", \"#\"), Token.RIGHT_CURLY_BRACKET),\n    \"list.nonempty.multiple.\": ([], \"root.\", (Token.COMMA, []), \"list.nonempty.\"),\n    \"list.nonempty.single.\": ([], \"root.\"),\n    \"list.empty.\": ([]),\n    \"object.nonempty.multiple.\": ({}, (\"attribute.\", \"\"), Token.COMMA, (\"object.nonempty.\", \"\")),\n    \"object.nonempty.single.\": ({}, (\"attribute.\", \"\")),\n    \"object.empty.\": ({}),\n    \"attribute.\": ({}, (Token.STRING, (None, \"value\")), Token.COLON, (\"root.\", 0)),\n})\n```\n\nNow we can do:\n```python\n# Prints: {'Hello': 'World'}\nprint( tinyparser.parse(json, '{\"Hello\" : \"World\"}') )\n```\n\n# Documentation\n\n## 1.   Specifying the Grammar\nThe first constructor argument to the class `tinyparser.Language` is the grammar - a python dictionary containing all grammar rules.\nEach dictionary key maps a specific identification to a rule definition.\n\n```python\ngrammmar = {\n    \"root.option-A\": [\"number.\"]\n    , \"root.option-B\": [\"string.\"]\n    , \"number.\": [Token.NUMBER]\n    , \"string.\": [Token.STRING]\n    # And so on...\n}\n```\n\n### 1.1 Rule Identifications\nIn principle, you can name your rules the way you like. For most cases however, you'll want a hierachical key structure.\nBy doing this, you can reference groups of rules and thus enable disjunctions.\nThis is, because tiny-parser rule references will match every rule that starts with a certain prefix.\nFor example, a rule reference of `\"expression.\"` will match all rules with a dictionary key starting with `expression.` , such as `expression.empty.` , `expression.nonempty.single.` or `expression.nonemtpy.multiple.` .\n\nBy convention, all rule identifications should end in the separation character you use (in our case `.`). This is, because references should not have to care, if they reference a group of rules or a single rule (separation of _Interface_ from _Implementation_).\n\n**Note:** For educational purposes, all rule identifications are words. When you ship your code and/or  parsing speed is needed, numbers would suite the purpose just as well, but are quicker in parsing time. That is, the shorter your identifications, the quicker tiny-parser can resolve each reference.\n\n### 1.2 Rule Definitions\nRule definitions are either of the form `[steps...]` or `(target, steps...)`.\nIf a rule is defined to match _nothing_ (the empty string) and therefore has no steps, you may just specify `target` (neither wrapped inside a tuple nor list). I.e., you may as well pass `None` .\n\n### 1.3 Steps\nThis chapter will be all about the matching steps, i.e. _what_ you can match.\n\nUsually, language grammars come in different formats: BNF, EBNF, graphical control flow etc.\nCommon to all of them is, what they are made of:\n1. **Tokens** (i.e. string content matching some part of the input), e.g. `}` or `\u0026\u0026` or `const`, and\n2. **References** to other rules.\n\nEssentially, the \"steps\" you will pass as arguments to the definition of each rule will mostly consist of these two things,\nreferences to other rules and tokens that you want to read from the input.\n\n### 1.4 Parsing Tokens\ntiny-parser will parse your input in two stages: 1. tokenization, 2. rule matching.\nTokenization is a common preparation step in parsing. Most compilers and source code analysis tools do this.\nBreaking up the input into it's atomic components (tokens) happens, because it eases the process of rule matching immensely.\n\nTokenization happens linearly from the beginning of the input to the end.\nYou can compare this process to the process of identifying the \"building blocks\" of written english within a given sentence:\n1. **words** (made of characters of the english alphabet, terminated by everything that's not of the english alphabet),\n2. **dots**,\n3. **dashes**,\n4. **parentheses**,\n5. **numbers** (starting with a digit, terminated by everything thats neither a number nor a decimal dot).\n\nYou can probably already see, how this eases the further comprehension of some input string.\n\ntiny-parser by default employs a basic tokenization that will suffice for many occasions.\nIt's defined by the enum `tinyparser.Token`, deriving from the class `tinyparser.TokenType`.\n\nThis basic tokenization will allow you to match certain tokens, just by passing the enum member of the token type you'd like to match.\nFor example the rule :\n\n```python\n\"root.parentheses.empty.\": [Token.LEFT_PARENTHESIS, Token.RIGHT_PARENTHESIS]\n```\nhas two steps that together match \"()\", even with whitespaces in between \"(  )\".\n\n### 1.5 Matching exact Tokens\nIn some cases, merely specifying the type of token that you want to match is not precise enough.\nTo match a token with specific content, for example the identifier `func`, you can do this with the function `exactly`:\n\n```python\n\"root.function.\": [Token.exactly(\"func\"), Token.IDENTIFIER]\n```\n\n### 1.6 Referencing Rules\n\nYou reference rules (or groups of rules) simply by writing their identification or common prefix as a string. For parsing a simple list of numbers, each separated by a comma, you'd write:\n\n```python\ngrammmar = {\n    \"list.nonempty.multiple.\": [\"list-element.\", Token.COMMA, \"list.nonempty.\"]\n    , \"list.nonempty.single.\": [\"list-element.\"]\n    , \"list.nonempty.single.\": []\n    , \"list-element.\": [Token.NUMBER]\n}\n```\n\n### 1.7 Step Alternatives\n\n### 1.8 Step Destinations\n\n### 1.9 Step Result Transformers\n\n### 1.10 Custom Targets\n\nThe `target` of a rule specifies its return value once its matched of the rule - so to speak.\n\n# Reference\n\n### Complete list of Standard Tokens\n\n| Token Name  | Regular Expression  |\n| ----------- | ------------------ |\n| NEWLINE | \\\\r\\\\n\\|\\\\r\\|\\\\n |\n| DOUBLE_EQUAL | == |\n| EXCLAMATION_EQUAL | != |\n| LESS_EQUAL | \u003c= |\n| GREATER_EQUAL | \u003e= |\n| AND_EQUAL | \u0026= |\n| OR_EQUAL | \\|= |\n| XOR_EQUAL | \\^= |\n| PLUS_EQUAL | \\+= |\n| MINUS_EQUAL | -= |\n| TIMES_EQUAL | \\*= |\n| DIVIDES_EQUAL | /= |\n| DOUBLE_AND | \u0026\u0026 |\n| DOUBLE_OR | \\\\\\|\\\\\\| |\n| DOUBLE_PLUS | \\\\+\\\\+ |\n| DOUBLE_MINUS | -- |\n| PLUS | \\\\+ |\n| MINUS | - |\n| TIMES | \\* |\n| DIVIDES | / |\n| POWER | \\\\^ |\n| LESS | \u003c |\n| GREATER | \u003e |\n| LEFT_PARENTHESIS | \\\\( |\n| RIGHT_PARENTHESIS | \\\\) |\n| LEFT_SQUARE_BRACKET | \\\\[ |\n| RIGHT_SQUARE_BRACKET | \\\\] |\n| LEFT_CURLY_BRACKET | \\\\{ |\n| RIGHT_CURLY_BRACKET | \\\\} |\n| SEMICOLON | ; |\n| COLON | : |\n| COMMA | , |\n| HAT | \\\\^ |\n| DOT | \\\\. |\n| IDENTIFIER | [a-zA-Z_][a-zA-Z0-9_]*\\b |\n| NUMBER | (\\\\+\\|-)?([1-9][0-9]*(\\\\.[0-9]*)?\\\\b\\|\\\\.[0-9]+\\\\b\\|0\\\\b) |\n| STRING | \"(?P\\\u003cvalue\\\u003e([^\"]\\|\\\\\\\\\")+)\"\n| UNKNOWN | .\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduffsdevice%2Ftiny-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduffsdevice%2Ftiny-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduffsdevice%2Ftiny-parser/lists"}