{"id":22282754,"url":"https://github.com/warrenspe/tokex","last_synced_at":"2026-02-15T15:46:49.055Z","repository":{"id":57476134,"uuid":"55562792","full_name":"warrenspe/tokex","owner":"warrenspe","description":"Structured string parsing library","archived":false,"fork":false,"pushed_at":"2020-09-19T04:56:42.000Z","size":247,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-29T18:31:29.889Z","etag":null,"topics":["grammar","parsing","string-matching","token","tokenizer"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/warrenspe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-06T00:14:12.000Z","updated_at":"2022-12-11T17:14:35.000Z","dependencies_parsed_at":"2022-09-26T17:41:01.176Z","dependency_job_id":null,"html_url":"https://github.com/warrenspe/tokex","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/warrenspe/tokex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warrenspe%2Ftokex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warrenspe%2Ftokex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warrenspe%2Ftokex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warrenspe%2Ftokex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/warrenspe","download_url":"https://codeload.github.com/warrenspe/tokex/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warrenspe%2Ftokex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29482797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-15T15:33:17.885Z","status":"ssl_error","status_checked_at":"2026-02-15T15:32:53.698Z","response_time":118,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["grammar","parsing","string-matching","token","tokenizer"],"created_at":"2024-12-03T16:35:50.475Z","updated_at":"2026-02-15T15:46:49.034Z","avatar_url":"https://github.com/warrenspe.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tokex\nA Python 2/3 compatible string parsing library allowing for parsing of complex strings into dictionaries and lists of tokens.\n\n## Why tokex?\nAdmittedly, with a complex enough regex, Python's built-in [re](https://docs.python.org/3.6/library/re.html) library will allow you to accomplish anything that you would be able to accomplish using tokex.  The main difference between the two is that re is focused on matching characters while tokex is focused on matching tokens.  Compared to re however, tokex allows for a more spaced out, readable definition of a grammar which can result in fewer bugs than if it were written as a re pattern, and allows for grouping and reuse of grammar tokens as named sub grammars in a way reminiscent of BNF, which can significantly cut down on the overall size of the grammar.  Finally, tokex allows for Python style comments to be inserted directly into the grammar.\n\n## Usage\ntokex exposes two API functions: compile and match.\n\ntokex.**compile(**_input\\_grammar,_ _allow\\_sub\\_grammar\\_definitions=True_, _tokenizer=tokex.tokenizers.TokexTokenizer,_ _default\\_flags=tokex.flags.DEFAULTS,_ _debug=True_**)**\n\n\u003e Compile a tokex grammar into a Tokex object, which can be used for matching using its **match()** method.  If you intend to call match several times using the same input grammar, using a precompiled Tokex object can be slightly more performant, as the tokex grammar won't have to be parsed each time\n\u003e\n\u003e If *allow\\_sub\\_grammar\\_definitions* is set to True it will enable [Sub Grammars](#sub-grammars) within the given grammar. Note that tokex is susceptible to the [billion laughs](https://en.wikipedia.org/wiki/Billion_laughs) attack when compiling untrusted 3rd party grammars with this feature enabled.  If compilation of 3rd party grammars is ever required, sub grammar support should be turned off to mitigate this type of attack.\n\u003e\n\u003e A custom tokenizer can be passed through the _tokenizer_ parameter. If given it should be set to an instance/subclass of tokex.tokenizers.TokexTokenizer.\n\u003e\n\u003e  _default\\_flags_ can be passed as a set of strings of flags to apply to valid elements by default. Default flags can be overridden by specifying an opposing flag on elements in the grammar.  See [Grammar Notes](#grammar-notes) for the set of flags which are applied by default.\n\u003e\n\u003e If _debug_ is passed as True, it will enable the logging logger (named \"tokex\"), which will print out debugging information regarding the grammar as it processes an input string.\n\ntokex.**match(**_input\\_grammar,_ _input_string,_ _match_entirety=True,_ _allow\\_sub\\_grammar\\_definitions=True,_ _tokenizer=tokex.tokenizers.TokexTokenizer,_ _default\\_flags=tokex.flags.DEFAULTS,_ _debug=True_**)**\n\n\u003e Matches a given tokex grammar against an input string and returns either a dictionary of named matches if the grammar matches the input string or None if it doesn't.\n\u003e\n\u003e If *match\\_entirety* is True the grammar will only match the input string if the entire input string is consumed.  If it is False, trailing tokens at the end of the input string may be ignored if they do not match the grammar.\n\u003e\n\u003e If *allow\\_sub\\_grammar\\_definitions* is set to True it will enable [Sub Grammars](#sub-grammars) within the given grammar. Note that tokex is susceptible to the [billion laughs](https://en.wikipedia.org/wiki/Billion_laughs) attack when compiling untrusted 3rd party grammars with this feature enabled.  If compilation of 3rd party grammars is ever required, sub grammar support should be turned off to mitigate this type of attack.\n\u003e\n\u003e  _default\\_flags_ can be passed as a set of strings of flags to apply to valid elements by default. Default flags can be overridden by specifying an opposing flag on elements in the grammar.  See [Grammar Notes](#grammar-notes) for the set of flags which are applied by default\n\u003e\n\u003e A custom tokenizer can be passed through the _tokenizer_ parameter. If given it should be set to an instance/subclass of tokex.tokenizers.TokexTokenizer.\n\u003e\n\u003e If _debug_ is passed as True, it will enable the logging logger (named \"tokex\"), which will print out debugging information regarding the grammar as it processes an input string.\n\n### Tokex Object\nA Tokex object (constructed using tokex.compile) has the following methods on it:\n\nTokex.**match(**_input_string,_ _match_entirety=True_, _debug=False_**)**\n\n\u003e Tokex.match runs a precompiled grammar against an input string and returns either a dictionary of named matches if the grammar matches the input string or None if it doesn't.\n\u003e\n\u003e If *match\\_entirety* is True the grammar will only match the input string if the entire input string is consumed.  If it is False, trailing tokens at the end of the input string may be ignored if they do not match the grammar.\n\u003e\n\u003e If _debug_ is passed as True, it will enable the logging logger (named \"tokex\"), which will print out debugging information regarding the grammar as it processes an input string.\n\n## Usage Examples\nThe following examples will show parsing of tokens in simplified SQL queries\n\n**Drop Query**\n```\n\u003e\u003e\u003e import tokex\n\u003e\u003e\u003e drop_tokex = tokex.compile(\"\"\"\n    'DROP'\n    \u003ctarget: ~table|database~\u003e\n    ?(if_exists: 'IF' 'EXISTS')\n    \u003cname: .\u003e\n\"\"\")\n\n\u003e\u003e\u003e drop_tokex.match(\"DROP DATABASE test_database\")\n{'target': 'DATABASE', 'name': 'test_database'}\n\n\u003e\u003e\u003e drop_tokex.match(\"DROP TABLE IF EXISTS test_table\")\n{'target': 'TABLE', 'if_exists': None, 'name': 'test_table'}\n\n\u003e\u003e\u003e drop_tokex.match(\"DROP test_table\") is None # Missing DATABASE or TABLE token\nTrue\n```\n\n**Update Query**\n```\n\u003e\u003e\u003e update_tokex = tokex.compile(r\"\"\"\n    'UPDATE' \u003ctable_name: .\u003e \"SET\"\n    +(columns:\n        \u003cname: .\u003e \"=\" \u003cvalue: .\u003e sep { ',' }\n    )\n    ?('WHERE' +(where_clauses: \u003ctoken: !~(ORDER)|(LIMIT)~\u003e ) )\n    ?(order: 'ORDER' 'BY' \u003ccolumn: .\u003e \u003cdirection: ~(ASC)|(DESC)~\u003e )\n    ?('LIMIT' \u003climit: ~\\\\d+~\u003e )\n\"\"\")\n\n\u003e\u003e\u003e update_tokex.match(\"UPDATE test SET a=1, b=2, c = 3 WHERE a \u003e 0 AND b = 2 ORDER BY c DESC limit 1\")\n{\n    'table_name': 'test',\n    'columns': [\n        {'name': 'a', 'value': '1'},\n        {'name': 'b', 'value': '2'},\n        {'name': 'c', 'value': '3'}\n    ],\n    'where_clauses': [\n        {'token': 'a'}, {'token': '\u003e'}, {'token': '0'},\n        {'token': 'AND'},\n        {'token': 'b'}, {'token': '='}, {'token': '2'}\n    ],\n    'order': {'column': 'c', 'direction': 'DESC'},\n    'limit': '1'\n}\n\n\u003e\u003e\u003e update_tokex.match(\"UPDATE test SET a=1 LIMIT 1\")\n{\n    'table_name': 'test',\n    'columns': [{'name': 'a', 'value': '1'}],\n    'limit': '1'\n}\n\n\u003e\u003e\u003e update_tokex.match(\"UPDATE test_table SET WHERE a \u003e 1\") is None # Missing a column to set\nTrue\n```\n\n**Select Query**\n```\n\u003e\u003e\u003e select_tokex = tokex.compile(r\"\"\"\n    def join_condition {\n        +(conditions:  \u003ccondition: !~(INNER)|(LEFT)|(WHERE)|(ORDER)|(LIMIT)~\u003e)\n    }\n    def where_condition {\n        +(conditions: \u003ccondition: !~(ORDER)|(LIMIT)~\u003e )\n    }\n\n    'SELECT' ?(distinct: \"DISTINCT\")\n        +(select_attributes: \u003cname: !\"from\"\u003e sep { ',' } )\n    'FROM' \u003ctable: .\u003e\n    *(joins:\n        {\n            (inner: \"INNER\" \"JOIN\" \u003ctable: .\u003e \"ON\" join_condition() )\n            (left: \"LEFT\" \"JOIN\" \u003ctable: .\u003e \"ON\" join_condition() )\n        }\n    )\n    ?(where: \"WHERE\" where_condition() )\n    ?(order: \"ORDER\" \"BY\" \u003corder_by_column: .\u003e \u003corder_by_direction: ~(ASC)|(DESC)~\u003e )\n    ?(\"LIMIT\" \u003climit: ~\\\\d+~\u003e )\n\"\"\")\n\n\u003e\u003e\u003e select_tokex.match(\"SELECT * FROM test limit 1\")\n{\n    'select_attributes': [{'name': '*'}],\n    'table': 'test',\n    'limit': '1'\n}\n\n\u003e\u003e\u003e select_tokex.match(\"\"\"\n    SELECT a, b, c\n    FROM test_table\n    INNER JOIN a ON a = t\n    INNER JOIN b ON b = a\n    LEFT JOIN c ON c = a\n    WHERE a \u003e 1 AND b \u003c 2\n    ORDER BY a DESC\n    LIMIT 2\n\"\"\")\n{\n    'select_attributes': [{'name': 'a'}, {'name': 'b'}, {'name': 'c'}],\n    'table': 'test_table',\n    'joins': [\n        {\n            'inner': {\n                'table': 'a', 'conditions': [{'condition': 'a'}, {'condition': '='}, {'condition': 't'}]\n            }\n        },\n        {\n            'inner': {\n                'table': 'b', 'conditions': [{'condition': 'b'}, {'condition': '='}, {'condition': 'a'}]\n            }\n        },\n        {\n            'left': {\n                'table': 'c',\n                'conditions': [{'condition': 'c'}, {'condition': '='}, {'condition': 'a'}]\n            }\n        }\n    ],\n    'where': {\n        'conditions': [\n            {'condition': 'a'}, {'condition': '\u003e'}, {'condition': '1'},\n            {'condition': 'AND'},\n            {'condition': 'b'}, {'condition': '\u003c'}, {'condition': '2'}\n        ]\n    },\n    'order': {\n        'order_by_column': 'a', 'order_by_direction': 'DESC'\n    },\n    'limit': '2'\n}\n\n\u003e\u003e\u003e select_tokex.match(\"SELECT FROM test\") is None # Missing select columns\nTrue\n```\n\n## Input String Tokenization\n- By default, input strings will be tokenized using the default tokenizer, which tokenizes tokens using the following order of precedence:\n\n  All occurrances of `\"[^\"]*\"` or `'[^']*'` are broken up into their own tokens\n  All alphanumeric strings are broken into their own token (strings of consecutive a-z, A-Z, 0-9, \\_)\n  All other non-white space characters are broken up into their own 1-character tokens.\n\n  You can also specify that newlines should be tokenized by instantiating a new instance of `tokenizers.tokenizer.TokexTokenizer`, passing `tokenize_newlines=True`.\n  If you do this, you can also pass `ignore_empty_lines=True` to only tokenize which will prevent the trailing newline on empty lines from being tokenized.\n\n  The tokenizing behavior can be further modified by creating a new subclass of `tokenizers.tokenizer.TokexTokenizer`.\n  For minor customizations to the base tokenization you can override the base classes `tokenizer_regexes` attribute.  This attribute is set to a list of regular expression strings (strings that could be passed to re.compile) of tokens to match.  Strings at the start of the list take precedence over strings at the end (ie, they will be tried on each position of the input string in order).\n  For full control over tokenization, you can override the base classes `tokenize` method.  It should accept a string to tokenize and return a list of parsed tokens.\n\n\n## Defining A Grammar\nBelow is a description of each type of grammar element that can be used to construct a tokex grammar.\n#### Grammar Notes\n- Certain elements can take names, for example\n  - Sub Grammars: `def grammar_name { ... }`\n  - Named Sections: `(section_name: ... )`\n  These names can consist of any characters from the following sets: a-z, A-Z, 0-9, \\_, and -\n- Use \\ to escape characters within certain elements.  For example:\n  - \"a string with an \\\" embedded quote\"\n  - \\~a regular expression with an \\~ embedded tilde\\~\n  Note that this also means that you have to escape slashes within regular expressions.  Two slashes in a grammar = 1 slash in the regular expression.  So a total of 4 are needed to match a slash character using the regular expression\n  - \\~a regular expression with an \\\\\\\\ embedded slash\\~\n- Comments can be included in grammars in a similar fashion to python by using #.  They can appear anywhere in a line and all characters afterwards are considered a part of the comment\n- Some flags are set by default; these can be overridden by passing a custom set of default flags to match/compile:\n  - Case Insensitive **i**\n\n##### Backtracking\nIt is very important to note that unlike `re`, tokex does not attempt to do any backtracking while it matches a user string.  In other words, grammar elements are always greedy, if they can continue to match characters in the input string, they will.  This design decision was made primarily for performance reasons.  Tokex grammars are often both longer than a typical `re` regex, and include more nested iterable sections.  See https://en.m.wikipedia.org/wiki/ReDoS#Exponential_backtracking for more information regarding this.\n\n### String Literal\nMatches an input token exactly.\n\n#### Syntax\n`\"String Literal\"`\n\nor\n\n`'String Literal'`\n\n#### Valid Flags\n- Case Sensitive: **s**\n  - `s\"Case Sensitive String\"` - Case of input token must also match case of grammar element to match\n- Case Insensitive: **i**\n  - `i\"Case Insensitive String\"` - Case of input token does not need to match case of grammar element to match\n- Quoted: **q**\n  - `q\"Quoted String\"` - Input token must be additionally be wrapped by either ' or \" to match the grammar element.\n- Unquoted: **u**\n  - `u\"Unquoted String\"` - If the input token is wrapped by ' or \" it will not match the grammar element.\n- Not: **!**\n  - `!\"Not String\"` - The input token matches the grammar element if it does not match the given string.\n  - Note: **!** is applied after any other given flags, for example `!q\"asdf\"` matches any string which is not `\"asdf\"` or `'asdf'`\n\n#### Examples\n```\n\u003e\u003e\u003e string_literal_tokex = tokex.compile(\"'abc' 'def' 'g'\")\n\u003e\u003e\u003e string_literal_tokex.match(\"abc def g\") # Matches\n\u003e\u003e\u003e string_literal_tokex.match(\"g def abc\") is None # Does not match\n```\n\n### Regular Expressions\nMatches if the `re` regular expression it contains matches the input token.\n\n#### Syntax\n`~Regular Expression~`\n\n#### Valid Flags\n- Case Sensitive: **s**\n  - `s~Case Sensitive Regular Expression~` - Case of input token must also match case of grammar element expression to match\n- Case Insensitive: **i**\n  - `i~Case Insensitive Regular Expression~` - Case of input token does not need to match case of grammar element expression to match\n- Quoted: **q**\n  - `q~Quoted Regular Expression~` - Input token must be additionally be wrapped by either ' or \" to match the grammar element expression.\n- Unquoted: **u**\n  - `u~Unquoted Regular Expression~` - If the input token is wrapped by ' or \" it will not match the grammar element expression.\n- Not: **!**\n  - `!~Not Regular Expression~` - The input token matches the grammar element if it does not match the grammar element expression.\n  - Note: **!** is applied after any other given flags, for example `!q\"asdf\"` matches any string which is not \"asdf\" or 'asdf'\n\n\n#### Examples\n```\n\u003e\u003e\u003e regular_expression_tokex = tokex.compile(\"~(yes)|(no)|(maybe)~\")\n\u003e\u003e\u003e regular_expression_tokex.match(\"maybe\") # Matches\n\n\u003e\u003e\u003e numeric_regular_expression_tokex = tokex.compile(r\"~\\\\d+~\")\n\u003e\u003e\u003e numeric_regular_expression_tokex.match(\"4570\") # Matches\n```\n\n### Any String\nMatches any non-whitespace input token.\nNote: This element will not match a newline (if newlines have been [tokenized](#input-string-tokenization)).\n\n#### Syntax\n`.`\n\n#### Valid Flags\n- Quoted: **q**\n  - `q.` - Matches any quoted (wrapped by either ' or \") non-whitespace input token\n- Unquoted: **u**\n  - `u.` - Matches any unquoted (not wrapped by either ' or \") non-whitespace input token\n\n#### Examples\n```\n\u003e\u003e\u003e any_string_tokex = tokex.compile(\".\")\n\u003e\u003e\u003e any_string_tokex.match(\"maybe\") # Matches\n\u003e\u003e\u003e any_string_tokex.match(\"'ANYTHING'\") # Matches\n```\n\n### Newline\nMatches a newline in an input string .\nNote that Newline elements will only match newlines in input strings if newlines are tokenized by setting `tokenize_newlines=True` on the tokenizer. See [Input String Tokenization](#input-string-tokenization)\n\n#### Syntax\n`$`\n\n#### Examples\n```\n\u003e\u003e\u003e newline_tokex = tokex.compile(\". $ .\", tokenizer=tokex.tokenizers.TokexTokenizer(tokenize_newlines=True))\n\u003e\u003e\u003e newline_tokex.match(\"something \\n else \") # Matches\n\u003e\u003e\u003e newline_tokex.match(\"something else \") # Does not match\n```\n\n### Named Tokens\nMatched tokens wrapped in a named token will have the matched token recorded in the nearest named section.\nNote: Only singular elements (documented above, not below) can be wrapped inside a named token\n\n#### Syntax\n`\u003ctoken-name: ...\u003e`\n\n#### Examples\n```\n\u003e\u003e\u003e named_token_tokex = tokex.compile(\"\u003ctoken: .\u003e\")\n\u003e\u003e\u003e named_token_tokex.match(\"some_token\")\n{'token': 'some_token'}\n```\n\n### Named Section\nA named section does not actually match any tokens in an input string, instead it acts as a container for the elements within it and has the following two effects:\n1. The elements within it will only match an input grammar if they *all* match.  If the contained elements do not all match, then none of them will match.\n2. The output from any matching named tokens within a named section will be be grouped together into a single dictionary\n\n#### Syntax\n`(name: ...)`\n\n#### Examples\n```\n\u003e\u003e\u003e named_grammar_tokex = tokex.compile(\"(test: 'a' \u003cmiddle: .\u003e 'c')\")\n\u003e\u003e\u003e named_grammar_tokex.match(\"a b c\") # Matches\n{'test': {'middle': 'b'}}\n\u003e\u003e\u003e named_grammar_tokex.match(\"a b\") # Does not match\n```\n\n### Zero Or One (optionally Named) Section\nActs the same way that a regular Named Section does, however will match an input string zero or one times.  In other words, the elements it contains are optional.\nNote: A Zero Or One section can be given a name or not.  If it is, all the named tokens within it will be grouped up into a dictionary mapped to by the name you give the section.  If it isn't, all named matches will be populated in the nearest parent named grammar\n\n#### Syntax\n`?(name: ... )`\n\nor\n\n`?( ... )`\n\n\n#### Examples\n```\n\u003e\u003e\u003e zero_or_one_tokex = tokex.compile(\"'prefix' ?( \u003cmiddle_element: !'suffix'\u003e) 'suffix'\")\n\u003e\u003e\u003e zero_or_one_tokex.match(\"prefix middle_token suffix\") # Matches\n{'middle_element': 'middle_token'}\n\u003e\u003e\u003e zero_or_one_tokex.match(\"prefix suffix\") # Still matches\n\n\u003e\u003e\u003e zero_or_one_tokex = tokex.compile(\"'SELECT' ?(distinct: 'distinct')\")\n\u003e\u003e\u003e zero_or_one_tokex.match(\"select distinct\") # Matches\n{'distinct': None}\n\u003e\u003e\u003e zero_or_one_tokex.match(\"select\") # Matches\n{}\n```\n\n### Zero Or More Named Section\nActs the same way that a regular Named Section does, however will match an input string zero or more times.  In other words, the elements it contains are optional, or can be present one or more times.\n\nNotes:\n - Each time a zero or more named section matches it will create a new dictionary \"context\" which all named sections will populate.  On the next iteration, it creates a fresh dictionary for the named sections to populate.\n - A zero or more named section can have an optional iteration delimiter section specified on it, using the following syntax: `sep { ... }`.  The effect of doing this is that subsequent matches (after the first match) will only match if the grammar elements defined within the `sep { ... }` are present.  If any named tokens appear within the iteration delimiter section they will populate the previous iterations dictionary.\n\n#### Syntax\n`*(name: ... )`\n\n`*(name: ... sep { ... } )` (the grammar within the `sep { ... }` must occur between each match of the section)\n\n#### Examples\n```\n\u003e\u003e\u003e zero_or_one_grammar = tokex.compile(\"*(as: \u003ca: 'a'\u003e) *(bs: \u003cb: 'b'\u003e)\")\n\u003e\u003e\u003e zero_or_one_grammar.match(\"a a b b b\")\n{'as': [{'a': 'a'}, {'a': 'a'}], 'bs': [{'b': 'b'}, {'b': 'b'}, {'b': 'b'}]}\n\u003e\u003e\u003e zero_or_one_grammar.match(\"b b\")\n{'bs': [{'b': 'b'}, {'b': 'b'}]}\n\n\u003e\u003e\u003e zero_or_one_grammar = tokex.compile(\"*(letters: \u003cletter: .\u003e sep { ',' })\")\n\u003e\u003e\u003e zero_or_one_grammar.match(\"a, b, c\")\n{'letters': [{'letter': 'a'}, {'letter': 'b'}, {'letter': 'c'}]}\n\u003e\u003e\u003e zero_or_one_grammar.match(\"a, b c\") # Does not match, as there's no , between b and c\n```\n\n### One Or More Section\nActs the same way that a regular Named Section does, however will match an input string one or more times.  In other words, the elements it contains are required, and can be present one or more times.\n\nNotes:\n - Each time a one or more named section matches it will create a new dictionary \"context\" which all named sections will populate.  On the next iteration, it creates a fresh dictionary for the named sections to populate.\n - A one or more named section can have an optional iteration delimiter section specified on it, using the following syntax: `sep { ... }`.  The effect of doing this is that subsequent matches (after the first match) will only match if the grammar elements defined within the `sep { ... }` are present.  If any named tokens appear within the iteration delimiter section they will populate the previous iterations dictionary.\n\n#### Syntax\n`+(name: ... )`\n\n`+(name: ... sep { ... } )` (the grammar within the `sep { ... }` must occur between each match of the section)\n\n#### Examples\n```\n\u003e\u003e\u003e one_or_more_grammar = tokex.compile(\"+(as: \u003ca: 'a'\u003e) +(bs: \u003cb: 'b'\u003e)\")\n\u003e\u003e\u003e one_or_more_grammar.match(\"a a b b b\")\n{'as': [{'a': 'a'}, {'a': 'a'}], 'bs': [{'b': 'b'}, {'b': 'b'}, {'b': 'b'}]}\n\u003e\u003e\u003e one_or_more_grammar.match(\"b b\") # Does not match, as there are no a's\n\n\u003e\u003e\u003e one_or_more_grammar = tokex.compile(\"+(letters: \u003cletter: .\u003e sep { ',' })\")\n\u003e\u003e\u003e one_or_more_grammar.match(\"a, b, c\")\n{'letters': [{'letter': 'a'}, {'letter': 'b'}, {'letter': 'c'}]}\n\u003e\u003e\u003e one_or_more_grammar.match(\"a, b c\") # Does not match, as there's no , between b and c\n```\n\n### One of Set\nSpecifies that one grammar of the set of contained grammars should match the input string at the current position.\nWill attempt to match each grammar in order until one matches.\n\n#### Syntax\n`{ ... }`\n\n#### Examples\nMatch one grammar of a set, zero or many times:\n```\n\u003e\u003e\u003e one_of_set_tokex = tokex.compile(\"\"\"\n    'ALTER' 'TABLE' \u003ctable_name: .\u003e\n    *(conditions:\n        {\n            (add_column: 'add' 'column' \u003cname: .\u003e \u003ctype: .\u003e)\n            (remove_column: 'remove' 'column' \u003cname: .\u003e)\n            (modify_column: 'modify' 'column' \u003cname: .\u003e \u003cnew_name: .\u003e \u003cnew_type: .\u003e)\n            (add_index: 'add' 'index' \u003ccolumn: .\u003e)\n            (remove_index: 'remove' 'index' \u003ccolumn: .\u003e)\n        } sep { ',' }\n    )\n\"\"\")\n\u003e\u003e\u003e one_of_set_tokex.match(\"\"\"\n    ALTER TABLE test\n    ADD COLUMN a int,\n    REMOVE COLUMN a_old,\n    REMOVE INDEX a_old\n\"\"\")\n{\n    'table_name': 'test',\n    'conditions': [\n        {'add_column': {'name': 'a', 'type': 'int'}},\n        {'remove_column': {'name': 'a_old'}},\n        {'remove_index': {'column': 'a_old'}}\n    ]\n}\n\n```\n\n### Sub Grammars\nDefines a named sub grammar which can be later referenced by using: `sub_grammar_name()`.\n\n#### Syntax\n`def name { ... }`\n\n#### Notes:\n- Defined sub grammars can be nested arbitrarily, but only exist within the scope of the\n  namespace of the sub grammar they were defined in.  Sub grammars defined outside of any\n  other sub grammars are considered global. Example:\n```\ndef grammar_a {\n    def grammar_b { 'Grammar B Only exists inside grammar_a' }\n    grammar_b() '\u003c- This works'\n}\ngrammar_b() \"\u003c- This raises an exception; as it is undefined outside of grammar_a's scope.\"\n```\n- Defined sub grammars will be expanded when the grammar is compiled.  This, combined with\n  the ability to arbitrarily recurse defined sub grammars means that grammar compilation is\n  susceptible to the [Billion Laughs](https://en.wikipedia.org/wiki/Billion_laughs) attack.\n  Because of this, you should either not compile untrusted 3rd party grammars, or you should\n  disable sub grammar definitions when compiling 3rd party grammars (see documentation below).\n- Defined sub grammars can occur anywhere within your grammar, however the act of defining a\n  sub grammar does not have any impact on your tokex grammar until it is used.  For example:\n  `'a' def b { 'b' } 'c'` does not match `'a b c'`, but does match `'a c'`\n  `'a' def b { 'b' } b() 'c'` matches `'a b c'`\n- Defined sub grammars cannot be applied until their declaration is finished.  For example,\n  while the following is valid:\n```\ndef a {\n    'a'\n    def b { 'b' }\n    b()\n}\na()\n```\n(Matches \"a b\")\nThe following raises an exception.\n```\ndef a {\n    'a'\n    def b { a() }\n}\n```\n(`a()` cannot appear until the sub grammar 'a' is completed)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarrenspe%2Ftokex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwarrenspe%2Ftokex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarrenspe%2Ftokex/lists"}