{"id":22764606,"url":"https://github.com/kuria/parser","last_synced_at":"2025-08-04T10:37:47.136Z","repository":{"id":57009946,"uuid":"143571421","full_name":"kuria/parser","owner":"kuria","description":"Character-by-character string parsing library","archived":false,"fork":false,"pushed_at":"2023-04-22T14:38:47.000Z","size":64,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-30T10:14:11.748Z","etag":null,"topics":["parser","php"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kuria.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-05T00:08:56.000Z","updated_at":"2023-09-19T17:45:24.000Z","dependencies_parsed_at":"2025-02-05T12:12:05.365Z","dependency_job_id":"cccd0b61-00ec-43c9-b963-fff0a3efd857","html_url":"https://github.com/kuria/parser","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/kuria/parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuria%2Fparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuria%2Fparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuria%2Fparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuria%2Fparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kuria","download_url":"https://codeload.github.com/kuria/parser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuria%2Fparser/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268682874,"owners_count":24289691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","php"],"created_at":"2024-12-11T12:09:28.659Z","updated_at":"2025-08-04T10:37:47.078Z","avatar_url":"https://github.com/kuria.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"Parser\n######\n\nCharacter-by-character string parsing library.\n\n.. image:: https://travis-ci.com/kuria/parser.svg?branch=master\n   :target: https://travis-ci.com/kuria/parser\n\n.. contents::\n   :depth: 2\n\n\nFeatures\n********\n\n- line number tracking (can be disabled for performance)\n- supports CR, LF and CRLF line endings\n- verbose exceptions\n- many methods to navigate and operate the parser\n\n  - forward / backward peeking and seeking\n  - forward / backward character consumption\n  - state stack\n\n- character types\n- expectations\n\n\nRequirements\n************\n\n- PHP 7.1+\n\n\nUsage\n*****\n\nCreating a parser\n=================\n\nCreate a new parser instance with string input.\n\nThe parser begins at the first character.\n\n.. code:: php\n\n   \u003c?php\n\n   use Kuria\\Parser\\Parser;\n\n   $input = 'foo bar baz';\n\n   $parser = new Parser($input);\n\n\nParser properties\n=================\n\nThe parser has several public properties that can be used to inspect its\ncurrent state:\n\n- ``$parser-\u003ei`` - current position\n- ``$parser-\u003echar`` - current character (or ``NULL`` at the end of input)\n- ``$parser-\u003elastChar`` - last character (or ``NULL`` at the start of input)\n- ``$parser-\u003eline`` - current line (or ``NULL`` if line tracking is disabled)\n- ``$parser-\u003eend`` - end of input indicator (``TRUE`` at the end, ``FALSE`` otherwise)\n- ``$parser-\u003evars`` - user-defined variables attached to the current state\n\n.. WARNING::\n\n   All of the public properties (with the exception of ``$parser-\u003evars``)\n   are read-only and must not be modified directly by the calling code.\n\n   Use the built-in parser methods to mutate the parser state.\n   See `Parser method overview`_.\n\n\nParser method overview\n======================\n\nRefer to doc comments of the respective methods for more information.\n\nAlso see `Character types`_.\n\n\nStatic methods\n--------------\n\n- ``getCharType($char): int`` - determine character type\n- ``getCharTypeName($charType): string`` - get human-readable character type name\n\n\nInstance methods\n----------------\n\n- ``getInput(): string`` - get the input string\n- ``setInput($input): void`` - replace the input string (this also resets the parser)\n- ``getLength(): int`` - get length of the input string\n- ``isTrackingLineNumbers(): bool`` - see if line number tracking is enabled\n- ``type(): int`` - get type of the current character\n- ``is(...$types): bool`` - check whether the current character is of one of the specified types\n- ``atNewline(): bool`` - see if the parser is at the start of a newline sequence\n- ``eat(): ?string`` - go to the next character and return the current one (returns ``NULL`` at the end)\n- ``spit(): ?string`` - go to the previous character and return the current one (returns ``NULL`` at the beginning)\n- ``shift(): ?string`` - go to the next character and return it (returns ``NULL`` at the end)\n- ``unshift(): ?string`` - go to the previous character and return it (returns ``NULL`` at the beginning)\n- ``peek($offset, $absolute = false): ?string`` - get character at the given offset or absolute position (does not affect state)\n- ``seek($offset, $absolute = false): void`` - alter current position\n- ``reset(): void`` - reset states, vars and rewind to the beginning\n- ``rewind(): void`` - rewind to the beginning\n- ``eatChar($char): ?string`` - consume specific character and return the next character\n- ``tryEatChar(): bool`` - attempt to consume specific character and return success state\n- ``eatType($type): string`` - consume all characters of the specified type\n- ``eatTypes($typeMap): string`` - consume all characters of the specified types\n- ``eatWs(): string`` - consume whitespace, if any\n- ``eatUntil($delimiterMap, $skipDelimiter = true, $allowEnd = false): string`` - consume all characters until the specified delimiters\n- ``eatUntilEol($skip = true): string`` - consume all character until end of line or input\n- ``eatEol(): string`` - consume end of line sequence\n- ``eatRest(): string`` - consume reamaining characters\n- ``getChunk($start, $end): string`` - get chunk of the input (does not affect state)\n- ``detectEol(): ?string`` - find and return the next end of line sequence (does not affect state)\n- ``countStates(): int`` - get number of stored states\n- ``pushState(): void`` - store the current state\n- ``revertState(): void`` - revert to the last stored state and pop it\n- ``popState(): void`` - pop the last stored state without reverting to it\n- ``clearStates(): void`` - throw away all stored states\n- ``expectEnd(): void`` - ensure that the parser is at the end\n- ``expectNotEnd(): void`` - ensure that the parser is not at the end\n- ``expectChar($expectedChar): void`` - ensure that the current character matches the expectation\n- ``expectCharType($expectedType): void`` - ensure that the current character is of the given type\n\n\nExample INI parser implementation\n=================================\n\n.. code:: php\n\n   \u003c?php\n\n   use Kuria\\Parser\\Parser;\n\n   /**\n    * INI parser (example)\n    */\n   class IniParser\n   {\n       /**\n        * Parse an INI string\n        */\n       public function parse(string $string): array\n       {\n           // create parser\n           $parser = new Parser($string);\n\n           // prepare variables\n           $data = [];\n           $currentSection = null;\n\n           // parse\n           while (!$parser-\u003eend) {\n               // skip whitespace\n               $parser-\u003eeatWs();\n               if ($parser-\u003eend) {\n                   break;\n               }\n\n               // parse the current thing\n               if ($parser-\u003echar === '[') {\n                   // a section\n                   $currentSection = $this-\u003eparseSection($parser);\n               } elseif ($parser-\u003echar === ';') {\n                   // a comment\n                   $this-\u003eskipComment($parser);\n               } else {\n                   // a key=value pair\n                   [$key, $value] = $this-\u003eparseKeyValue($parser);\n\n                   // add to output\n                   if ($currentSection === null) {\n                       $data[$key] = $value;\n                   } else {\n                       $data[$currentSection][$key] = $value;\n                   }\n               }\n           }\n\n           return $data;\n       }\n\n       /**\n        * Parse a section and return its name\n        */\n       private function parseSection(Parser $parser): string\n       {\n           // we should be at the [ character now, eat it\n           $parser-\u003eeatChar('[');\n\n           // eat everything until ]\n           $sectionName = $parser-\u003eeatUntil(']');\n\n           return $sectionName;\n       }\n\n       /**\n        * Skip a commented-out line\n        */\n       private function skipComment(Parser $parser): void\n       {\n           // we should be at the ; character now, eat it\n           $parser-\u003eeatChar(';');\n\n           // eat everything until the end of line\n           $parser-\u003eeatUntilEol();\n       }\n\n       /**\n        * Parse a key=value pair\n        */\n       private function parseKeyValue(Parser $parser): array\n       {\n           // we should be at the first character of the key\n           // eat characters until = is found\n           $key = $parser-\u003eeatUntil('=');\n\n           // eat everything until the end of line\n           // that is our value\n           $value = trim($parser-\u003eeatUntilEol());\n\n           return [$key, $value];\n       }\n   }\n\n\nUsing the parser\n----------------\n\n.. code:: php\n\n   \u003c?php\n\n   $iniParser = new IniParser();\n\n   $iniString = \u003c\u003c\u003cINI\n   ; An example comment\n   name=Foo\n   type=Bar\n\n   [options]\n   size=150x100\n   onload=\n   INI;\n\n   $data = $iniParser-\u003eparse($iniString);\n\n   print_r($data);\n\nOutput:\n\n::\n\n  Array\n  (\n      [name] =\u003e Foo\n      [type] =\u003e Bar\n      [options] =\u003e Array\n          (\n              [size] =\u003e 150x100\n              [onload] =\u003e\n          )\n\n  )\n\n\nCharacter types\n***************\n\nThe table below lists the default character types.\n\nThese types are available as constants on the ``Parser class``:\n\n- ``Parser::C_NONE`` - no character (NULL)\n- ``Parser::C_WS`` - whitespace (tab, linefeed, vertical tab, form feed, carriage return and space)\n- ``Parser::C_NUM`` - numeric character (``0-9``)\n- ``Parser::C_STR`` - string character (``a-z``, ``A-Z``, ``_`` and any 8-bit char)\n- ``Parser::C_CTRL`` - control character (ASCII 127 and ASCII \u003c 32 except whitespace)\n- ``Parser::C_SPECIAL`` - ``!\"#$%\u0026'()*+,-./:;\u003c=\u003e?@[\\\\]^\\`{|}~``\n\n\n\n==== ========= =========\n#    Character Type\n==== ========= =========\nNULL *none*    C_NONE\n0    ``0x00``  C_CTRL\n1    ``0x01``  C_CTRL\n2    ``0x02``  C_CTRL\n3    ``0x03``  C_CTRL\n4    ``0x04``  C_CTRL\n5    ``0x05``  C_CTRL\n6    ``0x06``  C_CTRL\n7    ``0x07``  C_CTRL\n8    ``0x08``  C_CTRL\n9    ``\\t``    C_WS\n10   ``\\n``    C_WS\n11   ``\\v``    C_WS\n12   ``\\f``    C_WS\n13   ``\\r``    C_WS\n14   ``0x0e``  C_CTRL\n15   ``0x0f``  C_CTRL\n16   ``0x10``  C_CTRL\n17   ``0x11``  C_CTRL\n18   ``0x12``  C_CTRL\n19   ``0x13``  C_CTRL\n20   ``0x14``  C_CTRL\n21   ``0x15``  C_CTRL\n22   ``0x16``  C_CTRL\n23   ``0x17``  C_CTRL\n24   ``0x18``  C_CTRL\n25   ``0x19``  C_CTRL\n26   ``0x1a``  C_CTRL\n27   ``0x1b``  C_CTRL\n28   ``0x1c``  C_CTRL\n29   ``0x1d``  C_CTRL\n30   ``0x1e``  C_CTRL\n31   ``0x1f``  C_CTRL\n32   ``0x20``  C_WS\n33   ``!``     C_SPECIAL\n34   ``\"``     C_SPECIAL\n35   ``#``     C_SPECIAL\n36   ``$``     C_SPECIAL\n37   ``%``     C_SPECIAL\n38   ``\u0026``     C_SPECIAL\n39   ``'``     C_SPECIAL\n40   ``(``     C_SPECIAL\n41   ``)``     C_SPECIAL\n42   ``*``     C_SPECIAL\n43   ``+``     C_SPECIAL\n44   ``,``     C_SPECIAL\n45   ``-``     C_SPECIAL\n46   ``.``     C_SPECIAL\n47   ``/``     C_SPECIAL\n48   ``0``     C_NUM\n49   ``1``     C_NUM\n50   ``2``     C_NUM\n51   ``3``     C_NUM\n52   ``4``     C_NUM\n53   ``5``     C_NUM\n54   ``6``     C_NUM\n55   ``7``     C_NUM\n56   ``8``     C_NUM\n57   ``9``     C_NUM\n58   ``:``     C_SPECIAL\n59   ``;``     C_SPECIAL\n60   ``\u003c``     C_SPECIAL\n61   ``=``     C_SPECIAL\n62   ``\u003e``     C_SPECIAL\n63   ``?``     C_SPECIAL\n64   ``@``     C_SPECIAL\n65   ``A``     C_STR\n66   ``B``     C_STR\n67   ``C``     C_STR\n68   ``D``     C_STR\n69   ``E``     C_STR\n70   ``F``     C_STR\n71   ``G``     C_STR\n72   ``H``     C_STR\n73   ``I``     C_STR\n74   ``J``     C_STR\n75   ``K``     C_STR\n76   ``L``     C_STR\n77   ``M``     C_STR\n78   ``N``     C_STR\n79   ``O``     C_STR\n80   ``P``     C_STR\n81   ``Q``     C_STR\n82   ``R``     C_STR\n83   ``S``     C_STR\n84   ``T``     C_STR\n85   ``U``     C_STR\n86   ``V``     C_STR\n87   ``W``     C_STR\n88   ``X``     C_STR\n89   ``Y``     C_STR\n90   ``Z``     C_STR\n91   ``[``     C_SPECIAL\n92   ``\\``     C_SPECIAL\n93   ``]``     C_SPECIAL\n94   ``^``     C_SPECIAL\n95   ``_``     C_STR\n96   \\`        C_SPECIAL\n97   ``a``     C_STR\n98   ``b``     C_STR\n99   ``c``     C_STR\n100  ``d``     C_STR\n101  ``e``     C_STR\n102  ``f``     C_STR\n103  ``g``     C_STR\n104  ``h``     C_STR\n105  ``i``     C_STR\n106  ``j``     C_STR\n107  ``k``     C_STR\n108  ``l``     C_STR\n109  ``m``     C_STR\n110  ``n``     C_STR\n111  ``o``     C_STR\n112  ``p``     C_STR\n113  ``q``     C_STR\n114  ``r``     C_STR\n115  ``s``     C_STR\n116  ``t``     C_STR\n117  ``u``     C_STR\n118  ``v``     C_STR\n119  ``w``     C_STR\n120  ``x``     C_STR\n121  ``y``     C_STR\n122  ``z``     C_STR\n123  ``{``     C_SPECIAL\n124  ``|``     C_SPECIAL\n125  ``}``     C_SPECIAL\n126  ``~``     C_SPECIAL\n127  ``0x7f``  C_CTRL\n128  ``0x80``  C_STR\n129  ``0x81``  C_STR\n130  ``0x82``  C_STR\n131  ``0x83``  C_STR\n132  ``0x84``  C_STR\n133  ``0x85``  C_STR\n134  ``0x86``  C_STR\n135  ``0x87``  C_STR\n136  ``0x88``  C_STR\n137  ``0x89``  C_STR\n138  ``0x8a``  C_STR\n139  ``0x8b``  C_STR\n140  ``0x8c``  C_STR\n141  ``0x8d``  C_STR\n142  ``0x8e``  C_STR\n143  ``0x8f``  C_STR\n144  ``0x90``  C_STR\n145  ``0x91``  C_STR\n146  ``0x92``  C_STR\n147  ``0x93``  C_STR\n148  ``0x94``  C_STR\n149  ``0x95``  C_STR\n150  ``0x96``  C_STR\n151  ``0x97``  C_STR\n152  ``0x98``  C_STR\n153  ``0x99``  C_STR\n154  ``0x9a``  C_STR\n155  ``0x9b``  C_STR\n156  ``0x9c``  C_STR\n157  ``0x9d``  C_STR\n158  ``0x9e``  C_STR\n159  ``0x9f``  C_STR\n160  ``0xa0``  C_STR\n161  ``0xa1``  C_STR\n162  ``0xa2``  C_STR\n163  ``0xa3``  C_STR\n164  ``0xa4``  C_STR\n165  ``0xa5``  C_STR\n166  ``0xa6``  C_STR\n167  ``0xa7``  C_STR\n168  ``0xa8``  C_STR\n169  ``0xa9``  C_STR\n170  ``0xaa``  C_STR\n171  ``0xab``  C_STR\n172  ``0xac``  C_STR\n173  ``0xad``  C_STR\n174  ``0xae``  C_STR\n175  ``0xaf``  C_STR\n176  ``0xb0``  C_STR\n177  ``0xb1``  C_STR\n178  ``0xb2``  C_STR\n179  ``0xb3``  C_STR\n180  ``0xb4``  C_STR\n181  ``0xb5``  C_STR\n182  ``0xb6``  C_STR\n183  ``0xb7``  C_STR\n184  ``0xb8``  C_STR\n185  ``0xb9``  C_STR\n186  ``0xba``  C_STR\n187  ``0xbb``  C_STR\n188  ``0xbc``  C_STR\n189  ``0xbd``  C_STR\n190  ``0xbe``  C_STR\n191  ``0xbf``  C_STR\n192  ``0xc0``  C_STR\n193  ``0xc1``  C_STR\n194  ``0xc2``  C_STR\n195  ``0xc3``  C_STR\n196  ``0xc4``  C_STR\n197  ``0xc5``  C_STR\n198  ``0xc6``  C_STR\n199  ``0xc7``  C_STR\n200  ``0xc8``  C_STR\n201  ``0xc9``  C_STR\n202  ``0xca``  C_STR\n203  ``0xcb``  C_STR\n204  ``0xcc``  C_STR\n205  ``0xcd``  C_STR\n206  ``0xce``  C_STR\n207  ``0xcf``  C_STR\n208  ``0xd0``  C_STR\n209  ``0xd1``  C_STR\n210  ``0xd2``  C_STR\n211  ``0xd3``  C_STR\n212  ``0xd4``  C_STR\n213  ``0xd5``  C_STR\n214  ``0xd6``  C_STR\n215  ``0xd7``  C_STR\n216  ``0xd8``  C_STR\n217  ``0xd9``  C_STR\n218  ``0xda``  C_STR\n219  ``0xdb``  C_STR\n220  ``0xdc``  C_STR\n221  ``0xdd``  C_STR\n222  ``0xde``  C_STR\n223  ``0xdf``  C_STR\n224  ``0xe0``  C_STR\n225  ``0xe1``  C_STR\n226  ``0xe2``  C_STR\n227  ``0xe3``  C_STR\n228  ``0xe4``  C_STR\n229  ``0xe5``  C_STR\n230  ``0xe6``  C_STR\n231  ``0xe7``  C_STR\n232  ``0xe8``  C_STR\n233  ``0xe9``  C_STR\n234  ``0xea``  C_STR\n235  ``0xeb``  C_STR\n236  ``0xec``  C_STR\n237  ``0xed``  C_STR\n238  ``0xee``  C_STR\n239  ``0xef``  C_STR\n240  ``0xf0``  C_STR\n241  ``0xf1``  C_STR\n242  ``0xf2``  C_STR\n243  ``0xf3``  C_STR\n244  ``0xf4``  C_STR\n245  ``0xf5``  C_STR\n246  ``0xf6``  C_STR\n247  ``0xf7``  C_STR\n248  ``0xf8``  C_STR\n249  ``0xf9``  C_STR\n250  ``0xfa``  C_STR\n251  ``0xfb``  C_STR\n252  ``0xfc``  C_STR\n253  ``0xfd``  C_STR\n254  ``0xfe``  C_STR\n255  ``0xff``  C_STR\n==== ========= =========\n\n\nCustomizing character types\n===========================\n\nCharacter types can be customized by extending the base ``Parser`` class.\n\nThe following example changes \"``-``\" and \"``.``\" from ``CHAR_SPECIAL`` to ``CHAR_STR``\nand inherits everything else.\n\n.. code:: php\n\n   \u003c?php\n\n   class CustomParser extends Parser\n   {\n       const CHAR_TYPE_MAP = [\n           '-' =\u003e self::C_STR,\n           '.' =\u003e self::C_STR,\n       ] + parent::CHAR_TYPE_MAP; // inherit everything else\n   }\n\n   // usage example\n   $parser = new CustomParser('foo-bar.baz');\n\n   var_dump($parser-\u003eeatType(CustomParser::C_STR));\n\nOutput:\n\n::\n\n  string(11) \"foo-bar.baz\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuria%2Fparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkuria%2Fparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuria%2Fparser/lists"}