{"id":18864315,"url":"https://github.com/devtronic/super-tokenizer","last_synced_at":"2025-08-13T19:51:57.310Z","repository":{"id":56967123,"uuid":"82956383","full_name":"devtronic/super-tokenizer","owner":"devtronic","description":"A powerful dynamic tokenizer written in PHP","archived":false,"fork":false,"pushed_at":"2017-02-26T15:31:46.000Z","size":10,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-30T21:29:11.435Z","etag":null,"topics":["lexer","php7","tokenizer"],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devtronic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-23T18:06:28.000Z","updated_at":"2022-02-12T14:04:36.000Z","dependencies_parsed_at":"2022-08-21T09:50:35.361Z","dependency_job_id":null,"html_url":"https://github.com/devtronic/super-tokenizer","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devtronic%2Fsuper-tokenizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devtronic%2Fsuper-tokenizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devtronic%2Fsuper-tokenizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devtronic%2Fsuper-tokenizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devtronic","download_url":"https://codeload.github.com/devtronic/super-tokenizer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239808525,"owners_count":19700451,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lexer","php7","tokenizer"],"created_at":"2024-11-08T04:40:50.956Z","updated_at":"2025-02-20T09:13:54.231Z","avatar_url":"https://github.com/devtronic.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![GitHub tag](https://img.shields.io/packagist/v/devtronic/super-tokenizer.svg)](https://github.com/Devtronic/super-tokenizer)\n[![Packagist](https://img.shields.io/packagist/l/Devtronic/super-tokenizer.svg)](https://github.com/Devtronic/super-tokenizer/blob/master/LICENSE)\n[![Travis](https://img.shields.io/travis/Devtronic/super-tokenizer.svg)](https://travis-ci.org/Devtronic/super-tokenizer/)\n[![Packagist](https://img.shields.io/packagist/dt/Devtronic/super-tokenizer.svg)](https://github.com/Devtronic/super-tokenizer)\n\n# Super Tokenizer\n\nSuper Tokenizer is a ultra dynamic and easy to use tokenizer written in PHP\n\n### Installation\n```bash\ncomposer require devtronic/super-tokenizer\n```\n\n### Usage\n#### Minimal Tokenizer\n```php\n\u003c?php\n\nuse Devtronic\\SuperTokenizer\\Tokenizer;\n\nrequire_once __DIR__ . '/vendor/autoload.php';\n\n$tokenizer = new Tokenizer();\n\n$sample = 'Minimal tokenizer example';\n\n$tokens = $tokenizer-\u003etokenize($sample);\nprint_r($tokens);\n```\n\nPrints\n```\nArray\n(\n    [0] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e Minimal\n            [position] =\u003e 0\n        )\n\n    [1] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e tokenizer\n            [position] =\u003e 8\n        )\n\n    [2] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e example\n            [position] =\u003e 18\n        )\n)\n```\n\n\nYou can also get the name of the token with the getTokenName()-Method\n```php\n\u003c?php\n// ...\nforeach ($tokens as \u0026$token) {\n    $token['name'] = $tokenizer-\u003egetTokenName($token['type']);\n}\n\nprint_r($tokens);\n```\n\nPrints\n```\nArray\n(\n    [0] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e Minimal\n            [position] =\u003e 0\n            [name] =\u003e TT_TOKEN\n        )\n\n    [1] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e tokenizer\n            [position] =\u003e 8\n            [name] =\u003e TT_TOKEN\n        )\n\n    [2] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e example\n            [position] =\u003e 18\n            [name] =\u003e TT_TOKEN\n        )\n)\n```\n#### Simple Tokenizer\n\nThe simple tokenizer also allows to use strings (\"hello\" or 'hello'), Brackets ('()', '[]' and '{}'), multiple separators\n(\" \", \"\\t\", \"\\n\", \"\\r\", \"\\0\", \"\\x0B\") and character escaping with a backslash (\\)\n\n```php\n\u003c?php\n\nuse Devtronic\\SuperTokenizer\\SimpleTokenizer;\n\nrequire_once __DIR__ . '/vendor/autoload.php';\n\n$tokenizer = new SimpleTokenizer();\n\n$sample = '\"Simple\" \\'Tokenizer\\' with\\ different brackets [a, b] (c,d), {0, 1}';\n\n$tokens = $tokenizer-\u003etokenize($sample);\n\nforeach ($tokens as \u0026$token) {\n    $token['name'] = $tokenizer-\u003egetTokenName($token['type']);\n}\n\nprint_r($tokens);\n```\n\nPrints\n```\nArray\n(\n    [0] =\u003e Array\n        (\n            [type] =\u003e 10\n            [value] =\u003e \"Simple\"\n            [position] =\u003e 0\n            [name] =\u003e TT_STRING\n        )\n\n    [1] =\u003e Array\n        (\n            [type] =\u003e 10\n            [value] =\u003e 'Tokenizer'\n            [position] =\u003e 9\n            [name] =\u003e TT_STRING\n        )\n\n    [2] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e with different\n            [position] =\u003e 21\n            [name] =\u003e TT_TOKEN\n        )\n\n    [3] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e brackets\n            [position] =\u003e 37\n            [name] =\u003e TT_TOKEN\n        )\n\n    [4] =\u003e Array\n        (\n            [type] =\u003e 20\n            [value] =\u003e [\n            [position] =\u003e 46\n            [name] =\u003e TT_BRACKET_OPEN\n        )\n\n    [5] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e a,\n            [position] =\u003e 47\n            [name] =\u003e TT_TOKEN\n        )\n\n    [6] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e b\n            [position] =\u003e 50\n            [name] =\u003e TT_TOKEN\n        )\n\n    [7] =\u003e Array\n        (\n            [type] =\u003e 21\n            [value] =\u003e ]\n            [position] =\u003e 51\n            [name] =\u003e TT_BRACKET_CLOSE\n        )\n\n    [8] =\u003e Array\n        (\n            [type] =\u003e 20\n            [value] =\u003e (\n            [position] =\u003e 53\n            [name] =\u003e TT_BRACKET_OPEN\n        )\n\n    [9] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e c,d\n            [position] =\u003e 54\n            [name] =\u003e TT_TOKEN\n        )\n\n    [10] =\u003e Array\n        (\n            [type] =\u003e 21\n            [value] =\u003e )\n            [position] =\u003e 57\n            [name] =\u003e TT_BRACKET_CLOSE\n        )\n\n    [11] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e ,\n            [position] =\u003e 58\n            [name] =\u003e TT_TOKEN\n        )\n\n    [12] =\u003e Array\n        (\n            [type] =\u003e 20\n            [value] =\u003e {\n            [position] =\u003e 60\n            [name] =\u003e TT_BRACKET_OPEN\n        )\n\n    [13] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e 0,\n            [position] =\u003e 61\n            [name] =\u003e TT_TOKEN\n        )\n\n    [14] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e 1\n            [position] =\u003e 64\n            [name] =\u003e TT_TOKEN\n        )\n\n    [15] =\u003e Array\n        (\n            [type] =\u003e 21\n            [value] =\u003e }\n            [position] =\u003e 65\n            [name] =\u003e TT_BRACKET_CLOSE\n        )\n)\n```\n\n#### Custom tokens / Custom tokenizer\nTo add your own tokens, you can simply create a custom tokenizer class like this:\n```php\n\u003c?php\n\nuse Devtronic\\SuperTokenizer\\SimpleTokenizer;\n\nrequire_once __DIR__ . '/vendor/autoload.php';\n\nclass CustomTokenizer extends SimpleTokenizer\n{\n    const TT_DOLLAR = 30;\n    const TT_EQUALS = 35;\n\n    public function __construct()\n    {\n        parent::__construct();\n\n        $this-\u003ecustomTokens = [\n            self::TT_DOLLAR =\u003e '$',\n            self::TT_EQUALS =\u003e '='\n        ];\n    }\n}\n\n$tokenizer = new CustomTokenizer();\n\n$sample = '$var = 1234';\n$tokens = $tokenizer-\u003etokenize($sample);\n\nforeach ($tokens as \u0026$token) {\n    $token['name'] = $tokenizer-\u003egetTokenName($token['type']);\n}\n\nprint_r($tokens);\n```\n\nPrints\n```\nArray\n(\n    [0] =\u003e Array\n        (\n            [type] =\u003e 30\n            [value] =\u003e $\n            [position] =\u003e 0\n            [name] =\u003e TT_DOLLAR\n        )\n\n    [1] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e var\n            [position] =\u003e 1\n            [name] =\u003e TT_TOKEN\n        )\n\n    [2] =\u003e Array\n        (\n            [type] =\u003e 35\n            [value] =\u003e =\n            [position] =\u003e 5\n            [name] =\u003e TT_EQUALS\n        )\n\n    [3] =\u003e Array\n        (\n            [type] =\u003e 1\n            [value] =\u003e 1234\n            [position] =\u003e 7\n            [name] =\u003e TT_TOKEN\n        )\n)\n```\n\nThe preTokenize()-Method allows you to modify the input source before tokenizing (normalize linendings...).\nWith postTokenize() you can modify the result of the tokenize method (detect numbers, ...)\n\n### Testing\n```\nphpunit\n```\n\n### Contributing\n- Fork the repository\n- Create a pull request","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevtronic%2Fsuper-tokenizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevtronic%2Fsuper-tokenizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevtronic%2Fsuper-tokenizer/lists"}