https://github.com/devtronic/super-tokenizer
A powerful dynamic tokenizer written in PHP
https://github.com/devtronic/super-tokenizer
lexer php7 tokenizer
Last synced: 11 months ago
JSON representation
A powerful dynamic tokenizer written in PHP
- Host: GitHub
- URL: https://github.com/devtronic/super-tokenizer
- Owner: devtronic
- License: mit
- Created: 2017-02-23T18:06:28.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-02-26T15:31:46.000Z (over 9 years ago)
- Last Synced: 2024-12-30T21:29:11.435Z (over 1 year ago)
- Topics: lexer, php7, tokenizer
- Language: PHP
- Size: 9.77 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/Devtronic/super-tokenizer)
[](https://github.com/Devtronic/super-tokenizer/blob/master/LICENSE)
[](https://travis-ci.org/Devtronic/super-tokenizer/)
[](https://github.com/Devtronic/super-tokenizer)
# Super Tokenizer
Super Tokenizer is a ultra dynamic and easy to use tokenizer written in PHP
### Installation
```bash
composer require devtronic/super-tokenizer
```
### Usage
#### Minimal Tokenizer
```php
tokenize($sample);
print_r($tokens);
```
Prints
```
Array
(
[0] => Array
(
[type] => 1
[value] => Minimal
[position] => 0
)
[1] => Array
(
[type] => 1
[value] => tokenizer
[position] => 8
)
[2] => Array
(
[type] => 1
[value] => example
[position] => 18
)
)
```
You can also get the name of the token with the getTokenName()-Method
```php
getTokenName($token['type']);
}
print_r($tokens);
```
Prints
```
Array
(
[0] => Array
(
[type] => 1
[value] => Minimal
[position] => 0
[name] => TT_TOKEN
)
[1] => Array
(
[type] => 1
[value] => tokenizer
[position] => 8
[name] => TT_TOKEN
)
[2] => Array
(
[type] => 1
[value] => example
[position] => 18
[name] => TT_TOKEN
)
)
```
#### Simple Tokenizer
The simple tokenizer also allows to use strings ("hello" or 'hello'), Brackets ('()', '[]' and '{}'), multiple separators
(" ", "\t", "\n", "\r", "\0", "\x0B") and character escaping with a backslash (\)
```php
tokenize($sample);
foreach ($tokens as &$token) {
$token['name'] = $tokenizer->getTokenName($token['type']);
}
print_r($tokens);
```
Prints
```
Array
(
[0] => Array
(
[type] => 10
[value] => "Simple"
[position] => 0
[name] => TT_STRING
)
[1] => Array
(
[type] => 10
[value] => 'Tokenizer'
[position] => 9
[name] => TT_STRING
)
[2] => Array
(
[type] => 1
[value] => with different
[position] => 21
[name] => TT_TOKEN
)
[3] => Array
(
[type] => 1
[value] => brackets
[position] => 37
[name] => TT_TOKEN
)
[4] => Array
(
[type] => 20
[value] => [
[position] => 46
[name] => TT_BRACKET_OPEN
)
[5] => Array
(
[type] => 1
[value] => a,
[position] => 47
[name] => TT_TOKEN
)
[6] => Array
(
[type] => 1
[value] => b
[position] => 50
[name] => TT_TOKEN
)
[7] => Array
(
[type] => 21
[value] => ]
[position] => 51
[name] => TT_BRACKET_CLOSE
)
[8] => Array
(
[type] => 20
[value] => (
[position] => 53
[name] => TT_BRACKET_OPEN
)
[9] => Array
(
[type] => 1
[value] => c,d
[position] => 54
[name] => TT_TOKEN
)
[10] => Array
(
[type] => 21
[value] => )
[position] => 57
[name] => TT_BRACKET_CLOSE
)
[11] => Array
(
[type] => 1
[value] => ,
[position] => 58
[name] => TT_TOKEN
)
[12] => Array
(
[type] => 20
[value] => {
[position] => 60
[name] => TT_BRACKET_OPEN
)
[13] => Array
(
[type] => 1
[value] => 0,
[position] => 61
[name] => TT_TOKEN
)
[14] => Array
(
[type] => 1
[value] => 1
[position] => 64
[name] => TT_TOKEN
)
[15] => Array
(
[type] => 21
[value] => }
[position] => 65
[name] => TT_BRACKET_CLOSE
)
)
```
#### Custom tokens / Custom tokenizer
To add your own tokens, you can simply create a custom tokenizer class like this:
```php
customTokens = [
self::TT_DOLLAR => '$',
self::TT_EQUALS => '='
];
}
}
$tokenizer = new CustomTokenizer();
$sample = '$var = 1234';
$tokens = $tokenizer->tokenize($sample);
foreach ($tokens as &$token) {
$token['name'] = $tokenizer->getTokenName($token['type']);
}
print_r($tokens);
```
Prints
```
Array
(
[0] => Array
(
[type] => 30
[value] => $
[position] => 0
[name] => TT_DOLLAR
)
[1] => Array
(
[type] => 1
[value] => var
[position] => 1
[name] => TT_TOKEN
)
[2] => Array
(
[type] => 35
[value] => =
[position] => 5
[name] => TT_EQUALS
)
[3] => Array
(
[type] => 1
[value] => 1234
[position] => 7
[name] => TT_TOKEN
)
)
```
The preTokenize()-Method allows you to modify the input source before tokenizing (normalize linendings...).
With postTokenize() you can modify the result of the tokenize method (detect numbers, ...)
### Testing
```
phpunit
```
### Contributing
- Fork the repository
- Create a pull request