https://github.com/sergey0xff/lexit
An open source lexer generator
https://github.com/sergey0xff/lexit
compiler grammar grammar-parser lexer lexer-generator lexical-analyzer python python3
Last synced: 5 months ago
JSON representation
An open source lexer generator
- Host: GitHub
- URL: https://github.com/sergey0xff/lexit
- Owner: sergey0xff
- Created: 2018-06-24T21:41:52.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-21T14:59:41.000Z (almost 7 years ago)
- Last Synced: 2025-04-20T19:37:41.574Z (6 months ago)
- Topics: compiler, grammar, grammar-parser, lexer, lexer-generator, lexical-analyzer, python, python3
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 15
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Lexit
Lexit is an open source lexer generator written in Python3.6 using new features like `NamedTuple`, type hinting and `__init_subclass__` hook.## Simple example
```python
from typing import Iterablefrom lexit import Lexer, Token
class MyLexer(Lexer):
NUMBER = '\d+'
ADD = '\+'
SUB = '-'
MUL = '\*'
DIV = '/'ignore = r'\s+'
tokens_iter: Iterable[Token] = MyLexer.lex('2 + 2')
print(*tokens_iter, sep='\n')
```
Produces the following output
```python
Token(type='NUMBER', value='2', line=1, column=1)
Token(type='ADD', value='+', line=1, column=3)
Token(type='NUMBER', value='2', line=1, column=5)
```## Requirements
* The only requirement is Python3.6+
* For testing the `pytest` library is used## Installation
```bash
pip install lexit
```## Error handling
```
try:
tokens = list(JsonLexer.lex('${"hello": "world"}'))
except LexerError as e:
print(e.pretty())
exit(1)# The error message is self-describing
# It shows what happened and where
No match for character '$' in line 1 column 1
${"hello": "world"}
^
```## Design decisions
* Should be easy to use
* Longest match priority (`++` always wins over `+` despite of the order in which the tokens are defined in the lexer class)
* Self-describing errors for humans (it's should be obvious what happened and where)## More examples
### JSON lexer
```python
from lexit import Lexerclass JsonLexer(Lexer):
NUMBER = r'-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?'
STRING = r'"(\\\"|\\\\|[^"\n])*?"i?'
L_BRACE = r'{'
R_BRACE = r'}'
L_BRACKET = r'\['
R_BRACKET = r'\]'
TRUE = r'true'
FALSE = r'false'
NULL = r'null'
COMMA = r','
COLON = r':'ignore = r'\s+'
tokens = list(JsonLexer.lex('{"hello": "world"}'))
```