https://github.com/sergey0xff/lexit

An open source lexer generator
https://github.com/sergey0xff/lexit

compiler grammar grammar-parser lexer lexer-generator lexical-analyzer python python3

Last synced: 5 months ago
JSON representation

An open source lexer generator

Host: GitHub
URL: https://github.com/sergey0xff/lexit
Owner: sergey0xff
Created: 2018-06-24T21:41:52.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-10-21T14:59:41.000Z (almost 7 years ago)
Last Synced: 2025-04-20T19:37:41.574Z (6 months ago)
Topics: compiler, grammar, grammar-parser, lexer, lexer-generator, lexical-analyzer, python, python3
Language: Python
Homepage:
Size: 9.77 KB
Stars: 15
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Lexit

Lexit is an open source lexer generator written in Python3.6 using new features like `NamedTuple`, type hinting and `__init_subclass__` hook.

## Simple example

```python

from typing import Iterable

from lexit import Lexer, Token

class MyLexer(Lexer):

    NUMBER = '\d+'

    ADD = '\+'

    SUB = '-'

    MUL = '\*'

    DIV = '/'

    ignore = r'\s+'

tokens_iter: Iterable[Token] = MyLexer.lex('2 + 2')

print(*tokens_iter, sep='\n')

```

Produces the following output

```python

Token(type='NUMBER', value='2', line=1, column=1)

Token(type='ADD', value='+', line=1, column=3)

Token(type='NUMBER', value='2', line=1, column=5)

```

## Requirements

* The only requirement is Python3.6+

* For testing the `pytest` library is used

## Installation

```bash

pip install lexit

```

## Error handling

```

try:

    tokens = list(JsonLexer.lex('${"hello": "world"}'))

except LexerError as e:

    print(e.pretty())

    exit(1)

# The error message is self-describing

# It shows what happened and where 

No match for character '$' in line 1 column 1

${"hello": "world"}

^

```

## Design decisions

* Should be easy to use

* Longest match priority (`++` always wins over `+` despite of the order in which the tokens are defined in the lexer class)

* Self-describing errors for humans (it's should be obvious what happened and where)

## More examples

### JSON lexer

```python

from lexit import Lexer

class JsonLexer(Lexer):

    NUMBER = r'-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?'

    STRING = r'"(\\\"|\\\\|[^"\n])*?"i?'

    L_BRACE = r'{'

    R_BRACE = r'}'

    L_BRACKET = r'\['

    R_BRACKET = r'\]'

    TRUE = r'true'

    FALSE = r'false'

    NULL = r'null'

    COMMA = r','

    COLON = r':'

    ignore = r'\s+'

tokens = list(JsonLexer.lex('{"hello": "world"}'))

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sergey0xff/lexit

Awesome Lists containing this project

README