https://github.com/anthonynguyen/pylexer

A tiny lexer module written in Python
https://github.com/anthonynguyen/pylexer

Last synced: 9 days ago
JSON representation

A tiny lexer module written in Python

Host: GitHub
URL: https://github.com/anthonynguyen/pylexer
Owner: anthonynguyen
License: mit
Created: 2012-03-08T01:02:46.000Z (over 13 years ago)
Default Branch: master
Last Pushed: 2012-03-08T01:15:36.000Z (over 13 years ago)
Last Synced: 2025-05-19T18:18:32.434Z (about 2 months ago)
Language: Python
Homepage:
Size: 93.8 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # pylexer

### A tiny lexer module written in Python

## About

**By Anthony Nguyen**

I was learning about string tokenization for NLP and I wrote this as a sort of application of the concepts I'm learning.

It *will* stop if it does not find a token at the beginning of the data it is looking at.

## Usage

	import pylexer, re

	lexer = pylexer.Lexer([

		("word", "[a-z]+"),

		("shortdate", "\d{1,2}\/\d{1,2}\/\d{4}")

		], re.I)

	for token in lexer.scan("I was born on 01/01/1970", True):

		print token

* `Lexer.__init__` takes two arguments: a list of tokens as tuples `("name", "regex")`, and any regex flags

* `Lexer.addTokens` takes a list of tokens as tuples `("name", "regex")`

* `Lexer.addFlags` takes any number of regex flags

* `Lexer.scan` is an iterator and takes two arguments: the string to scan, and whether or not to ignore whitespace (optional, disabled by default). It returns objects of the `Token` class.

The `Token` class is basically a variable container. It holds the following information:

* `name` - The token's name

* `rule` - The regex used to match the token

* `data` - The token's matched data

* `start` - The start index (in the original string) of the token's data

* `end` - The end index (in the original string) of the token's data

The `Scanner` class is the class that's doing the actual scanning work, but it should only ever need to be used from the `Lexer` class.

## License

MIT licensed. See LICENSE.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anthonynguyen/pylexer

Awesome Lists containing this project

README