https://github.com/jongiddy/trespass

Regular Expression Engine
https://github.com/jongiddy/trespass

Last synced: 4 months ago
JSON representation

Regular Expression Engine

Host: GitHub
URL: https://github.com/jongiddy/trespass
Owner: jongiddy
License: mit
Created: 2014-09-18T06:51:02.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2014-10-15T06:50:49.000Z (over 10 years ago)
Last Synced: 2025-01-06T09:46:16.710Z (5 months ago)
Language: Python
Size: 184 KB
Stars: 0
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        trespass

========

Regular Expression Engine

Almost POSIX extended regular expressions, plus:

1. Search multiple patterns simultaneously - return longest-leftmost result - 

   if results have same length and startpos, pattern added earlier succeeds

    ```python

    pattern = Pattern()

    pattern.addRegExp(r'(ab+c*)', 1)

    pattern.addRegExp(r'bf(ab+)*', 2)

    pattern.addRegExp(r'^(a(bc)?)*$', 3)

    pattern.addRegExp(r'01x?(ab+)*2', 'red')

    match = pattern.match('abbbabf')

    assert (match.start(), match.end(), match.value()) == (0, 4, 1)

    match = pattern.match('ddg01abb2s')

    assert (match.start(), match.end(), match.value()) == (3, 9, 'red')

    ```

2. Add text in chunks

    ```python

    pattern = Pattern('hello$', 'done')  # 2nd param can be anything

    pattern.addRegExp('hello', addLinks) # e.g. addLinks is a function

    matcher = Matcher(pattern)

    match = matcher.addChunk('hello')

    # matches second pattern, but may also match first pattern

    assert match is None

    match = matcher.addChunk(', world')

    # we now know text does not match first pattern so match to

    # second pattern succeeds

    assert (match.start(), match.end()) == (0, 5)

    assert match.value() == addLinks

    # After any successful match we need to create a new Matcher

    # Indexes start from 0 again

    matcher = Matcher(pattern)

    match = matcher.addChunk(' and again hello')

    # again, no match yet since text may match the first pattern

    assert match is None

    match = matcher.addFinal('')

    # we now know that both patterns match - same start and length

    # so pattern added first wins

    assert (match.start(), match.end()) == (11, 16)

    assert match.value() == 'done'

    ```

3. Flexible capturing using tags (#)

    ```python

    # positions 3,8 are limits of first alphanumeric sequence

    # positions 9,14 are limits of second alphanumeric sequence

    # Note: use \# to match a literal #

    pattern = Pattern('([[:space:]]+#[[:alnum:]]+#)+!')

    match = pattern.match('!\t\thello world!')

    assert match.start() == 1

    assert match.end() == 15

    assert match.tags() == (3, 8, 9, 14)

    ```

4. Greedy and reluctant matching

    ```python

    # a{2,4}? matches as few characters as possible (2)

    # a?? matches as few characters as possible (0)

    # a? matches as many characters as possible (1)

    # a* matches as many characters as possible (4)

    # a+ matches as many characters as possible (1)

    # (a* gets to be greedy before a+, so a* eats more characters,

    # however the requirement for a+ to match at least one, in order

    # that the entire match is successful, causes a* to match 4, not 5)

    pattern = Pattern('a{2,4}?#a??#a?#a*#a+#')

    assert pattern.match('aaaaaaaa').tags() == (2, 2, 3, 7, 8)

    ```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jongiddy/trespass

Awesome Lists containing this project

README