https://github.com/alehed/rex
A DSL for deterministic finite state machines
https://github.com/alehed/rex
dfa dsl racket regular-expression
Last synced: 4 months ago
JSON representation
A DSL for deterministic finite state machines
- Host: GitHub
- URL: https://github.com/alehed/rex
- Owner: alehed
- License: mit
- Created: 2016-10-27T21:28:48.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2018-06-13T17:28:35.000Z (about 8 years ago)
- Last Synced: 2025-10-20T14:54:12.318Z (8 months ago)
- Topics: dfa, dsl, racket, regular-expression
- Language: Racket
- Size: 57.6 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
README
# rex
A proof of concept implementation of a language to describe DFAs.
## What is wrong with regexes?
Regular expressions are used in a lot of places where they are clearly not the
best tool for the job. For example take syntax highlighting, where we should
use parsers instead. If you take a look at some source code written in a
language with built in support for regular expressions (Perl, Ruby, etc.)
you'll likely find other atrocious uses.
The problem is that regular expressions make it very easy to write expressions
that generate a lot of code and are slow (see theoretical background info).
This approach makes it really hard to write inefficient expressions. Execution
time and memory usage of a rex is roughly proportional to the length of the
expression.
Is this approach better? I don't know, decide for yourself if you like it.
### Theoretical Background
From a mathematical standpoint the problems with regular expressions are the
following:
Regular expressions are easily translated into non-deterministic finite
automata (NFA), while a computer can only execute deterministic finite state
machines (DFA). The two are mathematically equivalent and can be converted into
one another, but at a price: To convert an NFA into a DFA the number of states
in the DFA will be O(2^(n)) in the worst case. This makes execution times
unpredictable for the person writing the expression. So conversion from a NFA
to a DFA is hard and expensive.
The other way around is similar: if we have an arbitrary DFA and we want to get
a regular expression from it, conversion is also hard and the length of the
regular expression is O(2^(n)) in the worst case (where n is the number of
states in the DFA). This time you probably get a fast regex but it's really
hard to write it.
And there are extensions to regular expressions that are implemented by some
engines that take them into the area of context-sensitive languages which makes
execution times even worse.
The solution here is that if you try to specify anything other than what
directly generates a DFA, it will complain and fail.
> Note: Strictly speaking the automata generated are not DFAs because they
> don't include an explicit fail state, but adding one is cheap so that is what
> is done here.
## Installation
### For regular usage
1. Install [Racket](https://racket-lang.org)
1. Install the package using raco: `raco pkg install rex`
1. Enjoy
### For development
1. Install [Racket](https://racket-lang.org)
1. Clone this repository
1. Install it as a local package using raco: `raco pkg install ./rex`
1. Enjoy
## Usage
Create a file that has `#lang rex` as the first line.
The initial line is followed by the actual expression.
A basic rex that matches the string "abc" but nothing else looks like this.
```
#lang rex
abc
```
This file would be executed by running: `racket abc.rkt "abc" "abcd"`. This should print `(#t #f)` since the first expression was matched successfully while the second one was not.
For other options and flags consult `racket filename.rkt --help`.
### Syntax
For a detailed documentation of the syntax, please consult the [documentation](http://docs.racket-lang.org/rex).
## Contributing
> βOn the internet nothing ever happens by asking permission.β β Don't remember
Just fork away, PRs welcome.
The test-suite is run with `raco test -p rex`. Make sure it runs before every
commit. New features should preferably add a corresponding file in `tests/`.
### Development
Once you added your changes you have to recompile the package using
`raco setup --pkgs rex` otherwise racket will complain with:
```
link: module mismatch;
possibly, bytecode file needs re-compile because dependencies changed
```
## Future
In the long term if this turns out to be useful, probably a fast implementation
in C is desirable.