An open API service indexing awesome lists of open source software.

https://github.com/amake/regexgen-ruby

Generate regular expressions that match a set of strings
https://github.com/amake/regexgen-ruby

regex ruby

Last synced: 11 months ago
JSON representation

Generate regular expressions that match a set of strings

Awesome Lists containing this project

README

          

# regexgen

Generate regular expressions that match a set of strings.

This is a Ruby port of [@devongovett](https://github.com/devongovett/regexgen)'s
JavaScript [regexgen](https://github.com/devongovett/regexgen) package.

## Installation

Add this line to your application's Gemfile:

```ruby
gem 'regexgen'
```

And then execute:

$ bundle install

Or install it yourself as:

$ gem install regexgen

## Usage

```ruby
require 'regexgen'

Regexgen.generate(['foobar', 'foobaz', 'foozap', 'fooza']) #=> /foo(?:zap?|ba[rz])/
```

## CLI

`regexgen` also has a simple CLI to generate regexes using inputs from the
command line.

```sh
$ regexgen
usage: regexgen [-mix] strings...
-m Multiline flag
-i Case-insensitive flag
-x Extended flag
```

## Unicode handling

Unlike the JavaScript version, this package does not do any special Unicode
handling because Ruby does it all for you. You are recommended to use a Unicode
encoding for your strings.

## How does it work?

Just like the JavaScript version:

1. Generate a [Trie](https://en.wikipedia.org/wiki/Trie) containing all of the
input strings. This is a tree structure where each edge represents a single
character. This removes redundancies at the start of the strings, but common
branches further down are not merged.

2. A trie can be seen as a tree-shaped deterministic finite automaton (DFA), so
DFA algorithms can be applied. In this case, we apply [Hopcroft's DFA
minimization
algorithm](https://en.wikipedia.org/wiki/DFA_minimization#Hopcroft.27s_algorithm)
to merge the nondistinguishable states.

3. Convert the resulting minimized DFA to a regular expression. This is done
using [Brzozowski's algebraic
method](http://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions#2392),
which is quite elegant. It expresses the DFA as a system of equations which
can be solved for a resulting regex. Along the way, some additional
optimizations are made, such as hoisting common substrings out of an
alternation, and using character class ranges. This produces an an [Abstract
Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree) (AST) for
the regex, which is then converted to a string and compiled to a Ruby
`Regexp` object.

## Development

After checking out the repo, run `bin/setup` to install dependencies. Then, run
`rake test` to run the tests. You can also run `bin/console` for an interactive
prompt that will allow you to experiment.

To install this gem onto your local machine, run `bundle exec rake install`. To
release a new version, update the version number in `version.rb`, and then run
`bundle exec rake release`, which will create a git tag for the version, push
git commits and tags, and push the `.gem` file to
[rubygems.org](https://rubygems.org).

## Contributing

Bug reports and pull requests are welcome on GitHub at
https://github.com/amake/regexgen-ruby.

## License

The gem is available as open source under the terms of the [MIT
License](https://opensource.org/licenses/MIT).