https://github.com/fwcd/cfg-to-regex
Tool for converting context-free-grammars into recursive regexes
https://github.com/fwcd/cfg-to-regex
compiler context-free-grammar pcre regex
Last synced: 2 months ago
JSON representation
Tool for converting context-free-grammars into recursive regexes
- Host: GitHub
- URL: https://github.com/fwcd/cfg-to-regex
- Owner: fwcd
- License: mit
- Created: 2020-05-04T16:22:44.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-05-04T17:35:23.000Z (about 6 years ago)
- Last Synced: 2026-02-15T16:40:09.321Z (3 months ago)
- Topics: compiler, context-free-grammar, pcre, regex
- Language: Elixir
- Homepage:
- Size: 18.6 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CFG to Regex Compiler
A small tool to convert context-free grammars (written in ANTLR syntax) into a regex.
Modern regex engines support many features that exceed the expressiveness of classic regular expressions, e.g. recursion, backreferences and lookaround, thus making it possible to encode arbitrary CFGs in a single regex.
Hence the generated regex requires the following features:
* **Recursion** to encode arbitrary context-free productions
* represented as `(?&name)` where `name` is the name of a capturing group
* **Named capturing groups** to encode the rule names
* represented as `(?...)` where `name` defines the name of the capturing group
## Installation
If [available in Hex](https://hex.pm/docs/publish), the package can be installed
by adding `cfg_to_regex` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:cfg_to_regex, "~> 0.1.0"}
]
end
```
Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
be found at [https://hexdocs.pm/cfg_to_regex](https://hexdocs.pm/cfg_to_regex).
## Building
To build the application, run `mix escript.build` to generate an executable.
## Usage
This executable only requires the Erlang runtime to be installed on the system and can be invoked directly or using `escript`:
`escript cfg_to_regex [start rule] [path/to/grammar.g4]`
## Example
Sample grammars can be found in the `examples` directory. For example does the following context-free (and non-regular) grammar:
```antlr
hello : 'Hello';
world : 'World';
hello_world : hello hello_world world | ', ';
```
...compile to the following regex:
```
(?(?(?:Hello))(?&hello_world)(?(?:World))|(?:, ))
```
As a more extreme example, we can express the entire ANSI C grammar using a single regex:
C grammar regex
```
(?(?(?(?(?(?(?:typedef))|(?(?:extern))|(?(?:static))|(?(?:auto))|(?(?:register)))|(?&storage_class_specifier)(?&declaration_specifiers)|(?(?(?:void))|(?(?:char))|(?(?:short))|(?(?:int))|(?(?:long))|(?(?:float))|(?(?:double))|(?(?:signed))|(?(?:unsigned))|(?(?(?(?:struct))|(?(?:union)))(?(?:[a-zA-Z0-9_]+))(?:\{)(?(?(?(?&type_specifier)(?&specifier_qualifier_list)|(?&type_specifier)|(?(?(?:const))|(?(?:volatile)))(?&specifier_qualifier_list)|(?&type_qualifier))(?(?(?(?(?:\*)|(?:\*)(?(?&type_qualifier)|(?&type_qualifier_list)(?&type_qualifier))|(?:\*)(?&pointer)|(?:\*)(?&type_qualifier_list)(?&pointer))(?(?&IDENTIFIER)|(?:\()(?&declarator)(?:\))|(?&direct_declarator)(?:\[)(?(?(?(?(?(?(?(?(?(?(?(?(?(?(?(?(?&IDENTIFIER)|(?(?:[0-9]+))|(?(?:")(?:[a-zA-Z0-9]+)(?:"))|(?:\()(?(?(?&conditional_expression)|(?&unary_expression)(?(?:=)|(?(?:\*=))|(?(?:\/=))|(?(?:%=))|(?(?:\+=))|(?(?:\-=))|(?(?:<<=))|(?(?:>>=))|(?(?:&=))|(?(?:\^=))|(?(?:\|=)))(?&assignment_expression))|(?&expression)(?:,)(?&assignment_expression))(?:\)))|(?&postfix_expression)(?:\[)(?&expression)(?:\])|(?&postfix_expression)(?:\()(?:\))|(?&postfix_expression)(?:\()(?(?&assignment_expression)|(?&argument_expression_list)(?:,)(?&assignment_expression))(?:\))|(?&postfix_expression)(?:\.)(?&IDENTIFIER)|(?&postfix_expression)(?(?:\->))(?&IDENTIFIER)|(?&postfix_expression)(?(?:\+\+))|(?&postfix_expression)(?(?:\-\-)))|(?&INC_OP)(?&unary_expression)|(?&DEC_OP)(?&unary_expression)|(?(?:&)|(?:\*)|(?:\+)|(?:\-)|(?:~)|(?:!))(?&cast_expression)|(?(?:sizeof))(?&unary_expression)|(?&SIZEOF)(?:\()(?(?&specifier_qualifier_list)|(?&specifier_qualifier_list)(?(?&pointer)|(?(?:\()(?&abstract_declarator)(?:\))|(?:\[)(?:\])|(?:\[)(?&constant_expression)(?:\])|(?&direct_abstract_declarator)(?:\[)(?:\])|(?&direct_abstract_declarator)(?:\[)(?&constant_expression)(?:\])|(?:\()(?:\))|(?:\()(?(?(?(?&declaration_specifiers)(?&declarator)|(?&declaration_specifiers)(?&abstract_declarator)|(?&declaration_specifiers))|(?¶meter_list)(?:,)(?¶meter_declaration))|(?¶meter_list)(?:,)(?(?:\.\.\.)))(?:\))|(?&direct_abstract_declarator)(?:\()(?:\))|(?&direct_abstract_declarator)(?:\()(?¶meter_type_list)(?:\)))|(?&pointer)(?&direct_abstract_declarator)))(?:\)))|(?:\()(?&type_name)(?:\))(?&cast_expression))|(?&multiplicative_expression)(?:\*)(?&cast_expression)|(?&multiplicative_expression)(?:\/)(?&cast_expression)|(?&multiplicative_expression)(?:%)(?&cast_expression))|(?&additive_expression)(?:\+)(?&multiplicative_expression)|(?&additive_expression)(?:\-)(?&multiplicative_expression))|(?&shift_expression)(?(?:<<))(?&additive_expression)|(?&shift_expression)(?(?:>>))(?&additive_expression))|(?&relational_expression)(?:<)(?&shift_expression)|(?&relational_expression)(?:>)(?&shift_expression)|(?&relational_expression)(?(?:<))(?&shift_expression)|(?&relational_expression)(?(?:>))(?&shift_expression))|(?&equality_expression)(?(?:==))(?&relational_expression)|(?&equality_expression)(?(?:!=))(?&relational_expression))|(?&and_expression)(?:&)(?&equality_expression))|(?&exclusive_or_expression)(?:\^)(?&and_expression))|(?&inclusive_or_expression)(?:\|)(?&exclusive_or_expression))|(?&logical_and_expression)(?(?:&&))(?&inclusive_or_expression))|(?&logical_or_expression)(?(?:\|\|))(?&logical_and_expression))|(?&logical_or_expression)(?:\?)(?&expression)(?::)(?&conditional_expression)))(?:\])|(?&direct_declarator)(?:\[)(?:\])|(?&direct_declarator)(?:\()(?¶meter_type_list)(?:\))|(?&direct_declarator)(?:\()(?(?&IDENTIFIER)|(?&identifier_list)(?:,)(?&IDENTIFIER))(?:\))|(?&direct_declarator)(?:\()(?:\)))|(?&direct_declarator))|(?::)(?&constant_expression)|(?&declarator)(?::)(?&constant_expression))|(?&struct_declarator_list)(?:,)(?&struct_declarator))(?:;))|(?&struct_declaration_list)(?&struct_declaration))(?:\})|(?&struct_or_union)(?:\{)(?&struct_declaration_list)(?:\})|(?&struct_or_union)(?&IDENTIFIER))|(?(?(?:enum))(?:\{)(?(?(?&IDENTIFIER)|(?&IDENTIFIER)(?:=)(?&constant_expression))|(?&enumerator_list)(?:,)(?&enumerator))(?:\})|(?&ENUM)(?&IDENTIFIER)(?:\{)(?&enumerator_list)(?:\})|(?&ENUM)(?&IDENTIFIER))|(?(?&IDENTIFIER)))|(?&type_specifier)(?&declaration_specifiers)|(?&type_qualifier)|(?&type_qualifier)(?&declaration_specifiers))(?&declarator)(?(?(?&declaration_specifiers)(?:;)|(?&declaration_specifiers)(?(?(?&declarator)|(?&declarator)(?:=)(?(?&assignment_expression)|(?:\{)(?(?&initializer)|(?&initializer_list)(?:,)(?&initializer))(?:\})|(?:\{)(?&initializer_list)(?:,)(?:\})))|(?&init_declarator_list)(?:,)(?&init_declarator))(?:;))|(?&declaration_list)(?&declaration))(?(?:\{)(?:\})|(?:\{)(?(?(?(?&IDENTIFIER)(?::)(?&statement)|(?(?:case))(?&constant_expression)(?::)(?&statement)|(?(?:default))(?::)(?&statement))|(?&compound_statement)|(?(?:;)|(?&expression)(?:;))|(?(?(?:if))(?:\()(?&expression)(?:\))(?&statement)|(?&IF)(?:\()(?&expression)(?:\))(?&statement)(?(?:else))(?&statement)|(?(?:switch))(?:\()(?&expression)(?:\))(?&statement))|(?(?(?:while))(?:\()(?&expression)(?:\))(?&statement)|(?(?:do))(?&statement)(?&WHILE)(?:\()(?&expression)(?:\))(?:;)|(?(?:for))(?:\()(?&expression_statement)(?&expression_statement)(?:\))(?&statement)|(?&FOR)(?:\()(?&expression_statement)(?&expression_statement)(?&expression)(?:\))(?&statement))|(?(?(?:goto))(?&IDENTIFIER)(?:;)|(?(?:continue))(?:;)|(?(?:break))(?:;)|(?(?:return))(?:;)|(?&RETURN)(?&expression)(?:;)))|(?&statement_list)(?&statement))(?:\})|(?:\{)(?&declaration_list)(?:\})|(?:\{)(?&declaration_list)(?&statement_list)(?:\}))|(?&declaration_specifiers)(?&declarator)(?&compound_statement)|(?&declarator)(?&declaration_list)(?&compound_statement)|(?&declarator)(?&compound_statement))|(?&declaration))|(?&translation_unit)(?&external_declaration))
```