Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/marcelog/ex_abnf

Parser for ABNF Grammars
https://github.com/marcelog/ex_abnf

Last synced: 12 days ago
JSON representation

Parser for ABNF Grammars

Awesome Lists containing this project

README

        

[![Build Status](https://travis-ci.org/marcelog/ex_abnf.svg)](https://travis-ci.org/marcelog/ex_abnf)

## About

A parser and interpreter written in [Elixir](http://elixir-lang.org/) for [ABNF grammars](https://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form).

ABNF is defined in the [RFC2234](https://tools.ietf.org/html/rfc2234), which is
obsoleted by [RFC4234](https://tools.ietf.org/html/rfc4234), which in turn is
obsoleted by the [RFC5234](https://tools.ietf.org/html/rfc5234). There's also an update
in the [RFC7405](https://tools.ietf.org/html/rfc7405).

This library implements the latest definition (RFC5234) (with erratas #3076, and #2968), and RFC7405.

## Use example

iex(1)> grammar = ABNF.load_file "test/resources/ipv4.abnf"
iex(2)> initial_state = %{}
iex(2)> ABNF.apply grammar, "ipv4address", '250.246.192.34', initial_state
%ABNF.CaptureResult{
input: '250.246.192.34',
rest: '',
state: %{ipv4address: '250.246.192.34'},
string_text: '250.246.192.34',
string_tokens: ['250', '.', '246', '.', '192', '.', '34'],
values: ["Your ip address is: 250.246.192.34"]
}

The result can be read as an [%ABNF.CaptureResult{}](https://github.com/marcelog/ex_abnf/blob/master/lib/ex_abnf/capture_result.ex)
where:
* **input**: The original input
* **rest**: The part of the input that **didn't** match.
* **state**: The state after running all the rules applied to the input.
* **string_text**: The rule value as a string (this might or might not be the same as the rule value, since you can return custom values when adding a reduce code to the rule).
* **string_tokens**: Each one of the values that compose the string (in this case, [octet, dot, octet, dot, octet, dot, octet]).
* **values**: The rule value. In this case the value comes from the reduce code in the [grammar itself](https://github.com/marcelog/ex_abnf/blob/master/test/resources/ipv4.abnf#L6).

## More complex examples

* There's a small sample application at [https://github.com/marcelog/ex_abnf_example](https://github.com/marcelog/ex_abnf_example). An article
describing this application is located at [http://marcelog.github.io/articles/abnf_grammars_in_elixir.html](http://marcelog.github.io/articles/abnf_grammars_in_elixir.html).

* The [unit tests](https://github.com/marcelog/ex_abnf/blob/master/test/ex_abnf_test.exs)
use different [sample RFCs](https://github.com/marcelog/ex_abnf/tree/master/test/resources) to
test the [grammar parser](https://github.com/marcelog/ex_abnf/blob/master/lib/ex_abnf/grammar.ex)
and [the interpreter](https://github.com/marcelog/ex_abnf/blob/master/lib/ex_abnf/interpreter.ex)

## How it works
This is not a parser generator, but an interpreter. It will load up an ABNF
grammar, and generate an (kind of) [AST](http://en.wikipedia.org/wiki/Abstract_syntax_tree)
for it. Then you can apply any of the rules to an input and the interpreter
will parse the input according to the rule.

## Using it with Mix

To use it in your Mix projects, first add it as a dependency:

```elixir
def deps do
[{:ex_abnf, "~> 0.2.8"}]
end
```
Then run mix deps.get to install it.

## Adding custom code to reduce rules
After a rule, you can add your own code, for example:
```
userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) !!!
state = Map.put state, :userinfo, rule
{:ok, state}
!!!
```

The code in question will be packed together into a module that is created in
runtime to speed up execution later on.

Your code can return:
* **{:ok, state}**: The match continues, and the new state is used for
future calls.

* **{:ok, state, rule_value}**: Returns a new state but also the **rule_value**
is used as the result of the match. In YACC terms, rule_value would be the
equivalent of $$ = ...

* **{:error, error}**: The whole match is aborted and this error is thrown.

And your code will be called with the following bindings:

* **state**: This is the state that you can pass when calling the initial
**ABNF.apply** function, and is a way to keep state through the whole match,
it can be whatever you like and can mutate through calls as long as your code
can handle it.

* **values**: When a rule is composed of different tokens
(e.g: path = SEGMENT "/" SEGMENT) this contains a list with all the values of
those tokens in order. In YACC terms, this would be the equivalent of using
$1, $2, $3, etc. Note that a value here can be a reduced value returned by
your own code in a previous rule.

* **string_values**: Just like `values` but each value is a nested list of
lists with all the characters that matched (you will usually want to flatten
the list to get each one of the full strings).

## Adding helper code
You can also start your grammar with code to write your own helper functions and
module additions. For example:
```
!!!
require Logger
def return_value(ip) do
Logger.debug "Hello world"
"Your ip address is: #{ip}"
end
!!!

IPv4address =
dec-octet "."
dec-octet "."
dec-octet "."
dec-octet !!!
state = Map.put state, :ipv4address, rule
{:ok, state, return_value(rule)}
!!!

dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255

DIGIT = %x30-39
```

Note how the result of the `IPv4address` rule is the result of a call to the
function `return_value`.

## Changes from 0.1.x to 0.2.x
* In the reduce code the rule value is no longer the rule name, but the
variable `rule`.
* The grammar text no longer supports `cr` as the newline, one should always
use `crlf`.
* In the reduce code there are now available the following variables:
* `rule`: The rule value
* `string_values`: Like the old `tokens` variable, but contains a nested list
of lists with the parsed strings.
* `values`: Like the old `tokens` variable, but with the reduced values
(could be a mixed nested list of lists containing char_lists and/or other
kind of values).
* Original rule names are now preserverd and only downcased, no replacements
are done to chars (i.e: `-` to `_`).

## License
The source code is released under Apache 2 License.

Check [LICENSE](https://github.com/marcelog/ex_abnf/blob/master/LICENSE) file
for more information.