https://github.com/fschutt/nom-documentation-readme

Last synced: 4 months ago
JSON representation
Host: GitHub
URL: https://github.com/fschutt/nom-documentation-readme
Owner: fschutt
Created: 2017-06-02T00:21:20.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-06-02T00:21:47.000Z (about 9 years ago)
Last Synced: 2025-10-14T07:18:21.576Z (8 months ago)
Size: 3.91 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # nom

nom, eating data byte by byte

nom is a parser combinator library with a focus on safe parsing,

streaming patterns, and (as much as possible) zero copying.

The code is available on [Github](https://github.com/Geal/nom)

### Getting started

#### Minimal parser

Here is an example of the most basic parser you can write. It takes a bunch

of bytes and looks for the binary representation of the string "nom".

```rust

#[macro_use]

extern crate nom;

// Generates something like:

// fn cool_word(i: &[u8]) -> nom::IResult<&[u8], &[u8]> { ... }

named!(cool_word<&[u8]>, tag!("nom"));

fn main() {

    println!("{:?}", cool_word(b"nom is awesome"));

}

// Outputs: Done([32, 105, 115, 32, 97, 119, 101, 115, 111, 109, 101], [110, 111, 109])

```

The heart of this is the `named!` macro, which generates a function, which 

returns a `nom::IResult`. This is an alternative type for `std::Result`:

```rust,ignore

pub enum IResult {

   Done(I, O),          /* I holds the rest of the input, O holds the parsed output */

   Error(Err),       /* Error holds the error. It's possible to have sub-errors. */

   Incomplete(Needed),  /* Incomplete is for parsing streams (via network) */

}

```


The last three bytes hold the value `nom` as bytes (nom only occured once in

our string). A function in nom can return a specific type (in this case, it 

just returned `IResult`). You can, for example use the following function to 

return an IResult of `str` from the parser, instead of `&[u8]`:

```rust,ignore

// Takes &str as input (first argument), produces an IResult of type &str (second argument)

named!(cool_word<&str, &str>, tag!("nom"));

// outputs: Done(" is awesome", "nom")

```

#### Repeated and alternative parsers

One of the core strengths of nom is the ability to combine sub-parsers into 

larger, mor complicated parsers. For this to work, we have to define two 

functions: One to parse a singular "nom" string, and one to expect that the 

"nom" is at least __once__ contained in our input:

```rust

# #[macro_use]

# extern crate nom;

# 

// same as before

named!(cool_word<&str, &str>, tag!("nom"));

// the many1 macro produes a vector of T

named!(many_cool_words <&str, Vec<&str>>,

   many1!(cool_word));

fn main() {

    println!("{:?}", many_cool_words("nomnomnomnomnomnom oh yes!"));

}

// Output: Done(" oh yes!", ["nom", "nom", "nom", "nom", "nom", "nom"])

```

The macros `many1!`, `many0` or `many`, `many_m_n` and `many_till` can be 

used for repeated content. `many1` for example has the restriction that the

given parser must match at least once, if the string did not contain the word

"nom", it would fail.

Alternative parsers can be written using the `alt!` macro. It tries the first

parser and if it fails, tries the second parser (seperated by a vertical bar).

You can even try more than two parsers, by chaining them via a `|`.

```rust

# #[macro_use]

# extern crate nom;

# 

named!(parse_many_as <&str, Vec>,

   many1!(char!('a')));

named!(parse_many_bs <&str, Vec>,

   many1!(char!('b')));

named!(many_as_or_bs <&str, Vec>,

     alt!(parse_many_as | parse_many_bs));

fn main() {

    println!("{:?}", many_as_or_bs("aaaaaaaples"));

    println!("{:?}", many_as_or_bs("bbbaaaaples"));

}

// Done("ples", ['a', 'a', 'a', 'a', 'a', 'a', 'a'])

// Done("aaaaples", ['b', 'b', 'b'])

```

Here, we also introduced the `char!` macro for single characters.

Note that the order of parsers are important in the `alt!` macro. This will,

for example, not parse the "b"s in our string and stop at the first "failed" 

parsing result:

```rust,ignore

fn main() {

   println!("{:?}", many_as_or_bs("aaabbbaaaabbbples"));

}

// Done("bbbaaaabbbples", ['a', 'a', 'a'])

```

#### Capturing / serializing input to structs

Usually when using nom you have some sort of input that you want to serialize

to a data structure. `nom` is a zero-copy parser, meaning that it will build

references to the data you give it, not clone the data.

Serializing blocks of data that follow a specific order is done using the 

`do_parse!` macro. This macro takes a list of parsers, seperated by a `>>`

and runs them sequentially on your data. __Note:__ In older versions of `nom`,

the `>>` was a tilde character (`~`). This still works (same effect), but the

`>>` should be used.

```rust

# #[macro_use]

# extern crate nom;

# 

named!(many_as <&str, Vec>,

   many1!(char!('a')));

named!(many_bs <&str, Vec>,

   many1!(char!('b')));

named!(as_space_then_bs <&str, ()>, do_parse!(

    many_as >>

    char!(' ') >>

    many_bs >>

    () ));

fn main() {

    println!("{:?}", as_space_then_bs("aaa bbbbbbbbbb"));

}

// Done("", ())

```

`do_parse` uses the last seperated item as a return type that it returns from

the generated function. In the above case, this was simply `()`, a void type.

However, you can bind named fields to the parsers (using the `name: parser` syntax)

and then capture the output to a struct.

```rust

# #[macro_use]

# extern crate nom;

# 

# pub struct ImportantStruct {

#    a_vec: Vec,

#    b_vec: Vec,

# }

# named!(many_as <&str, Vec>,

#   many1!(char!('a')));

#

# named!(many_bs <&str, Vec>,

#    many1!(char!('b')));

# 

//[...]

// see previous example for the missing code

named!(serialize_a_b_struct<&str, ImportantStruct>, do_parse!(

    captured_as: many_as >>

    char!(' ') >>

    captured_bs: many_bs >>

    (ImportantStruct { 

           a_vec: captured_as, 

           b_vec: captured_bs

    }) ));

fn main() {

    let serialized_struct = serialize_a_b_struct("aaa bbbbbbbbbb").unwrap();

    println!("{:?}", serialized_struct.1.a_vec);

    println!("{:?}", serialized_struct.1.b_vec);

}

// ['a', 'a', 'a']

// ['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b']

```

Notice: The captured struct has to be wrapped in `()` and the resulting type

has to be unwrapped (for this example), resulting in a `(&str, ImportantStruct)`.

So we have to use `.1` to get our struct.

#### Conditional parsing, serializing enums, convenience parsers

Now that you know how to assign fields, you can write conditional parsers using

`opt!` and `cond!`. `opt!` returns an `Option` with `Some(IResult)` if the given

parser succeded.

```rust

# #[macro_use]

# extern crate nom;

# 

use std::str::FromStr;

use nom::digit;

named!(negative_i32 <&str, i32>, do_parse!(

    neg: opt!(char!('-')) >>

    digs: cond_reduce!(neg.is_some(), 

        map_res!(digit, i32::from_str)) >>

    (-digs) ));

fn main() {

    println!("{:?}", negative_i32("1023"));

    println!("{:?}", negative_i32("-55"));

}

// Error(CondReduce)

// Done("", -55)

```

This is a parser that only parses negative numbers. It looks for the char "-" 

as the first input, then stores the result in `neg`. The `cond_reduce!` macro only 

executes if the first argument is `true` - otherwise it returns an Error. Of

course you could also use an `if` statement inside the return type.

The `nom::digit` parser is a convenience parser for basic types, provided by nom.

Nom has convenience parsers for integers (with various endianness) and strings.

You can also serialize directly to enums by using a closure or extract bytes

using the `take!`:

```rust

# #[macro_use]

# extern crate nom;

use nom::Endianness;

#[derive(Debug)]

struct Header<'a> {

  endianness:   Endianness,

  version:      &'a[u8],

}

named!(parse_header<&[u8], Header>, 

    do_parse!(

        endianness: alt!(

          tag!(b"v") => {|_| Endianness::Little } |

          tag!(b"V") => {|_| Endianness::Big } ) >>

        version: take!(3) >>

        (Header {

          endianness:   endianness,

          version:      version,

        })

    )

);

fn main() {

    println!("{:?}", parse_header(b"v244"));

}

// Done([], Header { endianness: Little, version: [50, 52, 52] })

```

#### Simple error handling

To handle errors with `nom`, you can use the `add_return_error!` macro when 

you want to make your own error code (all errors in `nom` are `i32`) and 

later on use pattern `error_to_list!` to get a `Vec` over the error codes.

Then you can implement error handling how you want. Or you can use the 

error_chain` crate to transform noms `ErrorKind` into a string for an error message. 

See the [Error management](error_management.html) page for more details

#### Simple CSV parser

This is an example of how to combine parsers to make a CSV parser. There are 

more complicated formats, but this should give you an idea of how parsers can

be combined to form bigger ones. Each sub-parser is a seperate function and 

can thus be tested on its own.

```rust

# #[macro_use]

# extern crate nom;

# 

named!(csv_escaped_char<&str, char>, alt!(

    tag!("\"\"") => { |_| '"' } |

        none_of!("\"")));

named!(csv_style_str<&str, String>, do_parse!(

    char!('"') >>

    chars: many0!(csv_escaped_char) >>

    char!('"') >>

    (chars.into_iter().collect()) ));

fn main() {

    println!("{:?}", csv_style_str(r#""some string" is cool"#));

    println!("{:?}", csv_style_str(

        r#""some string with "" internal quotes!" is also cool"#));

}

```

### Further Reading

- [Nom guides](home.html)

- [The design of nom](how_nom_macros_work.html)

- [How to write parsers](making_a_new_parser_from_scratch.html)

- [Error management](error_management.html)

- [Migrating from nom 1.0 to 2.0 / higher](upgrading_to_nom_2.html)

- [FAQ](FAQ.html)

Also, you can join us on `#nom` on our IRC channel at `irc.mozilla.org:6667`.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fschutt/nom-documentation-readme

Awesome Lists containing this project

README