An open API service indexing awesome lists of open source software.

https://github.com/iraikov/chicken-abnf

Parser combinators for Augmented BNF grammars (RFC 4234)
https://github.com/iraikov/chicken-abnf

abnf augmented-bnf-grammars chicken-scheme chicken-scheme-eggs parsing parsing-combinators scheme scheme-language scheme-programming-language

Last synced: 5 months ago
JSON representation

Parser combinators for Augmented BNF grammars (RFC 4234)

Awesome Lists containing this project

README

          

# chicken-abnf
Parser combinators for Augmented BNF grammars (RFC 4234)

## Documentation

The `abnf` library provides a collection of combinators to help constructing parsers
for Augmented Backus-Naur form (ABNF) grammars
[RFC 4234](http://www.ietf.org/rfc/rfc4234.txt).

## Library Procedures

The combinator procedures in this library are based on the interface
provided by the [lexgen](https://github.com/iraikov/chicken-lexgen) library.

### Terminal values and core rules

(char CHAR) => MATCHER

Procedure `char` builds a pattern matcher function that matches a
single character.

(lit STRING) => MATCHER

`lit` matches a literal string (case-insensitive).

The following primitive parsers match the rules described in RFC 4234, Section 6.1.

(alpha STREAM-LIST) => STREAM-LIST

Matches any character of the alphabet.

(binary STREAM-LIST) => STREAM-LIST

Matches [0..1].

(decimal STREAM-LIST) => STREAM-LIST

Matches [0..9].

(hexadecimal STREAM-LIST) => STREAM-LIST

Matches [0..9] and [A..F,a..f].

(ascii-char STREAM-LIST) => STREAM-LIST

Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).

(cr STREAM-LIST) => STREAM-LIST

Matches the carriage return character.

(lf STREAM-LIST) => STREAM-LIST

Matches the line feed character.

(crlf STREAM-LIST) => STREAM-LIST

Matches the Internet newline.

(ctl STREAM-LIST) => STREAM-LIST

Matches any US-ASCII control character. That is, any character with a
decimal value in the range of [0..31,127].

(dquote STREAM-LIST) => STREAM-LIST

Matches the double quote character.

(htab STREAM-LIST) => STREAM-LIST

Matches the tab character.

(lwsp STREAM-LIST) => STREAM-LIST

Matches linear white-space. That is, any number of consecutive
`wsp`, optionally followed by a `crlf` and (at least) one more
`wsp`.

(sp STREAM-LIST) => STREAM-LIST

Matches the space character.

(vspace STREAM-LIST) => STREAM-LIST

Matches any printable ASCII character. That is, any character in the
decimal range of [33..126].

(wsp STREAM-LIST) => STREAM-LIST

Matches space or tab.

(quoted-pair STREAM-LIST) => STREAM-LIST

Matches a quoted pair. Any characters (excluding CR and LF) may be
quoted.

(quoted-string STREAM-LIST) => STREAM-LIST

Matches a quoted string. The slash and double quote characters must be
escaped inside a quoted string; CR and LF are not allowed at all.

The following additional procedures are provided for convenience:

(set CHAR-SET) => MATCHER

Matches any character from an SRFI-14 character set.

(set-from-string STRING) => MATCHER

Matches any character from a set defined as a string.

### Operators

(concatenation MATCHER-LIST) => MATCHER

`concatenation` matches an ordered list of rules. (RFC 4234, Section 3.1)

(alternatives MATCHER-LIST) => MATCHER

`alternatives` matches any one of the given list of rules. (RFC 4234, Section 3.2)

(range C1 C2) => MATCHER

`range` matches a range of characters. (RFC 4234, Section 3.4)

(variable-repetition MIN MAX MATCHER) => MATCHER

`variable-repetition` matches between `MIN` and `MAX` or more consecutive
elements that match the given rule. (RFC 4234, Section 3.6)

(repetition MATCHER) => MATCHER

`repetition` matches zero or more consecutive elements that match the given rule.

(repetition1 MATCHER) => MATCHER

`repetition1` matches one or more consecutive elements that match the given rule.

(repetition-n N MATCHER) => MATCHER

`repetition-n` matches exactly `N` consecutive occurences of the given rule. (RFC 4234, Section 3.7)

(optional-sequence MATCHER) => MATCHER

`optional-sequence` matches the given optional rule. (RFC 4234, Section 3.8)

(pass) => MATCHER

This matcher returns without consuming any input.

(bind F P) => MATCHER

Given a rule `P` and function `F`, returns a matcher that first
applies `P` to the input stream, then applies `F` to the returned
list of consumed tokens, and returns the result and the remainder of
the input stream.

Note: this combinator will signal failure if the input stream is
empty.

(bind* F P) => MATCHER

The same as `bind`, but will signal success if the input stream is
empty.

(drop-consumed P) => MATCHER

Given a rule `P`, returns a matcher that always returns an empty
list of consumed tokens when `P` succeeds.

### Abbreviated syntax

`abnf` supports the following abbreviations for commonly used combinators:

; `::` : `concatenation`
; `:?` : `optional-sequence`
; `:!` : `drop-consumed`
; `:s` : `lit`
; `:c` : `char`
; `:*` : `repetition`
; `:+` : `repetition1`

## Examples

The following parser libraries have been implemented with `abnf`, in
order of complexity:

* csv
* internet-timestamp
* json-abnf
* mbox
* smtp
* internet-message
* mime

### Parsing date and time

```scheme

(import abnf)

(define fws
(concatenation
(optional-sequence
(concatenation
(repetition wsp)
(drop-consumed
(alternatives crlf lf cr))))
(repetition1 wsp)))

(define (between-fws p)
(concatenation
(drop-consumed (optional-sequence fws)) p
(drop-consumed (optional-sequence fws))))

;; Date and Time Specification from RFC 5322 (Internet Message Format)

;; The following abnf parser combinators parse a date and time
;; specification of the form
;;
;; Thu, 19 Dec 2002 20:35:46 +0200
;;
; where the weekday specification is optional.

;; Match the abbreviated weekday names

(define day-name
(alternatives
(lit "Mon")
(lit "Tue")
(lit "Wed")
(lit "Thu")
(lit "Fri")
(lit "Sat")
(lit "Sun")))

;; Match a day-name, optionally wrapped in folding whitespace

(define day-of-week (between-fws day-name))

;; Match a four digit decimal number

(define year (between-fws (repetition-n 4 decimal)))

;; Match the abbreviated month names

(define month-name (alternatives
(lit "Jan")
(lit "Feb")
(lit "Mar")
(lit "Apr")
(lit "May")
(lit "Jun")
(lit "Jul")
(lit "Aug")
(lit "Sep")
(lit "Oct")
(lit "Nov")
(lit "Dec")))

;; Match a month-name, optionally wrapped in folding whitespace

(define month (between-fws month-name))

;; Match a one or two digit number

(define day (concatenation
(drop-consumed (optional-sequence fws))
(alternatives
(variable-repetition 1 2 decimal)
(drop-consumed fws))))

;; Match a date of the form dd:mm:yyyy
(define date (concatenation day month year))

;; Match a two-digit number

(define hour (repetition-n 2 decimal))
(define minute (repetition-n 2 decimal))
(define isecond (repetition-n 2 decimal))

;; Match a time-of-day specification of hh:mm or hh:mm:ss.

(define time-of-day (concatenation
hour (drop-consumed (char #\:))
minute (optional-sequence
(concatenation (drop-consumed (char #\:))
isecond))))

;; Match a timezone specification of the form
;; +hhmm or -hhmm

(define zone (concatenation
(drop-consumed fws)
(alternatives (char #\-) (char #\+))
hour minute))

;; Match a time-of-day specification followed by a zone.

(define itime (concatenation time-of-day zone))

(define date-time (concatenation
(optional-sequence
(concatenation
day-of-week
(drop-consumed (char #\,))))
date
itime
(drop-consumed (optional-sequence fws))))

(define (err s)
(print "lexical error on stream: " s)
`(error))

(print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))

```

## Version History

* 8.3 Removed unneeded dependency on yasos [thanks to Mario Domenech Goulart]
* 8.0 Ported to CHICKEN 5 and yasos collections interface
* 7.0 Added bind* variant of bind [thanks to Peter Bex]
* 6.0 Using utf8 for char operations
* 5.1 Improvements to the CharLex->CoreABNF constructor
* 5.0 Synchronized with lexgen 5
* 3.2 Removed invalid identifier :|
* 3.0 Implemented typeclass interface
* 2.9 Bug fix in consumed-objects (reported by Peter Bex)
* 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp)
* 2.6 Bug fixes in consumer procedures
* 2.5 Removed procedure memo
* 2.4 Moved the definition of bind and drop to lexgen
* 2.2 Added pass combinator
* 2.1 Added procedure variable-repetition
* 2.0 Updated to match the interface of lexgen 2.0
* 1.3 Fix in drop
* 1.2 Added procedures bind drop consume collect
* 1.1 Added procedures set and set-from-string
* 1.0 Initial release

## License

>
>
> Copyright 2009-2021 Ivan Raikov
>
>
> This program is free software: you can redistribute it and/or
> modify it under the terms of the GNU General Public License as
> published by the Free Software Foundation, either version 3 of the
> License, or (at your option) any later version.
>
> This program is distributed in the hope that it will be useful, but
> WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> General Public License for more details.
>
> A full copy of the GPL license can be found at
> .
>