An open API service indexing awesome lists of open source software.

https://github.com/iraikov/chicken-lexgen

Lexer and parser combinators in Chicken Scheme
https://github.com/iraikov/chicken-lexgen

chicken-scheme chicken-scheme-eggs lexer lexer-parser parser-combinators pattern-matcher scheme scheme-language scheme-programming-language

Last synced: 7 months ago
JSON representation

Lexer and parser combinators in Chicken Scheme

Awesome Lists containing this project

README

          

# chicken-lexgen
Lexer and parser combinators in Chicken Scheme

## Description

`lexgen` is a lexer generator comprised in its core of only five
small procedures that can be combined to form pattern matchers.

A pattern matcher procedure takes an input stream, and returns a new
stream advanced by the pattern.

A stream is defined as a list that contains a list of characters
consumed by the pattern matcher, and a list of characters not yet
consumed. E.g., the list

((#\a) (#\b #\c #\d #\e))

represents a stream that contains the consumed character a, and the
unconsumed characters b c d e.

A pattern matcher has the form of a procedure that takes a success
continuation, which is invoked when the pattern matches and the stream
is advanced, an error continuation, which is invoked when the pattern
does not match, and an input stream.

## Library Procedures

Every combinator procedure in this library returns a procedure that
takes in a success continuation, error continuation and input stream
as arguments.

### Basic procedures

(seq MATCHER1 MATCHER2) => MATCHER

`seq` builds a matcher that matches a sequence of patterns.

(bar MATCHER1 MATCHER2) => MATCHER

`bar` matches either of two patterns. It's analogous to patterns
separated by `|` in traditional regular expressions.

(star MATCHER) => MATCHER

`star` is an implementation of the Kleene closure. It is analogous
to `*` in traditional regular expressions.

### Token procedure

(tok TOKEN PROC) => MATCHER

Procedure `tok` builds pattern matchers based on character comparison
operations. It is intended for matching input sequences that are
SRFI-127 lazy streams.

For each stream given, `tok` applies the procedure `PROC` to the given
token `TOKEN` and an input character. If the procedure returns a true
value, that value is prepended to the list of consumed elements, and
the input character is removed from the stream of input elements.

(char CHAR) => MATCHER

Matches a single character.

(set CHAR-SET) => MATCHER

Matches any of a SRFI-14 set of characters.

(range CHAR CHAR) => MATCHER

Matches a range of characters. Analogous to character class `[]`.

(lit STRING) => MATCHER

Matches a literal string `s`.

### Convenience procedures

These procedures are built from the basic procedures and are provided
for convenience.

(try PROC) => PROC

Converts a binary predicate procedure to a binary procedure that
returns its right argument when the predicate is true, and false
otherwise.

(lst MATCHER-LIST) => MATCHER

Constructs a matcher for the sequence of matchers in `MATCHER-LIST`.

(pass) => MATCHER

This matcher returns without consuming any input.

(pos MATCHER) => MATCHER

Positive closure. Analogous to `+`.

(opt MATCHER) => MATCHER

Optional pattern. Analogous to `?`.

(bind F P) => MATCHER

Given a rule `P` and function `F`, returns a matcher that first
applies `P` to the input stream, then applies `F` to the returned
list of consumed tokens, and returns the result and the remainder of
the input stream.

Note: this combinator will signal failure if the input stream is
empty.

(bind* F P) => MATCHER

The same as `bind`, but will signal success if the input stream is
empty.

(rebind F G P) => MATCHER

Given a rule `P` and procedures `F` and `G`, returns a matcher
that first applies `F` to the input stream, then applies `P` to
the resulting stream, then applies `G` to the resulting list of
consumed elements and returns the result along with the remainder of
the input stream.

Note: this combinator will signal failure if the input stream is
empty.

(rebind* F G P) => MATCHER

The same as `rebind`, but will signal success if the input stream is
empty.

(drop P) => MATCHER

Given a rule `P`, returns a matcher that always returns an empty
list of consumed tokens when `P` succeeds.

### Lexer procedure

(lex MATCHER ERROR STRING) => CHAR-LIST

`lex` takes a pattern and a string, turns the string into a list of
streams (containing one stream), applies the pattern, and returns the
first possible match. Argument `ERROR` is a single-argument
procedure called when the pattern does not match anything.

## Examples

### A pattern to match floating point numbers

```scheme

;; A pattern to match floating point numbers.
;; "-"?(([0-9]+(\\.[0-9]+)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?

(define numpat
(let* ((digit (range #\0 #\9))
(digits (pos digit))
(fraction (seq (char #\.) digits))
(significand (bar (seq digits (opt fraction)) fraction))
(exp (seq (set "eE") (seq (opt (set "+-")) digits)))
(sign (opt (char #\-))))
(seq sign (seq significand (opt exp)))))

(define (err s)
(print "lexical error on stream: " s)
(list))

(lex numpat err "-123.45e-6")
```

### A pattern to match floating point numbers and construct user-defined lexer state

```scheme

(define (collect cs)
(let loop ((cs cs) (ax (list)))
(cond ((null? cs) `(,(list->string ax)))
((atom? (car cs)) (loop (cdr cs) (cons (car cs) ax)))
(else (cons (list->string ax) cs)))))

(define (make-exp x)
(or (and (pair? x)
(let ((x1 (collect x)))
(list `(exp . ,x1)))) x))

(define (make-significand x)
(or (and (pair? x)
(let ((x1 (collect x)))
(cons `(significand ,(car x1)) (cdr x1)))) x))

(define (make-sign x)
(or (and (pair? x)
(let ((x1 (collect x)))
(cons `(sign ,(car x1)) (cdr x1)))) x))

(define (check s) (lambda (s1) (if (null? s1) (err s) s1)))

(define bnumpat
(let* ((digit (range #\0 #\9))
(digits (star digit))
(fraction (seq (char #\.) digits))
(significand (bar (seq digits (opt fraction)) fraction))
(exp (seq (set "eE") (seq (opt (set "+-")) digits)))
(sign (opt (char #\-)) )
(pat (seq (bind make-sign sign)
(seq (bind make-significand significand)
(bind make-exp (opt exp))))))
pat))

(define (num-parser s) (car (lex bnumpat err s)))

(num-parser "-123.45e-6")

```

## Version History

* 8.2 Removed yasos dependency [thanks to Noel Cragg]
* 8.1 Ported to CHICKEN 5 and yasos collections interface
* 7.1 Bug fix in bind* [thanks to Peter Bex]
* 7.0 Added bind* and rebind* variants of bind and rebind [thanks to Peter Bex]
* 6.1-6.2 Corrected behavior of the tok combinator so that the failure continuation is invoked upon end-of-input [thanks to Chris Salch]
* 6.0 Using utf8 for char operations
* 5.2 Ensure test script returns proper exit status
* 5.0-5.1 Added error continuation to the matcher interface and eliminated multiple stream matching
* 4.0 Implemented typeclass interface for abstracting over input sequences
* 3.8 Added procedure `star*` (greedy Kleene closure matching)
* 3.6 Added procedure redo [thanks to Christian Kellermann]
* 3.5 Bug fixes in bind [reported by Peter Bex]
* 3.3 Bug fixes in stream comparison
* 3.2 Improved input stream comparison procedures
* 3.1 Added rebind combinator and stream-unfold procedure
* 3.0 Added an extension mechanism for input streams of different
types (to be elaborated and documented in subsequent versions).
* 2.6 Added bind and drop combinators
* 2.5 The seq combinator checks whether the first parser in the sequence has failed
* 2.4 Added (require-library srfi-1); using lset<= instead of equal? in star
* 2.3 Bug fix in procedure range; added procedure cps-table
* 2.2 Bug fix in procedure star
* 2.1 Added procedure lst
* 2.0 Core procedures rewritten in continuation-passing style
* 1.5 Using (require-extension srfi-1)
* 1.4 Ported to Chicken 4
* 1.2 Added procedures try and tok (supersedes pred)
* 1.0 Initial release

## License

Based on the [SML lexer generator](http://www.standarddeviance.com/projects/combinators/combinators.html) by Thant Tessman.
>
> Copyright 2009-2021 Ivan Raikov.
>
>
> This program is free software: you can redistribute it and/or modify
> it under the terms of the GNU General Public License as published by
> the Free Software Foundation, either version 3 of the License, or
> (at your option) any later version.
>
> This program is distributed in the hope that it will be useful, but
> WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> General Public License for more details.
>
> A full copy of the GPL license can be found at
> .
>