https://github.com/iraikov/chicken-lexgen
  
  
    Lexer and parser combinators in Chicken Scheme 
    https://github.com/iraikov/chicken-lexgen
  
chicken-scheme chicken-scheme-eggs lexer lexer-parser parser-combinators pattern-matcher scheme scheme-language scheme-programming-language
        Last synced: 7 months ago 
        JSON representation
    
Lexer and parser combinators in Chicken Scheme
- Host: GitHub
- URL: https://github.com/iraikov/chicken-lexgen
- Owner: iraikov
- License: gpl-3.0
- Created: 2018-09-11T04:20:25.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2021-10-04T22:06:57.000Z (about 4 years ago)
- Last Synced: 2025-01-30T16:28:27.686Z (9 months ago)
- Topics: chicken-scheme, chicken-scheme-eggs, lexer, lexer-parser, parser-combinators, pattern-matcher, scheme, scheme-language, scheme-programming-language
- Language: Scheme
- Size: 24.4 KB
- Stars: 5
- Watchers: 3
- Forks: 0
- Open Issues: 1
- 
            Metadata Files:
            - Readme: README.md
- License: LICENSE
 
Awesome Lists containing this project
README
          # chicken-lexgen
Lexer and parser combinators in Chicken Scheme
## Description
`lexgen` is a lexer generator comprised in its core of only five
small procedures that can be combined to form pattern matchers.
A pattern matcher procedure takes an input stream, and returns a new
stream advanced by the pattern.
A stream is defined as a list that contains a list of characters
consumed by the pattern matcher, and a list of characters not yet
consumed. E.g., the list
  ((#\a) (#\b #\c #\d #\e))
represents a stream that contains the consumed character a, and the
unconsumed characters b c d e.
A pattern matcher has the form of a procedure that takes a success
continuation, which is invoked when the pattern matches and the stream
is advanced, an error continuation, which is invoked when the pattern
does not match, and an input stream.
## Library Procedures
Every combinator procedure in this library returns a procedure that
takes in a success continuation, error continuation and input stream
as arguments.
### Basic procedures
    (seq MATCHER1 MATCHER2) => MATCHER
`seq` builds a matcher that matches a sequence of patterns. 
(bar MATCHER1 MATCHER2) => MATCHER
`bar` matches either of two patterns. It's analogous to patterns
separated by `|` in traditional regular expressions.
(star MATCHER) => MATCHER
`star` is an implementation of the Kleene closure. It is analogous
to `*` in traditional regular expressions.
### Token procedure
(tok TOKEN PROC) => MATCHER
Procedure `tok` builds pattern matchers based on character comparison
operations. It is intended for matching input sequences that are
SRFI-127 lazy streams.
For each stream given, `tok` applies the procedure `PROC` to the given
token `TOKEN` and an input character. If the procedure returns a true
value, that value is prepended to the list of consumed elements, and
the input character is removed from the stream of input elements.
(char CHAR) => MATCHER
Matches a single character.
(set CHAR-SET) => MATCHER
Matches any of a SRFI-14 set of characters. 
(range CHAR CHAR) => MATCHER
Matches a range of characters. Analogous to character class `[]`.
(lit STRING) => MATCHER
Matches a literal string `s`.
### Convenience procedures
These procedures are built from the basic procedures and are provided
for convenience.
(try PROC) => PROC
Converts a binary predicate procedure to a binary procedure that
returns its right argument when the predicate is true, and false
otherwise.
(lst MATCHER-LIST) => MATCHER
Constructs a matcher for the sequence of matchers in `MATCHER-LIST`.
(pass) => MATCHER
This matcher returns without consuming any input.
(pos MATCHER) => MATCHER
Positive closure. Analogous to `+`.
(opt MATCHER) => MATCHER
Optional pattern. Analogous to `?`.
(bind F P) => MATCHER
Given a rule `P` and function `F`, returns a matcher that first
applies `P` to the input stream, then applies `F` to the returned
list of consumed tokens, and returns the result and the remainder of
the input stream.
Note: this combinator will signal failure if the input stream is
empty.
(bind* F P) => MATCHER
The same as `bind`, but will signal success if the input stream is
empty.
(rebind F G P) => MATCHER
Given a rule `P` and procedures `F` and `G`, returns a matcher
that first applies `F` to the input stream, then applies `P` to
the resulting stream, then applies `G` to the resulting list of
consumed elements and returns the result along with the remainder of
the input stream.
Note: this combinator will signal failure if the input stream is
empty.
(rebind* F G P) => MATCHER
The same as `rebind`, but will signal success if the input stream is
empty.
(drop P) => MATCHER
Given a rule `P`, returns a matcher that always returns an empty
list of consumed tokens when `P` succeeds.
### Lexer procedure
(lex MATCHER ERROR STRING) => CHAR-LIST
`lex` takes a pattern and a string, turns the string into a list of
streams (containing one stream), applies the pattern, and returns the
first possible match. Argument `ERROR` is a single-argument
procedure called when the pattern does not match anything.
## Examples
### A pattern to match floating point numbers
```scheme
;;  A pattern to match floating point numbers. 
;;  "-"?(([0-9]+(\\.[0-9]+)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)? 
(define numpat
  (let* ((digit        (range #\0 #\9))
	 (digits       (pos digit))
	 (fraction     (seq (char #\.) digits))
	 (significand  (bar (seq digits (opt fraction)) fraction))
	 (exp          (seq (set "eE") (seq (opt (set "+-")) digits)))
	 (sign         (opt (char #\-))))
    (seq sign (seq significand (opt exp)))))
 
 (define (err s)
  (print "lexical error on stream: " s)
  (list))
 (lex numpat err "-123.45e-6")
```
### A pattern to match floating point numbers and construct user-defined lexer state
```scheme
(define (collect cs) 
  (let loop ((cs cs) (ax (list)))
    (cond ((null? cs)         `(,(list->string ax)))
	  ((atom? (car cs))   (loop (cdr cs) (cons (car cs) ax)))
	  (else               (cons (list->string ax) cs)))))
(define (make-exp x)
  (or (and (pair? x) 
	   (let ((x1 (collect x)))
	     (list `(exp . ,x1)))) x))
(define (make-significand x)
  (or (and (pair? x) 
	   (let ((x1 (collect x)))
	     (cons `(significand ,(car x1)) (cdr x1)))) x))
(define (make-sign x)
  (or (and (pair? x) 
	   (let ((x1 (collect x)))
	     (cons `(sign ,(car x1)) (cdr x1)))) x))
(define (check s) (lambda (s1) (if (null? s1) (err s) s1)))
(define bnumpat 
  (let* ((digit        (range #\0 #\9))
	 (digits       (star digit))
	 (fraction     (seq (char #\.) digits))
	 (significand  (bar (seq digits (opt fraction)) fraction))
	 (exp          (seq (set "eE") (seq (opt (set "+-")) digits)))
	 (sign         (opt (char #\-)) )
	 (pat          (seq (bind make-sign sign) 
			    (seq (bind make-significand significand)
				 (bind make-exp (opt exp))))))
    pat))
(define (num-parser s) (car (lex bnumpat err s)))
(num-parser "-123.45e-6")
```
## Version History
* 8.2 Removed yasos dependency [thanks to Noel Cragg]
* 8.1 Ported to CHICKEN 5 and yasos collections interface
* 7.1 Bug fix in bind*  [thanks to Peter Bex]
* 7.0 Added bind* and rebind* variants of bind and rebind [thanks to Peter Bex]
* 6.1-6.2 Corrected behavior of the tok combinator so that the failure continuation is invoked upon end-of-input [thanks to Chris Salch]
* 6.0 Using utf8 for char operations
* 5.2 Ensure test script returns proper exit status
* 5.0-5.1 Added error continuation to the matcher interface and eliminated multiple stream matching
* 4.0 Implemented typeclass interface for abstracting over input sequences
* 3.8 Added procedure `star*` (greedy Kleene closure matching)
* 3.6 Added procedure redo [thanks to Christian Kellermann]
* 3.5 Bug fixes in bind [reported by Peter Bex]
* 3.3 Bug fixes in stream comparison
* 3.2 Improved input stream comparison procedures
* 3.1 Added rebind combinator and stream-unfold procedure 
* 3.0 Added an extension mechanism for input streams of different
  types (to be elaborated and documented in subsequent versions).
* 2.6 Added bind and drop combinators
* 2.5 The seq combinator checks whether the first parser in the sequence has failed
* 2.4 Added (require-library srfi-1); using lset<= instead of equal? in star
* 2.3 Bug fix in procedure range; added procedure cps-table
* 2.2 Bug fix in procedure star
* 2.1 Added procedure lst
* 2.0 Core procedures rewritten in continuation-passing style
* 1.5 Using (require-extension srfi-1)
* 1.4 Ported to Chicken 4
* 1.2 Added procedures try and tok (supersedes pred)
* 1.0 Initial release
## License
Based on the [SML lexer generator](http://www.standarddeviance.com/projects/combinators/combinators.html) by Thant Tessman.
>
>  Copyright 2009-2021 Ivan Raikov.
> 
> 
>  This program is free software: you can redistribute it and/or modify
>  it under the terms of the GNU General Public License as published by
>  the Free Software Foundation, either version 3 of the License, or
>  (at your option) any later version.
> 
>  This program is distributed in the hope that it will be useful, but
>  WITHOUT ANY WARRANTY; without even the implied warranty of
>  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>  General Public License for more details.
> 
>  A full copy of the GPL license can be found at
>  .
>