Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scymtym/s-expression-syntax
Parse CL syntactic constructs in s-expression form (possibly represented as e.g. CSTs) and construct syntax trees
https://github.com/scymtym/s-expression-syntax
ast common-lisp cst parsing s-expressions
Last synced: about 2 months ago
JSON representation
Parse CL syntactic constructs in s-expression form (possibly represented as e.g. CSTs) and construct syntax trees
- Host: GitHub
- URL: https://github.com/scymtym/s-expression-syntax
- Owner: scymtym
- License: other
- Created: 2021-04-19T08:08:17.000Z (almost 4 years ago)
- Default Branch: builder-protocol
- Last Pushed: 2024-11-03T01:56:45.000Z (3 months ago)
- Last Synced: 2024-11-03T02:25:17.005Z (3 months ago)
- Topics: ast, common-lisp, cst, parsing, s-expressions
- Language: Common Lisp
- Homepage:
- Size: 862 KB
- Stars: 4
- Watchers: 4
- Forks: 1
- Open Issues: 17
-
Metadata Files:
- Readme: README.org
- License: COPYING
Awesome Lists containing this project
README
#+TITLE: s-expression-syntax README
#+AUTHOR: Jan Moringen
#+EMAIL: [email protected]
#+LANGUAGE: en#+OPTIONS: toc:t num:nil
#+SEQ_TODO: TODO STARTED | DONE#+BEGIN_SRC lisp :exports results :results silent
(ql:quickload '("s-expression-syntax"
"architecture.builder-protocol.print-tree"))
#+END_SRC* STARTED Introduction
The ~s-expression-syntax~ library provides declarative rules for
parsing the various kinds of s-expression syntax used in Common
Lisp:+ Special operators
+ Lambda lists
+ Declarations
+ Type specifiers
+ Standard macros
+ +~cl:loop~+ (work in progress)
+ Feature expressions
+ +Format control strings+ (work in progress)
Note that this library is not a code walker but it could be used to
implement a code walker.* STARTED Concepts
Explaining things in more detail requires a little bit of
terminology:+ Expression :: A data structure which /conceptually/
consists of atoms and conses (possibly meant for evaluation in
which case the expression is a /form/). In contrast to the
usual definition, expressions processed by this library may be
represented as arbitrary objects which are interpreted as atoms
and conses according to rules defined by the client using the
library. That said, processing expressions using the usual
interpretation of atoms and conses is of course
supported. Cons expressions can contain other expressions which
are /sub-expressions/ of the respective containing expression.+ Syntax Description :: An object which describes the syntax of one
kind of /expression/. For example, there is a syntax
description for the special operator ~cl:let~ which describes
(the infinite set) of well-formed ~let~ /expressions/. Besides
recognizing well-formed /expressions/ of a certain kind, a
/syntax description/ also contains descriptions of the /parts/
that can appear within well-formed /expressions/ of that
kind. For example, for ~cl:let~, the /parts/ are+ zero or more bindings, in turn consisting of
+ a variable name
+ an optional initial value form
+ zero or more declarations
+ and zero or more body forms.
More precisely, a /syntax description/ consists of a name, a
list of /parts/ and a parser which consumes /expressions/
conforming to the described syntax and produces a parse /result
tree/.+ Part :: An object which describes a sub-structure of
the syntax described by a particular /syntax description/. For
example, the /syntax description/ for ~cl:let~ has a /part/ for
declaration /expressions/ contained in ~let~ /expressions/.More precisely, a /part/ consists of a name, a cardinality,
which is either one, zero-or-one, one-or-more or zero-or-more,
and an evaluation which is roughly "evaluated", "not evaluated"
or "composed of evaluated and non-evaluated /parts/".For example, ~cl:defun~ has the following /parts/
| Name | Cardinality | Evaluation |
|----------------------+--------------+-------------------------------------------------|
| name | one | not evaluated |
| lambda list | one | composed of evaluated and non-evaluated /parts/ |
| documentation string | zero-or-one | not evaluated |
| declaration | zero-or-more | not evaluated |
| body forms | zero-or-more | evaluated |The "composed of evaluated and non-evaluated /parts/"
evaluation of the lambda list is due to the fact that lambda
lists contain unevaluated variable names and lambda list
keywords as well as, possibly, evaluated initialization forms
for optional and keyword parameters and ~&aux~ variables.+ Result Tree :: A tree that is result of parsing an
input /expression/. Each node in the tree corresponds to a
/sub-expression/ of the input /expression/ (multiple nodes may
correspond to the same /sub-expression/). A node is
characterized by its /kind/, its /initargs/ and its /relations/
to other nodes.The result tree could possibly be called an Abstract Syntax
Tree (AST), but we avoid that term because the result directly
reflects the structure of the input /expression/.We may call a /result tree/ a /partial result tree/ if it
contains unparsed /sub-expressions/ of the input /expression/
in some of its leaf nodes.* STARTED Tutorial
The most common use of this library probably is turning a given
/expression/ into an AST. This process happens in multiple steps1. Determine an appropriate /syntax description/ for parsing the
/expression/. For example, the /expression/ ~(locally (declare …) 1
(+ a b) 3)~ must be parsed using the /syntax description/ for the
special operator ~cl:locally~.2. Apply the obtained /syntax description/ in conjunction with a parse
result builder to obtain a partial (see 3.) AST for the
/expression/.3. Optionally parse evaluated sub-expressions recursively. In the
above example ~(declare …)~ is a sub-expression that is not
evaluated while ~1~, ~(+ a b)~ and ~3~ are sub-expressions that
are evaluated. The latter are not automatically parsed and thus
must be recursively processed in the way described here in order
to obtain a fully parsed AST. A complete AST can generally only
be produced by consulting an environment as well as interleaving
parsing with macroexpansion and is therefore out of scope for
this library.The following code performs steps 1. and 2. and prints the resulting
(partially parsed) AST in a human-readable form. Note how the ~list~
builder of the [[https://github.com/scymtym/architecture.builder-protocol][architecture.builder-protocol system]] is passed in the
~parse~ call and later used to destructure the AST node ~node~ by
calling the functions ~node-relations~ and ~node-relation~.#+NAME: simple-parse
#+BEGIN_SRC lisp :exports both :results output
(let* ((expression '(defun foo (a &optional (b 2))
(declare (type integer a))
(declare (type integer b))
(format t "~S" a)
(list a b)))
(syntax (s-expression-syntax:find-syntax 'defun))
;; Alternatively, determine the appropriate syntax description
;; for EXPRESSION automatically:
;; (syntax (s-expression-syntax:classify t expression))
(builder 'list)
(node (s-expression-syntax:parse builder syntax expression)))
(flet ((describe-sub-expression (sub-expression relation-args)
(format t "~2@T-> ~S~%~
~2@T evaluation: ~S~%"
sub-expression (getf relation-args :evaluation))))
(loop :for relation :in (architecture.builder-protocol:node-relations builder node)
:for part-name = (find-symbol (symbol-name (first relation))
(find-package "S-EXPRESSION-SYNTAX"))
:for part = (s-expression-syntax:find-part part-name syntax)
:for cardinality = (s-expression-syntax:cardinality part)
:for (sub-expression evaluation)
= (multiple-value-list (architecture.builder-protocol:node-relation
builder relation node))
:do (format t "~A (~A)~%" part-name cardinality)
(ecase (s-expression-syntax:cardinality part)
((1) (describe-sub-expression sub-expression evaluation))
((*) (mapc #'describe-sub-expression sub-expression evaluation))))))
#+END_SRCEvaluating the code results in the following output which
illustrates the four /parts/ of the ~defun~ /expression/: name,
lambda-list, declaration and form. The latter two have a cardinality
of ~*~, so multiple child nodes may be related to the parent node
through the relation in question. In this example, both relations
contain two child nodes: two declarations and two body forms.#+RESULTS: simple-parse
#+begin_example
NAME (1)
-> (:FUNCTION-NAME NIL :NAME FOO :SOURCE FOO)
evaluation: NIL
LAMBDA-LIST (1)
-> (:ORDINARY-LAMBDA-LIST
((:REQUIRED . *)
(((:REQUIRED-PARAMETER
((:NAME . 1)
(((:VARIABLE-NAME NIL :NAME A :SOURCE A) :EVALUATION NIL)))
:SOURCE A)))
(:OPTIONAL . *)
(((:OPTIONAL-PARAMETER
((:NAME . 1) (((:VARIABLE-NAME NIL :NAME B :SOURCE B)))
(:DEFAULT . 1)
(((:UNPARSED NIL :EXPRESSION 2 :CONTEXT :FORM :SOURCE 2) :EVALUATION
T)))
:SOURCE (B 2))
:EVALUATION :COMPOUND)))
:SOURCE (A &OPTIONAL (B 2)))
evaluation: :COMPOUND
DECLARATION (*)
-> (:DECLARATION
((:ARGUMENT . *)
(((:ATOMIC-TYPE-SPECIFIER
((:NAME . 1) (((:TYPE-NAME NIL :NAME INTEGER :SOURCE INTEGER))))
:SOURCE INTEGER))
((:VARIABLE-NAME NIL :NAME A :SOURCE A))))
:KIND TYPE :SOURCE (TYPE INTEGER A))
evaluation: NIL
-> (:DECLARATION
((:ARGUMENT . *)
(((:ATOMIC-TYPE-SPECIFIER
((:NAME . 1) (((:TYPE-NAME NIL :NAME INTEGER :SOURCE INTEGER))))
:SOURCE INTEGER))
((:VARIABLE-NAME NIL :NAME B :SOURCE B))))
:KIND TYPE :SOURCE (TYPE INTEGER B))
evaluation: NIL
FORM (*)
-> (:UNPARSED NIL :EXPRESSION (FORMAT T "~S" A) :CONTEXT :FORM :SOURCE
(FORMAT T "~S" A))
evaluation: T
-> (:UNPARSED NIL :EXPRESSION (LIST A B) :CONTEXT :FORM :SOURCE (LIST A B))
evaluation: T
#+end_exampleWe can also focus on the overall tree structure and print the
(partially parsed) AST as a tree. The following code again uses the
=architecture.builder-protocol= system to destructure the AST, this
time as part of a generic tree printer.#+NAME: tree-parse
#+BEGIN_SRC lisp :exports both :results output
(let* ((expression '(defun foo (a &optional (b 2))
(declare (type integer a))
(declare (type integer b))
(format t "~S" a)
(list a b)))
(builder 'list)
(node (s-expression-syntax:parse builder t expression)))
(let ((*print-case* :downcase))
(architecture.builder-protocol.print-tree:print-tree
builder node *standard-output*)))
#+END_SRCNote the unparsed leafs indicated by the ~unparsed~ node kind.
#+RESULTS: tree-parse
#+begin_example
defun
│ source: (defun foo (a &optional (b 2))
│ (declare (type integer a))
│ (declare (type integer b))
│ (format t "~S" a)
│ (list a b))
├─name: function-name
│ name: foo
│ source: foo
├─lambda-list: ordinary-lambda-list
│ │ source: (a &optional (b 2))
│ ├─required: required-parameter
│ │ │ source: a
│ │ └─name: variable-name
│ │ name: a
│ │ source: a
│ └─optional: optional-parameter
│ │ source: (b 2)
│ ├─name: variable-name
│ │ name: b
│ │ source: b
│ └─default: unparsed
│ expression: 2
│ context: :form
│ source: 2
├─declaration: declaration
│ │ kind: type
│ │ source: (type integer a)
│ ├─argument: atomic-type-specifier
│ │ │ source: integer
│ │ └─name: type-name
│ │ name: integer
│ │ source: integer
│ └─argument: variable-name
│ name: a
│ source: a
├─declaration: declaration
│ │ kind: type
│ │ source: (type integer b)
│ ├─argument: atomic-type-specifier
│ │ │ source: integer
│ │ └─name: type-name
│ │ name: integer
│ │ source: integer
│ └─argument: variable-name
│ name: b
│ source: b
├─form: unparsed
│ expression: (format t "~S" a)
│ context: :form
│ source: (format t "~S" a)
└─form: unparsed
expression: (list a b)
context: :form
source: (list a b)
#+end_example* STARTED Dictionary
#+BEGIN_SRC lisp :results none :exports none :session "doc"
#.(progn
#1=(ql:quickload '(:s-expression-syntax :alexandria :split-sequence))
'#1#)
(defun doc (symbol kind)
(let* ((lambda-list (sb-introspect:function-lambda-list symbol))
(string (documentation symbol kind))
(lines (split-sequence:split-sequence #\Newline string))
(trimmed (mapcar (alexandria:curry #'string-left-trim '(#\Space)) lines)))
(format nil "~(~A~) ~<~{~A~^ ~}~:@>~2%~{~A~^~%~}"
symbol (list lambda-list) trimmed)))
#+END_SRC** STARTED Syntax Description Protocol
#+BEGIN_SRC lisp :results value :exports results :session "doc"
(doc 's-expression-syntax:find-syntax 'function)
#+END_SRC#+RESULTS:
#+begin_example
find-syntax NAME &KEY IF-DOES-NOT-EXISTReturn the syntax description named NAME, if any.
IF-DOES-NOT-EXIST controls the behavior in case a syntax description
named NAME does not exist. The following values are allowed:#'ERROR
Signal an error if a syntax description named NAME does not exist.
OBJECT
Return OBJECT if a syntax description named NAME does not exist.
#+end_example#+BEGIN_SRC lisp :results value :exports results :session "doc"
(doc '(setf s-expression-syntax:find-syntax) 'function)
#+END_SRC#+RESULTS:
#+begin_example
(setf find-syntax) NEW-VALUE NAME &KEY IF-DOES-NOT-EXISTSet the syntax description associated with NAME to NEW-VALUE.
An existing association for NAME, if any, is replaced.
IF-DOES-NOT-EXISTS is accepted for parity with FIND-SYNTAX but
ignored.
#+end_example#+BEGIN_SRC lisp :results value :exports results :session "doc"
(doc 's-expression-syntax:ensure-syntax 'function)
#+END_SRC#+RESULTS:
#+begin_example
ensure-syntax NAME CLASS &REST INITARGSAssociate NAME with a syntax description based on CLASS and INITARGS.
Return the new or updated syntax description object associated with
NAME.If the database of syntax descriptions already contains a syntax
description for NAME, the existing syntax description object is
reinitialized with INITARGS.If the database of syntax descriptions does not contain a syntax
description for NAME, a new association is created by making an
instance of CLASS, initializing it with INITARGS and registering the
new object for NAME.
#+end_example** STARTED Parser Protocol
#+BEGIN_SRC lisp :results value :exports results :session "doc"
(doc 's-expression-syntax:classify 'function)
#+END_SRC#+RESULTS:
#+begin_example
classify CLIENT EXPRESSIONClassify EXPRESSION, possibly according to specialized behavior of CLIENT.
Return a syntax description object that roughly reflects the kind of
EXPRESSION. Note that a precise classification would have to take into
account aspects beyond the syntax, such as the environment, to, for
example, distinguish function and macro application or variable
references and symbol macro applications. It is always possible to
find an appropriate syntax description:+ If EXPRESSION is a special form, this function returns the syntax
description for the corresponding special operator.+ If EXPRESSION is an application of a standard macro, this function
returns the syntax description for that macro.+ If EXPRESSION a list not covered by the above cases, this function
returns the syntax description for a generic (that is, function or
macro) application. Note that this case also covers invalid
applications such as (1 2 3).+ If EXPRESSION is a symbol but not a keyword, this function returns a
syntax description for a variable reference.+ If EXPRESSION is any object that is not covered by the above cases,
this function returns a syntax description for a self-evaluating
object.
#+end_example#+BEGIN_SRC lisp :results value :exports results :session "doc"
(doc 's-expression-syntax:parse 'function)
#+END_SRC#+RESULTS:
#+begin_example
parse CLIENT SYNTAX EXPRESSIONParse EXPRESSION according to SYNTAX, possibly specialized to CLIENT.
SYNTAX is a designator for a syntax description:
+ If SYNTAX is `t', `classify' is applied to CLIENT and EXPRESSION to
obtain an appropriate syntax description object.+ If SYNTAX is any other symbol, `find-syntax' is called to obtain the
syntax description named by SYNTAX. An error is signaled if SYNTAX
does not name a syntax description.+ Otherwise SYNTAX must be a syntax description object.
EXPRESSION is either one of the kinds of expressions that make up
Common Lisp programs (such as forms, type specifiers and declarations)
or a particular non-standard representation of such expressions which
is specific to CLIENT. For example, a client may choose to represent
every sub-expression contained in an expression as a standard object
in order to store additional information. If CLIENT employs such a
non-standard representation, the protocol named by symbols exported
from the `s-expression-syntax.expression-grammar' package has to be
implemented by defining appropriate methods.If EXPRESSION does not conform to the syntax described by SYNTAX, an
error of type `invalid-syntax-error' is signaled.If EXPRESSION does conform to the syntax described by SYNTAX, a parse
result that associates the parts of SYNTAX with sub-expressions of
EXPRESSION is returned. The type and structure of the return value
depends on CLIENT as the parse result is constructed using the builder
protocol with CLIENT as the builder.
#+end_example# Local Variables:
# eval: (require 'ob-lisp)
# End: