Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/thblt/uberwald

A simple Lisp interpreter to teach myself about Lisps and Interpreters.
https://github.com/thblt/uberwald

c interpreter lisp toy

Last synced: about 2 months ago
JSON representation

A simple Lisp interpreter to teach myself about Lisps and Interpreters.

Lists

README

        

#+TITLE: Überwald

Überwald is a simple Lisp dialect and interpreter I'm writing to learn about Lisps, interpreters, and writing C code that doesn't explode.

* The interpreter

** Components

- =object.c= :: define the =ubw_obj= type. =ubw_obj= is a tagged union and stores every possible primitive type;
- =read.c= :: read an object from a sequence of characters;
- =eval.c= :: evaluate a lisp form, given an environment;
- =env.c= :: provide an environment, /ie/ recursive associative arrays for binding values to symbols;
- =store.c= :: manage in-memory storage of objects of various types:
- regular objects (=ubw_obj=) on a preallocated (and extensible) heap;
- string values;
- symbol names.

** Memory management

*** Garbage collection

Like most LISPs, Überwald is garbage collected. Marking of unreachable objects is performed in multiple ways:

1. *Garbage-collecting environments*

- symbols are bound in environments;
- functions hold a reference to the environment they were defined in (because of lexical scoping);
- environments have a reference counter;
- when a function is garbage-collected, it decreases the reference count of the environment it was defined in by one;
- when the reference count of a given environment reaches zero, the environment is marked for garbage collection.

2. *Garbage-collecting symbols with their environments*

- whenever an environment gets collected, all the values of symbols bound inside it are destroyed, recursively if needed.

3. *Garbage collecting overwritten values*

- whenever a symbol is =(set)=, its previous value is recursively marked as destroyed.

The garbage collection phase in itself consists of updating the list of available heap space. This is an array of pair (index, size).

*** AST allocation

When Überwald code is =(read)=, the AST is by default stored on a special space called ephemeral storage, which gets deleted each time a complete form has been evaluated. Some special forms, though, will instruct the reader to store emitted AST on the heap instead. Such special forms are =(define)= and =(set)=. This prevents unnecessary copies. Ephemeral storage is pre-allocated before reading start as a =ubw_vector= of =ubw_obj=.

** Symbols and keywords

Symbols and keywords are internally represented as integers, and a associative map is defined in store.h. When reading a symbol, the reader interns it by calling =ubw_intern= with its name, and receives the integer that uniquely identifies it. Internally, one or two lookup tables are kept, one to associate names with identifiers, the other

Keywords are implemented in exactly the same way, the only difference is in evaluation (keywords evaluate to themselves).

* The language

** General specifications

- [[https://en.wikipedia.org/wiki/Common_Lisp#The_function_namespace][LISP-1]] (a single namespace for functions and variables)
- Interpreted;
- Source code is UTF-8, unless implementation language doesn't support it/makes it hard, in which case ASCII.
- Hygienic or unhygienic macros;
- Syntactic bindings by default.
- CL-like reader-macros (=set-macro-character=)

** Language

*** Name bindings

Name bindings are performed within a hierarchical set of environments. An environment is a structure holding a reference to its parent and a list of names bound to Lisp values. Looking up a symbol means looking it up into the bottommost environment, then its parent, until a binding for the symbol has been found or a parentless environment has been reached.

*** Functions

A function is stored as:

#+BEGIN_EXAMPLE
[type [args] body]
#+END_EXAMPLE

where:

- type is either =:closure= or =:special=. The former is for a regular function, the latter defines a special form, which receives its arguments unevaluated.
- =args= is an argument vector. It's a vector of either:
- symbols, for named argument;
- the =:optional= keyword, once, immediately after the last required argument;
- the =:variadic= keyword, once, immediately before the last argument.

*** Atoms

- boolean ::
- integers :: the usual notations
- floats :: same here
- keywords :: identified by a colon, as in =:keyword=. Implemented as a reader macro of the standard library: =:some-keyword= expands to (lispy.keywords.find-or-create)

** Functions

*** Primitive functions

Notice these functions aren't special forms. Lispy has a concept of lazy functions.

- =(if [condition then & else])= :: evaluate =condition=, then =then= if condition is true, else either.
- =(progn [& body])= :: evaluate every sexp in =body= and return the value of the last one.
- =(quote [& cdr])= :: evaluate to =cdr=.

*** Interpreter

- =(read [s])= :: Read =s= and return a S-expression.
- =(eval [& cdr]) :: Evaluates =cdr= as a Lispy AST.

**** Inspecting environments

- (bound? [symbol]) :: returns =t= if =symbol= is bound in the current environment (defined as the bottommost environment and its parents, up to the root.)

**** Inspecting values

No surprise there:

- =(integer? [x])=
- =(float? [x])=
- =(bool? [x])=
- (=number [x])=
- =(string? [x])=
- =(list? [x])=
- =(vector? [x])=
- =(null? [x])=

**** Manipulating symbols

- =(set [symbol value])= :: bind =value= to =symbol= in the deepest environment where symbol is defined, or the root environment.

*** File I/O

- =(fopen path mode)= :: Opens =desc= with =mode= and returns a file descriptor. The exact type of descriptors is system-dependant.
- =(fclose desc)= :: Closes file descriptor DESC.
- =(fread)= ::
- =(fwrite desc bytes)= ::
- =(fseek desc pos)= ::
- =stdin=, =stdout=, =stderr= :: File descriptors for standard input, output and error output.