Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/andrewbuss/crisp

A C interpreter for a purely functional language, bolted to a very unsafe FFI
https://github.com/andrewbuss/crisp

Last synced: 3 months ago
JSON representation

A C interpreter for a purely functional language, bolted to a very unsafe FFI

Lists

README

        

crisp: a purely functional language interpreter bolted to a very unsafe FFI

Repo contents:

crisp.h : header file with core declarations
crisp.c : the core, including cell allocation and evaluation
parse.c : routines for converting between cells and strings
ffi.c : routines supporting the foreign function interface
interpreter.c : REPL

modules/std : a module containing important functions which are
| not required to implement the minimal interpreter
+ std.c : wrappers around zip, concat, apply, assoc which conform
| to the native function interface, plus hash and ispair,
| plus asc, sum, product, and modulus
+ std.crisp : declare the above native functions in the global env
This also contains many useful lambda functions:
nil, not, and, nand, or, nor, neq, dec, inc,
void, makerec, defrec, test, testwith, list-equal,
c{a,d}{a,d}r, reduce, map, all, any, none, filter,
reverse-concat, reverse, reversed-range, range,
repeat, zip, len

tests.crisp : an assortment of tests and additional syntax examples
libc_demo.crisp : a few examples using the FFI with libc
bintree.crisp : an implementation of a basic persistent binary tree map

fuzz_parser.c : a small program to exercise the parser and verify
round-trip stability. Only useful for fuzzing

Building:

git submodule update --init
ln -sr libatomic_ops bdwgc
cmake .
make

# optionally, install with all modules; read Modules below
sudo make install

Some syntax examples and testing code are in the tests file. Run tests:

crisp < tests.crisp

Run as a REPL:

crisp

Modules:

crisp.c contains the minimal evaluation logic, but many common functions
are implemented in the std module. A module is a crisp script, optionally
combined with any number of compiled object files, linked into a shared
library. When a module is loaded with the `import` function, the script is
executed in an environment with a few extra mappings:

this : an FFI_LIBRARY, the module currently being loaded
native-function : convert an FFI_SYMBOL into a NATIVE_FUNCTION

In practice, a module will contain `def`'s to define new functions in the
global environment. For example:

def hash native-function this.hash

will expose `hash` for use after the module is imported. Private functions
need not be declared in this way.

Modules are loaded at runtime using the `import` function. Modules may
import other modules similarly, although crisp does not attempt to detect
or resolve circular dependencies.

Modules are installed by default to /usr/lib/ with filenames of the form
`libstd.crisp.so`. In addition to the default library paths, `import`
tries to find a library named `std` in:

$CRISP_MODULE_PATH/modules/libstd.crisp.so
$CRISP_MODULE_PATH/std/libstd.crisp.so

where CRISP_MODULE_PATH is an environment variable that may be provided at
runtime, and defaults to `./modules/` if not provided. If imports are
failing, check that modules are reachable from the current directory.

Syntax notes:

Whitespace is generally ignored. An exception, in order to work as a REPL,
is that lines are evaluated after each newline, unless there are unclosed
parentheses. Example:

sum 1 2 3 ; result: 6
4 5 ; result: 4 5

(sum 1 2 3
4 5) ; result: 15

The if function takes two required arguments: a predicate and a clause to
be evaluated if the predicate is non-nil. If only two arguments are
supplied and the predicate is nil, nil is returned. If the predicate is
nil, everything after the second argument is evaluated. This allows an
easier syntax for chaining if's. The lambda function works similarly;
evaluating everything after the first argument as a referent. Example:

def fizzbuzz (lambda x
with fizz (equal (modulus x 3) 0)
with buzz (equal (modulus x 5) 0)
if (and fizz buzz)
FizzBuzz
if fizz
Fizz
if buzz
Buzz
x)

map fizzbuzz (range 20)
; output:
; FizzBuzz 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz
11 Fizz 13 14 FizzBuzz 16 17 Fizz 19

Evaluation notes:

A list is evaluated by examining its head. If the head is callable, and
isn't a special function (examples include def, quote, lambda), which
holds evaluation for its arguments, the arguments are evaluated and the
function is applied. If the head is not callable, each element of the list
is independently evaluated and the resulting list is returned.

The behavior of native functions is unconstrained, although the interface
is defined: a native function accepts a cell which may be either NIL or a
PAIR, and returns a cell which may be of any type.

Little is specified about the behavior of FFI functions - the interface
is not stable enough to describe yet. Roughly, an FFI_FN can be
applied to integer or symbol arguments. Symbol arguments are passed as
pointers to the underlying strings, so (libc.puts foo) prints "foo\n".
The return value of an FFI_FN is always assumed to be an integer.
This means that (libc.malloc 4096) will return an integer pointer which
can be passed to libc.gets or libc.free, for example.

Finally, lambdas are evaluated by evaluating the body of a lambda in a new
environment where the names in the lambda args list are mapped to the
corresponding values to which the lambda is being applied. This new
environment is stacked atop the environment in which the lambda was
created, so variables may override other local variables, but global
definitions are still accessible.

For example, to evaluate

with z 4 (def f lambda (x y) sum y (product z x))

f 3 7

the interpreter first pairs x with 3 and y with 7 to yield a new env
concatenated with the existing environment:

(x . 3) (y . 7) (z . 4) ... (f . (lambda (x y) sum y (product z x)))

Then it evaluates the body of the lambda in this environment, obtaining:

sum y (product z x) ->
y -> 7
product z x ->
z -> 4
x -> 3
-> apply product 4 3 -> 12
-> apply sum 7 12 -> 19

Implementation notes:

At one point crisp was implemented in a purely functional style
(as far as that's possible in C). Later work, particularly work on
optimizing tail calls, has added enough complex control flow that this
is no longer really true, but the goal of moving as much functionality
from the interpreter into the language remains.

There is no special "environment" struct. The environment is a normal list
containing dotted-pair mappings between variable names and their values.
Evaluation contexts are stacked by concatenating a list of name/value pairs
to the previous environment.

The intention is to implement only the minimum functionality in C required
to implement higher-level functionality in the crisp language, such as
recursion, map, filter, defrec, and, via the FFI, libc, and glib, I/O of
some sort.

At present, crisp uses the Boehm-Demers-Weiser (http://www.hboehm.info/gc/)
garbage collector rather than including one of its own. A reference counted
system is a goal of later work.

The FFI functionality is exciting because it allows reuse of libraries from
many other projects with very little extra C code in crisp. Using libc from
crisp allows all sorts of low-level I/O.

Symbols are interned when they are parsed.

Fuzzing:

Automated fuzzing is a fun way to catch bugs. CMake targets are included
building crisp without FFI or GC. FFI yields a lot of false positives when
a fuzzer discovers it can pass a garbage pointer to libc.free for example.
Allowing the fuzzer to call arbitrary external code is also simply a
worrying idea. GC tends to clog up the fuzzer with extra logic and control
flow, and isn't needed for short executions.

Build the slimmer crisp:

cd fuzzing
cmake .. -DCMAKE_C_COMPILER=afl-gcc
make crisp_fuzz

Prepare directories for afl:

mkdir inputs afl_state
echo "(lambda (x y) y x) 4 2" > inputs/simple

Run afl with a 200 MB memory limit:

afl-fuzz -m 200 -x afl_dict.txt -i inputs -o afl_state ./crisp_fuzz