Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/joinr/clclojure
An experimental port of clojure to common lisp. Also some native common lisp implementations of clojure libraries, like seq, persistent vectors, etc.
https://github.com/joinr/clclojure
Last synced: 2 months ago
JSON representation
An experimental port of clojure to common lisp. Also some native common lisp implementations of clojure libraries, like seq, persistent vectors, etc.
- Host: GitHub
- URL: https://github.com/joinr/clclojure
- Owner: joinr
- License: epl-1.0
- Created: 2013-08-02T05:56:15.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2019-12-11T12:58:40.000Z (about 5 years ago)
- Last Synced: 2024-08-02T01:25:55.248Z (6 months ago)
- Language: Common Lisp
- Homepage:
- Size: 835 KB
- Stars: 224
- Watchers: 21
- Forks: 8
- Open Issues: 4
-
Metadata Files:
- Readme: README.org
- License: LICENSE
Awesome Lists containing this project
README
# -*- mode: org; -*-
#+HTML_HEAD:
#+HTML_HEAD:#+HTML_HEAD:
#+HTML_HEAD:
#+HTML_HEAD:
#+HTML_HEAD:# This is a template for pushing out org files that are compatible
# with both HTML and latex export. Specifically, these files
# Support code highlighting - for clojure code - and typeset
# the code to look different from the main sections. The
# design is meant for providing a quick template to inject
# clojure source into org docs, and provide an interactive,
# REPL-friendly presentation.
#+TITLE: CLClojure: An Experiment Port of Clojure to Common Lisp
#+AUTHOR: joinr
#+DATE: 18 Aug 2018* Background
Porting Clojure seems to be the thing to do these days. After the clojurescript
compiler came out, people had a really good blueprint for hosting clojure in other
languages. The fact that the clojurescript compiler is defined, largely, in terms
of protocols, makes it a bit closer to the "clojure in clojure" goal. As a result,
you should be able to bootstrap the language (as they did with JavaScript) with
a minimum of primitives. Most of the language is defined in Clojure, so voila.We currently have Clojure ports targetting Scheme, C, C-via-Scheme, Python, etc.
I've found Common Lisp to be a particularly slick environment with some cool tools
and a fairly active community. I think it would be interesting to port the
ideas from Clojure to Common Lisp, ultimately implementing a compliant Clojure in
Common Lisp.* Layout
The current WIP is broken into several Common Lisp files (only tested on SBCL).
You can look at the dependencies in clclojure.asd.
The general layout and organization follows (subject to change):
** common-utils
Contains a plethora of utility functions and convenient clojure
analogues (or direct ports) useful for porting clojure source. Primary features
include multi-arity lambdas via =lambda*= and multi-arity functions via =defun*=.
Also provides a =with-recur= form that allows for a similar idiom for Clojure's
=loop/recur=. This is typically pulled in by all other packages for general
utility. Some limited sequence functionality overlaps with sequences.lisp, and may
be replaced in the future.
** sequences
Provides an implementation of clojure's lazy sequence abstraction. Implemented
in CLOS, and porting the seq protocols. Provides a subset of the clojure core
sequence library for CL used in bootstrapping and metaprogramming. The intent
is not to have a full port, but a minimally useful subset. Integrates the seq
abstraction with CLOS sequence types.
** pvector
A port of the java implementation of clojure's persistent vector
based on Hash Array Mapped Tries (HAMTs). Serves as a stand-alone
datastructure with local functions. Also defines =subvec= which
is the O(1) slice of a persistent vector.
** cowmap
A simple persistent map implemented by copy-on-write operations over
a standard hash table. This is a minimal implementation to support
bootstrapping, and not intended for performance or production use.
** pmap (wip)
A port of the java implementation of clojure's persistent hashmap.
Incomplete.
** protocols
Defines forms for =defprotocol= and =deftype= in common lisp.
Protocols are implemented as bundles of generic functions and args,
stored in a CLOS struct. Protocols are registered, and define
=satisfies?= predicates for the protocol to ensure valid
implementations of =reify= =deftype= =extend-type= and
=extend-protocol=. Protocol definitions expect vector args (rewrite
pending to extend these to sequences to allow simple cl-based protocol
definitions).** reader
Provides a consolidated place for reader macro support. Primary
purpose is to enable vectors, maps, and sets for bootstrapping.
Original experiments also supported characters. Intentionally
hijacks the read-table. Needs to be ported to named-readtables,
or excluded entirely once bootstrapping of clojure.tools.reader is done.** eval
Due to the custom evaluation rules for clojure's data literals, simple
reader macros are not enough to get literal support. This package
defines a generic function, =custom-eval=, and an API for
installing/uninstalling custom evaluation support in SBCL. The
implementation swaps the function definition for
=SB-IMPL::simple-eval-in-lexenv= and adds a final condition to the
non-list eval rules: if =custom-eval= is supported, then the generic
function is invoked. This allows user-defined extension of the
evaluation semantics in a minimal way, and retains low-level support
(e.g. from the compiler).** literals
Consolidated place to extend custom evaluation rules for vectors, maps, and other literals.
** keywordfunc
Initial hack at implementing keywords-as-functions support in CL.
This currently works only in lexical forms, and is not a global modification.
Exposes the macro =with-key-fn= which will scan the body for keywords in function
position, and build a table of keyword functions and fbind. The more commonly used
feature is the =keyaccess= type, which is used in the ~lexical~ package.
** lexical
Implements a lexical environment ala =let*= called =unified-let*= that
will unify lisp-2 bindings into a lisp-1 lexical environment for the body.
This is the foundational method for implementing clojure-style =let=.
** bootstrap
Implements a core set of clojure special forms and brings along the protocol definitions
from ~cljs.core~. Eventually, the entire ~cljs.core~ library will be pulled in here
and evaluated. This should hopefully provide a sufficiently robust environment
to then bootstrap the rest of clojure via tools.reader and tools.analyzer, leveraging
extant CL implementations from the earlier bootstrapping process where it makes sense.
** symbols (wip)
Nascent thought piece on implement clojure-specific vars and namespaces. Clojure's
qualified symbols and keywords are a tad different and require some thought towards integration
with CL.* Goals
** Bridge the gap between the cool stuff in clojure and Common Lisp.** Implement really useful bits of clojure in portable common lisp, and provide them as stand-alone libraries.
This includes lazy sequences, the generic sequence abstraction,
and the fundamental persistent hash-array-mapped-trie data structures in clojure:
- persistent vectors
- persistent maps
- persistent sets.** Extend the generic sequence abstraction and other idioms to Common Lisp's built-in mutable structures.
Common Lisp already has a sequence library, but I think Clojure's is more general and can be trivially extended to new types.
Common Lisp's irrational locking down of SEQUENCE is a hurdle here. The HYPERSPEC will never be updated in my lifetime :)
So generic functions are the current way to bridge this stuff.- Protocols are really nice, as are Clojure's arbitrary dispatch multimethods.
- Data literals are also highly useful. I think Clojure nailed the choice of literals, so providing reader macros for these guys would be very nice.** Possibly wrap a Common Lisp STM implementation, or cheat and use something like lparallel or just delegate to clojure.core.async (like clojurescript).
** Bootstrap a fully functional Clojure onto Common Lisp.
** Learn.
So far, this project continues to be a great way to learn about both CL and Clojure, to include
implementation hurdles, support for PLT stuff, long-standing warts from decisions made ~25 years or more ago,
and work-arounds. Nothing is insurmountable so far, which is pretty cool.* Status
** Began work porting Clojure's persistent data structures from Java about years ago, while
simultaneously learning Common Lisp (and by necessity Java :/ ).- Got a working persistent vector implementation, with compatible clojure literals about a year ago.
- Started working on persistent hash maps around November 2012.- Built a temporary implementation of clojure-style protocols in Common Lisp ~ Dec 2012.
- Pulled the bits into an actual ASDF system and pushed everything to the Github August 2013.- Implemented a baby clojure evaluator that __should__ bridge the lisp1/lisp2 gap between clojure and the Common Lisp host. Unsure if this is going to work out in the long term, but eh.
- It's real trivial at the moment.- Working on library code in my spare time, still need to port the persistent map.
** Revisited 2017
- implemented some rudimentary stuff
- vector reader macros not fleshed out; work fine at the REPL,
but failed to return a vector (instead returning a form to create the vector).
Trivial but important oversight.
- Still hijacking the global read-table. Pulled in named-readtables to help,
but haven't integrated.
- Working on variadic functions, explored funcallable classes, refined protocols, deftype.
- cleaned up the build process (more work to be done here)** Revisiting 2018
- got reader macros matured (vector literals produced properly now),
- got protocol definitions and implementations respecting vectors,
albeit in a hacky way. We still allow lists for args....
- working on deftype, then getting the bootstrap from CLJS (core protocols and
functions) to compile.- Working on leveraging named-readtables for better delineation of clojure source,
to include file-level (somewhat racket like).
- Also intend to leverage named-readtable
to get case-senstive reader (via :modern), and to enable CL interop trivially
with a macro allows CL reader inside body (vice versa, for inline clojure if
it makes sense).
- refining the approach after reading a lot, looking at some other sources of
inspiration (including early proto-clojure->java compiler from Hickey in CL)
- Basic def, fn works. Protocols work. Metadata works mostly. Deftype
- got let, albeit without destructuring. Still useful for bootstrapping!
- Initial implementation of reify working, wrapped deftype in a version
compatible with cl:deftype** Revisiting 2019
*** Reconciled some problems with data literals, clojure's evaluation model, and CL's model.
The issue is tricky and - even after using Clojure for 8+ years -
something I completely missed.So folks laud CL having awesome reader macros, and talking
about how they allow you to implement any language etc. Banking on
these comments, I went about doing that (Clojure). As it turns
out....IF you want to define a language with new data literals , and
those literals have their OWN evaluation semantics, reader macros
cannot help you (they help to a point).- I think racket probably better supports this based on research, but
I have no proof.When we read a vector in clojure (not a syntax-quoted one), we
actually get a literal vector data structure (not a a list of `(vector
arg1 arg2 arg3), but an actual vector of [(eval arg1) (eval arg2)
...]).I chased my tail on this twice now, and finally realized that in
clojure, (eval [x y]) -> [(eval 'x) (eval 'y)].So when you read a vector, and eval it, you actually have to apply
those evaluation semantics to build up the resulting vector. In CL,
eval just passes the vector through.This poses a problem in CL, since yes, you can trivially extend the
reader to handle vectors, but you CANNOT get the evaluation semantics
to line up, since CL provides no mechanism to do so (unlike say reader
macros). You can of course define your whole infrastructure,
including your own eval, etc., but I'm interested in minimally
bootstrapping a functional enough clojure to get the language tooling
ported, so hacking on running lisp implementation of `eval` is
attractive.I tried having the reader macro return =`(vector ,@args)= instead of
building the literal vector directly ala =(eval `(apply #'vector
,@args))= , which worked (mostly!), except that for things like macros,
instead of dealing with a vector for say bindings, you'd now be
dealing with the syntax-quoted list of =(vector arg1 arg2 ....)=.I though about hacking all the macros to recognize this, and coerce
the vector sexpr into an actual vector, but it's like squeezing air in
a balloon.Rather, simply extending eval =sb-impl:simple-eval-in-lexenv= to
understand that there are other data literals (as expressed by say a
generic function) that have their own evaluation semantics as forms,
allows one to get up to par with using clojure literals e.g. in
macros.It should be minor surgery (and actually portable), but since
the goal is to bootstrap something that can read the source for
/clojure.tool.reader/ and /clojure.tools.analyzer/, etc. and emit a pure
common-lisp implementation, I "think" it'd be a one-time deal.- Currently implemented in /eval.lisp/, which provides /clclojure.eval/ and an
API for allowing custom evaluation semantics defined by the
generic function =custom-eval=.- Also implemented a companion package, /clclojure.literals/, which
enables custom evaluation, and extends the semantics for the
boostrapped persistent structures.*** Quasiquoting Data Literals
So we have evaluation semantics consistent with Clojure for
vectors, maps (and trivially sets when they arrive).The backtick implementation in sbcl's backq.lisp is currently
befuddling due to lack of familiarity. Based on the existing
implementation in readers.lisp, I get =unquote= working fine with
user-defined literals like vectors and maps, but unquote-splicing is
throwing. Trying to hack around it from the reader level, e.g. coerce
to a normal quasiquoted sexpr then eval that, rather than hacking
backq.lisp as I did clclojure.eval. Lack of familiarity is stunting
progress a bit.I had to dig into the innards of their eval and backquote
implementations to jank some stuff together, but it works. It's life
Jim, but not as we know it.... I literally got done hacking the
quasiquote stuff in reader.lisp, clclojure.reader:quasify to make
things work, and had no idea how I got it to work since my previous
intuitions had failed. SBCL uses a "comma" struct to indicate splicing
operands inside a quasiquote, and there's some weird interplay with
macroexpand. So I kludged stuff together through trial and error, to
build an arglist to shove at my vector and map constructor.
- [[https://github.com/joinr/clclojure/blob/master/reader.lisp#L197][quasify]]- [[https://github.com/joinr/clclojure/blob/master/reader.lisp#L219][literal reader constructor]]
The other hack is wiring in a generic method into sbcl's default eval
implementation,- [[https://github.com/joinr/clclojure/blob/master/eval.lisp#L166][custom-eval]]
This admits defining custom evaluation rules for new data literals,
like vectors and maps.- [[https://github.com/joinr/clclojure/blob/master/literals.lisp#L16][data literals]]
Surprisingly, SBCL has some well documented/written source. I learned
a bit plumbing through it, although their compiler stuff (and IR-1
intermediate language) is still magic to me. Thankfully, it appears to
hacking the simple-eval-in-lexenv function is enough to wire into the
compiler (I'm guessing the compiler form leverages it downstream, but
I'm not sophisticated enough yet to see how it all weaves together).Once the CL native reader and analyzer are in place, then the normal
eval semantics would take hold (unless there's a desire to maintain
the bootstrapping methodology and allow clojure literals like
persistent vectors, maps, and sets to have eval semantics for interop
or something, I dunno).*** Refactoring
There are a lot of "learning" debris throughout the code base
that are not on the critical path for bootstrapping (or no longer
are). These need to be pruned and moved elsewhere so that a
consistent, clear path of required code is attentuated.*** Sequences and Hacks
So I decided (for ease of metaprogramming my bootstrap), to clean up
the lame sequence implementation I had from years ago and write a
proper one. This went very well.I even got up to the point where I'd implemented idiomatic lazy
sequences, integrated with CLOS, and started getting reduce working.
Then I ran into a snag:I wanted to use CL's built-in reduce for its sequence types, since it
was already efficient. Clojure has the notion of a simple protocol,
InternalReduce, which basically says "hey, reduce yourself!" or else
it falls back to (likely slower) seq-based first/rest reduction. So
this is desirable (plus nice interop points).I got everything working, but one thing was wrong: Clojure implements
a short-circuiting variant of reduce, via invoking (reduced
the-result). This creates a little wrapped type that can be quickly
checked during the reduction loop to see if it's a reduced item, which
signals to stop reducing. The reduced item is then unpacked (via
deref) and returned as the result.Problem: the CL folks didn't think of this decades ago. So how to
bootstrap this functionality on extant concrete?+I almost gave up, then remembered vaunted warnings from Java land+
+about idiots using Exceptions for control flow. This built on the+
+idea that I could just throw an error if I detected a reduced value+,
+and handle the reduction gracefully outside of the built-in cl:reduce.++My ugly hack (although using the elegant tools of signals and+
+handler-case) emerged:+We just leverage the excellent low-level construct =block= to define
an escapable context we can =return-from=. From inside this block,
we wrap the call to reduce with a lambda that - upon detecting
a reduced item - invokes =return-from= with the dereferenced value of
the reduced item.As another fun time, I also ran into problems with CLOS and multiple
dispatch that were a little unsavory. Had to learn about
call-next-method. Realized that CLOS doesn't have a method preference
hierarchy by default (clojure does), and that these kinds of problems
create interesting lisp OOP solutions...Also managed to crash SBCL due
to it several times (I was trying to leverage SBCL's extensible
sequences, where unlike other lisps, you can inherit from the sequence
base class, which I wanted my lazyseq classes to do). This ended up
failing since going the generic method route, I wrote a specializer on
'sequence, which overode the other specializer on 'lazyseq, since
'lazyseq inherited from 'sequence. Despite getting some of it worked
out, I still managed to crash SBCL several times (stochastic...), and
reverted to NOT having 'lazyseq extend 'sequence for now.*** Loop / Recur (currently with-recur)
Basis of loop/recur. Combined with lambda* and defun*, this will
allow clojure style recur forms in function bodies (with a trivial
extension to loop / recur). I'm not checking for tail position
yet....probably need to do that to ensure correctness. Might have to
code walk.This took many tries (dissecting dolist, some simple loop forms), as
well as trying and failing to implement a (more elegant) version with
macrolet and another that used a simple labels form for recur (labels
blew the stack.... think tco only happens there if compiled).CL has some nice lower level flow control primitives like block and
tagbody.- "BTW, does your code work on non-tail `recur`s? (Is there even such a thing? idk.) Example:"
#+BEGIN_SRC lisp
(with-recur (x 2)
(if (< x 4)
(progn
(recur (1+ x))
(format t "back from recur~%"))
x))
#+END_SRCI think, based on the semantics of "go", it will, since it's
effectively a goto that jumps to the tag in the tagbody.However, I'm currently working on tailcall detection to enforce
clojure's semantics.The new version is now like clojure's
#+BEGIN_SRC clojure
(loop [x 2]
(if (< x 10)
(recur (inc x))
x))
#+END_SRCwith a sequence of bindings denoting
[arg1 init, arg2, init] ....#+BEGIN_SRC lisp
(with-recur (x 2)
(if (< x 10)
(recur (1+ x))
x))
#+END_SRCStarted off optimizing to see if I could avoid emitting with-recur for
functions that don't invoke it, so code-walk and look for recur
forms.. Then realized I needed to detect tail calls...so started
going down that rabbit hole..Curious to see if there's a simple way to walk the code instead of
denoting the grammar of tail calls. Like something empirical to
mechanically demonstrate the evaluation is independent of further
calls. Say, build a DAG representing the call graph, demonstrating
that recur only ever occurs at leaves.My cro-mag approach it to just define special cases for each form and
check that way. Tricky stuff...**** Tail Call Detection / Enforcement
Defined some rules (currently for CL, but clclojure should work on top of these
with little to no changes) that follow the tail call semantics for special forms.In =common-utils=, these functions inform =with-recur= about the viability of
the code at compiletime and throw errors if invalid tail calls are detected:
- =detect-recur= Simple naive code walker that walks an expression looking for ~(recur ...)~
- =tail-children= Defines the (currently static) semantics of different special forms
detailing whether they have children and what position (:tail or :non-tail). Returns
a list of pairs of (:tail|:non-tail child-expr)
- =categorize-tails= Walks an expression, accumulating a list of call-site structs,
which encode what kind of call was made (:recur, :illegal-recur) and what the expr
was for use in reporting errors.
- =summary-tails= Reduces the output of =categorize-tails= into a pair of (t|nil illegals),
where ~illegals~ is a list of illegal call-sits, and ~t|nil~ reports whether "any"
instances of ~recur~ calls were detected.=with-recur= then leverages =summary-tails= to determine if there are errors in the
input at compile time, and signals an illegal tail call if so. If there aren't
errors, but there are not instances of =recur= calls in the expression, =with-recur=
just optimizes down to a semantically-equivalent =let*=, so it can be invoked regularly
with no added codegen (e.g. in =lambda*= or other recur targets) if recur never shows up.Otherwise, a form with a local function =recur= is emitted and things perform as expected,
with proper tail calls enforced.*** Symbol Interning and Mucking With MetaProgramming
Discovered, through use of =case= to match on symbols for
dissecting lists during code walking, that my functions wouldn't work
outside of the package they were originally defined it. Subsequently,
macros depending on said functions, (=with-recur= being the foremost example)
failed to behave as expected. It turns out, this has to do with how
CL interns apparently unqualified symbols in a package.Say I implement a function to examine the first element of a
list and dispatch depending on the symbol detected there:
#+BEGIN_SRC lisp
(defun detect-foo-bar (expr)
(case (first expr)
(foo :foo)
(bar :bar)
(t nil)))
#+END_SRCThis function will work great so long as I'm inside of the package where I defined it.
The =case= test for each clause uses =common-lisp:eql=, so the notion of symbol
equality applies to the args. The side-effect introduced here is that
the symbols 'foo, 'bar, are actually....'current-package:foo and 'current-package:bar
rather than being free-floating, unqualified symbols. If we're at the repl, they
even print as such, without any indication that they're qualified. This is standard behavior.The tricky part is that if we then export =detect-foo-bar= and try to use it from ~other-package~,
#+BEGIN_SRC lisp
CL-USER> (original-package:detect-foo-bar '(foo 2 3 4))
NIL
#+END_SRCwe get no joy, despite the input expression "obviously" having the symbol 'foo in it. To compound matters,
the original function we dutifully tested at the REPL works fine in ~original-package~.The reason is that 'foo in ~original-package~ where we defined the function, and where we
macroexpanded =case= to implement it, is actually '=original-package::foo= , and the symbol
we're trying to match against coming from the list read and evaluated in ~CL-USER~ is actually
'=CL-USER::FOO=:#+BEGIN_SRC lisp
CL-USER> (symbol-package 'foo)
#
#+END_SRCUnder these conditions, we realize the symbols are not in the same package, thus not =eql=:
#+BEGIN_SRC lisp
CL-USER> (eql 'foo 'original-package:foo)
NIL
#+END_SRCClojure doesn't have this problem due to its semantics for unqualified symbols across
namespaces:#+BEGIN_SRC clojure
Clojure 1.10.1
user=> (def foo 'foo)
#'user/foo
user=> (ns blah)
nil
blah=> (def foo 'foo)
#'blah/foo
blah=> (= foo user/foo)
true
#+END_SRCSo CL's behavior was a bit of a surprise, particularly in how it
seemingly janks up defining utility functions that work on symbols and
dissect lists, e.g. for metaprogramming.Implementing (and then testing from another package) =with-recur= and its auxillary functions
was the first notable time this popped up. The fact that it can fail silently (in the case
of =case= ) engenders additional caution (absent a better solution that I'm unaware of).So, my current solution is to define a new equality test based on unqualified symbols, and
their =symbol-name= equality. In ~common-utils~:- =symbol?= determines if the input is strictly a symbol, and not keyword
- =symbol== determines if both inputs are =symbol?=, whether the =symbol-name= s are =string==
- =seql= wraps the typical =common-lisp:eql= with a custom path for inputs that are =symbol?=
- =custom-case= Allows defining =common-lisp:case= macros with a custom user-defined test.
- =scase= Implements =case= using =seql= vis =custom-case= and is a drop-in replacement for
existing uses of =case= that relied unintentionally on interned symbols for case clauses.With these in place, and extant code for =with-recur= and friends ported to use =scase=,
we now have a working implementation of clojure-style loop/recur.*** Variadic Protocols
We now appear to have functioning variadic protocols, as well as
marker protocols (protocols with no methods).These work via a simple dispatch mechanism, via common-utils:lambda*.
The generic function is somewhat opaque in the args, only taking (obj
&rest args) but the dispatch mechansim conforms to the protocol
definition, allowing multiple dispatches per clojure's definition.Behind the scenes, we have a simple lambda* that's created for the
implementation during defmethod time, which actually carries out the
implementation.It is invoked inside the body of the defmethod specializer, and
simplied applid to the obj and args, allowing the lambda* dispatch
mechanism to take over.This is good for bootstrapping, but the lack of integration with slime
and e.g. arg detection (ala CIDER) is not desirable long-term.For dev purposes, it's meaningless though since we're bootstrapping
pretty simply. I don't plan to hack a custom slime mode to enable
documenting multiple- arity functions just for the boostrap.This opens most of the remaining doors for our basic boostrap. We can
now leverage most of cljs.core to get the basic libraries lifted in
with little effort (given the current state of reader, seqs, etc.).It should also be possible to work backwards from the protocol
implementations to get us into an even simpler bootstrapping environment.
For the time being, we'll continue wrapping existing CL functionality
though.*** Generic Symbol Equality (#'seql)
One nagging feature that probably needs to be addressed is the
necessity of using common-utils:seql for symbol equality testing due
to CL's somewhat baroque equality and the impediment it causes toward
simple metaprogramming.This is prolific: even hash tables aren't trivially extended with a
custom equality test out of the box, although that's not a vast
hurdle.
*** Lexical Literals
So, everything worked fine when evaluating literals from the null
lexical environment based on the custom-eval semantics in
clclojure.eval:custom-eval and related wiring into sbcl's =eval=.This did not carry over into expressions with lexically bound
forms that were also data literals. Results kept returning wierd
quoted forms for values rather than the actual evaluated forms.It ended up being a long and tricky process to tease out the
underlying cause, and to diagnose a quickish fix.As it turns out, CL is pretty canny about how it handles lexical
scope. To satisfy the compiler, we'd like to pass off friendly,
common forms that SBCL expects. Unfortunately, if literals show
up in various places where lexical scope is defined (such as the
right-hand side bindings for let*, which is emitted by
=clclojure.base:let=), we end up with problems.We also end up with problems if the literal is the result (or any
piece of) the body of expressions that extend lexical scope.We could theoretically fix all this at a somewhat high level, with
a code-walker that recognizes our clj forms and does the requisite
transforms at macroexpansion time. This is cool, but not
necessarily what we want for helpful user feedback,
e.g. macroexpanding a form expecting to see clojure literals.Thankfully, we have access to the evaluator, and can continue to
slightly tweak it to suit our needs. Namely, where lexical forms are
defined, e.g. =common-lisp:let*= and =common-lisp:let=, we can add in
a mechanical transform to the expression (not unlike manual macro)
that walks the expression and expands our literals into
literal-constructor forms that are readily recognized by sbcl's
evaluation machinery (and the compiler).So, the technique is to define another generic function that user's
can express lexical expressions for, =clclojure.eval:let-expr=. The
semantics would be:#+BEGIN_SRC clojure
(def y 3)(let [x [2 y]] [x]) => [[2 3]]
#+END_SRCIn CL
#+BEGIN_SRC lisp
(let* ((x [2 y]))
[x])
#+END_SRC
is compiled to#+BEGIN_SRC lisp
(let* ((x (persistent-vector 2 y)))
(persistent-vector x))
#+END_SRCImportantly, we don't want to muck with the literals at
macroexpansion time (e.g. for debugging an visualizing the code),
but we do want to intercept them prior to going to evaluation.
This sounds very much like a possible compiler macro (or other
eval-when context), and may be trivially refactored into one.Absent that, we define a function
=clclojure.eval:custom-eval-bindings=, which acts as an intercessor,
and takes an expression to walk, and a lexical environment to pass
along in case we ever get to use it (this conforms to the function
signature for sbcl's =sb-impl::%simple-eval=.We detect various types of expressions that could define literals
in the lexical scope, and walk the expression based on the form's
semantics, recursively, replacing any literals with the
appropriate constructor- based canonically evaluable bindings -
iff the value supports =clclojure.eval:let-expr?=.This gets us 90% of the way to happy, but there still exists a
problem with (fn ) forms, since they actually build up lambdas.Funny enough, CL doesn't like to just emit (lambda (x) ...) from a
defmacro expression.Neigh, if we quasiquote it, we get #'(lambda (x) ...) which is
effectively (function (lambda (x) ...))Since we're emitting these forms, they end up being opaque when we
look at custom-eval-bindings. So, we have to process them at
macroexpansion time , e.g. in the (fn ...) macro. We do so by
macroexpanding the body of the lambda, and similarly walking the
form with =custom-eval-bindings=. Then, we emit the ultimately
sharp-quoted lambda with the new body. This works well, with the
small tradeoff that fully macroexpanding a (fn ..) macro will
expand into the more verbose form without any literals.*** Multimethods (WIP)
We still need to implement multimethods but that'll be a simple
variation on the dispatch stuff. These show up in both the
reader and analyzer, so they're next.
*** Loose protocol implementation
Right now, defprotocol expects implementations for all function arities.
I think Clojure allows partial implementation, or generates stubs via
abstract methods. We need to implement this relaxation, e.g. for
IFn, so we can get boostrapping advanced.
*** Revisiting Lexical Literals
So the original implementation for this fixed one thing only to break another.
In an attempt to comport with the SBCL compiler's pre-baked notion (read ignorance)
of the evaluation rules for non-list things, I went down the route of defining
a custom code walker to intercept calls to =let= and friends. This interception
would walk the code, and invoke generic functions to coerce custom literals (like
vectors) to eval-friendly forms (read, a constructor and args) so I could avoid hacking the compiler.This worked great and fixed a major problem, where lexical literals were effectively just uneval'd
and quoted directly. Now, at eval time, we get custom literals working as in Clojure.**** Downside
I broke the expectation that we would be using "actual" persistent vectors as syntactic
objects in macros (namely =defprotocol=, =fn=, =defn=). This primarily manifested as
an ugly regression in =extend-protocol=, where CL would complain that we're trying to
apply a generic function from the pvector namespace to something non-pvector.The short of it is that, depending on the order of eval and macroexpansion, and
if we have a (let ...) binding anywhere, we stand a good chance of coercing our
actual [vector] into a (persistent-vector vector) for eval sake, thus breaking
the expectation that literals will be preserved both for eval and code.**** Fix?
To get this working, I went down a couple of paths, including looking at
hacking into the compiler. Then I realized, if I can just tag the transformed
literals, I can identify them and invert the process inside of macros. You'd
end up with a ping-ponging of the custom eval bindings walker munging data literals
into evalable forms, then macros - prior to expansion - traversing their args and
coercing the forms back into actual vectors, maps, and sets, etc. Design wise
it's ugly, but performance wise should be neglible....So the eventual solution involves using the code walker from SBCL and a couple of
generic functions. We now - by convention - mark our literals with the
identity function =literal=, which just wraps the aforementioned constructor.We then define a function =recover-literals= that can walk a form and detect
invocations of =literal=, replacing them with the result of calling =symbolic=
on the subform, which evals the arg for =literal= and gives us back our
custom literal.=recover-literals= serves as an auxillary function used in the macro-defining-macro
=defmacro/literal-walker=, which will quietly bind the args for the macro definition
to a lexical binding where the args (inputs) are bound to their =recover-literals=
results. This basically ensures that, when using =defmacro/literal-walker=,
we always have access to data literals. Simultaneously, our literals now also
work seamlessly in =eval= with custom evaluation semantics, including in lexical
bindings.So far, all tests in examples are passing.
*** Revisiting Quasiquoting
The method I hacked together for quasiquoting does not work as intended with lexical
values in quoted literals. I did some naive variant that's bone-headed in hindsight,
where I tried to eval stuff at read-time to build an actual non-sexp literal datastructure.
Turns out, this is not what Clojure does. Clojure does depart from CL's quasiquotation
(via =syntax-quote=), in that for quoted literals, it emits the actual sexpr construction
of the literal. This is similar to my solution for working around lexically scoped
literals. The only challenge in SBCL is that the =`= reader macro defaults to emitting
a =quasiquote= form. So if we leverage the built-in stuff, we get the quoted form
of the sexpr vector constructors expanded from the original quasiquoted literal.The solution is to hack the =`= macro and rebind it to something that's smart enough
to detect that we're actually quoting a literal. From there, we should just emit the
expanded quasiquote result without the =quasiquote= form preceding it. Or otherwise,
expand the quoted result.Initial testing with persistent vectors is promising...almost there.
**** Nested Quasiquoting mostly works
Signficant overhaul to the old (wrong) qq infrastructure. Eliminated all
readtime eval stuff, we now emit sexprs consistent with Clojure's implementation (slightly
different in form though).We now have a cleaner quasiquotation model that's extensible as well.
*** Generalized usage of sequence lib
Ensured the sequences lib is portable enough to be used, sicne it shows up
quite a bit in clojure.core bootstrapping (e.g. macrology). Need to define
additional core functions that are seq friendly or coercing.*** Destructuring
Destructuring is pretty fundamental and shows up in a lot of places.
So common forms like =loop=, =for=, =let=, =fn=, =defmacro= and friends
all leverage it. We should be able to fairly straightforwardly port it
from clojure.core.**** Revisited Loop/Recur and variadic version
During porting of more core library functions over, I exposed bugs
in the extant labels/macrolet based version of function implementation.Did some experimenting and switched to the earlier loop-focused
=with-recur= macro. Did some more testing and experimentation,
and switch over to inlining arity bodies for multiple-arity
function definitions. So named functions work, recur works,
=loop= (without destructuring) is implemented (over =loop*=).Our functions now work fine with =lazy-seq= where before
they were creating infinite loops and later not
recognizing function symbols for self recur.*** ex-info
En route to =destructure=, I implemented exception handling
via =ex-info= and friends. =try-catch-finally= is a macro
away.*** TCO detecting false positives
Need to revisit the tail cail detection code that compiles
=with-recur= invocations, since we seem to be mistaking
simple =with-let= and =if-let= forms for non-tail
calls for some reason. For now, I've got warnings
rendering, where in reality we want an error to signal (prior
behavior).*** Rudimentary Parity
We're approaching basic functionality where I can just
copy/paste source from clojure.core and have it work without
issue. It turns out, this is a great way to explore the
range of language features from the ground up and implement
as needed. It might be possible to demo some advent of code
solutions in clojure in common lisp before too long...*** try
Defined try-catch-finally semantics from clojure.core/try in common-utils:try, which
clclojure.base exports. Trivial to implement using =handler-case= and =restart-case=.*** ~ and ~@
Realized that =sb-impl::comma-charmacro= handles this and splicing (@ form), so all
you have to do is add another reader macro binding for =~= using the same as =,=.*** cond
Implemented clojure's cleaner version of =cond=. Found a subtle error regarding
the definition and use of =cons=. Since we shadow =common-lisp:cons=, we ended up
with an infinite pingpong of the protocol extension of =-conj= for =cons= types.
Fix was to ensure we use =common-lisp:cons= for all necessary non-seq cons cell
methods.* Hurdles
A couple of big hurdles:
** Lisp1 vs Lisp2.
- rely on macros (like def) to unify symbol and function namespaces,
leveraging CL's environment.
- This is mostly implemented via =unified-let*= in lexical.lisp,
which is then used in =clclojure.base:defn= and elsewhere.
- The only thing missing is arbitrary support for keyword
access based on function application, which I have working
prototype of, but it's not valid outside of clojure macros.
- Better solution is to address this as a compiler pass
in the analyzer.- The longer route would be writing a custom eval, compiler, etc.
Doesn't look necessary at the moment.*** Lexical Scope vs. Top Level
So, Common Lisp as a lisp-2 has multiple namespaces,
foremost of which are function and symbol. We already have
the top-level (that is, null lexical environment) symbol
and function namespaces unified by using setf for symbol-function
to make it identical to symbol value....but.....
- Lexical environments don't work that way!
- symbol-function and symbol-value only work on non-lexical symbols.
- Initial hack was to unify the namespaces by traversing
the vars in the let bindings and unifying prior to continued binding.
- Had a false-positive success since the symbol modifications were
actually pointing to non-lexical scoped symbols (stuff from prior
REPL interactions).
- The fix for this is to use a combination of let* and labels, which CAN
unify the symbol-value and symbol-function namespaces for lexical vars..
- Defined a macro, unified-let*, that does this for us:
- We parse the bindings, detecting if any symbol points to a literal
keyword.
- We ensure keyaccess objects are compiled during macroexpansion,
which creates the plumbing for keyword access if it didn't
already exist
- and we create function definitions for the vars that point to
keywords, where the function bindings invoke the funcallable
keyword accessor directly.
- We need to apply this as an implicit body for fn forms as well..** Persistent structures.
- If we get defprotocol, deftype, etc. up and running,
these are already defined in CLJS core.
- For bootstrapping, using original port of Persistent Vector
from JVM clojure, also creating a dumb wrapper on top.
- Need to add meta field to struct, also atomic locks at
some point (unless cljs provides this....)
** Protocols.
- Already implemented as generic functions.
- Explore alternative implementations if generic functions aren't
speedy enough (doubtful, but haven't profiled anything).
- protocol definitions need to be namespace/package qualified.
- Looks like they are already, at the package level.
- Need better error messaging on missing protocols
- TODO: get-protocol should signal.
- Namespace qualified symbols needed.
- Currently there's a potential for collisions, since
the protocol registry (a hashtable) doesn't
assoc the protocol name with the package name
it was defined it.
- We can approximate qualified symbols using the
dot syntax, even though CL won't resolve those
symbols to a clojure namespace directly.
- Get variadic protocol implementation verified.
** Deftype.
- shadows CL deftype.
- current implementation follows defprotcol, appears to work for
non-varidiac protocol impls.
- Look at walking method impl bodies to eliminated unnecesary
slot bindings (currently we generally bind all non-shadowed
slots).
** Multimethods.
- Initial ideas for multiple dispatch following clojure's implementation.
** Metadata
- Symbols and collections (anything that supports IMeta) can have
a persistent map of metadata in clojure.
- Used to communicate a lot of information, including type hinting,
source code, documentation, etc.
- Current expectation is to leverage property lists for symbols,
and unify with generic functions behind a protocol.
** Namespaces
- CL has packages. Initial hack would be to leverage packages
as an anlogue to ns.
- Longer term solution is implement own ns system via objects.
- Rather leverage CL where possible.
- Currently implementation of def exports by default.
- Looking at introspection possibilites for more
easily tagging meta, arglists. custom function
objects (experimental) are looking good for this.
- Should we maintain a parallel collection of
namespaces?
- Based on def and derived forms, we ought to
be able to hook in and register stuff.
- Allows the reflection and introspection
we care about / use in clojure.
- Need to translate between require, refer, import (clojure)
and import (cl).
** Reader.
*** CL macros use , and ,@ in place of ~ and ~@ in Clojure.
We'll need to either cook the common lisp reader, or build a separate clojure
reader that will perform the appropriate replacements.
- Looks like a simple tweak to the ` reader macro will suffice,
we'll just flip the symbols as appropriate.
- TODO: quasitquoting in clojure (let []) inside of macros is
not splicing, need to revisit quoted-children.
- +ex `[,'a] => [,'A] (incorrect)+
- This now appears to be working, via =clclojure.readers:quasify=.
- Should be compatible if we switch reader macros for , and ,@ around
too.
*** @ is a literal for #'deref in clojure, comma is whitespace in clojure.
- Similar, we'll flip the read-table.
*** [] denote vectors
- already have a reader macro in reader.lisp
- vectors read correctly and obey clojure eval semantics.
- +There's an incorrect read going on, [x 2] will
currently read when it should throw since x
is not defined.+
- +TODO: revisite quoted-children for vectors and
the reader fn bracket-reader.+
- +If we're not reading a quoted
vec, we need to eval args to persistent-vector ctor.+
- Quasiquoting works except for splicing.
*** {} denote maps
- already have a reader macro in readers.lisp.
- Janky but useful bootstrapping implementation in
cowmap.lisp, which is a copy-on-write wrapper around
a CL hashtable.
- This should be enough to bootstrap, where clojure.core
actually defines a persistent map implementation for us.
- We can always opt for an optimized implementation
if it makes sense to go for a built-in.
- Similar problems with quasiquoting.
*** \#{} denote sets
- Easy enough to add a dispatch in reader.lisp
- Trivial implementation based on cowmap, pending.
*** ^ denotes metadata for the reader
- Trivial to implement as a reader macro, BUT, we need to get
metadata working generally.
- with-meta currently mutates symbol's plists. Ideally we'd need
a corresponding =alter-var-root!= to do that.
- need to implement metadata slots for other clojure structures (data literals, persistent lists).
*** \c vs. #\c for chars
- Added initial support for clj->cl, need to define
printable representation for chars as well.
- Current holdup is defining a print-method for
STANDARD-CHAR, which claims the class doesn't exist.
- Looking at *print-escape* and friends to see if
we can hack this. We may need our own printer,
outside of the provided REPL.
- defined a reader macro that coerces \c to #\c.
- Maybe less useful for bootstrapping.
*** reading/printing booleans
- Similar issues as characters. PAIP has a good
chapter on this for similar issues with Scheme.
*** .- field access
- wrap to calls for CLOS slot
*** ns.name/fn
- detect / in symbol name, coerce to
qualified internal symbol ns.name::fn
*** (.someMethod object)
- .hello is a valid function name in CL...
- Reader macro for .?
- need to incorporate . form ala: (. obj someMethod)
*** ::qualified-key
- :: is used as a delimiter for package symbols in CL.
- need to parse the symbol name and dispatch....
- ::blah -> :BLAH for now...CL reader eliminates
redundant colon.
- Should be simple to modify the reader macro for :
*** (aget obj field) from cljs...
- keep? This doesn't work the same in clj jvm...
** Keyword access
Keywords are applicable in Clojure. That means they show
up in the function position in forms. This won't
jive in CL directly.
- Possible option: reader macro for lists, detect
presence of keyword in function call positition,
if not quoted.
- Replace keyword with a funcallable thing that
has a print-method looking like a keyword?
- Or try to hack eval (dubious).
- custom read / eval / print....
Looks like we can just leverage he function namespace
to get around this....keywords are "just"
symbols....
- (defun :a (m) (gethash :a m))
- (defun (setf :A) (new-value m)
(setf (gethash :A m)
new-value))
- (defparameter ht (make-hash-table))
- (setf (:A ht) 3)
So the workaround is:
- Need a reader macro lists.
- If we see a keyword in the function position,
- we define a function for the keyword via defun.
- define a setf expander that provides hashmap
access (alternately, something that's not mutating).Looks like that may work inside the existing ecosystem....
Significant progress / experimentation in
clclojure.keyworfuncs. However, it's looking like,
to get "full" access (even with some tricky pseudo-keyword
class, macros, and the above suff), we're probably better
off hacking eval / compile.** Destructuring.
This may be a bit tricky, although there are a limited number of clojure forms.
- Given the goal is to read tools.analyzer and tools.reader, we may
get by with an un-prettied version of the original source that
elides destructuring (e.g. via macroexpansion on the CLJ side) into
something simpler to digest for clclojure's bootstrap.
** Seq library.
This shouldn't be too hard. I already have a lazy list lib prototype as well as
generic functions for the basic ops. I think I'll try to use the protocols
defined in the clojurescript version as much possible, rather than baking in a
lot of the seq abstraction in the host language like clojure does.
- For simple bootstrapping, this if fine, but we already get all of this
with the CLJS core implementation. So, get the minimums there and
gain everything else....** Strings
CL strings are mutable (character arrays), clj/cljs are not...
- Can we inherit from string to create an immutable type that
outlaws setf?
- I think most string operations (ala the sequence-based ones)
return copies (which are mutable).
- We can prevent setf in clojure mode, providing a pure API
for string manipulation...
- As well as impure....hmm** Regexes
Leverage CL-PPRCE
- check for reader macro collisions....** Compilation / Analysis
Currently, we project everything to (portable) CL, and let
the host do the dirty work for compilation/analysis.
- Future, port clojure.tools.analyzer
- Maybe use that for some of our efforts...
- If we have enough ported, look at using
clojure.tools.reader to help as well.* Usage
Currently just clone the repository. Assuming you have ASDF setup properly,
you should be able to evaluate (require 'clclojure) at your Lisp REPL and it'll
compile (if you're in the project dir in lisp).Alternately, you can compile and load the .asd file, then use quicklisp
to load it from the repl. (ql:quickload :clclojure)You currently get persistent vectors, literals, defprotocol, extend-protocol.
You can mess with the currently limited clojure evaluator in the
:clclojure.base package, although I'm moving in a different direction now
that I think I can explot CL better.You can see rudimentary examples and usage in the :clclojure.example
package (example.lisp). TODO: shift to named-readtables to keep
clclojure from hijacking your session. Right now, we take over the
REPL.....user beware!I'm a bit new to building Common Lisp projects, so the packaging will likely
change as I learn. At some point, if anything useful comes out of this
experiment, I may see if it can get pushed to quicklisp.* License
Eclipse Public License, just like Clojure.