https://github.com/tjvr/nefarious
my Nefarious Scheme
https://github.com/tjvr/nefarious
programming-language rpython
Last synced: 5 months ago
JSON representation
my Nefarious Scheme
- Host: GitHub
- URL: https://github.com/tjvr/nefarious
- Owner: tjvr
- Created: 2016-09-10T10:25:22.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-06-30T15:39:39.000Z (over 8 years ago)
- Last Synced: 2025-04-09T17:31:39.224Z (10 months ago)
- Topics: programming-language, rpython
- Language: Python
- Homepage:
- Size: 289 KB
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
_Nefarious Scheme_.
===================
A programming language with:
* **mutable syntax**.
* an efficient yet dynamic parser.
* function _syntax_ rather than function names.
* equivalence between types and CFG non-terminals.
* lexical scope.
* CFG rule priority: newest takes precedence [so shadowing works!].
* an incremental hybrid GC.
* ~~a fast tracing JIT bytecode VM (based on PyPy).~~ [not yet—compilers are hard!]
Written in RPython; compiles to native code (via C), using the [RPython
toolchain](https://rpython.rtfd.io/). But it can also run on top of a standard
Python interpreter (albeit slower).
**_Work in progress._**
Overview
--------
Nefarious is a text-based programming language. It has mutable syntax: the language grammar can be extend at runtime.
The idea is to do away with DSLs and operator overloading and so on, and just have fully general function syntax.
Here's a quick (and poorly chosen) example:
define Int:a + Int:b { return (INT_ADD a b) }
define Int:a - Int:b { return (INT_SUB a b) }
define if Bool:test then Block:tv else Block:fv { ... }
define fib Int:n {
if n < 2 then { return 1 } else { return fib (n - 1) + fib (n - 2) }
}
It has a very simple tokenisation stage: it separates out newlines, whitespace, individual punctuation characters, and strings of digits; anything that's left tokenises as a WORD.
I use a sophisticated Earley parser; this allows me to be flexible and extend the grammar during parsing.
When a variable declaration is encountered, eg:
```
let x = 42
```
the parser adds a new production `Int -> x` to the grammar.
Blocks `{ }` have their own scope. When we enter a block, we save the current grammar onto a stack; upon exiting the block we pop its rules. In this way the parser gives us lexical scope and variable shadowing for free.
Functions are defined not with *names*, but with a list of *symbols*. (This gets converted into a CFG rule.) eg:
```
define fib Int:n { ... }
```
The function `fib _` has one argument slot, named `n`, of type `Int`.
Upon entering the function's body, the parser pushes its arguments onto the stack; like we did variables.
After parsing the entire body, the function is type-checked, and a new rule added to the grammar; in this case `Int -> 'fib' Int`.
In this way, the parser builds up a Scheme-like AST for the whole program file.
Just like Scheme, we could support macros which are evaluated at compile-time (after parse-time).
To make all this manageable, we enforce an equivalence between non-terminals (in the CFG) and types (in the language's type system). So the LHS of a production is always its type: Int/Bool/Text/whatever. There are special types for Line and Block and Program.
This is done to help resolve ambiguity; there's no point accepting parses that won't type-check, when there are other parses that will.
Although moving type-checking into the parser may turn out to be horrible to use in practice!
The other tool for resolving ambiguity is ordered choice; if two different productions result in the same non-terminal, the one defined most recently always wins. (This is why shadowing works.)
There's some extra magic to handle parametric types/rules — eg `T -> if Bool then T else T`, or `List T -> T ',' T` [Except for left-recursive parametrics, eg. T -> T, which turns out to be iffy.]
This is all then compiled to bytecode for a custom VM. My plan is for the "core" language to just define rules for emitting the bytecodes; defining labels & jumps; and handling functions and name bindings; and then everything else can be implemented in the language itself, including control flow and all the built-in syntax.
The idea, after all, was to do away with DSLs! I imagine there would be standard "preambles" which define a nice language to work with. Maybe even specialised ones for different
domains (science, math)?
I've omitted a few details (eg. upvars, optional whitespace), but this overview is too long as it is!
There are a range of fun extensions to the language and/or compiler: I could add an optimising compiler to the VM, and do flow analysis/SSA. Implementing closures properly could be fun. And the language would rather benefit from unevaluated argument types: `define while (Uneval Bool):test do Block:body`?
Install
-------
**Ubuntu**: you'll need the following before `make` will work (of course, you might have
them already):
sudo apt-get install build-essential git unzip python
For making the JIT (`make nfsj`), you'll need to run:
sudo apt-get install libffi-dev pkg-config
Installing `pypy` is recommended (builds might be faster).