An open API service indexing awesome lists of open source software.

https://github.com/antomfdez/brokm

brokm — a small, fast, embeddable HolyC-flavored general-purpose language (bytecode VM + mark-sweep GC)
https://github.com/antomfdez/brokm

Last synced: 7 days ago
JSON representation

brokm — a small, fast, embeddable HolyC-flavored general-purpose language (bytecode VM + mark-sweep GC)

Awesome Lists containing this project

README

          

# brokm

**brokm** — short for *"broken minded"* — is a small, fast, embeddable general-purpose
language with a clean [HolyC](https://en.wikipedia.org/wiki/TempleOS#HolyC)-flavored syntax.
It is built primarily for its author's own use, with four design goals: **small, fast, robust,
and easy to embed**.

This is **v1.0** — the first stable release: a working bytecode VM with a precise, **generational** mark-sweep garbage
collector, aggregate types (arrays + classes/structs **with methods**), an opt-in
**manual-memory** mode for low-level work, a **static type checker** that validates the HolyC type
annotations before any code runs, **typed bytecode** that specializes the int hot path, a
**baseline JIT** that compiles hot functions to native code (**arm64 + x86-64**, with a full
interpreter fallback), an **AOT compiler** (`brokm build`) that turns a program into a
standalone native executable, a **standard library** — native builtins (file I/O, strings,
math, maps, **scripting**: `Args`/`Env`/`Shell`/`Input`/`Time`/…) plus
[**`lib/std/` modules written in brokm**](lib/std/) (`#include "std/std.bk"`), a **C embedding
API** with **multi-instance VMs** (create any number of independent runtimes, register natives,
exchange values, call brokm from C), and **multi-file programs** via `#include` — enough that a
[compiler written in brokm](examples/realcc.bk) emits the VM's real bytecode and **compiles its
own source** (`make test-bootstrap`, byte-identical to the C compiler — see
[docs/ROADMAP.md](docs/ROADMAP.md)).

```holyc
// hello.bk — top-level code runs; a bare string statement prints.
"Hello World\n";

I64 Fib(I64 n)
{
if (n < 2) return n;
return Fib(n - 1) + Fib(n - 2);
}

I64 i = 0;
for (i = 0; i <= 10; ++i)
"Fib(%d) = %d\n", i, Fib(i);
```

## Build & run

Requires a C99 compiler and `make` (no external dependencies).

```sh
make # builds ./brokm (-std=c99 -Wall -Wextra, 0 warnings)
./brokm file.bk # run a program (JIT-compiled when hot)
./brokm run file.bk # same, explicit subcommand
./brokm build file.bk -o app # AOT-compile to a standalone native executable
./brokm build file.bk --emit=c # emit the generated C instead of an executable
./brokm build file.bk --freestanding # AOT-build with the reduced runtime profile
./brokm # start the REPL
make test # run the golden test suite
make test-aot # run the suite AOT-compiled to native executables
make test-freestanding # focused checks for the freestanding AOT profile
make test-gc # run the suite with the GC firing on every allocation
make bench # benchmark interpreter vs JIT vs AOT (Fib)
make debug # ASan/UBSan + bytecode/exec tracing build
```

`brokm build` translates the compiled bytecode to C and drives the system C
compiler (options: `-o `, `--emit=c`, `--keep-c`, `--freestanding`,
`--cc `, `--cflags ` for extra compiler/linker flags such as
`--target=...` or `-static`, `-O0`..`-O3`, `--verbose`, `--quiet`). The build needs the brokm source tree to
link the runtime: it looks next to the `brokm` executable, or wherever
`BROKM_HOME` points.

## Install & update

Everything lives in one place — `~/.brokm` is a clone of this repository
holding the binary, the standard library (`lib/`), and the runtime sources
`brokm build` links against. One script installs *and* updates:

```sh
sh install.sh # clone (or pull) ~/.brokm and build; re-run any time to update
```

Then add to your shell profile (the script prints these):

```sh
export BROKM_HOME="$HOME/.brokm"
export PATH="$BROKM_HOME:$PATH"
```

Uninstall with `rm -rf ~/.brokm`. Set `BROKM_HOME` before running the script
to install somewhere else.

## Scripting & the standard library

The native builtins cover scripting basics: `Args()` (command-line arguments),
`Env`, `Exit`, `Shell` (exit status), `ShellStr` (captured stdout), `Input`
(read a line), `Time`/`TimeMs`, `Sleep`, `ReadFile`/`WriteFile`/`AppendFile`,
`FileExists`. On top of them, [`lib/std/`](lib/std/) is a standard library
**written in brokm**, split into focused modules — `str`, `arr`, `io`, `path`,
`os` — that any script can pull in by name (resolved via `$BROKM_HOME/lib`,
falling back to `~/.brokm/lib`):

```holyc
#include "std/std.bk" // or just "std/str.bk", etc.

U0[] lines = ReadLines("/etc/hosts");
"%d hosts lines\n", Len(lines);

U0[] parts = StrSplit(EnvOr("PATH", ""), ":");
"%s\n", StrJoin(ArrSortStr(parts), "\n");

if (Len(Args()) > 0 && Args()[0] == "--touch")
AppendLine("log.txt", "ran at " + ToStr(Time()));
```

brokm has one flat namespace, so each module prefixes its public names
(`Str*`, `Arr*`, `Path*`…). See [docs/CODEMAP.md](docs/CODEMAP.md) for the
full listing.

## Language at a glance

- **Types** (HolyC-style): `U0 U8 U16 U32 U64 I8 I16 I32 I64 F64 Bool`; default integer is `I64`.
- **Type-first declarations**: `I64 x = 5;` `F64 r = 3.14;`
- **Top-level code executes** — no `main()` required.
- **Functions**: `I64 Add(I64 a, I64 b) { return a + b; }`
- **Printing**: a bare string statement prints, with printf-style args:
`"x = %d\n", x;` — or call `Print(...)`.
- **Arrays**: dynamic, heap-allocated — `I64[] a = [1, 2, 3]; a[0] = 9;` with `Len`/`Append`.
- **Classes/structs**: `class Point { I64 x; I64 y; }`, `Point p = Point(3, 4); p.x = 9;`
(reference semantics).
- **Methods**: declare `I64 Area() { return this.w * this.h; }` in the class body and call
`r.Area()`; the receiver is `this`.
- **Manual memory**: `U0 b = MAlloc(32); PokeI64(b, 0, 42); Free(b);` — raw, GC-invisible
buffers with typed peek/poke, plus `GcDisable`/`GcEnable`.
- **Maps**: string-keyed hash maps — `U0 m = MapNew(); MapSet(m, "k", 1); MapGet(m, "k");`
with `MapHas`/`MapDelete`/`MapLen`/`MapKeys`.
- **Control flow**: `if/else`, `while`, `for`, `do/while`, `switch/case/default`,
`break`, `continue`, `return`.
- **Multi-file**: `#include "lib.bk"` — textual, include-once, resolved relative to the file,
then `$BROKM_HOME/lib`, then `~/.brokm/lib` (where the standard library lives).
- **Operators**: `+ - * / %`, comparisons, `&& || !`, bitwise `& | ^ ~ << >>`,
assignment + compound (`+=` …), `++ --`.

Full reference: [docs/SYNTAX.md](docs/SYNTAX.md).
Internals: [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).

## Embedding

```c
#include "brokm.h"

static BrokmValue HostAdd(int argc, const BrokmValue *args) {
return brokm_int(brokm_as_int(args[0]) + brokm_as_int(args[1]));
}

int main(void) {
BrokmVM *vm = brokm_new(); /* an independent runtime */
brokm_register(vm, "HostAdd", HostAdd); /* expose a C native */
brokm_eval(vm, "I64 Triple(I64 n) { return HostAdd(n, HostAdd(n, n)); }");

BrokmValue args[1] = { brokm_int(7) }, result;
brokm_call(vm, "Triple", 1, args, &result); /* call brokm from C */
/* result == 21; also: brokm_get_global / brokm_set_global, string exchange */

brokm_free(vm);
return 0;
}
```

The API is **instance-based** (v0.11): a host may create any number of independent VMs — each
with its own heap, globals, interned strings, and GC — and use or destroy them in any order.
It lets a host **register native functions** (per VM), **exchange values** (ints, floats, bools,
strings), **read/set globals**, and **call brokm functions from C**. See
[`examples/embed.c`](examples/embed.c) for a complete program, including two VMs running side by
side (`make embed && ./embed-demo`).

## Editor support

Syntax highlighting for `.bk` files ships under [`editors/`](editors/):

- **Neovim / Vim** — drop-in regex syntax (`editors/nvim/`), no build step.
- **Sublime Text** — a `.sublime-syntax` (`editors/sublime/`), no build step.
- **Zed** — a Tree-sitter extension (`editors/zed/`).
- **Tree-sitter grammar** (`editors/tree-sitter-brokm/`) powers Zed and the
`nvim-treesitter` option; it parses every `.bk` file in this repo with no errors.

Install instructions for each editor are in [`editors/README.md`](editors/README.md).

## broked — an editor written in brokm

[**broked**](https://github.com/antomfdez/broked) is a vim-style terminal code
editor written entirely in brokm: modal editing (normal, insert, visual,
command), operators (`dw`, `cw`, `d$`, …), counts, registers, undo, search,
an ex command line, line numbers, and syntax highlighting for `.bk` files.
brokm has no raw-tty natives, so its terminal layer is built from the
scripting stdlib alone — `Shell`/`ShellStr` driving `stty` and `dd`, with
`ShellStr`'s stdout flush doubling as the frame flush — and `brokm build`
turns it into a standalone native executable. It comes with a headless test
suite that drives the key dispatcher with no terminal attached.

```sh
brokm broked.bk file.bk # run from source
brokm build broked.bk -o broked # or AOT-compile the editor itself
```

## Status

brokm runs real programs (arithmetic, variables, control flow, functions, recursion, strings,
**dynamic arrays**, **classes/structs**, **manual memory**, printing) on a stack bytecode VM,
with a precise **generational** mark-sweep collector (minor + major, young→old promotion) whose
write barrier is exercised by array and field mutation and verified red/green. A **static type
checker** validates the type annotations between parse and compile — catching arity, argument,
field, and class-type errors before execution — using gradual typing so existing programs are
unaffected. The compiler emits **typed bytecode**: int-specialized arithmetic and comparison
opcodes on the hot path, with a runtime guard that deopts to the generic handler so gradual
typing stays correct. A **baseline JIT** (macOS + Linux, **arm64 + x86-64**) compiles hot functions to
native code in `mmap`'d executable pages — profile-gated, with inlined integer
arithmetic/comparisons, branches, and recursion, deopt guards for correctness, and a full
interpreter fallback for ineligible functions and every other platform. It runs the recursive
`Fib` benchmark ~2.5× faster than the interpreter on each architecture (`make bench`). A **standard
library** of native builtins covers file I/O (`ReadFile`/`WriteFile`/`PrintErr`), strings
(`CharAt`/`Chr`/`Substr`/`IndexOf`/`ToInt`/`ToStr`), and math
(`Abs`/`Min`/`Max`/`Sqrt`/`Pow`/`Floor`/`Ceil`) — enough that `examples/lexer.bk` tokenizes and
`examples/calc.bk` parses + evaluates brokm-flavored source written in brokm, the first steps
toward self-hosting. **String-keyed maps** (`MapNew`/`MapGet`/`MapSet`/…) add the symbol-table
primitive a self-hosted compiler needs, with their mutations exercising the write barrier
red/green. Classes carry **methods** (`obj.m(args)` with an implicit `this`, dispatched through a
new `OP_INVOKE`), so a compiler's data types can hold behavior. A **C embedding API** lets a host
register native functions, exchange scalar and string values, read/set globals, and call brokm
functions from C (`examples/embed.c`). **`#include`** splits a program across files — textual,
include-once, resolved relative to each file. Putting it together, **`examples/realcc.bk` is a
compiler written in brokm** (lexer → parser → code generator) that emits the C VM's **real
bytecode** and runs it directly — and it now **compiles its own complete source**: the
self-compiled compiler produces byte-identical output to the C compiler (`make test-bootstrap`).
**Multi-instance VMs** (v0.11) made the runtime instance-based: any number of independent VMs per
process, each with its own heap, globals, and collector. **Portability + CI** (v0.11.1) brought
the full matrix — including both JIT backends — to **Linux**, verified by a GitHub Actions matrix
(macOS arm64 + Linux x86-64, warnings as errors) on every push. The **AOT compiler** (v0.12)
turns bytecode into C on the same helper ABI the interpreter and JIT share, producing standalone
native executables. The **scripting stdlib** (v0.13) added OS/process natives and the
`lib/std/` modules written in brokm itself, installed and updated as one unit by `install.sh`.
**Optimized AOT** (v0.14) made the emitted C cache the VM stack top in a C local and call
AOT-compiled callees directly — AOT binaries now beat the JIT on `make bench` — and AOT
executables link only the runtime core (no parser/typechecker/compiler inside). Next up: a
`--freestanding` runtime profile, the path toward booting a kernel written in brokm — with a
runtime in brokm as the long-term star. See the roadmap.

## License

TBD by the author.