https://github.com/karpeleslab/kataan
https://github.com/karpeleslab/kataan
Last synced: about 14 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/karpeleslab/kataan
- Owner: KarpelesLab
- License: mit
- Created: 2026-06-04T00:16:56.000Z (12 days ago)
- Default Branch: master
- Last Pushed: 2026-06-11T03:08:46.000Z (5 days ago)
- Last Synced: 2026-06-11T03:14:33.831Z (5 days ago)
- Language: Rust
- Size: 6.26 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Roadmap: ROADMAP.md
Awesome Lists containing this project
README
# Kataan
A high-performance **JavaScript (ECMAScript) engine written in pure Rust**, with
no foreign code on the critical path. Kataan is usable three ways — as a Rust
library, as a C library, and as a standalone command-line tool — the same
tri-modal model proven out in the sibling projects
[`purecrypto`](https://github.com/KarpelesLab/purecrypto) (cryptography) and
[`rsurl`](https://github.com/KarpelesLab/rsurl) (HTTP/curl).
> **Status: running and broadly conformant; advanced tiers in active build-out.**
> The lexer and the full ECMAScript parser are complete, and **two execution
> engines** run real programs and are checked to agree on every test:
>
> - a **tree-walking interpreter** (the default / corpus engine), and
> - a **register bytecode VM** (the primary path for `kataan run` and the C ABI),
> compiling nearly all of the common language directly — every operator,
> objects/arrays, method calls with `call`/`apply`/`bind`, `new`/`new.target`,
> all loops + `for-of`/`for-in`/`switch`/`try`-`catch`-`finally`,
> closures (incl. mutual recursion), destructuring, rest/spread, **classes**
> with `extends`/`super` and getters/setters, generators (incl. `yield*` and
> `.throw()`), and `async`/`await` — falling back to the tree-walker for the
> handful of constructs it doesn't yet compile.
>
> A **dual-path Test262-style conformance corpus (520/520) passes on both
> engines**, covering closures, classes/inheritance (incl. `extends` of native
> errors), optional chaining, the iterator protocol, `Map`/`Set`/`WeakMap`,
> `Symbol` (incl. `Symbol.hasInstance`), `BigInt`, `Promise` + async/await,
> `Proxy`/`Reflect` (incl. the `ownKeys` trap driving `Object.keys`/`values`/
> `entries`/`for-in`), typed arrays, `Date`, an in-house `RegExp`, and a large
> standard library (Math, JSON, Object/Array/String/Number). Compiled bytecode can
> be serialized, reloaded, and run without the source.
>
> Three advanced tiers are real and tested, though each has named work remaining:
>
> - a **machine-code JIT** (x86-64 / Linux, behind `jit`) with an optimizing
> integer path (four-pass optimizer + register allocator) and a float path
> covering `+ - * / %`, comparisons, control flow, and the SSE-expressible
> `Math` intrinsics (`sqrt`/`abs`/`min`/`max`/`floor`/`ceil`/`trunc`), emitting
> into W^X memory via raw syscalls; object/string ops stay interpreted;
> - a pure-Rust, `no_std` **WebAssembly engine** — full MVP plus sign-extension,
> saturating conversion, bulk-memory, multi-value, and typed structured
> control — with a JS↔WASM boundary (`validate`/`compile`/`instantiate`, the
> `Module`/`Instance`/`Global`/`Memory` objects, host-function imports, and
> stateful instances), driven by a `.wast`/WAT spec harness (a spec-derived
> corpus, not yet the full upstream suite);
> - a **zero-copy "D′" snapshot tier** atop the moving GC: a verified codec that
> `mmap`-reloads a heap (eleven reference cell kinds, cross-kind cycles,
> insertion-order-preserving) and runs a restored closure both in place and
> reloaded into a fresh runtime.
>
> Kataan works as a CLI/REPL, a Rust library, and a C library (`kt_eval`). See
> the [roadmap](ROADMAP.md) for the remaining road to a complete engine.
## Why
Modern JavaScript engines (V8, JavaScriptCore, SpiderMonkey) all rely on the
same handful of techniques. Kataan commits to the full set from the
architecture stage rather than retrofitting them:
- **NaN-boxed values** — every JS value in 64 bits, `Copy`, dense on the stack.
- **Hidden classes (shapes) + inline caches** — property access becomes a slot
load, not a hash probe; the single biggest lever for real-world JS speed.
- **Register-based bytecode VM** — fewer instructions than a stack VM, and
JIT-friendly by construction.
- **Interned atoms + rope strings** — O(1) key comparison, non-quadratic
string building.
- **A precise, generational, moving GC** — bump allocation makes `new` nearly
free.
- **Tiered execution** — a fast interpreter first, then a baseline JIT, then an
optimizing JIT driven by inline-cache type feedback.
The language core is **sans-I/O** and `no_std + alloc`; the host runtime (event
loop, timers, `fetch`, `crypto`, modules) is a separate layer on top, so the
engine stays embeddable. See [`ROADMAP.md`](ROADMAP.md) for the road ahead — the
remaining work to a complete JS+WASM engine and the design invariants behind it.
## Pure Rust, no foreign code
Kataan depends on no C libraries. Where it needs cryptography or networking it
reuses sibling **pure-Rust** Karpelès Lab crates:
- [`purecrypto`](https://github.com/KarpelesLab/purecrypto) — `crypto.subtle` /
WebCrypto, `crypto.getRandomValues`, `randomUUID`, and TLS.
- [`rsurl`](https://github.com/KarpelesLab/rsurl) — HTTP/HTTPS transport behind
`fetch` and the Node `http(s)` compatibility layer.
`unsafe` is quarantined: the crate is `unsafe_code = "deny"` (not `forbid`),
and only the `ffi` module plus a small, audited set of VM hot-path primitives
opt back in with a scoped `#[allow(unsafe_code)]` and a safety comment.
## Try it
The CLI runs JavaScript today:
```console
$ cargo run -- run -e '
class Animal { constructor(n){ this.n = n } speak(){ return `${this.n} makes a sound` } }
class Dog extends Animal { speak(){ return `${this.n} barks` } }
console.log(new Dog("Rex").speak());
console.log([1,2,3,4].filter(x => x % 2).map(x => x*x).reduce((a,b)=>a+b, 0));
console.log(JSON.stringify({ ok: true, items: [...new Set([1,1,2,3])] }));
'
Rex barks
10
{"ok":true,"items":[1,2,3]}
```
It also exposes each pipeline stage, and an interactive REPL:
```console
$ cargo run -- lex -e 'x => x * 2' # token stream
$ cargo run -- parse -e 'x => x * 2' # AST dump
$ cargo run -- disasm -e '1 + 2 * 3' # register bytecode
$ cargo run -- repl # interactive session
$ cargo run -- --help
```
The `disasm` command shows the register bytecode the compiler emits:
```console
$ cargo run -- disasm -e 'let s = 0; let i = 0; while (i < 3) { s += i; i += 1; } s'
chunk #0 "" (regs=14, params=0)
0 LoadInt r0, 0
...
6 Lt r6, r4, r5
7 JumpIfFalse r6, +9
...
16 Jump -13
18 Return r13
```
## Use as a Rust library
```rust
use kataan::parser::Parser;
use kataan::interp::Interp;
let program = Parser::parse_program("const sq = x => x * x; sq(8)").unwrap();
let mut interp = Interp::new();
assert_eq!(interp.run(&program).unwrap().to_js_string(), "64");
```
The lower stages are available directly too:
```rust
use kataan::lexer::{Lexer, TokenKind};
let tokens = Lexer::new("let answer = 42;").tokenize().unwrap();
assert_eq!(tokens.first().unwrap().text("let answer = 42;"), "let");
assert_eq!(tokens.last().unwrap().kind, TokenKind::Eof);
```
### Feature flags
| Feature | Default | Description |
|-----------|:-------:|--------------------------------------------------------------------|
| `std` | ✓ | Standard library; implies `alloc`. Needed by the host runtime/CLI. |
| `alloc` | ✓ | Heap-backed types; the minimum for the pure language core. |
| `regex` | ✓ | In-house regular-expression engine. |
| `intl` | ✓ | In-house `Intl`-lite (collation, number/date formatting). |
| `module` | ✓ | ESM + CommonJS module loader. |
| `host` | ✓ | Host runtime: event loop, timers, console, encoding, URL, streams. |
| `fetch` | | `fetch` / Node `http(s)` over `rsurl`. |
| `crypto` | | `crypto.getRandomValues` / WebCrypto over `purecrypto`. |
| `jit` | | Machine-code JIT (x86-64/Linux): optimizing integer + float paths. |
| `ffi` | | The C ABI (the only place broad `unsafe` is allowed). |
| `cli` | ✓ | The `kataan` command-line tool. |
Build the bare `no_std` language core with:
```console
cargo build --no-default-features --features alloc
```
## Use as a C library
```console
cargo rustc --lib --release --features ffi --crate-type staticlib # libkataan.a
cargo rustc --lib --release --features ffi --crate-type cdylib # libkataan.so
```
The header is [`include/kataan.h`](include/kataan.h); a runnable example lives
in [`tests/ffi_smoke.c`](tests/ffi_smoke.c). The C ABI follows the `purecrypto`
conventions — `KtStatus` return codes, the in/out length convention, opaque
handles, and a panic catch at every boundary.
## License
MIT © 2026 Karpelès Lab Inc. See [LICENSE](LICENSE).