https://github.com/lloyal-ai/lloyal.node

Covalent inference for Node.js
https://github.com/lloyal-ai/lloyal.node
beam-search inference llama-cpp test-time-scaling
Last synced: 22 days ago
JSON representation
Covalent inference for Node.js
Host: GitHub
URL: https://github.com/lloyal-ai/lloyal.node
Owner: lloyal-ai
License: apache-2.0
Created: 2025-11-09T12:12:17.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-02-14T04:19:11.000Z (about 1 month ago)
Last Synced: 2026-02-14T12:57:26.157Z (about 1 month ago)
Topics: beam-search, inference, llama-cpp, test-time-scaling
Language: C++
Homepage: https://lloyal-ai.github.io/lloyal.node/
Size: 2.44 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # lloyal.node

[![Build & Test](https://github.com/lloyal-ai/lloyal.node/actions/workflows/tests.yml/badge.svg)](https://github.com/lloyal-ai/lloyal.node/actions/workflows/tests.yml)

[![GPU Tests](https://github.com/lloyal-ai/lloyal.node/actions/workflows/gpu-test.yml/badge.svg)](https://github.com/lloyal-ai/lloyal.node/actions/workflows/gpu-test.yml)

[![npm](https://img.shields.io/npm/v/@lloyal-labs/lloyal.node.svg)](https://www.npmjs.com/package/@lloyal-labs/lloyal.node)

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

[![llama.cpp](https://img.shields.io/badge/llama.cpp-b8087-green.svg)](https://github.com/ggml-org/llama.cpp/releases/tag/b8087)

**Native backend for the lloyal inference platform.**

Prebuilt llama.cpp binaries for 13 platform/GPU combinations, exposing a `SessionContext` that powers the [`@lloyal-labs/sdk`](https://github.com/lloyal-ai/sdk) inference primitives (Branch, BranchStore, Session, Rerank) and [`@lloyal-labs/lloyal-agents`](https://github.com/lloyal-ai/sdk/tree/main/packages/agents) multi-agent framework. Built on [liblloyal](https://github.com/lloyal-ai/liblloyal), a header-only C++20 inference kernel for llama.cpp.

All SDK and agent exports are re-exported from this package for convenience — `import { Branch, runAgents } from "@lloyal-labs/lloyal.node"` works out of the box.

## Install

```bash

npm install @lloyal-labs/lloyal.node

```

Prebuilt binaries for 13 platform/GPU combinations. GPU selection at runtime, not install time.

| Platform | Arch  | Acceleration        |

| -------- | ----- | ------------------- |

| macOS    | arm64 | Metal               |

| macOS    | x64   | CPU                 |

| Linux    | x64   | CPU / CUDA / Vulkan |

| Linux    | arm64 | CPU / CUDA / Vulkan |

| Windows  | x64   | CPU / CUDA / Vulkan |

| Windows  | arm64 | CPU / Vulkan        |

## Quick Start

```javascript

import { createContext } from "@lloyal-labs/lloyal.node";

import { Branch, BranchStore } from "@lloyal-labs/sdk";

const ctx = await createContext({ modelPath: "./model.gguf", nSeqMax: 4 });

const store = new BranchStore(ctx);

const root = Branch.create(ctx, 0, { temperature: 0.8 });

await root.prefill(await ctx.tokenize("Explain quantum entanglement"));

// Fork and generate — all branches in lockstep, 1 GPU call per step

const branches = await Promise.all([root.fork(), root.fork(), root.fork()]);

for (;;) {

  const live = branches.filter((b) => !b.disposed);

  if (!live.length) break;

  const produced = live.map((b) => ({ b, ...b.produce() }));

  for (const p of produced.filter((p) => p.isStop)) await p.b.prune();

  const items = produced

    .filter((p) => !p.isStop)

    .map((p) => {

      p.b.accept(p.token);

      return [p.b, p.token];

    });

  await store.commit(items);

}

```

Or for single-branch generation, Branch is an async iterable:

```javascript

for await (const { token, text } of branch) {

  process.stdout.write(text);

}

```

See [`@lloyal-labs/sdk`](https://github.com/lloyal-ai/sdk) for the full Branch API, continuous tree batching, KV tenancy, and topology documentation.

### Without the SDK

`createContext` returns a `SessionContext` — the native interface to llama.cpp. You can use it directly without the SDK's Branch/BranchStore layer:

```javascript

import { createContext } from "@lloyal-labs/lloyal.node";

const ctx = await createContext({ modelPath: "./model.gguf", nSeqMax: 4 });

// Chat templates — model-agnostic formatting + tool calling

const { prompt, grammar, format } = await ctx.formatChat(messages, {

  addGenerationPrompt: true,

  tools: [{ type: "function", function: { name: "search", parameters: schema } }],

});

const { content, toolCalls } = await ctx.parseChatOutput(output, format);

// Branch primitives — what the SDK's Branch class wraps

const handle = ctx._branchCreate(0, samplerParams);

await ctx._branchPrefill(handle, tokens);

const token = ctx._branchSample(handle);

const text = ctx.tokenToText(token);

const isStop = ctx.isStopToken(token);

ctx._branchAccept(handle, token);

const logits = ctx._branchGetLogits(handle);     // Float32Array(vocabSize)

const entropy = ctx._branchModelEntropy(handle);

const child = ctx._branchFork(handle);

// Store primitives — what the SDK's BranchStore wraps

await ctx._storeCommit([handle1, handle2], [tok1, tok2]);  // N branches, 1 GPU call

await ctx._storePrefill([handle], [tokens]);

await ctx._storeRetainOnly(winner);

const available = ctx._storeAvailable();

// KV cache — snapshot, copy, persist

await ctx.kvSeqCopy(0, 1);                      // share prefix across sequences

await ctx.kvCacheSave();                         // snapshot for rollback

await ctx.kvCacheLoad();                         // restore checkpoint

await ctx.kvCacheWriteFile("cache.bin");         // persist to disk

// Embeddings

const embeddings = await ctx.encode("query text");

const dim = ctx.getEmbeddingDimension();

// Grammar + tokenizer

const grammar = await ctx.jsonSchemaToGrammar(schema);

const tokens = await ctx.tokenize("Hello world");

const sep = await ctx.getTurnSeparator();

```

## What This Package Provides

**Native-only** (not in SDK):

- `createContext(options)` — load a GGUF model, return a `SessionContext`

- `loadBinary(options?)` — explicit GPU variant selection with automatic fallback

- Prebuilt binaries for 13 platform/GPU combinations

**Re-exported from [`@lloyal-labs/sdk`](https://github.com/lloyal-ai/sdk):**

- `Branch`, `BranchStore`, `Session`, `Rerank`

- Per-token metrics: `modelEntropy()`, `modelSurprisal()`, `samplingPerplexity`

- Chat formatting: `formatChat()`, `parseChatOutput()`

- Grammar: `jsonSchemaToGrammar()`, `setGrammar()`

**Re-exported from [`@lloyal-labs/lloyal-agents`](https://github.com/lloyal-ai/sdk/tree/main/packages/agents):**

- `runAgents`, `useAgentPool`, `generate`, `diverge`, `createToolkit`

- Structured concurrency DAG via Effection generators

- In-loop orchestration: agents as branches of a single running process

## GPU Variant Selection

```javascript

import { loadBinary, createContext } from "@lloyal-labs/lloyal.node";

// Automatic — uses Metal on macOS, CPU elsewhere

const ctx = await createContext({ modelPath: "./model.gguf" });

// Explicit CUDA

const binding = loadBinary({ gpuVariant: "cuda" });

const ctx = await binding.createContext({ modelPath: "./model.gguf" });

// Falls back to CPU with a warning if CUDA runtime not available

```

## Examples

| Example                           | Pattern                                           |

| --------------------------------- | ------------------------------------------------- |

| [`entropy/`](./examples/entropy/) | `modelEntropy()` mid-generation as control signal |

| [`chat/`](./examples/chat/)       | Interactive streaming chat                        |

| [`embed/`](./examples/embed/)     | Text embeddings extraction                        |

```bash

npx tsx examples/best-of-n/best-of-n.ts

npx tsx examples/chat/chat.ts ./model.gguf

```

## CI Testing

Integration tests run real inference across architectures:

| Architecture | Test Model   | Template |

| ------------ | ------------ | -------- |

| Llama        | Llama 3.2 1B | llama3   |

| Phi          | Phi 3.5 Mini | phi3     |

| Qwen         | Qwen 3 1.7B  | chatml   |

| Gemma        | Gemma 3 1B   | gemma    |

| SmolLM       | SmolLM2 1.7B | chatml   |

| Ministral    | Ministral 3B | mistral  |

See [distribution.md](docs/distribution.md) for details.

## Ecosystem

| Package                                                                                    | Description                                                                  |

| ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------- |

| [`@lloyal-labs/sdk`](https://github.com/lloyal-ai/sdk)                                     | Backend-agnostic inference primitives (Branch, BranchStore, Session, Rerank) |

| [`@lloyal-labs/lloyal-agents`](https://github.com/lloyal-ai/sdk/tree/main/packages/agents) | Multi-agent framework — in-loop orchestration via structured concurrency     |

| [liblloyal](https://github.com/lloyal-ai/liblloyal)                                        | Header-only C++20 inference kernel for llama.cpp                             |

| **lloyal.node**                                                                            | This package — native backend + prebuilt binaries                            |

| [nitro-llama](https://github.com/lloyal-ai/nitro-llama)                                    | React Native backend via Nitro Modules                                       |

| [tsampler](https://github.com/lloyal-ai/tsampler)                                          | Reference sampler implementation                                             |

## Contributing

See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup and release process.

## License

Apache 2.0 — See [LICENSE](./LICENSE) for details.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lloyal-ai/lloyal.node

Awesome Lists containing this project

README