https://github.com/jhsu/ai-rlm

A TypeScript implementation of the RLM (Recursive Language Model) inference strategy using the AI SDK
https://github.com/jhsu/ai-rlm
ai ai-sdk rlm
Last synced: 2 months ago
JSON representation
A TypeScript implementation of the RLM (Recursive Language Model) inference strategy using the AI SDK
Host: GitHub
URL: https://github.com/jhsu/ai-rlm
Owner: jhsu
License: mit
Created: 2026-02-14T00:08:57.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-14T13:19:28.000Z (3 months ago)
Last Synced: 2026-03-14T17:21:29.224Z (3 months ago)
Topics: ai, ai-sdk, rlm
Language: TypeScript
Homepage: https://www.npmjs.com/package/ai-rlm
Size: 133 KB
Stars: 4
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

README

          # ai-rlm

[![npm](https://img.shields.io/npm/v/ai-rlm?style=for-the-bdage)](https://www.npmjs.com/package/ai-rlm)

RLM (Recursive Language Model) provided via ai-sdk Agent or tool.

Based on the paper "Recursive Language Models" by Zhang, Kraska, and Khattab (2025).

## Overview

RLM is an inference strategy where LLMs treat long contexts as part of an external environment rather than feeding them directly to the model. The LLM writes JavaScript code to programmatically examine, decompose, and recursively call sub-LLMs over snippets.

### Key Features

- **Iterative Code Execution**: The model writes JavaScript code, sees output, then writes more code

- **Sub-LLM Queries**: Access to `llm_query()` and `llm_query_batched()` for semantic analysis

- **Context Management**: Efficient handling of large contexts through chunking

- **Sandboxed REPL**: JavaScript execution in a sandboxed QuickJS WebAssembly context

- **Pluggable Sandbox Interface**: Swap the execution environment with your own sandbox implementation

- **AI SDK Integration**: Works as an Agent or Tool with the Vercel AI SDK

- **Multiple Usage Patterns**: Use as standalone agent or as a tool in larger workflows

## Installation

```bash

npm install ai-rlm ai zod @ai-sdk/openai

```

`ai` and `zod` are peer dependencies and must be installed in your project.

The `model` and `subModel` settings accept any AI SDK `LanguageModel` — use any provider ([OpenAI](https://sdk.vercel.ai/providers/ai-sdk-providers/openai), [Anthropic](https://sdk.vercel.ai/providers/ai-sdk-providers/anthropic), [Google](https://sdk.vercel.ai/providers/ai-sdk-providers/google-generative-ai), etc.).

## Usage

### As Agent (Recommended)

The **RLMAgent** class provides a clean, agent-based API that integrates seamlessly with the AI SDK:

```typescript

import { RLMAgent } from 'ai-rlm';

import { openai } from '@ai-sdk/openai';

// Create agent

const agent = new RLMAgent({

  model: openai('gpt-4.1'),              // Root agent model

  subModel: openai('gpt-4.1-mini'),      // Sub-LLM model for queries

  maxIterations: 20,                      // Max REPL iterations

  maxLLMCalls: 50,                        // Max sub-LLM calls

});

// Process a context

const context = `

  The quick brown fox jumps over the lazy dog.

  The magic number is 42.

`;

const query = 'What is the magic number?';

const result = await agent.generate({

  prompt: query,

  options: { context },

});

const rlmResult = result.output;

console.log('Answer:', result.text);

console.log('Iterations:', rlmResult.iterations);

console.log('LLM Calls:', rlmResult.llmCallCount);

console.log('Steps:', rlmResult.steps); // Full trajectory

```

### As Tool

Use **createRLMTool** to create an AI SDK-compatible tool for use with `generateText` or `ToolLoopAgent`:

```typescript

import { createRLMTool } from 'ai-rlm';

import { generateText } from 'ai';

import { openai } from '@ai-sdk/openai';

// Create the tool

const rlmTool = createRLMTool({

  model: openai('gpt-4.1'),

  subModel: openai('gpt-4.1-mini'),

});

// Use in generateText

const result = await generateText({

  model: openai('gpt-4.1'),

  tools: { analyzeLargeContext: rlmTool },

  prompt: 'Analyze this large codebase for security vulnerabilities',

});

```

### With ToolLoopAgent

```typescript

import { ToolLoopAgent } from 'ai';

import { createRLMTool } from 'ai-rlm';

import { openai } from '@ai-sdk/openai';

const agent = new ToolLoopAgent({

  model: openai('gpt-4.1'),

  tools: {

    analyzeLargeContext: createRLMTool({

      model: openai('gpt-4.1'),

      subModel: openai('gpt-4.1-mini'),

    }),

    // ... other tools

  },

});

const result = await agent.generate({

  prompt: 'Check this document for compliance issues',

});

```

### Streaming Support

```typescript

const stream = await agent.stream({

  prompt: 'Analyze this',

  options: { context: largeDocument },

});

// textStream emits the final text after generate() completes

const reader = stream.textStream.getReader();

while (true) {

  const { done, value } = await reader.read();

  if (done) break;

  process.stdout.write(value);

}

```

## How It Works

The RLM agent writes JavaScript code to explore the context in an iterative loop:

```javascript

// First, explore the context

console.log('Context length:', context.length);

console.log('First 200 chars:', context.substring(0, 200));

// Search for specific patterns

const lines = context.split('\n');

const targetLine = lines.find(line => line.includes('magic number'));

console.log('Found:', targetLine);

// Store result for later

const answer = targetLine?.match(/magic number is (\d+)/)?.[1];

// Submit answer

FINAL_VAR(answer)

```

1. **Context Loading**: The context is loaded into a sandboxed JavaScript REPL environment

2. **Iterative Reasoning**: The root LLM writes JavaScript code to explore the context

3. **Code Execution**: Code is executed in a QuickJS WebAssembly sandbox with a 30s timeout

4. **Sub-LLM Queries**: For semantic analysis, `llm_query()` delegates to a sub-model

5. **Result Accumulation**: The model iterates until it finds an answer

6. **Final Answer**: The model submits an answer using `FINAL(answer)` or `FINAL_VAR(variable_name)`

### System Prompt

The RLM system prompt instructs the model to:

- EXPLORE FIRST - Look at data before processing

- ITERATE - Write small code snippets, observe outputs

- VERIFY BEFORE SUBMITTING - Check results are correct

- USE llm_query FOR SEMANTICS - Code finds WHERE; LLM understands WHAT

- CHUNK SMARTLY - Feed substantial chunks to sub-LLMs (~500K chars)

## REPL Sandbox

The JavaScript REPL runs code in a QuickJS WebAssembly sandboxed context:

### Available in the Sandbox:

- **`context`**: The input context (string or object)

- **`console.log()` / `console.error()`**: Output logging

- **`llm_query(prompt)`**: Query a sub-LLM for semantic analysis

- **`llm_query_batched(prompts)`**: Query multiple sub-LLMs

- **`FINAL(answer)`**: Submit final answer directly

- **`FINAL_VAR(varName)`**: Submit a variable from the REPL

- **Standard JavaScript**: All ES6+ features, Array methods, String methods, Math, JSON, etc.

### Security Features:

- 30-second timeout on code execution

- No access to Node.js built-in modules or file system

- No network access

- Sandboxed console output capture

### Custom Sandbox Implementations

`RLMAgent` supports user-defined sandboxes through `sandboxFactory`.

```typescript

import {

  RLMAgent,

  createQuickJSSandbox,

  type RLMSandbox,

  type RLMSandboxFactoryOptions,

} from 'ai-rlm';

import { openai } from '@ai-sdk/openai';

const sandboxFactory = (options: RLMSandboxFactoryOptions): RLMSandbox => {

  // Wrap the default QuickJS sandbox, or return your own implementation.

  return createQuickJSSandbox(options);

};

const agent = new RLMAgent({

  model: openai('gpt-4.1'),

  subModel: openai('gpt-4.1-mini'),

  sandboxFactory,

});

```

### Logging

Library diagnostics are silent by default. If you want internal agent logs, pass an explicit logger and log level:

```typescript

const agent = new RLMAgent({

  model: openai('gpt-4.1'),

  subModel: openai('gpt-4.1-mini'),

  logger: console,

  logLevel: 'debug',

});

```

Use this for local debugging. In application code, prefer wiring `logger` to your app's logging system rather than relying on `console`.

Your sandbox must implement:

```typescript

interface RLMSandbox {

  loadContext(context: RLMContext): Promise;

  executeJavaScript(code: string): Promise<{

    stdout: string;

    stderr: string;

    error?: string;

    result?: unknown;

  }>;

  getVariable(name: string): unknown;

  getLLMCallCount(): number;

  getUsageSummary(): RLMUsageSummary;

  cleanup(): void;

}

```

Custom sandbox factories are also propagated to recursive `sub_rlm()` calls.

## API Reference

### RLMAgent

The primary class for using RLM as an agent.

#### `constructor(settings: RLMAgentSettings)`

```typescript

import type { LanguageModel } from 'ai';

interface RLMAgentSettings {

  model: LanguageModel;     // Required: Root agent model

  subModel?: LanguageModel; // Optional: Sub-LLM model (defaults to model)

  maxIterations?: number;   // Max REPL iterations (default: 20)

  maxLLMCalls?: number;     // Max sub-LLM calls (default: 50)

  maxOutputChars?: number;  // Max REPL output chars (default: 100000)

  maxHistoryPreview?: number; // Max output preview chars in model history (default: 500)

  prepareIteration?: (ctx) => PrepareIterationResult | void | Promise;

  prepareSubAgent?: (ctx) => PrepareSubAgentResult | void | Promise;

  logger?: RLMLogger;       // Optional injected logger

  logLevel?: RLMLogLevel;   // Log level for internal diagnostics (default: "silent")

  sandboxFactory?: RLMSandboxFactory; // Optional custom sandbox factory

}

```

#### `async generate(options): Promise`

Generate an answer by iteratively analyzing the context.

**Parameters:**

```typescript

interface RLMAgentCallParameters {

  context: RLMContext;                    // The large context to analyze

  query: string;                          // The question or task

  abortSignal?: AbortSignal;              // Optional abort signal

  timeout?: number;                       // Optional timeout in ms

  onStepFinish?: (step: REPLStep) => void; // Callback for each step

}

```

**Returns:**

```typescript

interface RLMGenerateResult {

  text: string;             // The generated answer

  steps: REPLStep[];        // Array of REPL steps taken

  llmCallCount: number;     // Total LLM calls made

  iterations: number;       // Total iterations performed

  usage: RLMUsageSummary;   // Aggregated token usage across root + sub-calls

}

interface REPLStep {

  iteration: number;

  reasoning: string;        // The model's reasoning before code

  code: string;             // JavaScript code executed

  output: string;           // Console output and results

}

```

#### `async stream(options): Promise`

Run `generate()` and emit AI SDK-style stream parts for iteration progress and final text output.

**Returns:**

```typescript

interface RLMStreamResult extends RLMGenerateResult {

  textStream: ReadableStream;  // Emits text-delta content

  fullStream: ReadableStream>; // Emits start/start-step/finish-step/text/finish events

}

```

### createRLMTool

Factory function to create RLM as an AI SDK-compatible tool.

#### `createRLMTool(config?: RLMToolConfig)`

```typescript

import type { LanguageModel } from 'ai';

function createRLMTool(config?: {

  model?: LanguageModel;    // Root agent model

  subModel?: LanguageModel; // Sub-LLM model

  maxIterations?: number;   // Max iterations (default: 20)

  maxLLMCalls?: number;     // Max LLM calls (default: 50)

  maxOutputChars?: number;  // Max output chars (default: 100000)

  logger?: RLMLogger;       // Optional injected logger

  logLevel?: RLMLogLevel;   // Log level for internal diagnostics

}): Tool

```

**Tool Input Schema:**

```typescript

{

  context: string | string[] | Record;

  query: string;

  maxIterations?: number;   // Optional override

  maxLLMCalls?: number;     // Optional override

}

```

**Tool Output:**

```typescript

{

  answer: string;           // The generated answer

  iterations: number;       // Number of iterations

  stepsTaken: number;       // Number of steps executed

}

```

### RLMContext

Context can be any of these formats:

```typescript

type RLMContext = string | string[] | Record;

```

- `string`: Raw text document

- `string[]`: Array of lines or documents

- `Record`: JSON/structured data

## Architecture

```

┌─────────────────────────────────────────────────────────────┐

│                      RLMAgent Class                         │

├─────────────────────────────────────────────────────────────┤

│  ┌───────────────────────────────────────────────────────┐  │

│  │              REPL Environment (QuickJS)               │  │

│  │  - Sandboxed JavaScript execution                     │  │

│  │  - llm_query() for sub-LLM semantic analysis          │  │

│  │  - 30s timeout protection                             │  │

│  └───────────────────────────────────────────────────────┘  │

│                                                             │

│  ┌───────────────────────────────────────────────────────┐  │

│  │              generate() Method                        │  │

│  │  1. Generate reasoning + JS code                      │  │

│  │  2. Execute in sandboxed context                      │  │

│  │  3. Process llm_query markers → real LLM calls        │  │

│  │  4. Check for FINAL() answer                          │  │

│  │  5. Repeat or return answer                           │  │

│  └───────────────────────────────────────────────────────┘  │

│                                                             │

│  ┌───────────────────────────────────────────────────────┐  │

│  │              stream() Method                          │  │

│  │  - Delegates to generate()                            │  │

│  │  - Emits start-step / finish-step progress events     │  │

│  │  - Emits text-start / text-delta / text-end / finish  │  │

│  └───────────────────────────────────────────────────────┘  │

└─────────────────────────────────────────────────────────────┘

                              │

                              │ createRLMTool()

                              ▼

                    ┌──────────────────────┐

                    │    AI SDK Tool        │

                    │ - Tool interface      │

                    │ - Input validation    │

                    │ - Auto-execution      │

                    └──────────────────────┘

```

## Examples

Run the examples:

```bash

# Basic agent examples

bun run examples/basic-usage.ts

# Tool integration examples

bun run examples/tool-usage.ts

# Individual examples

bun run -e "import { example1SimpleTextSearch } from './examples/basic-usage.ts'; example1SimpleTextSearch()"

```

## CLI Codebase Search

This repo includes a local CLI script for searching a codebase with `RLMAgent`.

The CLI now uses a `ToolLoopAgent` orchestrator with tools:

- `list_files`

- `search_files`

- `read_file`

- `analyze_with_rlm` (deep analysis on selected files)

This avoids preloading the entire repository into one context window.

```bash

npm run code-search -- ./path/to/codebase "Where is authentication handled?"

```

You can also run the bin directly:

```bash

node ./bin/rlm-codebase-search.js ./path/to/codebase "How are API routes defined?"

```

Required environment variable:

```bash

export OPENAI_API_KEY="your_key_here"

```

### Example Files

- **`examples/basic-usage.ts`**: Agent API examples (generate, stream, callbacks)

- **`examples/tool-usage.ts`**: Tool API examples (with generateText, ToolLoopAgent)

- **`examples/document-comparison.ts`**: Document diffing example

- **`examples/data-transformation.ts`**: Data extraction and transformation

## License

MIT

## References

- Paper: "Recursive Language Models" (Zhang, Kraska, Khattab, 2025)

- AI SDK Documentation: https://sdk.vercel.ai/docs
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jhsu/ai-rlm

Awesome Lists containing this project

README