{"id":45260557,"url":"https://github.com/yogthos/Matryoshka","last_synced_at":"2026-03-12T18:01:00.021Z","repository":{"id":332161917,"uuid":"1132957210","full_name":"yogthos/Matryoshka","owner":"yogthos","description":"MCP server for token-efficient large document analysis via the use of REPL state","archived":false,"fork":false,"pushed_at":"2026-03-03T20:31:11.000Z","size":1523,"stargazers_count":109,"open_issues_count":2,"forks_count":12,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-03T20:41:06.230Z","etag":null,"topics":["ai-assistant","document-analysis","llm","llm-tools","mcp","mcp-server","model-context-protocol"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yogthos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-12T17:23:41.000Z","updated_at":"2026-03-03T10:22:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/yogthos/Matryoshka","commit_stats":null,"previous_names":["yogthos/matryoshka"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/yogthos/Matryoshka","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogthos%2FMatryoshka","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogthos%2FMatryoshka/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogthos%2FMatryoshka/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogthos%2FMatryoshka/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yogthos","download_url":"https://codeload.github.com/yogthos/Matryoshka/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yogthos%2FMatryoshka/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30437534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-12T14:34:45.044Z","status":"ssl_error","status_checked_at":"2026-03-12T14:09:33.793Z","response_time":114,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-assistant","document-analysis","llm","llm-tools","mcp","mcp-server","model-context-protocol"],"created_at":"2026-02-21T00:00:41.404Z","updated_at":"2026-03-12T18:01:00.004Z","avatar_url":"https://github.com/yogthos.png","language":"TypeScript","readme":"# Matryoshka\n\n[![Tests](https://github.com/yogthos/Matryoshka/actions/workflows/test.yml/badge.svg)](https://github.com/yogthos/Matryoshka/actions/workflows/test.yml)\n\nProcess documents 100x larger than your LLM's context window—without vector databases or chunking heuristics.\n\n## The Problem\n\nLLMs have fixed context windows. Traditional solutions (RAG, chunking) lose information or miss connections across chunks. RLM takes a different approach: the model reasons about your query and outputs symbolic commands that a logic engine executes against the document.\n\nBased on the [Recursive Language Models paper](https://arxiv.org/abs/2512.24601).\n\n## How It Works\n\nUnlike traditional approaches where an LLM writes arbitrary code, RLM uses **[Nucleus](https://github.com/michaelwhitford/nucleus)**—a constrained symbolic language based on S-expressions. The LLM outputs Nucleus commands, which are parsed, type-checked, and executed by **Lattice**, our logic engine.\n\n```\n┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐\n│   User Query    │────▶│   LLM Reasons   │────▶│ Nucleus Command │\n│ \"total sales?\"  │     │  about intent   │     │  (sum RESULTS)  │\n└─────────────────┘     └─────────────────┘     └────────┬────────┘\n                                                         │\n┌─────────────────┐     ┌─────────────────┐     ┌────────▼────────┐\n│  Final Answer   │◀────│ Lattice Engine  │◀────│     Parser      │\n│   13,000,000    │     │    Executes     │     │    Validates    │\n└─────────────────┘     └─────────────────┘     └─────────────────┘\n```\n\n**Why this works better than code generation:**\n\n1. **Reduced entropy** - Nucleus has a rigid grammar with fewer valid outputs than JavaScript\n2. **Fail-fast validation** - Parser rejects malformed commands before execution\n3. **Safe execution** - Lattice only executes known operations, no arbitrary code\n4. **Small model friendly** - 7B models handle symbolic grammars better than freeform code\n\n## Architecture\n\n### The Nucleus DSL\n\nThe LLM outputs commands in the Nucleus DSL—an S-expression language designed for document analysis:\n\n```scheme\n; Search for patterns\n(grep \"SALES_DATA\")\n\n; Filter results\n(filter RESULTS (lambda x (match x \"NORTH\" 0)))\n\n; Aggregate\n(sum RESULTS)    ; Auto-extracts numbers like \"$2,340,000\" from lines\n(count RESULTS)  ; Count matching items\n\n; Final answer\n\u003c\u003c\u003cFINAL\u003e\u003e\u003e13000000\u003c\u003c\u003cEND\u003e\u003e\u003e\n```\n\n### The Lattice Engine\n\nThe Lattice engine (`src/logic/`) processes Nucleus commands:\n\n1. **Parser** (`lc-parser.ts`) - Parses S-expressions into an AST\n2. **Type Inference** (`type-inference.ts`) - Validates types before execution\n3. **Constraint Resolver** (`constraint-resolver.ts`) - Handles symbolic constraints like `[Σ⚡μ]`\n4. **Solver** (`lc-solver.ts`) - Executes commands against the document\n\nLattice uses **miniKanren** (a relational programming engine) for pattern classification and filtering operations.\n\n### In-Memory Handle Storage\n\nFor large result sets, RLM uses a handle-based architecture with in-memory SQLite (`src/persistence/`) that achieves **97%+ token savings**:\n\n```\nTraditional:  LLM sees full array    [15,000 tokens for 1000 results]\nHandle-based: LLM sees stub          [50 tokens: \"$res1: Array(1000) [preview...]\"]\n```\n\n**How it works:**\n1. Results are stored in SQLite with FTS5 full-text indexing\n2. LLM receives only handle references (`$res1`, `$res2`, etc.)\n3. Operations execute server-side, returning new handles\n4. Full data is only materialized when needed\n\n**Components:**\n- `SessionDB` - In-memory SQLite with FTS5 for fast full-text search\n- `HandleRegistry` - Stores arrays, returns compact handle references\n- `HandleOps` - Server-side filter/map/count/sum on handles\n- `FTS5Search` - Phrase queries, boolean operators, relevance ranking\n- `CheckpointManager` - Save/restore session state\n\n### The Role of the LLM\n\nThe LLM does **reasoning**, not code generation:\n\n1. **Understands intent** - Interprets \"total of north sales\" as needing grep + filter + sum\n2. **Chooses operations** - Decides which Nucleus commands achieve the goal\n3. **Verifies results** - Checks if the current results answer the query\n4. **Iterates** - Refines search if results are too broad or narrow\n\nThe LLM never writes JavaScript. It outputs Nucleus commands that Lattice executes safely.\n\n### Components Summary\n\n| Component | Purpose |\n|-----------|---------|\n| **Nucleus Adapter** | Prompts LLM to output Nucleus commands |\n| **Lattice Parser** | Parses S-expressions to AST |\n| **Lattice Solver** | Executes commands against document |\n| **In-Memory Handles** | Handle-based storage with FTS5 (97% token savings) |\n| **miniKanren** | Relational engine for classification |\n| **RAG Hints** | Few-shot examples from past successes |\n\n## Installation\n\nInstall from npm:\n\n```bash\nnpm install -g matryoshka-rlm\n```\n\nOr run without installing:\n\n```bash\nnpx matryoshka-rlm \"What is the total of all sales values?\" ./report.txt\n```\n\n### Included Tools\n\nThe package provides several CLI tools:\n\n| Command | Description |\n|---------|-------------|\n| `rlm` | Main CLI for document analysis with LLM reasoning |\n| `lattice-mcp` | MCP server exposing direct Nucleus commands (no LLM required) |\n| `lattice-repl` | Interactive REPL for Nucleus commands |\n| `lattice-http` | HTTP server for Nucleus queries |\n| `lattice-pipe` | Pipe adapter for programmatic access |\n| `lattice-setup` | Setup script for Claude Code integration |\n\n### From Source\n\n```bash\ngit clone https://github.com/yogthos/Matryoshka.git\ncd Matryoshka\nnpm install\nnpm run build\n```\n\n## Configuration\n\nCopy `config.example.json` to `config.json` and configure your LLM provider:\n\n```json\n{\n  \"llm\": {\n    \"provider\": \"ollama\"\n  },\n  \"providers\": {\n    \"ollama\": {\n      \"baseUrl\": \"http://localhost:11434\",\n      \"model\": \"qwen2.5-coder:7b\",\n      \"options\": { \"temperature\": 0.2, \"num_ctx\": 8192 }\n    },\n    \"deepseek\": {\n      \"baseUrl\": \"https://api.deepseek.com\",\n      \"apiKey\": \"${DEEPSEEK_API_KEY}\",\n      \"model\": \"deepseek-chat\",\n      \"options\": { \"temperature\": 0.2 }\n    }\n  }\n}\n```\n\n## Usage\n\n### CLI\n\n```bash\n# Basic usage\nrlm \"What is the total of all sales values?\" ./report.txt\n\n# With options\nrlm \"Count all ERROR entries\" ./logs.txt --max-turns 15 --verbose\n\n# See all options\nrlm --help\n```\n\n### MCP Integration\n\nRLM includes `lattice-mcp`, an MCP (Model Context Protocol) server for direct access to the Nucleus engine. This allows coding agents to analyze documents with **80%+ token savings** compared to reading files directly.\n\nThe key advantage is **handle-based results**: query results are stored server-side in SQLite, and the agent receives compact stubs like `$res1: Array(1000) [preview...]` instead of full data. Operations chain server-side without roundtripping data.\n\n#### Available Tools\n\n| Tool | Description |\n|------|-------------|\n| `lattice_load` | Load a document for analysis |\n| `lattice_query` | Execute Nucleus commands on the loaded document |\n| `lattice_expand` | Expand a handle to see full data (with optional limit/offset) |\n| `lattice_close` | Close the session and free memory |\n| `lattice_status` | Get session status and document info |\n| `lattice_bindings` | Show current variable bindings |\n| `lattice_reset` | Reset bindings but keep document loaded |\n| `lattice_help` | Get Nucleus command reference |\n\n#### Example MCP config\n\n```json\n{\n  \"mcp\": {\n    \"lattice\": {\n      \"type\": \"stdio\",\n      \"command\": \"lattice-mcp\"\n    }\n  }\n}\n```\n\n#### Efficient Usage Pattern\n\n```\n1. lattice_load(\"/path/to/large-file.txt\")   # Load document (use for \u003e500 lines)\n2. lattice_query('(grep \"ERROR\")')           # Search - returns handle stub $res1\n3. lattice_query('(filter RESULTS ...)')     # Narrow down - returns handle stub $res2\n4. lattice_query('(count RESULTS)')          # Get count without seeing data\n5. lattice_expand(\"$res2\", limit=10)         # Expand only what you need to see\n6. lattice_close()                           # Free memory when done\n```\n\n**Token efficiency tips:**\n- Query results return handle stubs, not full data\n- Use `lattice_expand` with `limit` to see only what you need\n- Chain `grep → filter → count/sum` to refine progressively\n- Use `RESULTS` in queries (always points to last result)\n- Use `$res1`, `$res2` etc. with `lattice_expand` to inspect specific results\n\n### Programmatic\n\n```typescript\nimport { runRLM } from \"matryoshka-rlm/rlm\";\nimport { createLLMClient } from \"matryoshka-rlm\";\n\nconst llmClient = createLLMClient(\"ollama\", {\n  baseUrl: \"http://localhost:11434\",\n  model: \"qwen2.5-coder:7b\",\n  options: { temperature: 0.2 }\n});\n\nconst result = await runRLM(\"What is the total of all sales values?\", \"./report.txt\", {\n  llmClient,\n  maxTurns: 10,\n  turnTimeoutMs: 30000,\n});\n```\n\n## Example Session\n\n```\n$ rlm \"What is the total of all north sales data values?\" ./report.txt --verbose\n\n──────────────────────────────────────────────────\n[Turn 1/10] Querying LLM...\n[Turn 1] Term: (grep \"SALES.*NORTH\")\n[Turn 1] Result: 1 matches\n\n──────────────────────────────────────────────────\n[Turn 2/10] Querying LLM...\n[Turn 2] Term: (sum RESULTS)\n[Turn 2] Console output:\n  [Lattice] Summing 1 values\n  [Lattice] Sum = 2340000\n[Turn 2] Result: 2340000\n\n──────────────────────────────────────────────────\n[Turn 3/10] Querying LLM...\n[Turn 3] Final answer received\n\n2340000\n```\n\nThe model:\n1. Searched for relevant data with grep\n2. Summed the matching results\n3. Output the final answer\n\n## Nucleus DSL Reference\n\n### Search Commands\n\n```scheme\n(grep \"pattern\")              ; Regex search, returns matches with line numbers\n(fuzzy_search \"query\" 10)     ; Fuzzy search, returns top N matches with scores\n(text_stats)                  ; Document metadata (length, line count, samples)\n```\n\n### Symbol Operations (Code Files)\n\nFor code files, Lattice uses tree-sitter to extract structural symbols. This enables code-aware queries that understand functions, classes, methods, and other language constructs.\n\n**Built-in languages (packages included):**\n- TypeScript (.ts, .tsx), JavaScript (.js, .jsx), Python (.py), Go (.go)\n- HTML (.html), CSS (.css), JSON (.json)\n\n**Additional languages (install package to enable):**\n- Rust, C, C++, Java, Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Bash, SQL, and more\n\n```scheme\n(list_symbols)                ; List all symbols (functions, classes, methods, etc.)\n(list_symbols \"function\")     ; Filter by kind: \"function\", \"class\", \"method\", \"interface\", \"type\", \"struct\"\n(get_symbol_body \"myFunc\")    ; Get source code body for a symbol by name\n(get_symbol_body RESULTS)     ; Get body for symbol from previous query result\n(find_references \"myFunc\")    ; Find all references to an identifier\n```\n\n**Example workflow for code analysis:**\n\n```\n1. lattice_load(\"./src/app.ts\")           # Load a code file\n2. lattice_query('(list_symbols)')        # Get all symbols → $res1\n3. lattice_query('(list_symbols \"function\")')  # Just functions → $res2\n4. lattice_expand(\"$res2\", limit=5)       # See function names and line numbers\n5. lattice_query('(get_symbol_body \"handleRequest\")')  # Get function body\n6. lattice_query('(find_references \"handleRequest\")')  # Find all usages\n```\n\nSymbols include metadata like name, kind, start/end lines, and parent relationships (e.g., methods within classes).\n\n#### Adding Language Support\n\nMatryoshka includes built-in symbol mappings for 20+ languages. To enable a language, install its tree-sitter grammar package:\n\n```bash\n# Enable Rust support\nnpm install tree-sitter-rust\n\n# Enable Java support\nnpm install tree-sitter-java\n\n# Enable Ruby support\nnpm install tree-sitter-ruby\n```\n\n**Languages with built-in mappings:**\n- TypeScript, JavaScript, Python, Go, Rust, C, C++, Java\n- Ruby, PHP, C#, Kotlin, Swift, Scala, Lua, Haskell, Elixir\n- HTML, CSS, JSON, YAML, TOML, Markdown, SQL, Bash\n\nOnce a package is installed, the language is automatically available for symbol extraction.\n\n#### Custom Language Configuration\n\nFor languages without built-in mappings, or to override existing mappings, create a config file at `~/.matryoshka/config.json`:\n\n```json\n{\n  \"grammars\": {\n    \"mylang\": {\n      \"package\": \"tree-sitter-mylang\",\n      \"extensions\": [\".ml\", \".mli\"],\n      \"moduleExport\": \"mylang\",\n      \"symbols\": {\n        \"function_definition\": \"function\",\n        \"method_definition\": \"method\",\n        \"class_definition\": \"class\",\n        \"module_definition\": \"module\"\n      }\n    }\n  }\n}\n```\n\n**Configuration fields:**\n\n| Field | Required | Description |\n|-------|----------|-------------|\n| `package` | Yes | npm package name for the tree-sitter grammar |\n| `extensions` | Yes | File extensions to associate with this language |\n| `symbols` | Yes | Maps tree-sitter node types to symbol kinds |\n| `moduleExport` | No | Submodule export name (e.g., `\"typescript\"` for tree-sitter-typescript) |\n\n**Symbol kinds:** `function`, `method`, `class`, `interface`, `type`, `struct`, `enum`, `trait`, `module`, `variable`, `constant`, `property`\n\n#### Finding Tree-sitter Node Types\n\nTo configure symbol mappings for a new language, you need to know the tree-sitter node types. You can explore them using the tree-sitter CLI:\n\n```bash\n# Install tree-sitter CLI\nnpm install -g tree-sitter-cli\n\n# Parse a sample file and see the AST\ntree-sitter parse sample.mylang\n```\n\nOr use the [tree-sitter playground](https://tree-sitter.github.io/tree-sitter/playground) to explore node types interactively.\n\n**Example: Adding OCaml support**\n\n1. Find the grammar package: `tree-sitter-ocaml`\n2. Install it: `npm install tree-sitter-ocaml`\n3. Explore the AST to find node types for functions, modules, etc.\n4. Add to `~/.matryoshka/config.json`:\n\n```json\n{\n  \"grammars\": {\n    \"ocaml\": {\n      \"package\": \"tree-sitter-ocaml\",\n      \"extensions\": [\".ml\", \".mli\"],\n      \"moduleExport\": \"ocaml\",\n      \"symbols\": {\n        \"value_definition\": \"function\",\n        \"let_binding\": \"variable\",\n        \"type_definition\": \"type\",\n        \"module_definition\": \"module\",\n        \"module_type_definition\": \"interface\"\n      }\n    }\n  }\n}\n```\n\n**Note:** Some tree-sitter packages use native Node.js bindings that may not compile on all systems. If installation fails, check if the package supports your Node.js version or look for WASM alternatives.\n\n### Collection Operations\n\n```scheme\n(filter RESULTS (lambda x (match x \"pattern\" 0)))  ; Filter by regex\n(map RESULTS (lambda x (match x \"(\\\\d+)\" 1)))      ; Extract from each\n(sum RESULTS)                                       ; Sum numbers in results\n(count RESULTS)                                     ; Count items\n```\n\n### String Operations\n\n```scheme\n(match str \"pattern\" 0)       ; Regex match, return group N\n(replace str \"from\" \"to\")     ; String replacement\n(split str \",\" 0)             ; Split and get index\n(parseInt str)                ; Parse integer\n(parseFloat str)              ; Parse float\n```\n\n### Type Coercion\n\nWhen the model sees data that needs parsing, it can use declarative type coercion:\n\n```scheme\n; Date parsing (returns ISO format YYYY-MM-DD)\n(parseDate \"Jan 15, 2024\")           ; -\u003e \"2024-01-15\"\n(parseDate \"01/15/2024\" \"US\")        ; -\u003e \"2024-01-15\" (MM/DD/YYYY)\n(parseDate \"15/01/2024\" \"EU\")        ; -\u003e \"2024-01-15\" (DD/MM/YYYY)\n\n; Currency parsing (handles $, €, commas, etc.)\n(parseCurrency \"$1,234.56\")          ; -\u003e 1234.56\n(parseCurrency \"€1.234,56\")          ; -\u003e 1234.56 (EU format)\n\n; Number parsing\n(parseNumber \"1,234,567\")            ; -\u003e 1234567\n(parseNumber \"50%\")                  ; -\u003e 0.5\n\n; General coercion\n(coerce value \"date\")                ; Coerce to date\n(coerce value \"currency\")            ; Coerce to currency\n(coerce value \"number\")              ; Coerce to number\n\n; Extract and coerce in one step\n(extract str \"\\\\$[\\\\d,]+\" 0 \"currency\")  ; Extract and parse as currency\n```\n\nUse in map for batch transformations:\n\n```scheme\n; Parse all dates in results\n(map RESULTS (lambda x (parseDate (match x \"[A-Za-z]+ \\\\d+, \\\\d+\" 0))))\n\n; Extract and sum currencies\n(map RESULTS (lambda x (parseCurrency (match x \"\\\\$[\\\\d,]+\" 0))))\n```\n\n### Program Synthesis\n\nFor complex transformations, the model can synthesize functions from examples:\n\n```scheme\n; Synthesize from input/output pairs\n(synthesize\n  (\"$100\" 100)\n  (\"$1,234\" 1234)\n  (\"$50,000\" 50000))\n; -\u003e Returns a function that extracts numbers from currency strings\n```\n\nThis uses Barliman-style relational synthesis with miniKanren to automatically build extraction functions.\n\n### Cross-Turn State\n\nResults from previous turns are available:\n- `RESULTS` - Latest array result (updated by grep, filter)\n- `_0`, `_1`, `_2`, ... - Results from specific turns\n\n### Final Answer\n\n```scheme\n\u003c\u003c\u003cFINAL\u003e\u003e\u003eyour answer here\u003c\u003c\u003cEND\u003e\u003e\u003e\n```\n\n## Troubleshooting\n\n### Model Answers Without Exploring\n\n**Symptom**: The model provides an answer immediately with hallucinated data.\n\n**Solutions**:\n1. Use a more capable model (7B+ recommended)\n2. Be specific in your query: \"Find lines containing SALES_DATA and sum the dollar amounts\"\n\n### Max Turns Reached\n\n**Symptom**: \"Max turns (N) reached without final answer\"\n\n**Solutions**:\n1. Increase `--max-turns` for complex documents\n2. Check `--verbose` output for repeated patterns (model stuck in loop)\n3. Simplify the query\n\n### Parse Errors\n\n**Symptom**: \"Parse error: no valid command\"\n\n**Cause**: Model output malformed S-expression.\n\n**Solutions**:\n1. The system auto-converts JSON to S-expressions as fallback\n2. Use `--verbose` to see what the model is generating\n3. Try a different model tuned for code/symbolic output\n\n## Development\n\n```bash\nnpm test                              # Run tests\nnpm test -- --coverage                # With coverage\nRUN_E2E=1 npm test -- tests/e2e.test.ts  # E2E tests (requires Ollama)\nnpm run build                         # Build\nnpm run typecheck                     # Type check\n```\n\n## Project Structure\n\n```\nsrc/\n├── adapters/           # Model-specific prompting\n│   ├── nucleus.ts      # Nucleus DSL adapter\n│   └── types.ts        # Adapter interface\n├── logic/              # Lattice engine\n│   ├── lc-parser.ts    # Nucleus parser\n│   ├── lc-solver.ts    # Command executor (uses miniKanren)\n│   ├── type-inference.ts\n│   └── constraint-resolver.ts\n├── persistence/        # In-memory handle storage (97% token savings)\n│   ├── session-db.ts   # In-memory SQLite with FTS5\n│   ├── handle-registry.ts  # Handle creation and stubs\n│   ├── handle-ops.ts   # Server-side operations\n│   ├── fts5-search.ts  # Full-text search\n│   └── checkpoint.ts   # Session persistence\n├── treesitter/         # Code-aware symbol extraction\n│   ├── parser-registry.ts  # Tree-sitter parser management\n│   ├── symbol-extractor.ts # AST → symbol extraction\n│   ├── language-map.ts # Extension → language mapping\n│   └── types.ts        # Symbol interfaces\n├── engine/             # Nucleus execution engine\n│   ├── nucleus-engine.ts\n│   └── handle-session.ts   # Session with symbol support\n├── minikanren/         # Relational programming engine\n├── synthesis/          # Program synthesis (Barliman-style)\n│   └── evalo/          # Extractor DSL\n├── rag/                # Few-shot hint retrieval\n└── rlm.ts              # Main execution loop\n```\n\n## Acknowledgements\n\nThis project incorporates ideas and code from:\n\n- **[Nucleus](https://github.com/michaelwhitford/nucleus)** - A symbolic S-expression language by Michael Whitford. RLM uses Nucleus syntax for the constrained DSL that the LLM outputs, providing a rigid grammar that reduces model errors.\n- **[ramo](https://github.com/wjlewis/ramo)** - A miniKanren implementation in TypeScript by Will Lewis. Used for constraint-based program synthesis.\n- **[Barliman](https://github.com/webyrd/Barliman)** - A prototype smart editor by William Byrd and Greg Rosenblatt that uses program synthesis to assist programmers. The Barliman-style approach of providing input/output constraints instead of code inspired the synthesis workflow.\n- **[tree-sitter](https://tree-sitter.github.io/tree-sitter/)** - A parser generator tool and incremental parsing library. Used for extracting structural symbols (functions, classes, methods) from code files to enable code-aware queries.\n\n## License\n\nMIT\n\n## References\n\n- [RLM Paper](https://arxiv.org/abs/2512.24601)\n- [Original Implementation](https://github.com/alexzhang13/rlm)\n- [Model Context Protocol](https://modelcontextprotocol.io/)\n- [miniKanren](http://minikanren.org/)\n","funding_links":[],"categories":["Libraries","TypeScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyogthos%2FMatryoshka","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyogthos%2FMatryoshka","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyogthos%2FMatryoshka/lists"}