https://github.com/tonylnng/gateforge-loom

Weave intelligent agents into workflows. A composable multi-agent orchestration stack — Brain (Claude) · Hands (OpenClaw) · Memory (Hermes) — wired via n8n on Docker.
https://github.com/tonylnng/gateforge-loom
ai-agents claude docker-compose fastapi hermes multi-agent n8n openclaw orchestration pgvector
Last synced: about 20 hours ago
JSON representation
Weave intelligent agents into workflows. A composable multi-agent orchestration stack — Brain (Claude) · Hands (OpenClaw) · Memory (Hermes) — wired via n8n on Docker.
Host: GitHub
URL: https://github.com/tonylnng/gateforge-loom
Owner: tonylnng
License: mit
Created: 2026-05-08T02:31:07.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-06-24T02:02:51.000Z (5 days ago)
Last Synced: 2026-06-24T03:04:31.021Z (5 days ago)
Topics: ai-agents, claude, docker-compose, fastapi, hermes, multi-agent, n8n, openclaw, orchestration, pgvector
Language: Python
Size: 532 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Gateforge-Loom

> **Weave intelligent agents into workflows.**

> A composable, multi-agent orchestration stack — every agent is its own service, every interaction is a JSON contract, every run leaves a memory.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

![Stack](https://img.shields.io/badge/stack-FastAPI%20·%20n8n%20·%20Postgres%2Fpgvector%20·%20Redis-blue)

![Status](https://img.shields.io/badge/status-PoC-orange)

![Topology](https://img.shields.io/badge/topology-3--VM%20hybrid-blueviolet)

Gateforge-Loom is a **layered multi-agent system** built on three foundational

roles — **Brain · Hands · Memory** — orchestrated by **n8n** and connected

through **Redis** + **Postgres/pgvector**. The architecture treats agents as

threads on a loom: today three threads, tomorrow as many as your workflow

needs. New agents (Validator, Critic, Router, Reviewer…) drop in as additional

services; the orchestrator weaves them all into one cohesive run.

The stack runs equally well on **a single VM with Docker** or on a **3-VM

hybrid topology** — Brain + Orchestrator in Docker on one VM, and Hands and

Memory installed **natively as systemd services** on two more VMs, meshed over

Tailscale. The Brain reaches Anthropic models through the **Vercel AI Gateway**,

which keeps it reachable from regions where `api.anthropic.com` is blocked

(e.g. Hong Kong).

---

## Table of contents

- [Why Gateforge-Loom?](#why-gateforge-loom)

- [Architecture at a glance](#architecture-at-a-glance)

- [Concept diagram](#1-concept-diagram)

- [Sequence diagram](#2-sequence-diagram-one-job-end-to-end)

- [State diagram](#3-state-diagram-job-lifecycle)

- [Workflow diagram](#4-workflow-diagram-the-n8n-pipeline)

- [Deployment diagram](#5-deployment-diagram-3-vm-hybrid-topology)

- [Install-flow diagram](#6-install-flow-diagram-cluster-bring-up)

- [Components](#components)

- [Quick start](#quick-start)

- [Deployment topology](#deployment-topology)

- [VM-1 install — Brain + Orchestrator (Docker)](#vm-1-install--brain--orchestrator-docker)

- [VM-2 install — OpenClaw native (Hands)](#vm-2-install--openclaw-native-hands)

- [VM-3 install — Hermes native (Memory)](#vm-3-install--hermes-native-memory)

- [Backup & recovery](#backup--recovery)

- [Operations cheatsheet](#operations-cheatsheet)

- [Adding more agents](#adding-more-agents)

- [Project layout](#project-layout)

- [Roadmap](#roadmap)

---

## Why Gateforge-Loom?

> *"The three tools aren't competing — they're layered. Brain decides, Hands act, Memory remembers."*

Most multi-agent demos collapse three concerns into one prompt soup: planning,

execution, and memory all happen inside a single LLM call. That works for

toys; it does not survive production. Gateforge-Loom enforces **single

responsibility per layer**:

| Layer | Service | Owns | Never does | Where it runs |

|---|---|---|---|---|

| **Brain** | `claude-gateway` | Decisions, plans, synthesis | Side effects, I/O | VM-1 (Docker) |

| **Hands** | `openclaw` | Tool execution, I/O, automation | Strategy, judgement | VM-2 (native systemd) |

| **Memory** | `hermes` | Recall, learn, distil SOPs | Initiate actions | VM-3 (native systemd) |

| **Bus** | `redis` | Job state, locks, cache | Long-term storage | VM-1 (Docker) |

| **Storage** | `postgres + pgvector` | Episodic + SOP memory | Real-time state | VM-3 (native) |

| **Orchestrator** | `n8n` | Sequencing, retries, fan-out | Anything an agent should do | VM-1 (Docker) |

| **LLM Gateway** | `Vercel AI Gateway` | HK-reachable Anthropic proxy, failover, cost tracking | Any orchestration logic | External (Vercel edge) |

Each layer exposes a small typed API and can be upgraded, scaled, or replaced

independently — whether it runs as a container or a native service.

---

## Architecture at a glance

```

            ┌──────────────────────────────────────────────────────┐

            │               n8n  (orchestrator)                     │

            └────────────┬───────────────┬────────────┬─────────────┘

                         │               │            │

              POST /plan │     POST /recall │ POST /execute

                         ▼               ▼            ▼

                ┌────────────┐  ┌────────────┐  ┌────────────┐

                │ claude-gw  │  │  hermes    │  │  openclaw  │

                │  (Brain)   │  │ (Memory)   │  │  (Hands)   │

                └─────┬──────┘  └─────┬──────┘  └─────┬──────┘

                      │               │               │

                      ▼               ▼               ▼

                ┌────────────┐  ┌────────────┐  ┌────────────┐

                │ Vercel AI  │  │ Postgres + │  │  Redis bus │

                │  Gateway   │  │  pgvector  │  │   + tools  │

                │ → Anthropic│  │            │  │            │

                └────────────┘  └────────────┘  └────────────┘

```

Components can all run on a single VM, or split across three VMs over

Tailscale — see [Deployment topology](#deployment-topology).

---

## 1. Concept diagram

How the layers relate. Read top-to-bottom: a request enters at the

orchestrator, fans out to the agents, agents talk to the shared backplane,

results are woven back into a final artifact.

```mermaid

flowchart TB

    subgraph Client["Client / Trigger"]

        U["User · Cron · Webhook · Chat"]

    end

    subgraph Orchestration["Orchestration Layer · VM-1"]

        N["n8n Workflow Engine"]

    end

    subgraph Agents["Agent Layer (each agent = one service)"]

        direction LR

        B["🧠 Brain
claude-gateway
VM-1 · Docker
plan · merge · synthesize"]

        H["✋ Hands
openclaw
VM-2 · native
execute · tools"]

        M["📚 Memory
hermes
VM-3 · native
recall · write"]

        FA["… future agents
Validator · Critic · Router"]

    end

    subgraph External["External LLM Provider"]

        V["☁️ Vercel AI Gateway
→ Anthropic (Claude Opus)"]

    end

    subgraph Backplane["Shared Backplane"]

        R[("Redis
state bus · VM-1")]

        P[("Postgres + pgvector
SOP + episodic · VM-3")]

        S[("Object store
artifacts (S3 / MinIO)")]

    end

    subgraph Sinks["Output Sinks"]

        O["Notion · Slack · Drive · Webhook"]

    end

    U --> N

    N <--> B

    N <--> H

    N <--> M

    N -.-> FA

    B <--> V

    B <--> R

    H <--> R

    H --> S

    M <--> P

    N --> O

    classDef brain   fill:#FEE7DC,stroke:#D97757,color:#1F2937;

    classDef hands   fill:#DBEAFE,stroke:#3B82F6,color:#1F2937;

    classDef memory  fill:#EDE9FE,stroke:#8B5CF6,color:#1F2937;

    classDef future  fill:#F3F4F6,stroke:#9CA3AF,color:#1F2937,stroke-dasharray: 5 5;

    classDef store   fill:#D1FAE5,stroke:#10B981,color:#1F2937;

    classDef ext     fill:#FEF3C7,stroke:#F59E0B,color:#1F2937;

    class B brain

    class H hands

    class M memory

    class FA future

    class R,P,S store

    class V ext

```

**Key idea.** Agents never call each other directly. Everything is mediated

by n8n (control flow) and the shared backplane (state). This is what lets

you add or remove agents without rewriting the others. The Brain's only

outbound dependency is the Vercel AI Gateway.

---

## 2. Sequence diagram (one job, end-to-end)

What actually happens when a request comes in. Notice that **memory is

queried before the plan is finalized**, **memory is updated after every

successful run** (that's how the system gets faster over time), and **every

Brain call round-trips through the Vercel AI Gateway** to reach Anthropic.

```mermaid

sequenceDiagram

    autonumber

    participant U as User / Trigger

    participant N as n8n

    participant C as Claude Gateway (Brain)

    participant VG as Vercel AI Gateway

    participant M as Hermes (Memory)

    participant O as OpenClaw (Hands)

    participant DB as Postgres / Redis

    U->>N: POST /webhook (user_intent)

    N->>N: generate job_id

    N->>C: POST /plan { intent }

    C->>VG: messages.create (Claude Opus)

    VG-->>C: plan completion

    C-->>N: draft_plan { steps[] }

    N->>M: POST /recall { query }

    M->>DB: SELECT sop, episodic

    DB-->>M: hits[]

    M-->>N: memory_hits[]

    N->>C: POST /merge { draft_plan, hits }

    C->>VG: messages.create (Claude Opus)

    VG-->>C: merged completion

    C-->>N: final_plan (v2, SOP-augmented)

    loop for each step

        N->>O: POST /execute { tool, input }

        O->>O: run tool (web/browser/shell/api)

        O-->>N: { status, output, artifacts[] }

        N->>N: validate schema

        alt retryable error

            N->>O: retry (max_retries)

        end

        N->>DB: append step result (Redis)

    end

    N->>M: POST /write { episode, sop_updates }

    M->>DB: INSERT episodic, bump SOP version

    M-->>N: stored

    N->>C: POST /synthesize { step_results }

    C->>VG: messages.create (Claude Opus)

    VG-->>C: synthesis completion

    C-->>N: artifact_uri + summary

    N-->>U: final result

```

---

## 3. State diagram (job lifecycle)

Every job moves through a small, predictable set of states. State transitions

are written to Redis under `job:{job_id}:state` so any agent or operator can

inspect a job in flight.

```mermaid

stateDiagram-v2

    [*] --> Received: webhook hit

    Received --> Planning: job_id created

    Planning --> Recalling: draft plan ready

    Recalling --> Merging: memory hits returned

    Merging --> Executing: final plan committed

    Executing --> StepRunning: dispatch step

    StepRunning --> StepDone: status=success

    StepRunning --> StepFailed: status=error

    StepFailed --> StepRunning: retry (≤ max_retries)

    StepFailed --> Failed: retries exhausted

    StepDone --> Executing: more steps?

    StepDone --> Learning: all steps done

    Learning --> Synthesizing: episodic written

    Synthesizing --> Delivered: artifact emitted

    Delivered --> [*]

    Failed --> [*]

    note right of Recalling

        Hermes is degradable — if

        unavailable, returns empty

        hits and the job continues.

    end note

    note right of Learning

        Episodic always written.

        SOP versions bumped only

        when lessons exist.

    end note

```

---

## 4. Workflow diagram (the n8n pipeline)

The actual node graph implemented in

[`n8n/workflows/gateforge-loom-pipeline.json`](n8n/workflows/gateforge-loom-pipeline.json).

Import it directly in n8n.

```mermaid

flowchart LR

    T(["🪝 Webhook Trigger"]) --> J["Generate job_id"]

    J --> P1["Claude /plan"]

    P1 --> R1["Hermes /recall"]

    R1 --> M1["Claude /merge"]

    M1 --> SP["Split steps"]

    SP --> EX["OpenClaw /execute"]

    EX --> V{"Validate
schema"}

    V -- "retryable error" --> EX

    V -- "ok" --> AGG["Aggregate results"]

    AGG -- "more steps" --> SP

    AGG -- "done" --> W["Hermes /write"]

    W --> SY["Claude /synthesize"]

    SY --> OUT(["📤 Output sink"])

    classDef trigger fill:#FECACA,stroke:#DC2626;

    classDef brain   fill:#FEE7DC,stroke:#D97757;

    classDef hands   fill:#DBEAFE,stroke:#3B82F6;

    classDef memory  fill:#EDE9FE,stroke:#8B5CF6;

    classDef ctrl    fill:#F3F4F6,stroke:#6B7280;

    classDef out     fill:#FEF3C7,stroke:#F59E0B;

    class T trigger

    class P1,M1,SY brain

    class EX hands

    class R1,W memory

    class J,SP,V,AGG ctrl

    class OUT out

```

| # | Node | Type | Purpose |

|---|---|---|---|

| 1 | **Webhook Trigger** | Webhook | Entry point. Accepts `{ user_intent, context }`. |

| 2 | **Generate job_id** | Code | Deterministic ID for tracing. |

| 3 | **Claude /plan** | HTTP | Decompose intent → step list. |

| 4 | **Hermes /recall** | HTTP | Pull relevant SOP + episodic memories. |

| 5 | **Claude /merge** | HTTP | Fold memory into final plan (v2). |

| 6 | **Split steps** | Split-Out | One iteration per plan step. |

| 7 | **OpenClaw /execute** | HTTP | Run a single tool invocation. |

| 8 | **Validate** | Code | JSON-schema check + retryable error detection. |

| 9 | **Aggregate** | Merge | Collect step results into Redis. |

| 10 | **Hermes /write** | HTTP | Persist episodic memory + SOP patches. |

| 11 | **Claude /synthesize** | HTTP | Final report artifact. |

| 12 | **Output sink** | Notion / Slack / Drive | Deliver to the user. |

---

## 5. Deployment diagram (3-VM hybrid topology)

The physical placement of every component. **VM-1** runs the Brain +

Orchestrator + state bus in Docker; **VM-2** and **VM-3** run the agents as

**native systemd services** (no Docker). All cross-VM traffic flows over a

Tailscale mesh; only n8n's UI (`5678`) is public-facing.

```mermaid

flowchart TB

    Internet(["🌐 Internet / Clients"])

    subgraph VM1["VM-1 · Brain + Orchestrator · Docker"]

        direction TB

        N8N["n8n
:5678 (public)"]

        CG["claude-gateway
:8001"]

        RD[("redis
:6379")]

    end

    subgraph VM2["VM-2 · Hands · native systemd"]

        OC["openclaw.service
:8002
(uvicorn + Playwright)"]

    end

    subgraph VM3["VM-3 · Memory · native systemd"]

        HM["hermes.service
:8003 (uvicorn)"]

        PG[("postgresql 16
+ pgvector · :5432")]

    end

    VGW["☁️ Vercel AI Gateway
→ Anthropic (Claude Opus)"]

    Internet -->|"HTTPS :5678"| N8N

    N8N <-->|"docker net"| CG

    CG <-->|"docker net"| RD

    CG -->|"HTTPS (Anthropic-compat base URL)"| VGW

    N8N <-->|"Tailscale :8002"| OC

    N8N <-->|"Tailscale :8003"| HM

    CG -.->|"Tailscale (optional)"| OC

    HM <-->|"localhost :5432"| PG

    classDef brain   fill:#FEE7DC,stroke:#D97757,color:#1F2937;

    classDef hands   fill:#DBEAFE,stroke:#3B82F6,color:#1F2937;

    classDef memory  fill:#EDE9FE,stroke:#8B5CF6,color:#1F2937;

    classDef store   fill:#D1FAE5,stroke:#10B981,color:#1F2937;

    classDef ctrl    fill:#F3F4F6,stroke:#6B7280,color:#1F2937;

    classDef ext     fill:#FEF3C7,stroke:#F59E0B,color:#1F2937;

    classDef net     fill:#FFFFFF,stroke:#111827,color:#1F2937;

    class CG brain

    class OC hands

    class HM memory

    class RD,PG store

    class N8N ctrl

    class VGW ext

    class Internet net

```

| VM | Runs | Runtime | Public port | Tailscale ports |

|---|---|---|---|---|

| **VM-1** | n8n · claude-gateway · redis | Docker Compose | `5678` (n8n UI) | `8001`, `6379` (internal) |

| **VM-2** | openclaw | native systemd | — | `8002` |

| **VM-3** | hermes · postgres+pgvector | native systemd | — | `8003`, `5432` |

> The Gateforge-Loom contract is **transport-agnostic** — agents talk via JSON

> over HTTP with `INTERNAL_API_TOKEN` auth. Whether an agent runs as a Docker

> container or a native systemd service is invisible to n8n and the Brain.

---

## 6. Install-flow diagram (cluster bring-up)

The order of operations to stand up the cluster from scratch. Color-coded per

VM. Provision the Tailnet first so every VM can resolve the others before you

wire `.env` files.

```mermaid

flowchart TD

    Start(["Start"]) --> TS["Provision Tailnet
+ join all 3 VMs"]

    TS --> V3a["VM-3: install Postgres 16 + pgvector"]

    V3a --> V3b["VM-3: run infra/postgres/init.sql
(schema + seed SOP)"]

    V3b --> V3c["VM-3: install hermes venv
+ hermes.service (systemd)"]

    V3c --> V3d["VM-3: lock Postgres to localhost + Tailscale IP"]

    TS --> V2a["VM-2: install python3.12 venv"]

    V2a --> V2b["VM-2: install openclaw
+ openclaw.service (systemd)"]

    V2b --> V2c["VM-2: (optional) Playwright install chromium"]

    TS --> V1a["VM-1: install Docker + Compose"]

    V1a --> V1b["VM-1: clone repo, set .env
(STUB_MODE=0 + Vercel key + Tailscale IPs)"]

    V1b --> V1c["VM-1: trim docker-compose.yml
(n8n + claude-gateway + redis only)"]

    V1c --> V1d["VM-1: make up"]

    V3d --> Health{"All 3 /health
endpoints green?"}

    V2c --> Health

    V1d --> Health

    Health -- "no" --> Fix["Check Tailscale IPs
+ INTERNAL_API_TOKEN match"]

    Fix --> Health

    Health -- "yes" --> Import["VM-1: import workflow,
point HTTP nodes at Tailscale IPs"]

    Import --> Smoke["make test (end-to-end smoke)"]

    Smoke --> Done(["✅ Cluster live"])

    classDef vm1 fill:#FEE7DC,stroke:#D97757,color:#1F2937;

    classDef vm2 fill:#DBEAFE,stroke:#3B82F6,color:#1F2937;

    classDef vm3 fill:#EDE9FE,stroke:#8B5CF6,color:#1F2937;

    classDef ctrl fill:#F3F4F6,stroke:#6B7280,color:#1F2937;

    classDef ok  fill:#D1FAE5,stroke:#10B981,color:#1F2937;

    class V1a,V1b,V1c,V1d vm1

    class V2a,V2b,V2c vm2

    class V3a,V3b,V3c,V3d vm3

    class TS,Health,Fix,Import,Smoke ctrl

    class Start,Done ok

```

---

## Components

Quick summary below; read [`docs/components.md`](docs/components.md) for the

deep dive. The **runtime** column reflects the 3-VM hybrid topology.

### 🧠 `claude-gateway` (Brain) — VM-1, Docker

- **Image:** `gateforge-loom/claude-gateway` (Python 3.12 + FastAPI)

- **Port:** 8001 (host) → 8000 (container)

- **Endpoints:** `GET /health`, `POST /plan`, `POST /merge`, `POST /synthesize`

- **Job:** thin wrapper around an LLM. Owns *all* reasoning. Returns structured

  JSON only — never executes side effects.

- **LLM routing:** reaches Anthropic through the **Vercel AI Gateway** via an

  Anthropic-compatible base URL, so it stays reachable from HK. Set

  `ANTHROPIC_BASE_URL=https://ai-gateway.vercel.sh/v1/anthropic`,

  `ANTHROPIC_API_KEY=`, and `CLAUDE_MODEL` to an Opus model.

- **Stub mode:** `STUB_MODE=1` returns canned plans so you can wire the full

  pipeline before adding API keys; set `STUB_MODE=0` for live calls.

### ✋ `openclaw` (Hands) — VM-2, native systemd

- **Runtime:** Python 3.12 venv under `/opt/openclaw`, run by `openclaw.service`

- **Port:** 8002

- **Endpoints:** `GET /health`, `GET /tools`, `POST /execute`

- **Job:** runs one plan step against one registered tool. Built-in tool

  catalogue covers `web.fetch`, `browser.action`, `shell.run`, `api.call`.

  Failures are explicit (`status=error`, `retryable` flag) — n8n decides

  whether to retry.

- **Sandboxing:** systemd hardening (`ProtectSystem=strict`, `NoNewPrivileges`,

  `PrivateTmp`); Playwright/Chromium installed in the venv for `browser.action`.

### 📚 `hermes` (Memory) — VM-3, native systemd

- **Runtime:** Python 3.12 venv under `/opt/hermes`, run by `hermes.service`

- **Port:** 8003

- **Endpoints:** `GET /health`, `POST /recall`, `POST /write`

- **Job:** vector-search SOP & episodic memories on `/recall`; persist episode

  + bump SOP versions on `/write`. Uses Postgres `vector(1536)` columns.

- **Degradable:** if Postgres is down, `/recall` returns empty hits so the

  rest of the pipeline keeps running.

### 🚌 `redis` (State bus) — VM-1, Docker

- **Image:** `redis:7-alpine`

- **Port:** 6379

- **Job:** distributed state for in-flight jobs. Keys follow a strict

  convention so any agent can debug a job:

  ```

  job:{job_id}:state

  job:{job_id}:plan

  job:{job_id}:step:{step_id}

  job:{job_id}:cursor

  job:{job_id}:lock

  ```

- TTL: 7 days for active job keys; persisted via AOF.

### 🗄 `postgres` (Long-term memory) — VM-3, native

- **Package:** `postgresql-16` + `postgresql-16-pgvector`

- **Port:** 5432 (bound to localhost + Tailscale IP only)

- **Job:** durable storage for `episodic_memory` and `sop` tables. Schema

  initialised by [`infra/postgres/init.sql`](infra/postgres/init.sql) —

  includes one seed SOP so `/recall` returns data on day 1.

- **Indexes:** B-tree on intent + created_at; ivfflat on `embedding` once

  data exists.

### 🎼 `n8n` (Orchestrator) — VM-1, Docker

- **Image:** `n8nio/n8n:latest`

- **Port:** 5678 (the only public-facing UI)

- **Job:** owns control flow — sequencing, retries, fan-out, output sinks.

- **Imports:** `n8n/workflows/gateforge-loom-pipeline.json`. After import,

  point the Hermes/OpenClaw HTTP nodes at the **Tailscale IPs** of VM-3/VM-2.

### ☁️ `Vercel AI Gateway` (LLM provider) — external

- **Endpoint:** `https://ai-gateway.vercel.sh/v1/anthropic` (Anthropic-compatible)

- **Job:** proxies Brain calls to Claude Opus, reachable from regions where

  `api.anthropic.com` is blocked. Adds provider failover and cost tracking in

  the Vercel dashboard. The `claude-gateway` code is unchanged except for the

  `base_url` it points at.

---

## Quick start

For a fast local PoC, everything still runs on **one VM** with Docker:

```bash

git clone https://github.com/tonylnng/gateforge-loom.git

cd gateforge-loom

cp .env.example .env             # fill in passwords + tokens

make up                          # build + start everything

make health                      # hit every /health endpoint

make test                        # end-to-end smoke test

```

Then open  (n8n) and import

`n8n/workflows/gateforge-loom-pipeline.json`.

For the production **3-VM hybrid** layout, follow the per-VM install sections

below.

---

## Deployment topology

```

                ┌─────────────────────────────────┐

                │  VM-1: Brain + Orchestrator      │

                │  (Docker)                        │

                │  - n8n         :5678  (public)   │

                │  - claude-gw   :8001             │──→ Vercel AI Gateway

                │  - redis       :6379             │    (Anthropic-compat base URL)

                └────────┬────────────────┬────────┘

                         │  Tailscale     │

                         ▼                ▼

              ┌──────────────┐    ┌────────────────┐

              │  VM-2 (native)│    │  VM-3 (native) │

              │  - openclaw   │    │  - hermes      │

              │    (systemd)  │    │  - postgres    │

              │  :8002        │    │    + pgvector  │

              └──────────────┘    │  :8003  :5432  │

                                  └────────────────┘

```

| Resource | VM-1 (Brain+Orch) | VM-2 (Hands) | VM-3 (Memory) |

|---|---|---|---|

| **OS** | Ubuntu 22.04 LTS | Ubuntu 22.04 LTS | Ubuntu 22.04 LTS |

| **vCPU** | 4 | 2 (4 with Playwright) | 2 |

| **RAM** | 8 GB | 4 GB (8 GB w/ browsers) | 4 GB |

| **Disk** | 80 GB SSD | 40 GB SSD | 80 GB SSD (DB growth) |

| **Runtime** | Docker Compose | native systemd | native systemd |

| **Public port** | 5678 | none | none |

All three VMs share one `INTERNAL_API_TOKEN` and live on the same Tailnet.

Pick a region close to the Vercel AI Gateway edge (HK / Singapore / Tokyo).

---

## VM-1 install — Brain + Orchestrator (Docker)

VM-1 hosts the Brain, n8n, and the Redis state bus in Docker.

### 1. Base system + Docker

```bash

sudo apt update && sudo apt upgrade -y

sudo apt install -y curl git ufw fail2ban

sudo timedatectl set-timezone Asia/Hong_Kong

curl -fsSL https://get.docker.com | sudo sh

sudo usermod -aG docker "$USER"      # re-login for group change

docker --version && docker compose version

```

### 2. Tailscale + firewall

```bash

curl -fsSL https://tailscale.com/install.sh | sh

sudo tailscale up --hostname=loom-brain

# note the Tailscale IPs of all 3 VMs — you'll need them in .env

sudo ufw default deny incoming

sudo ufw default allow outgoing

sudo ufw allow 22/tcp                 # SSH

sudo ufw allow 5678/tcp               # n8n UI (public)

sudo ufw allow in on tailscale0       # all internal traffic

sudo ufw enable

```

### 3. Clone + configure `.env`

```bash

cd /opt

sudo git clone https://github.com/tonylnng/gateforge-loom.git

sudo chown -R "$USER":"$USER" gateforge-loom

cd gateforge-loom

cp .env.example .env

chmod 600 .env

```

Edit `.env` for Vercel routing + cross-VM Tailscale IPs:

```bash

# Brain (claude-gateway) — route through Vercel AI Gateway

STUB_MODE=0

ANTHROPIC_API_KEY=

ANTHROPIC_BASE_URL=https://ai-gateway.vercel.sh/v1/anthropic

CLAUDE_MODEL=claude-opus-4-7

# Cross-VM service URLs (use Tailscale IPs)

OPENCLAW_URL=http://100.x.x.2:8002

HERMES_URL=http://100.x.x.3:8003

# Shared secrets (must match VM-2 and VM-3)

INTERNAL_API_TOKEN=

N8N_ENCRYPTION_KEY=

# n8n

N8N_HOST=

N8N_PROTOCOL=https

WEBHOOK_URL=https:///

```

> The `claude-gateway` already constructs its Anthropic client from

> `ANTHROPIC_API_KEY` and `ANTHROPIC_BASE_URL`. Pointing `ANTHROPIC_BASE_URL`

> at the Vercel endpoint is the single change that unlocks HK reachability

> while keeping the Brain layer intact.

### 4. Trim `docker-compose.yml` for VM-1

On VM-1 you only need `n8n`, `claude-gateway`, and `redis`. Comment out or

remove the `openclaw`, `hermes`, and `postgres` blocks — those run natively on

VM-2 and VM-3.

### 5. Bring it up

```bash

make up        # builds + starts n8n, claude-gateway, redis

make health    # all three should report healthy

make test      # end-to-end smoke test across all 3 VMs

```

### 6. Import the workflow

Open `http://:5678`, import

`n8n/workflows/gateforge-loom-pipeline.json`, then edit the HTTP node URLs:

| n8n Node | URL |

|---|---|

| Claude `/plan` · `/merge` · `/synthesize` | `http://claude-gateway:8001/...` (Docker network, same VM) |

| Hermes `/recall` · `/write` | `http://100.x.x.3:8003/...` (Tailscale IP of VM-3) |

| OpenClaw `/execute` | `http://100.x.x.2:8002/execute` (Tailscale IP of VM-2) |

---

## VM-2 install — OpenClaw native (Hands)

VM-2 runs OpenClaw directly as a systemd service — no Docker.

### 1. Prerequisites + service user

```bash

sudo apt update && sudo apt install -y \

    python3.12 python3.12-venv python3-pip git curl ufw fail2ban

sudo useradd --system --create-home --shell /bin/bash openclaw

sudo mkdir -p /opt/openclaw /var/log/openclaw

sudo chown -R openclaw:openclaw /opt/openclaw /var/log/openclaw

```

### 2. Clone + install into a venv

```bash

sudo -u openclaw bash <<'EOF'

cd /opt/openclaw

git clone https://github.com/tonylnng/gateforge-loom.git src

cd src/services/openclaw

python3.12 -m venv /opt/openclaw/venv

/opt/openclaw/venv/bin/pip install -r requirements.txt

/opt/openclaw/venv/bin/pip install "uvicorn[standard]"

EOF

```

### 3. Environment file

```bash

sudo tee /etc/openclaw.env >/dev/null <

LOG_LEVEL=INFO

STUB_MODE=0

TOOL_TIMEOUT_SEC=60

EOF

sudo chmod 600 /etc/openclaw.env

sudo chown openclaw:openclaw /etc/openclaw.env

```

### 4. systemd unit

```bash

sudo tee /etc/systemd/system/openclaw.service >/dev/null <<'EOF'

[Unit]

Description=OpenClaw (Gateforge-Loom Hands agent)

After=network-online.target

Wants=network-online.target

[Service]

Type=simple

User=openclaw

Group=openclaw

WorkingDirectory=/opt/openclaw/src/services/openclaw

EnvironmentFile=/etc/openclaw.env

ExecStart=/opt/openclaw/venv/bin/uvicorn app.main:app \

          --host 0.0.0.0 --port 8002 --workers 2

Restart=on-failure

RestartSec=5

StandardOutput=append:/var/log/openclaw/stdout.log

StandardError=append:/var/log/openclaw/stderr.log

# Hardening

NoNewPrivileges=true

PrivateTmp=true

ProtectSystem=strict

ReadWritePaths=/var/log/openclaw /opt/openclaw

ProtectHome=true

[Install]

WantedBy=multi-user.target

EOF

sudo systemctl daemon-reload

sudo systemctl enable --now openclaw

sudo systemctl status openclaw

```

### 5. (Optional) Playwright for `browser.action`

```bash

sudo -u openclaw /opt/openclaw/venv/bin/pip install playwright

sudo -u openclaw /opt/openclaw/venv/bin/playwright install chromium

sudo /opt/openclaw/venv/bin/playwright install-deps chromium

```

### 6. Tailscale + firewall

```bash

curl -fsSL https://tailscale.com/install.sh | sh

sudo tailscale up --hostname=loom-hands

sudo ufw default deny incoming

sudo ufw allow 22/tcp

sudo ufw allow in on tailscale0       # exposes 8002 only over the tailnet

sudo ufw enable

```

---

## VM-3 install — Hermes native (Memory)

VM-3 runs both Hermes and Postgres+pgvector natively.

### 1. Postgres 16 + pgvector

```bash

sudo apt update && sudo apt install -y \

    python3.12 python3.12-venv python3-pip git curl ufw fail2ban \

    postgresql-16 postgresql-16-pgvector

sudo systemctl enable --now postgresql

```

### 2. Initialise the database (use the repo's init.sql)

```bash

sudo -u postgres psql <<'EOF'

CREATE USER hermes WITH PASSWORD '';

CREATE DATABASE hermes_db OWNER hermes;

\c hermes_db

CREATE EXTENSION IF NOT EXISTS vector;

EOF

# Load the seed schema from the repo (ships a seed SOP so /recall works day 1)

git clone https://github.com/tonylnng/gateforge-loom.git /tmp/gfl

sudo -u postgres psql -d hermes_db -f /tmp/gfl/infra/postgres/init.sql

```

### 3. Service user + install Hermes

```bash

sudo useradd --system --create-home --shell /bin/bash hermes

sudo mkdir -p /opt/hermes /var/log/hermes

sudo chown -R hermes:hermes /opt/hermes /var/log/hermes

sudo -u hermes bash <<'EOF'

cd /opt/hermes

git clone https://github.com/tonylnng/gateforge-loom.git src

cd src/services/hermes

python3.12 -m venv /opt/hermes/venv

/opt/hermes/venv/bin/pip install -r requirements.txt

/opt/hermes/venv/bin/pip install "uvicorn[standard]"

EOF

```

### 4. Environment file

```bash

sudo tee /etc/hermes.env >/dev/null <

DATABASE_URL=postgresql://hermes:@localhost:5432/hermes_db

LOG_LEVEL=INFO

EMBEDDING_PROVIDER=stub

EOF

sudo chmod 600 /etc/hermes.env

sudo chown hermes:hermes /etc/hermes.env

```

### 5. systemd unit

```bash

sudo tee /etc/systemd/system/hermes.service >/dev/null <<'EOF'

[Unit]

Description=Hermes (Gateforge-Loom Memory agent)

After=network-online.target postgresql.service

Wants=network-online.target

Requires=postgresql.service

[Service]

Type=simple

User=hermes

Group=hermes

WorkingDirectory=/opt/hermes/src/services/hermes

EnvironmentFile=/etc/hermes.env

ExecStart=/opt/hermes/venv/bin/uvicorn app.main:app \

          --host 0.0.0.0 --port 8003 --workers 2

Restart=on-failure

RestartSec=5

StandardOutput=append:/var/log/hermes/stdout.log

StandardError=append:/var/log/hermes/stderr.log

NoNewPrivileges=true

PrivateTmp=true

ProtectSystem=strict

ReadWritePaths=/var/log/hermes /opt/hermes

ProtectHome=true

[Install]

WantedBy=multi-user.target

EOF

sudo systemctl daemon-reload

sudo systemctl enable --now hermes

sudo systemctl status hermes

```

### 6. Lock Postgres to localhost + Tailscale, then firewall

Edit `/etc/postgresql/16/main/postgresql.conf`:

```

listen_addresses = 'localhost,100.x.x.3'   # Tailscale IP only

```

Edit `/etc/postgresql/16/main/pg_hba.conf`:

```

host hermes_db hermes 100.0.0.0/8 scram-sha-256

```

Reload and firewall:

```bash

sudo systemctl restart postgresql

curl -fsSL https://tailscale.com/install.sh | sh

sudo tailscale up --hostname=loom-memory

sudo ufw default deny incoming

sudo ufw allow 22/tcp

sudo ufw allow in on tailscale0       # exposes 8003 + 5432 only over the tailnet

sudo ufw enable

```

---

## Backup & recovery

> The episodic memory in Postgres is **the only durable asset** in the stack —

> Redis state is recoverable, and n8n workflows live in the repo. Protect VM-3.

### Nightly Postgres backup (VM-3, native)

```bash

sudo tee /etc/cron.daily/hermes-backup >/dev/null <<'EOF'

#!/bin/bash

set -euo pipefail

DEST=/var/backups/hermes

mkdir -p "$DEST"

pg_dump -U hermes hermes_db | gzip > "$DEST/hermes-$(date +%F).sql.gz"

# Retain 30 days

find "$DEST" -name "hermes-*.sql.gz" -mtime +30 -delete

EOF

sudo chmod +x /etc/cron.daily/hermes-backup

```

### Off-VM rotation (recommended)

Push the nightly dump off-box so a VM loss doesn't lose memory:

```bash

# after pg_dump, sync to object storage (S3 / MinIO / B2)

aws s3 cp /var/backups/hermes/hermes-$(date +%F).sql.gz \

  s3://your-bucket/gateforge-loom/hermes/

```

### n8n volume backup (VM-1, Docker)

```bash

docker run --rm \

  -v gateforge-loom_n8n-data:/src:ro \

  -v /var/backups/gateforge-loom:/dst \

  alpine tar czf "/dst/n8n-$(date +%F).tgz" -C /src .

```

### Restore drill

```bash

# On a fresh VM-3, after CREATE DATABASE hermes_db + CREATE EXTENSION vector:

gunzip -c hermes-2026-06-02.sql.gz | sudo -u postgres psql -d hermes_db

sudo systemctl restart hermes

curl http://100.x.x.3:8003/health    # expect healthy, not degraded

```

Test restore quarterly. A backup you haven't restored is a wish, not a backup.

---

## Operations cheatsheet

| Task | VM-1 (Brain+Orch) | VM-2 (OpenClaw) | VM-3 (Hermes) |

|---|---|---|---|

| **Status** | `docker compose ps` | `systemctl status openclaw` | `systemctl status hermes postgresql` |

| **Logs** | `docker compose logs -f` | `journalctl -u openclaw -f` | `journalctl -u hermes -f` |

| **Restart** | `make up` | `sudo systemctl restart openclaw` | `sudo systemctl restart hermes` |

| **Update code** | `git pull && make build && make up` | `cd /opt/openclaw/src && sudo -u openclaw git pull && sudo systemctl restart openclaw` | `cd /opt/hermes/src && sudo -u hermes git pull && sudo systemctl restart hermes` |

| **Health (from VM-1)** | `docker exec gfl-claude-gateway curl -s http://localhost:8000/health` | `curl http://100.x.x.2:8002/health` | `curl http://100.x.x.3:8003/health` |

---

## Adding more agents

The whole point of the loom metaphor: more threads, same machine.

1. **Scaffold a new service** — copy `services/openclaw/` to

   `services//`, rename the FastAPI app, define endpoints.

2. **Deploy it** — either add a Compose block on VM-1 (`networks: [loomnet]`

   + a `/health` healthcheck) or install it natively as a new systemd service

   on its own VM, following the VM-2 pattern.

3. **Register tools (optional)** — if it's an executor, expose a `GET /tools`

   manifest so the Brain can discover its capabilities.

4. **Add an n8n node** — drop an HTTP Request node in the workflow at the right

   point, pointing at the new service's Tailscale IP. Reroute connections.

5. **Update docs** — add a row to the components table here and a section in

   `docs/components.md`.

Examples of agents that fit naturally:

| Agent | Role | Where in workflow |

|---|---|---|

| **Validator** | Schema-check tool outputs | between OpenClaw and Aggregate |

| **Critic** | Score plans before execution | between `/merge` and `Split` |

| **Router** | Pick which executor for a step | inside `Split steps` |

| **Reviewer** | Human-in-the-loop approval | before `Output sink` |

| **Embedder** | Compute embeddings for Hermes | called by `/write` |

---

## Project layout

```

gateforge-loom/

├── README.md                  # this file

├── Makefile                   # up · down · health · test · clean · nuke

├── docker-compose.yml         # full stack (trim to n8n+brain+redis for VM-1)

├── .env.example

├── docs/

│   ├── components.md          # per-component deep dive

│   ├── api-contract.md        # endpoint reference

│   ├── deployment.md          # VM bring-up, hardening, backups

│   └── architecture.md        # design decisions + extension points

├── infra/postgres/init.sql    # pgvector + tables + seed SOP (load on VM-3)

├── n8n/workflows/             # importable workflow JSON (VM-1)

├── schemas/                   # JSON Schemas for tool I/O

├── scripts/                   # health + smoke-test

└── services/

    ├── claude-gateway/        # 🧠 Brain   → VM-1 (Docker)

    ├── openclaw/              # ✋ Hands   → VM-2 (native systemd)

    └── hermes/                # 📚 Memory  → VM-3 (native systemd)

```

---

## Roadmap

- [x] Phase 1 — stub services + n8n wiring + smoke test

- [x] Phase 1.5 — 3-VM hybrid topology (Docker Brain/Orch + native Hands/Memory) over Tailscale

- [ ] Phase 2 — live Brain via **Vercel AI Gateway** (Claude Opus, Anthropic-compatible base URL)

- [ ] Phase 3 — Playwright tool inside `openclaw`

- [ ] Phase 4 — real embeddings in `hermes` (voyage-3 or text-embedding-3-small)

- [ ] Phase 5 — Validator + Critic agents

- [ ] Phase 6 — multi-tenant (`tenant_id` everywhere) + per-job cost guardrails

- [ ] Phase 7 — Helm chart for OpenShift / Kubernetes deployment

---

## License

MIT. See [LICENSE](LICENSE).

---

*Designed and maintained by [@tonylnng](https://github.com/tonylnng).

Inspired by the "三個工具不是在競爭,而是在分層" framing — Brain, Hands,

and Memory don't replace each other, they layer.*
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tonylnng/gateforge-loom

Awesome Lists containing this project

README