https://github.com/murdinc/lcdata

A declarative agentic LLM execution engine
https://github.com/murdinc/lcdata
agentic-ai agents anthropic cli declarative golang json-config llm nlp ollama openai orchestration pipeline rest-api self-hosted speech-to-text sse text-to-speech websocket workflow
Last synced: about 1 month ago
JSON representation
A declarative agentic LLM execution engine
Host: GitHub
URL: https://github.com/murdinc/lcdata
Owner: murdinc
Created: 2026-04-14T16:44:30.000Z (3 months ago)
Default Branch: master
Last Pushed: 2026-05-01T02:51:08.000Z (3 months ago)
Last Synced: 2026-05-01T04:19:25.803Z (3 months ago)
Topics: agentic-ai, agents, anthropic, cli, declarative, golang, json-config, llm, nlp, ollama, openai, orchestration, pipeline, rest-api, self-hosted, speech-to-text, sse, text-to-speech, websocket, workflow
Language: Go
Homepage:
Size: 19.3 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # lcdata

A declarative agentic LLM execution engine. Drop a folder into `nodes/`, write a JSON config, and the engine exposes it as a REST + WebSocket API endpoint. No code changes. No recompile. Nodes compose into pipelines with conditional branching, parallel execution, loops, and fan-out — all in JSON.

Built as a single Go binary with a file-first design: the binary is the engine, `nodes/` is the content.

---

## Quick Start

```bash

# Build

go build -o lcdata .

# First run creates lcdata.json with defaults

./lcdata serve

# List available nodes

./lcdata list

# Run a node from the CLI

./lcdata run llm_chat --input message="Hello"

# Validate all node configs

./lcdata validate

# Show the execution graph for a pipeline

./lcdata graph smart_assistant

```

---

## The Core Idea

Every node is a directory:

```

nodes/

  my_agent/

    my_agent.json     ← node config

    system.md         ← optional system prompt (LLM nodes)

```

The `type` field determines how it runs. Nodes can be wired together into pipelines using Go template expressions (`{{.step_id.field}}`). Adding a new agent means dropping a folder — no Go code required.

---

## Node Types

| Type | Description |

|------|-------------|

| `llm` | LLM call — Anthropic Claude, Ollama, or OpenAI-compatible |

| `http` | Outbound HTTP request with templated URL, headers, and body |

| `search` | Web search — Brave API or SearXNG |

| `file` | File operations — read, write, append, exists, delete, list |

| `command` | Shell command with streaming stdout |

| `transform` | Template-based data reshaping, no external call |

| `database` | SQL query — Postgres or SQLite |

| `stt` | Speech-to-text — Deepgram, OpenAI Whisper, or whisper.cpp (local) |

| `tts` | Text-to-speech — ElevenLabs, OpenAI, or Piper (local) |

| `vector` | Vector store operations via springg — upsert, search, get, delete |

| `embedding` | Generate embedding vectors — OpenAI or Ollama |

| `scaffold` | Node self-builder — create, read, list, or delete node configs at runtime |

| `pipeline` | Orchestrates other nodes — sequential, switch, parallel, loop, map |

---

## Node Config Reference

### LLM Node

```json

{

  "name": "llm_chat",

  "description": "General-purpose chat using Claude",

  "type": "llm",

  "provider": "anthropic",

  "model": "claude-sonnet-4-6",

  "system_prompt_file": "system.md",

  "temperature": 0.7,

  "max_tokens": 4096,

  "stream": true,

  "tools": ["web_search", "read_file"],

  "retry_count": 2,

  "retry_delay": "1s",

  "structured_output": {

    "intent": { "type": "string" },

    "confidence": { "type": "number" }

  },

  "input": {

    "message": { "type": "string", "required": true },

    "history":  { "type": "array",  "required": false }

  },

  "output": {

    "response": { "type": "string" },

    "usage":    { "type": "object" }

  }

}

```

- **providers:** `anthropic`, `ollama`, `openai`

- **`stream: true`** emits `chunk` events over WebSocket/SSE as tokens arrive — works with and without tools

- **`structured_output`** — when set, the LLM response is parsed as JSON and each field is merged into the output map alongside `response`

- **`history`** input — pass an array of `{role, content}` objects for multi-turn conversations

- **`max_history`** — trim `history` to the most recent N entries before sending to the model (prevents context overflow)

- **`tools`** — list of node names the LLM can invoke as tools. Anthropic and Ollama run an agentic loop (up to 10 turns). Streaming is fully supported with tools — text tokens stream as they arrive; tool calls execute between turns.

- **`retry_count` / `retry_delay`** — retry on API error with exponential backoff + jitter (e.g. `"retry_count": 3, "retry_delay": "1s"`)

**Date/time template functions** — available in system prompts and all templates:

| Function | Returns |

|---|---|

| `{{now}}` | Current time as RFC3339 string |

| `{{date}}` | Current date as `YYYY-MM-DD` |

| `{{datetime}}` | Current date+time as `YYYY-MM-DD HH:MM:SS` |

### HTTP Node

```json

{

  "name": "fetch_url",

  "type": "http",

  "method": "GET",

  "url": "{{.input.url}}",

  "strip_html": true,

  "headers": {

    "Authorization": "Bearer {{.input.token}}"

  },

  "body": "{\"query\": \"{{.input.q}}\"}",

  "input": {

    "url": { "type": "string", "required": true }

  },

  "output": {

    "status": { "type": "number" },

    "body":   { "type": "string" }

  }

}

```

- **`strip_html: true`** strips tags, scripts, and styles — returns clean plain text

- URL, headers, and body are all Go templates rendered against the run context

### Search Node

```json

{

  "name": "web_search",

  "type": "search",

  "search_provider": "brave",

  "search_count": 10,

  "input": {

    "query": { "type": "string", "required": true }

  },

  "output": {

    "results": { "type": "array" },

    "count":   { "type": "number" }

  }

}

```

- **providers:** `brave`, `searxng`

- Returns `results` as an array of `{title, url, description}` objects

### File Node

```json

{

  "name": "read_file",

  "type": "file",

  "operation": "read",

  "input": {

    "path": { "type": "string", "required": true }

  },

  "output": {

    "content": { "type": "string" },

    "size":    { "type": "number" }

  }

}

```

- **operations:** `read`, `write`, `append`, `exists`, `delete`, `list`

- `write` and `append` require `input.content`

- Parent directories are created automatically on `write`/`append`

### Command Node

```json

{

  "name": "run_script",

  "type": "command",

  "command": "bash",

  "args": ["scripts/process.sh"],

  "timeout": "10m",

  "env": {

    "TARGET": "{{.input.target}}"

  },

  "input": {

    "target": { "type": "string", "required": true }

  },

  "output": {

    "stdout":    { "type": "string" },

    "exit_code": { "type": "number" }

  }

}

```

- Stdout is streamed line-by-line as `chunk` events over WebSocket/SSE

- `timeout` accepts Go duration strings: `30s`, `5m`, `1h`

### Transform Node

```json

{

  "name": "format_report",

  "type": "transform",

  "template": "# Report\n\n{{.input.title}}\n\n{{.input.body}}",

  "input": {

    "title": { "type": "string", "required": true },

    "body":  { "type": "string", "required": true }

  },

  "output": {

    "result": { "type": "string" }

  }

}

```

### Database Node

```json

{

  "name": "query_users",

  "type": "database",

  "driver": "sqlite",

  "connection": "./data.db",

  "query": "SELECT * FROM users WHERE name = ?",

  "params": ["{{.input.name}}"],

  "input": {

    "name": { "type": "string", "required": true }

  },

  "output": {

    "rows":  { "type": "array" },

    "count": { "type": "number" }

  }

}

```

- **drivers:** `sqlite`, `postgres`

- `params` values are Go templates rendered against the run context

- Rows stream as `chunk` events; final output is `{rows, count}`

### STT Node

```json

{

  "name": "transcribe",

  "type": "stt",

  "provider": "deepgram",

  "model": "nova-2",

  "language": "en",

  "input": {

    "audio_url": { "type": "string", "required": true }

  },

  "output": {

    "transcript": { "type": "string" },

    "confidence": { "type": "number" },

    "words":      { "type": "array" },

    "duration":   { "type": "number" },

    "language":   { "type": "string" }

  }

}

```

- **providers:** `deepgram` (pre-recorded REST API), `openai` / `whisper` (multipart upload), `whisper-cpp` (local)

- Deepgram accepts an audio URL in `input.audio_url`; OpenAI/Whisper fetches the URL then uploads it

- `whisper-cpp` runs locally via the `whisper-cli` binary — no API key required

**Local STT with whisper.cpp:**

```json

{

  "name": "transcribe_local",

  "type": "stt",

  "provider": "whisper-cpp",

  "model": "/path/to/ggml-base.en.bin",

  "language": "en",

  "input": {

    "audio_url": { "type": "string", "required": true }

  },

  "output": {

    "transcript": { "type": "string" },

    "confidence": { "type": "number" },

    "words":      { "type": "array" },

    "language":   { "type": "string" }

  }

}

```

- `model` — path to a whisper.cpp GGML model file (e.g. `ggml-base.en.bin`, `ggml-large-v3.bin`). Falls back to `whisperCppModel` in env config.

- Binary defaults to `whisper-cli` on `$PATH`. Override with `whisperCppBin` in env config or `WHISPER_CPP_BIN` env var.

- Models: download from [ggerganov/whisper.cpp](https://github.com/ggerganov/whisper.cpp) or via `bash models/download-ggml-model.sh base.en`

### TTS Node

```json

{

  "name": "speak",

  "type": "tts",

  "provider": "elevenlabs",

  "model": "eleven_multilingual_v2",

  "voice_id": "21m00Tcm4TlvDq8ikWAM",

  "input": {

    "text": { "type": "string", "required": true }

  },

  "output": {

    "audio_base64": { "type": "string" },

    "content_type":  { "type": "string" },

    "size_bytes":    { "type": "number" }

  }

}

```

- **providers:** `elevenlabs`, `openai`, `piper`

- Returns audio as a base64-encoded string. `content_type` is `audio/mpeg` for cloud providers, `audio/wav` for Piper.

**Local TTS with Piper:**

```json

{

  "name": "speak_local",

  "type": "tts",

  "provider": "piper",

  "voice_id": "/path/to/en_US-lessac-medium.onnx",

  "input": {

    "text": { "type": "string", "required": true }

  },

  "output": {

    "audio_base64": { "type": "string" },

    "content_type":  { "type": "string" },

    "size_bytes":    { "type": "number" }

  }

}

```

- `voice_id` — path to a Piper `.onnx` voice model file (required). No API key needed.

- Binary defaults to `piper` on `$PATH`. Override with `piperBin` in env config or `PIPER_BIN` env var.

- Voice models: download from [rhasspy/piper](https://github.com/rhasspy/piper) releases

### Vector Node

Backed by [springg](https://github.com/murdinc/springg) — a local vector store with cosine similarity search, WAL persistence, and optional S3 backup.

**Operations:** `upsert`, `search`, `get`, `delete`, `create_index`, `delete_index`

**upsert** — add or update a vector:

```json

{

  "name": "memory_store",

  "type": "vector",

  "operation": "upsert",

  "index": "assistant_memory",

  "input": {

    "id":       { "type": "string", "required": true },

    "vector":   { "type": "array",  "required": true },

    "metadata": { "type": "object", "required": false }

  },

  "output": {

    "id":    { "type": "string" },

    "added": { "type": "boolean" }

  }

}

```

**search** — find top-k similar vectors by cosine similarity:

```json

{

  "name": "memory_search",

  "type": "vector",

  "operation": "search",

  "index": "assistant_memory",

  "top_k": 5,

  "input": {

    "vector": { "type": "array",  "required": true },

    "k":      { "type": "number", "required": false }

  },

  "output": {

    "results": { "type": "array" },

    "count":   { "type": "number" }

  }

}

```

Each result in `results` is `{id, score, metadata}` where `score` is cosine similarity (0–1).

**get** — fetch a stored vector by ID:

```json

{

  "name": "memory_get",

  "type": "vector",

  "operation": "get",

  "index": "assistant_memory",

  "input": {

    "id": { "type": "string", "required": true }

  },

  "output": {

    "id":       { "type": "string" },

    "vector":   { "type": "array" },

    "metadata": { "type": "object" }

  }

}

```

**delete** — remove a vector by ID:

```json

{

  "name": "memory_delete",

  "type": "vector",

  "operation": "delete",

  "index": "assistant_memory",

  "input": {

    "id": { "type": "string", "required": true }

  }

}

```

**create_index** / **delete_index** — manage indexes (typically run once at setup):

```json

{

  "name": "memory_init",

  "type": "vector",

  "operation": "create_index",

  "index": "assistant_memory",

  "dimensions": 1536

}

```

- `index` — name of the springg index (required on all operations)

- `dimensions` — vector size; must match your embedding model (required for `create_index`)

- `top_k` — default number of results for `search` (default: 10); overridable per-call via `input.k`

- Vectors are stored as 32-bit floats; pass any JSON number array as `input.vector`

### Embedding Node

Generates embedding vectors from text. Pair with the `vector` node to build RAG pipelines.

```json

{

  "name": "embed_text",

  "type": "embedding",

  "provider": "openai",

  "model": "text-embedding-3-small",

  "input": {

    "text": { "type": "string", "required": true }

  },

  "output": {

    "vector":     { "type": "array" },

    "dimensions": { "type": "number" },

    "model":      { "type": "string" }

  }

}

```

- **providers:** `openai` (default model: `text-embedding-3-small`), `ollama` (model required, e.g. `nomic-embed-text`)

- Output `vector` can be passed directly to a `vector` node's `upsert` or `search` input

---

### Scaffold Node

The `scaffold` type lets lcdata **create new nodes at runtime** — it can build itself. Combined with an LLM node and the built-in `design_node` pipeline, you can describe a new capability in plain English and have it live in the engine within seconds.

Operations: `list`, `read`, `create`, `delete`.

```json

{

  "name": "scaffold_list",

  "type": "scaffold",

  "operation": "list"

}

```

```json

{

  "name": "scaffold_create",

  "type": "scaffold",

  "operation": "create",

  "input": {

    "name":          { "type": "string", "required": true },

    "config":        { "type": "string", "required": true },

    "system_prompt": { "type": "string" }

  }

}

```

**`list`** — returns `{nodes: [...summaries], count}`. No inputs required.

**`read`** — input: `name`. Returns `{name, path, config (string), object (parsed)}`.

**`create`** — inputs: `name`, `config` (JSON string or object — validated before writing), optional `system_prompt` (written as `system.md`). Returns `{name, path, created: true}`. The hot-reload watcher picks up the new node within 200ms automatically.

**`delete`** — input: `name`. Removes the node directory. Hot-reload deregisters it.

#### Self-building: `design_node` pipeline

The built-in `design_node` pipeline takes a natural language description and creates a working node:

```bash

curl -X POST http://localhost:8080/api/nodes/design_node/run \

  -H "Content-Type: application/json" \

  -d '{

    "input": {

      "name": "sentiment_score",

      "description": "Classify the sentiment of a text as positive, negative, or neutral with a confidence score"

    }

  }'

```

Internally it chains: `scaffold_list` → `scaffold_designer_llm` (Claude with full schema reference) → `scaffold_create`. The new node is live as soon as the response returns.

The `scaffold_designer_llm` node uses a comprehensive system prompt (`nodes/scaffold_designer_llm/system.md`) covering every node type, field, template syntax, naming conventions, and design guidelines so the LLM generates configs that pass validation on the first attempt.

**Provider note:** `design_node` defaults to `anthropic` / `claude-opus-4-5` for the designer LLM. You can edit `nodes/scaffold_designer_llm/scaffold_designer_llm.json` to switch to any other provider/model.

---

## Pipelines

Pipelines wire nodes together. Each step's output is available to all subsequent steps via `{{.step_id.field}}`. The pipeline's `output` block defines what gets returned to the caller.

```json

{

  "name": "my_pipeline",

  "type": "pipeline",

  "steps": [ ... ],

  "input": { ... },

  "output": {

    "answer": "{{.final_step.response}}",

    "items":  "{{.gather.results}}"

  }

}

```

### Sequential Step

```json

{

  "id": "summarize",

  "node": "summarize",

  "input": {

    "message": "{{.fetch.body}}"

  }

}

```

### Error Handling

Any step can specify `on_error` to run a fallback node instead of aborting the pipeline. The handler node receives `input.error` and `input.step_id`; its output replaces the failed step's output.

```json

{

  "id": "fetch",

  "node": "fetch_url",

  "input": { "url": "{{.input.url}}" },

  "on_error": "fallback_handler"

}

```

### Switch (Conditional Branching)

Routes to a different node based on a runtime value. LLM outputs like `{"intent": "search"}` are automatically normalized to `"search"`.

```json

{

  "id": "route",

  "switch": "{{.classify.intent}}",

  "cases": {

    "search":   { "node": "web_search",  "input": { "query":   "{{.input.message}}" } },

    "chat":     { "node": "llm_chat",    "input": { "message": "{{.input.message}}" } },

    "default":  { "node": "llm_chat",    "input": { "message": "{{.input.message}}" } }

  }

}

```

### Parallel

All branches run concurrently. Outputs are namespaced: `{{.gather.web.results}}`, `{{.gather.db.rows}}`.

```json

{

  "id": "gather",

  "parallel": [

    { "id": "web", "node": "web_search", "input": { "query": "{{.input.topic}}" } },

    { "id": "db",  "node": "db_lookup",  "input": { "term":  "{{.input.topic}}" } }

  ]

}

```

### Loop

Repeats inner steps until a condition is true or `max_iterations` is reached. Each iteration shares the same run context so later iterations can reference earlier ones.

```json

{

  "id": "refine",

  "loop": {

    "max_iterations": 5,

    "until": "{{gt (toFloat .evaluate.score) 0.8}}",

    "steps": [

      { "id": "draft",    "node": "llm_writer",    "input": { "topic": "{{.input.topic}}" } },

      { "id": "evaluate", "node": "llm_evaluator",  "input": { "draft": "{{.draft.text}}" } }

    ]

  }

}

```

### Map (Fan-out)

Runs a node once per item in an array. Results are collected into a new context key.

```json

{

  "id": "fetch_all",

  "map": {

    "over":        "{{.search.results}}",

    "as":          "result",

    "node":        "fetch_url",

    "concurrency": 3,

    "input": {

      "url": "{{.result.url}}"

    },

    "collect_as": "pages"

  }

}

```

- `as` names the current item for use in `input` templates

- `collect_as` sets the context key where the result array lands

- `concurrency` controls how many items run in parallel (default: 1 = sequential)

---

## Run Context

All steps in a run share a thread-safe `RunContext`. Steps read via templates and write by returning their output fields. Keys are namespaced by step ID.

```

input.message          → user-provided input

classify.intent        → written by the "classify" step

gather.web.results     → written by branch "web" inside parallel step "gather"

fetch_all.0            → written by map step "fetch_all" for item 0

pages                  → the collected array from a map step's collect_as

```

### Template Functions

| Function | Usage |

|----------|-------|

| `{{.step.field}}` | Access any step output |

| `{{.input.field}}` | Access run inputs |

| `{{toJSON .value}}` | Marshal to JSON string |

| `{{fromJSON .str}}` | Parse JSON string |

| `{{toFloat .value}}` | Convert to float64 |

| `{{toInt .value}}` | Convert to int |

| `{{default val fallback}}` | Use fallback if val is empty/nil |

| `{{join arr sep}}` | Join string array with separator |

| `{{gt a b}}` / `{{lt a b}}` | Comparison (for loop conditions) |

Simple path references like `{{.step.field}}` preserve the original type (array, map, number). Complex templates like `"https://{{.host}}/{{.path}}"` produce strings.

---

## Built-in Nodes

The `nodes/` directory ships with these ready-to-use nodes:

| Node | Type | What it does |

|------|------|-------------|

| `llm_chat` | llm | Streaming chat with Claude (sonnet-4-6) |

| `classify_intent` | llm | Classifies input as: search, database, chat, command |

| `classify_sentiment` | llm | Returns `{sentiment, confidence, explanation}` |

| `extract_entities` | llm | Returns `{people, organizations, locations, dates, topics}` |

| `summarize` | llm | Condenses text to 2-4 sentences |

| `translate` | llm | Translates to any language |

| `fetch_url` | http | Fetches a URL, strips HTML to plain text |

| `web_search` | search | Brave web search → `[{title, url, description}]` |

| `read_file` | file | Reads a file from disk |

| `write_file` | file | Writes a file to disk |

| `research_pipeline` | pipeline | web_search → fetch pages → summarize each → synthesize answer |

| `smart_assistant` | pipeline | Classify intent → route to research, translate, or chat |

| `analyze_document` | pipeline | Read file → parallel analysis → compose report → write file |

### Pipeline: `research_pipeline`

```

web_search

  └── map: fetch_url (concurrency: 3)

        └── map: summarize (concurrency: 3)

              └── llm_chat (synthesize)

```

Returns `{response, search_results, summaries}`.

### Pipeline: `smart_assistant`

```

classify_intent

  └── switch on intent:

        "search"   → research_pipeline

        "translate" → translate

        "default"  → llm_chat

```

Returns `{response, intent}`.

### Pipeline: `analyze_document`

```

read_file

  └── parallel:

        ├── summarize

        ├── extract_entities

        └── classify_sentiment

              └── llm_chat (compose report)

                    └── write_file

```

Returns `{report, output_path, summary, sentiment}`.

---

## API

### Discovery

```

GET /api/nodes           → list all nodes with descriptions and I/O schemas

GET /api/nodes/{name}    → full node spec

GET /api/health          → health check

GET /api/info            → server version and capabilities

```

### Execution

```

POST /api/nodes/{name}/run     → synchronous, waits for full result

POST /api/nodes/{name}/stream  → Server-Sent Events, streams as it runs

GET  /ws/nodes/{name}          → WebSocket, bidirectional streaming

POST /api/nodes/{name}/audio   → multipart upload: POST audio file, runs node with audio_url set

```

**Audio upload** — `multipart/form-data` with fields:

- `audio` (required) — audio file (WAV, MP3, OGG, FLAC, M4A, WebM)

- `env` (optional) — environment name (default: `"default"`)

The file is saved to a temp path and passed as `input.audio_url` to the node. Works with any `stt` node; `whisper-cpp` reads it directly without an HTTP round-trip.

**Request body:**

```json

{

  "input": { "message": "What is the capital of France?" },

  "env":    "default"

}

```

**Response:**

```json

{

  "run_id":      "a3f9b2c1",

  "node":        "smart_assistant",

  "status":      "completed",

  "output": {

    "response": "The capital of France is Paris.",

    "intent":   "chat"

  },

  "steps": [

    { "id": "classify", "node": "classify_intent", "status": "completed", "duration_ms": 180 },

    { "id": "handle",   "node": "llm_chat",         "status": "completed", "duration_ms": 620 }

  ],

  "duration_ms": 800

}

```

### Run Management

```

GET  /api/runs           → list recent runs

GET  /api/runs/{id}      → get run status and full result

POST /api/runs/{id}/cancel → cancel an in-progress run

```

### Streaming Events

All streaming connections (WebSocket and SSE) receive the same event stream:

```json

{"event":"run_started",    "run_id":"abc", "node":"smart_assistant"}

{"event":"step_started",   "run_id":"abc", "step_id":"classify",  "node":"classify_intent"}

{"event":"step_completed", "run_id":"abc", "step_id":"classify",  "output":{"intent":"search"}, "duration_ms":180}

{"event":"step_started",   "run_id":"abc", "step_id":"handle",    "node":"research_pipeline"}

{"event":"chunk",          "run_id":"abc", "step_id":"synthesize","data":"Based on the search results..."}

{"event":"map_progress",   "run_id":"abc", "step_id":"fetch_all", "progress":3, "total":10}

{"event":"step_completed", "run_id":"abc", "step_id":"handle",    "output":{...}, "duration_ms":4200}

{"event":"run_completed",  "run_id":"abc", "output":{...}, "duration_ms":4380}

```

**Event types:** `run_started`, `run_completed`, `run_failed`, `run_cancelled`, `step_started`, `step_completed`, `step_failed`, `chunk`, `loop_iteration`, `map_progress`, `retry`

---

## Configuration

### Server Config (`lcdata.json`)

Created automatically on first run.

```json

{

  "port":                8080,

  "jwt_secret":          "change-this-in-production",

  "require_jwt":         true,

  "nodes_path":          "./nodes",

  "env":                 "default",

  "log_level":           "info",

  "max_concurrent_runs": 10,

  "run_timeout":         "5m",

  "run_history":         100,

  "store_path":          "./lcdata.db",

  "rate_limit_rps":      0,

  "rate_limit_burst":    0

}

```

- **`store_path`** — SQLite file for run history persistence (created automatically)

- **`rate_limit_rps`** — requests per second per JWT `sub` claim (0 = disabled); falls back to remote IP when no JWT is present

- **`rate_limit_burst`** — bucket size for bursts (default: `rps * 2`)

### Credentials (`~/lcdataenv.json`)

Lookup order: `~/lcdataenv.json` → `./nodes/env.json`. All fields also fall back to environment variables.

```json

{

  "environments": {

    "default": {

      "anthropicKey":    "sk-ant-...",

      "ollamaEndpoint":  "http://localhost:11434",

      "openaiKey":       "sk-...",

      "braveKey":        "BSA...",

      "searxngEndpoint": "http://localhost:8888",

      "elevenlabsKey":   "",

      "deepgramKey":     "",

      "whisperCppBin":   "/usr/local/bin/whisper-cli",

      "whisperCppModel": "/path/to/ggml-base.en.bin",

      "piperBin":        "/usr/local/bin/piper",

      "springgEndpoint": "http://localhost:8181",

      "springgKey":      "",

      "dbConnections": {

        "main": "postgres://user:pass@localhost:5432/mydb"

      }

    },

    "production": {

      "anthropicKey": "sk-ant-...",

      "dbConnections": {

        "main": "postgres://user:pass@prod:5432/mydb?sslmode=require"

      }

    }

  }

}

```

**Environment variable fallbacks:**

| Config key | Env var |

|-----------|---------|

| `anthropicKey` | `ANTHROPIC_API_KEY` |

| `openaiKey` | `OPENAI_API_KEY` |

| `ollamaEndpoint` | `OLLAMA_ENDPOINT` |

| `braveKey` | `BRAVE_API_KEY` |

| `searxngEndpoint` | `SEARXNG_ENDPOINT` |

| `elevenlabsKey` | `ELEVENLABS_API_KEY` |

| `deepgramKey` | `DEEPGRAM_API_KEY` |

| `whisperCppBin` | `WHISPER_CPP_BIN` |

| `whisperCppModel` | `WHISPER_CPP_MODEL` |

| `piperBin` | `PIPER_BIN` |

| `springgEndpoint` | `SPRINGG_ENDPOINT` |

| `springgKey` | `SPRINGG_KEY` |

---

## CLI

```

lcdata serve                                        start the HTTP + WebSocket server

lcdata init [name] [type]                           scaffold a new node directory

lcdata list                                         list all nodes

lcdata show [name]                                  show full node config

lcdata run [name] --input key=val --env prod        run a node locally (no server)

lcdata run [name] --input message=-                 read one input value from stdin

lcdata validate                                     validate all node configs

lcdata graph [name]                                 print execution tree with icons

lcdata generate-jwt --client my-service             generate a signed JWT

lcdata generate-jwt --client svc --allow node1,node2  JWT scoped to specific nodes

lcdata version                                      show version

```

**Graph output example:**

```

◼ analyze_document  (pipeline)

├── [read]

│   ▤ read_file  (file)

├── [analyze] parallel (3 branches)

│   ├── branch "summary"

│   │   ◆ summarize  (llm)

│   ├── branch "entities"

│   │   ◆ extract_entities  (llm)

│   └── branch "sentiment"

│       ◆ classify_sentiment  (llm)

├── [compose]

│   ◆ llm_chat  (llm)

└── [write]

    ▤ write_file  (file)

```

**Node type icons:** ◆ llm · ◈ http · ⊕ search · ▤ file · ▶ command · ▣ database · ◇ transform · ◼ pipeline · ◎ stt · ◉ tts

---

## Auth

JWT authentication is enabled by default. Disable with `"require_jwt": false` in `lcdata.json`.

Generate a token:

```bash

lcdata generate-jwt --client my-service

```

Use in requests:

```

Authorization: Bearer 

```

---

## Operational Features

### Hot Reload

The server watches the `nodes/` directory with fsnotify. Adding, editing, or removing a node config takes effect within 200ms — no restart required. Discovery endpoints always reflect the current state.

### Run Persistence

Completed runs are persisted to SQLite (`store_path` in config). The `/api/runs` endpoint returns the most recent N runs (`run_history`), merging in-flight runs with persisted ones.

### Cost Tracking

LLM nodes report `input_tokens` and `output_tokens` in their output under a `usage` key. The runner aggregates token counts across all steps and exposes them on the run record:

```json

{

  "run_id": "a3f9b2c1",

  "input_tokens": 1240,

  "output_tokens": 387,

  "steps": [

    { "id": "classify", "input_tokens": 320,  "output_tokens": 12  },

    { "id": "answer",   "input_tokens": 920,  "output_tokens": 375 }

  ]

}

```

### Retry

Nodes that fail due to transient errors (API timeouts, rate limits) retry automatically with exponential backoff and ±25% jitter. Configure per-node:

```json

{

  "retry_count": 3,

  "retry_delay": "1s"

}

```

`retry` events are emitted on each attempt so streaming clients can observe them.

---

## Project Structure

```

lcdata/

  main.go

  go.mod

  lcdata.json.example

  lcdataenv.json.example

  DESIGN.md

  cmd/

    root.go           Cobra root + global flags

    serve.go          HTTP server, WebSocket, SSE, JWT middleware, rate limiting

    init.go           lcdata init (scaffold node directory)

    list.go           lcdata list

    show.go           lcdata show

    run.go            lcdata run (stdin support via key=-)

    validate.go       lcdata validate

    graph.go          lcdata graph (ASCII tree)

    jwt.go            lcdata generate-jwt (with --allow node scoping)

    version.go        lcdata version

  internal/lcdata/

    config.go         Server config (lcdata.json)

    environment.go    Credentials config (lcdataenv.json)

    node.go           Node struct, JSON loading, field schema, input validation

    pipeline.go       Step, SwitchCase, LoopConfig, MapConfig types

    runner.go         Run lifecycle, async execution, node hot-swap

    watcher.go        fsnotify hot reload (debounced, 200ms)

    store.go          SQLite run history persistence

    retry.go          Exponential backoff with ±25% jitter

    context.go        RunContext, template rendering, type preservation

    stream.go         Event types, Run struct, StepResult (with token counts)

    flow.go           Pipeline execution, switch/parallel/loop/map, on_error

    executor.go       Per-type dispatch with retry wrapper

    executor_llm.go   Anthropic (tool use loop), Ollama, OpenAI (SSE streaming)

    executor_http.go  HTTP requests + HTML stripping

    executor_search.go Brave API + SearXNG

    executor_file.go  File read/write/append/exists/delete/list

    executor_cmd.go   Command execution with streaming stdout

    executor_xfm.go   Transform (Go template rendering)

    executor_db.go    Database — SQLite + Postgres via database/sql

    executor_stt.go   STT — Deepgram, OpenAI Whisper, whisper.cpp (local+file path)

    executor_tts.go   TTS — ElevenLabs, OpenAI, Piper (local, returns base64 audio)

    executor_vector.go Vector store — springg (upsert, search, get, delete, create/delete index)

    executor_embed.go  Embeddings — OpenAI, Ollama (returns float64 vector)

  nodes/

    llm_chat/

    classify_intent/

    classify_sentiment/

    extract_entities/

    summarize/

    translate/

    fetch_url/

    web_search/

    read_file/

    write_file/

    research_pipeline/

    smart_assistant/

    analyze_document/

```

---

## Dependencies

```

github.com/anthropics/anthropic-sdk-go  v0.2.0-alpha.4

github.com/fsnotify/fsnotify            v1.9.0

github.com/go-chi/chi/v5                v5.2.3

github.com/go-chi/cors                  v1.2.2

github.com/golang-jwt/jwt/v5            v5.3.0

github.com/gorilla/websocket            v1.5.3

github.com/lib/pq                       v1.10.9

github.com/spf13/cobra                  v1.8.0

modernc.org/sqlite                      v1.37.0

```

Go 1.23+ (uses `log/slog`, `math/rand/v2`)

---

## Comparison

| | lcdata | LangChain | n8n |

|---|---|---|---|

| Config format | JSON files | Python code | Visual UI |

| Add new agent | Drop a folder | Write a class | Drag nodes |

| Streaming | All node types, unified events | Per-chain, varies | Limited |

| Self-describing API | `/api/nodes` live registry | No | Workflow export |

| Flow control | switch/parallel/loop/map in JSON | Graph edges in code | Node connections |

| Provider switch | Change one JSON field | Change class import | Reconfigure credential |

| Deploy | Single Go binary | Python env + deps | Node.js app |

| CLI mode | `lcdata run name` | Script the chain | No |
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/murdinc/lcdata

Awesome Lists containing this project

README