https://github.com/vinodhalaharvi/sibyl

A multi-agent convergence framework built on Temporal and Go
https://github.com/vinodhalaharvi/sibyl
ai-agents anthropic claude durable-execution golang llm multi-agent temporal workflow
Last synced: 15 days ago
JSON representation
A multi-agent convergence framework built on Temporal and Go
Host: GitHub
URL: https://github.com/vinodhalaharvi/sibyl
Owner: vinodhalaharvi
Created: 2026-05-13T13:32:04.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2026-05-14T13:56:36.000Z (about 1 month ago)
Last Synced: 2026-05-14T14:32:38.648Z (about 1 month ago)
Topics: ai-agents, anthropic, claude, durable-execution, golang, llm, multi-agent, temporal, workflow
Language: Go
Size: 168 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Sibyl

A small, principled multi-agent convergence framework built on

[Temporal](https://temporal.io) and Go.

Two cooperating agents — a **Researcher** and a **Critic** — iterate over a

question until the Critic approves the answer or `MaxRounds` is reached. The

loop itself is a Temporal Workflow (durable, replay-safe), and every LLM call

is a Temporal Activity (retried automatically on transient failures).

## Why this exists

The agent loop — *"call the model, run a tool, reason about the result, call

the model again"* — has the same shape as a long-running orchestration: many

steps, each can fail, each is expensive, and the whole thing must survive

crashes, restarts, and timeouts. Temporal is purpose-built for that shape.

Sibyl is a small reference implementation of an agent convergence pattern on

top of it.

## Project layout

```

sibyl/

├── agent/                    package agent — workflows, activities, types

│   ├── types.go              Question, Answer, Verdict, Round

│   ├── llm.go                CompleteFunc seam + Middleware + ScriptedLLM

│   ├── lift.go               bridge between weft Arrows and Temporal activities

│   ├── activities.go         Researcher and Critic — composed weft pipelines

│   ├── workflow.go           ConvergeWorkflow — the single-question convergence loop

│   ├── decompose.go          deterministic decompose + synthesize pipelines

│   ├── supervisor.go         SupervisorWorkflow — fan-out coordinator

│   ├── anthropic.go          Anthropic API client (CompleteFunc)

│   ├── claudecode.go         Local Claude Code CLI client (CompleteFunc)

│   └── *_test.go             unit tests (60 tests, in-process Temporal)

├── worker/

│   └── worker.go             Register() helper to wire Sibyl onto a Temporal worker

├── cmd/

│   ├── worker/main.go        runnable worker (-llm scripted | anthropic | claude-code)

│   ├── ask/main.go           submit a single ConvergeWorkflow

│   └── ask-supervisor/main.go  submit a SupervisorWorkflow (multi-agent fan-out)

├── go.mod / go.sum

├── Makefile

└── README.md

```

## Quick start

You need Go 1.24+ and the Temporal CLI (for the local dev server).

```bash

# 1. Resolve deps and run tests

go mod tidy

go test -race ./...

# 2. In one terminal, start the Temporal dev server

temporal server start-dev --db-filename temporal.db --ui-port 8080

# 3. In a second terminal, start the Sibyl worker

go run ./cmd/worker

# 4. In a third terminal, ask a question

go run ./cmd/ask -q "What is the capital of France?" -rounds 3

# Open http://localhost:8080 to watch the workflow execute live.

```

The bundled `cmd/worker` uses a **ScriptedLLM** by default — a deterministic,

in-memory "model" that returns canned responses. This lets you run the whole

stack end-to-end without API keys. To use a real LLM, pass `-llm`:

```bash

# Use the Anthropic API (requires ANTHROPIC_API_KEY)

go run ./cmd/worker -llm anthropic

# Use your local Claude Code CLI (uses your Pro/Max subscription auth)

go run ./cmd/worker -llm claude-code

# Default: scripted, no network, no auth

go run ./cmd/worker -llm scripted

```

## The CompleteFunc seam

The LLM boundary is a **function type**, not an interface:

```go

type CompleteFunc func(ctx context.Context, systemPrompt, userMessage string) (string, error)

```

A function type is the right tool for a single-method seam in Go: any

compatible method becomes a `CompleteFunc` via a method value, test doubles

can be plain closures, and middleware composes as ordinary function wrapping.

Three backends ship in the box:

| Type | Use it for | How |

|---|---|---|

| `ScriptedLLM` | unit tests / offline demos | canned responses, records calls |

| `AnthropicClient` | production / billed API | direct HTTP to `api.anthropic.com` |

| `ClaudeCodeClient` | running on your machine | shells out to `claude -p` |

Each exposes a `Complete` method that satisfies `CompleteFunc`:

```go

c, _ := agent.NewAnthropicClient(agent.AnthropicConfig{})

sibylworker.Register(w, c.Complete)   // method value -> CompleteFunc

```

### Middleware

Because `CompleteFunc` is a function type, wrapping it is trivial:

```go

func WithLogging(log *slog.Logger) agent.Middleware {

    return func(next agent.CompleteFunc) agent.CompleteFunc {

        return func(ctx context.Context, sys, user string) (string, error) {

            start := time.Now()

            out, err := next(ctx, sys, user)

            log.Info("llm call", "took", time.Since(start), "err", err)

            return out, err

        }

    }

}

complete := agent.Chain(rawClient.Complete, WithLogging(logger), WithRateLimit(...))

sibylworker.Register(w, complete)

```

## Composing arrows with weft

Sibyl uses [weft](https://github.com/vinodhalaharvi/weft) as its compositional

layer. Every step inside an activity — prompt building, LLM call, response

parsing — is a `weft.Arrow[A, B]`. The activity body is just `Pipe3` over

three of them:

```go

// agent/activities.go

researcher := weft.Pipe3(

    buildResearchRequest,        // weft.Arrow[ResearchInput, CompletionRequest]

    agent.CompleteAsArrow(c),    // weft.Arrow[CompletionRequest, string]

    weft.Pure(trimResponse),     // weft.Arrow[string, string]

)

```

Why this matters: as we move toward multi-agent systems, the unit of

composition is no longer the activity — it's the Arrow. You can add a

caching layer, swap parsers, or fan out to multiple LLMs in parallel

(`weft.Par`) without rewriting the activity surface. The activity wrapper

just dispatches to whichever arrow you've composed.

The `agent` package exposes two adapters (in `lift.go`):

| Adapter              | Direction              | Use it when                                  |

|----------------------|------------------------|----------------------------------------------|

| `CompleteAsArrow`    | `CompleteFunc` → Arrow | Lifting an LLM client into a weft pipeline   |

| `ArrowAsActivity`    | Arrow → activity func  | Registering a composed arrow with Temporal   |

The convergence loop itself remains a Temporal workflow (must be deterministic

for replay), but the work *inside* each round is now expressible in the

broader weft algebra. This is the seam you'd build a multi-agent supervisor

on top of.

## Multi-agent supervision

`SupervisorWorkflow` decomposes a question into subquestions, spawns a child

`ConvergeWorkflow` per subquestion in parallel, waits for all of them, and

synthesizes a final answer.

```

┌──────────────────────────────────────────────────────────────┐

│ SupervisorWorkflow                                           │

│                                                              │

│  1. Decompose activity        question -> []SubQuestion      │

│  2. for each SubQuestion:                                    │

│       ExecuteChildWorkflow(ConvergeWorkflow, ...)            │

│  3. Wait for all children (swallow individual failures)      │

│  4. Synthesize activity       []SubAnswer -> final string    │

└──────────────────────────────────────────────────────────────┘

         │                  │                  │

         ▼                  ▼                  ▼

   ConvergeWorkflow   ConvergeWorkflow   ConvergeWorkflow

   (subquestion 1)    (subquestion 2)    (subquestion N)

```

Run it:

```bash

make ask-supervisor Q="What is Go and how does it compare to Rust"

```

Or directly:

```bash

go run ./cmd/ask-supervisor -q "Postgres vs SQLite vs MySQL for a side project" -rounds 3

```

Each child workflow appears in the Web UI as a separate execution with a

deterministic ID (`-sub-`), so you can drill into any

child's event history independently.

**Failure handling.** Individual child failures are recorded in the output

(`SubAnswer.Error`) but don't fail the supervisor. The supervisor only fails

if every child failed, or if decomposition or synthesis itself failed.

**Decomposer.** Sibyl ships a deterministic heuristic decomposer that splits

on `?`, `and`, `vs`, `versus`, `compared to`, `;`. It's pure code, no LLM call,

no flakiness. Swap it for an LLM-backed decomposer by replacing

`decomposeArrow` in `agent/decompose.go` — it's a single `weft.Arrow`.

**Synthesizer.** Same story: the default synthesizer concatenates child

answers with markdown headings. Replace `synthesizeArrow` for LLM-backed

summarization.

## How the convergence loop works

```

┌──────────────────────────────────────────────────────────┐

│ ConvergeWorkflow (deterministic Go code)                 │

│                                                          │

│   for round := 1; round <= MaxRounds; round++ {          │

│       candidate := ExecuteActivity(Research, ...)        │

│       verdict   := ExecuteActivity(Critique, candidate)  │

│       if verdict.Approved {                              │

│           return candidate                               │

│       }                                                  │

│       // carry feedback forward to the next round        │

│   }                                                      │

└──────────────────────────────────────────────────────────┘

```

Both `Research` and `Critique` are activities. Their results are recorded in

the workflow's event history, so on a worker crash the workflow resumes

without re-running them.

The Critic returns structured JSON:

```json

{"approved": true, "confidence": 0.92, "feedback": ""}

```

If the model returns malformed JSON, the activity returns a non-retryable

error (`InvalidLLMResponse`) — retrying won't help if it's a prompt/model

problem, and we want to fail fast.

## Testing strategy

Temporal ships an in-process test environment (`testsuite.WorkflowTestSuite`)

that runs workflows and activities without a real server. All Sibyl tests use

it — no Docker, no network, no API keys:

```bash

go test -race -count=1 ./...

```

The `ScriptedLLM` test double lets each test specify the exact sequence of

LLM responses the workflow will see. Tests cover:

- happy path: converges on round 1

- revision path: converges on round 2 after critic feedback

- max-rounds path: terminates with `Converged: false`

- input validation: empty question, zero MaxRounds

- non-retryable errors: malformed critic JSON fails fast (one call, not five)

- the LLM-call parsing logic in each activity

## Production notes

- **Real LLM client.** Implement `LLMClient` against your provider. Keep

  retries inside the provider client minimal; let Temporal's activity retry

  policy handle it.

- **Cost control.** Set `MaxRounds` conservatively. Every round is two LLM

  calls. The Temporal Web UI shows exactly how many calls have happened so far.

- **Long human-in-the-loop.** Add a signal handler (`workflow.GetSignalChannel`)

  to inject human guidance mid-loop. The workflow can block on a signal for

  hours or days without consuming worker resources.

- **Multi-agent fan-out.** For more than two agents, spawn child workflows

  with `workflow.ExecuteChildWorkflow` — each child has its own event history

  and can crash/recover independently.

## Naming

A Sibyl, in Greek myth, was an oracle who deliberated before speaking. That's

what a convergence loop is: a structured deliberation before an answer is

returned. The library name is intentionally lowercase: `sibyl`.

## License

MIT (or your choice — edit before publishing).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vinodhalaharvi/sibyl

Awesome Lists containing this project

README