An open API service indexing awesome lists of open source software.

https://github.com/duriantaco/marlin

Typed AI workflow compiler/runtime in Rust with durable execution and schema-first flows
https://github.com/duriantaco/marlin

ai-workflows compiler developer-tools durable-execution llm runtime rust type-system workflow-engine

Last synced: about 1 month ago
JSON representation

Typed AI workflow compiler/runtime in Rust with durable execution and schema-first flows

Awesome Lists containing this project

README

          


Marlin logo

# Marlin

Marlin is a small language and local-first runtime for long-running AI workflows that need to pause, survive restarts, and resume safely with typed state.

Use Marlin when your workflow cannot afford to lose its place.

## Why Marlin Exists

Typical workflow today:

```text
LLM/API call -> tool/API call -> wait for human -> process crashes -> rebuild state by hand or replay work
```

With Marlin:

```text
effectful steps run -> workflow enters a durable wait state -> run state and effect log are persisted -> restart later -> continue or resume from the saved workflow state
```

That is a real runtime behavior in this repo:

- wait steps persist a durable waiting state to disk
- completed steps stay recorded in the persisted run
- `continue` and `resume` reload that persisted run instead of starting from scratch

Important caveat:

- Marlin does **not** truthfully promise "nothing ever re-runs"
- interrupted `Pure` steps may replay
- interrupted `Effectful` steps may require explicit recovery instead of blind replay
- committed wait steps resume from saved state, but interrupted waits still follow recovery rules

The exact implementation lives in:

- [`crates/marlin-runtime/src/lib.rs`](./crates/marlin-runtime/src/lib.rs)
- [`crates/marlin-nodes/src/lib.rs`](./crates/marlin-nodes/src/lib.rs)
- [`examples/review_wait.marlin`](./examples/review_wait.marlin)

One concrete example:

- a plugin-backed LLM/API drafts an outbound sales message
- a human approves it
- a second plugin-backed LLM/API step finalizes the approved message
- the workflow pauses again before dispatch
- after restart, Marlin reloads that saved run instead of replaying the finalized message step

The worked version of that story lives in [`docs/why-marlin.md`](./docs/why-marlin.md).

If you want the easiest real app example to copy, start with:

- built-in support triage app: [`examples/real_world/support_triage`](./examples/real_world/support_triage)

The repo also ships the longer outbound comparison:

- without Marlin: [`examples/real_world/without_marlin`](./examples/real_world/without_marlin)
- with Marlin: [`examples/real_world/with_marlin`](./examples/real_world/with_marlin)

## What To Measure

Marlin is trying to solve brittle long-running workflows, not raw model throughput.

So the most important metrics are:

- `expensive_step_replay_rate` after restart
- `manual_recovery_rate` after interrupted work
- `operator_visibility` of current step, completed steps, pending steps, and last effect events
- `time_to_recover` from restart to a useful inspectable state

Useful secondary metrics later:

- `draft_calls_avoided`
- `tokens_avoided`
- `cost_avoided_usd`

Marlin now surfaces token/cost-style usage data in `inspect` when a node records it in effect detail.
The real-world `DraftOutboundMessage` plugin example does this today.

## Observed Example Results

These long-running comparison results came from a manual local rerun against [`examples/real_world/fake_openai_server.py`](./examples/real_world/fake_openai_server.py), so both sides used the same HTTP-backed draft step.

Important:

- token counts are real observed values from that rerun
- USD figures are estimates derived from the example's configured token pricing, not billed provider invoices

Observed result without Marlin:

- initial draft cost:
- `49` total tokens
- `0.000041` USD estimated from configured pricing
- after manager approval, the app ran a second expensive finalize step
- the process crashed after finalize completed but before that result became durable workflow state
- `recover` had to replay finalize
- final stored record showed:
- `finalize_attempts = 2`
- `finalize_recovery_replays = 1`
- `finalize_total_tokens_total = 170`
- `finalize_cost_usd_estimate_total = 0.0001328`
- `dispatch_status = pending`
- `recovery_status = manual_recovery_replayed_finalize`
- extra post-crash finalize cost:
- `85` extra tokens
- `0.0000664` USD estimated from configured pricing

Observed result with Marlin:

- the draft + finalize steps together reported:
- `134` total tokens
- `0.000107` USD estimated from configured pricing
- after `approve`, the run entered a second durable wait at `workflow.dispatch`
- after cold restart, `continue` reloaded the same saved run at `workflow.dispatch`
- after `dispatch`, the run completed with:
- `workflow.finalize.attempts = 1`
- `workflow.notify.status = completed`
- extra post-restart finalize cost:
- `0` extra tokens
- `0` extra USD estimated

That is the current proof point: Marlin keeps completed expensive step state on disk across restart, so the workflow does not lose its place at later wait boundaries either.

## Status

Marlin is a very early beta.

This repository is:

- unstable
- incomplete
- still evolving at the language and runtime boundary

Do not treat the current syntax, IR, runtime artifact, or CLI output as stable yet.

## What Marlin Is

Marlin is meant to be:

- a small language
- a type checker and validator
- a durable execution engine
- a node/plugin system for AI workflows
- an additive inference-aware execution layer through `marlin-llm` and `marlin-gateway`
- a CLI-first tool

## What Marlin Is Not

Marlin does **not** make the LLM itself infer tokens faster.

It does not make GPT, Claude, or other models run faster at the GPU level.

What it can do is make **systems around LLMs** faster and better:

- faster to describe
- faster to validate
- faster to iterate on
- less wasteful in retries and bad wiring
- more reliable when humans, tools, and long-running waits are involved
- easier to recover after a crash because workflow state is persisted to disk

So the promise is not "faster neural nets".

The promise is closer to:

> a better language and runtime for building LLM-driven systems

Today that also includes one real execution path for model calls:

```text
workflow -> built-in LLMCall or plugin -> marlin_gateway_openai -> OpenAI-compatible upstream
```

That path gives Marlin a place to normalize:

- backend and model choice
- usage and cost metadata
- latency
- deterministic route selection

without pretending Marlin itself is a serving engine.

## What Is Being Done Right Now

The current work is focused on the core, not product surfaces.

Active work:

- tightening `marlin-lang`
- benchmarking Marlin over stable workload shapes and persisted runtime lifecycle phases
- keeping one semantic source of truth
- locking the typed IR and prepared-program boundary
- hardening persisted recovery semantics
- expanding runtime/operator measurements beyond compiler phases
- making the LLM execution path more inference-aware through `marlin-llm` and `marlin-gateway`
- surfacing richer usage and routing metadata to operators through `inspect`

Not being built yet:

- frontend studio
- cloud control plane
- browser extension
- multi-service product surface

## Install

Prerequisite:

- Rust toolchain with `cargo`

Install the CLI directly from GitHub:

```bash
cargo install --locked --git https://github.com/duriantaco/marlin.git marlin-cli
```

Check that the binary is available:

```bash
marlin --help
```

## Documentation

Long-form docs live in [`docs/`](./docs) and are set up to publish to GitHub Pages from [`.github/workflows/docs.yml`](./.github/workflows/docs.yml).

The docs are written to treat the codebase as the source of truth. Start with:

- why Marlin: [`docs/why-marlin.md`](./docs/why-marlin.md)
- real-world examples: [`examples/real_world/README.md`](./examples/real_world/README.md)
- real-world built-in app example: [`examples/real_world/support_triage/README.md`](./examples/real_world/support_triage/README.md)
- built-in LLM example with typed structured output: [`examples/llm_call_builtin.marlin`](./examples/llm_call_builtin.marlin)
- landing page: [`docs/index.md`](./docs/index.md)
- quickstart: [`docs/getting-started.md`](./docs/getting-started.md)
- integration guide: [`docs/integrating-marlin.md`](./docs/integrating-marlin.md)
- plugin guide: [`docs/plugins.md`](./docs/plugins.md)
- LLM gateway path: [`docs/llm-gateway.md`](./docs/llm-gateway.md)
- worked examples: [`docs/examples.md`](./docs/examples.md)
- runtime behavior: [`docs/runtime-semantics.md`](./docs/runtime-semantics.md)

Supporting docs:

- language/runtime boundary: [`docs/language-and-artifact.md`](./docs/language-and-artifact.md)
- CLI: [`docs/cli.md`](./docs/cli.md)
- limits: [`docs/limits-and-guarantees.md`](./docs/limits-and-guarantees.md)

## How To Use It

Prerequisite:

- Rust toolchain with `cargo`

From the repo root, validate the example program:

```bash
cargo run -p marlin-cli -- check examples/lead_review.marlin
```

Prepare the example program for runtime:

```bash
cargo run -p marlin-cli -- run examples/lead_review.marlin
```

Run the built-in `LLMCall` example against a local OpenAI-compatible endpoint:

```bash
/usr/bin/env python3 examples/real_world/fake_openai_server.py --port 8080
cargo run -p marlin-cli -- run examples/llm_call_builtin.marlin
```

That example uses `output_schema: "Answer"` so downstream bindings like `call.structured.answer`
are type-checked, while `call.text` still stays available for plain-text flows.

Run the simplest app-level example:

```bash
cd examples/real_world/support_triage
/usr/bin/env python3 ../fake_openai_server.py --port 8080
/usr/bin/env python3 app.py start \
ticket-123 \
"Taylor at ExampleCo" \
"Locked out after password reset" \
"We reset the password twice and still cannot sign in from either laptop."
/usr/bin/env python3 app.py status ticket-123
```

That example shows the same typed `LLMCall` surface inside a thin app wrapper that only stores
`ticket_id -> run_dir` and resumes `workflow.review` when a human decides.

Explain what Marlin will prepare from source:

```bash
cargo run -p marlin-cli -- explain examples/lead_review.marlin
```

Run a workflow that enters a wait state:

```bash
cargo run -p marlin-cli -- run examples/review_wait.marlin
```

If your workflow uses external plugin nodes, Marlin discovers them from:

- `MARLIN_PLUGIN_DIRS`
- `.marlin/plugins` under the current project tree or run directory tree

Read [`docs/plugins.md`](./docs/plugins.md) for the manifest format and runtime contract.

If you want the shipped LLM/API execution path, read [`docs/llm-gateway.md`](./docs/llm-gateway.md), the built-in app example under [`examples/real_world/support_triage`](./examples/real_world/support_triage), and the plugin-backed comparison under [`examples/real_world/with_marlin`](./examples/real_world/with_marlin).

Inspect a persisted run:

```bash
cargo run -p marlin-cli -- inspect .marlin/runs/
```

Get the same lifecycle state in machine-readable form:

```bash
cargo run -p marlin-cli -- status .marlin/runs/ --json
```

Resume a waiting run:

```bash
cargo run -p marlin-cli -- resume .marlin/runs/ workflow.review true
```

Continue a persisted run after restart:

```bash
cargo run -p marlin-cli -- continue .marlin/runs/
```

Cancel an active run:

```bash
cargo run -p marlin-cli -- cancel .marlin/runs/ "operator stop"
```

Run the workspace tests:

```bash
cargo test --workspace
```

Run the benchmark harness:

```bash
cargo run --release -p marlin-bench -- --format markdown
```

List the benchmark workloads:

```bash
cargo run --release -p marlin-bench -- --list-workloads
```

What the CLI currently does:

- `check` parses source, loads built-ins plus discovered plugins, and lowers source into validated typed IR
- `check`, `run`, `continue`, `resume`, `status`, and `cancel` now also support `--json` for machine-readable lifecycle output
- `explain` compiles and prepares a workflow, then prints the prepared step plan and runtime-facing metadata
- `run` prepares the program, persists a run record under `.marlin/runs`, and either completes or enters a wait state
- `continue` reloads a persisted run, reapplies recovery rules, re-checks provider/version and executor provenance compatibility, and can re-check waiting runs against persisted deadlines
- `inspect` loads a persisted run plus recent effect-log entries and now starts with the current status, next recommended action, run-level usage, and latest LLM/routing summary before the per-step detail dump
- `status` reads a persisted run directory and now surfaces the current runtime state, waiting step/kind, and next recommended operator action alongside total attempts, retried steps, and timed-out steps
- `resume` resumes a waiting step from disk with a JSON payload after verifying the persisted node provider and executor provenance still match the loaded runtime
- `cancel` marks active work cancelled and persists that terminal state
- pure steps can retry from persisted state when their manifest retry policy allows it
- pure and wait steps can time out from persisted state when their manifest timeout policy allows it
- external plugin executors run through the subprocess contract in `crates/marlin-plugin`
- gateway-backed LLM plugins can route OpenAI-compatible calls through `marlin_gateway_openai` with deterministic rule-based routing

What it does not do yet:

- production-grade replay guarantees
- robust effect recovery for real external systems
- hard timeout preemption for arbitrary executor code
- automatic retries or timeout enforcement for effectful steps
- stable public language/runtime guarantees

Runtime limits that matter today:

- retry policy is implemented only for `Pure` steps
- timeout policy is implemented only for `Pure` and `Wait` steps
- effectful executors still need explicit idempotent recovery instead of automatic replay
- a restart during an in-flight effectful step may block for recovery instead of blindly continuing
- source syntax does not yet expose retry/timeout configuration directly; today those policies come through manifests/catalog entries

## Initial Scope

The first version of Marlin should avoid product bloat.

We start with:

- one Rust workspace
- one CLI
- one semantic core
- one runtime core
- no frontend
- no browser extension
- no cloud control plane

Those can come later if the core proves worth it.

## Workspace Layout

`crates/marlin-lang`

- syntax
- AST
- validation
- typed IR
- runtime preparation

`crates/marlin-runtime`

- execution model over flattened plans
- durable state and effect log
- recovery rules for restarted runs
- scheduler interfaces

`crates/marlin-nodes`

- built-in manifests and runtime executors

`crates/marlin-plugin`

- plugin manifest loading
- subprocess executor protocol
- external node registry wiring

`crates/marlin-llm`

- shared `LLMCall` request/response/accounting contract
- normalized routing, cache, and budget hint types

`crates/marlin-gateway`

- OpenAI-compatible LLM execution layer
- routing policy loading
- usage/cost/latency normalization

`crates/marlin-cli`

- `marlin check`
- `marlin run`
- `marlin continue`
- `marlin status`
- `marlin resume`
- `marlin cancel`

`crates/marlin-bench`

- benchmark harness for Marlin compiler and persisted runtime lifecycle cost
- stable workload suite for tracking performance over time

## Current State

This repository is intentionally minimal.

The current scaffold exists to lock the semantic/runtime boundary before we commit to syntax sugar or UI. Today that boundary is:

- source text -> AST
- AST -> typed IR
- typed IR -> prepared program
- runtime -> schedules persisted execution over the prepared program

That is the contract the current compiler/runtime work is optimizing around.

## Current Language Direction

Marlin is aiming for:

- schema-first data shapes
- reusable groups/modules
- null-aware flow semantics
- one canonical typed IR
- CLI-first workflows

See [DESIGN.md](/Users/oha/marlin/DESIGN.md) for the design contract, [docs/ARCHITECTURE.md](/Users/oha/marlin/docs/ARCHITECTURE.md) for crate boundaries, and [ROADMAP.md](/Users/oha/marlin/ROADMAP.md) for the current implementation status and next milestones.

See [docs/PERFORMANCE.md](/Users/oha/marlin/docs/PERFORMANCE.md) for the current Marlin performance snapshot and [docs/BENCHMARKS.md](/Users/oha/marlin/docs/BENCHMARKS.md) for the benchmark design and methodology.