https://github.com/duriantaco/marlin
Typed AI workflow compiler/runtime in Rust with durable execution and schema-first flows
https://github.com/duriantaco/marlin
ai-workflows compiler developer-tools durable-execution llm runtime rust type-system workflow-engine
Last synced: about 1 month ago
JSON representation
Typed AI workflow compiler/runtime in Rust with durable execution and schema-first flows
- Host: GitHub
- URL: https://github.com/duriantaco/marlin
- Owner: duriantaco
- License: apache-2.0
- Created: 2026-04-19T07:39:13.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-19T09:46:34.000Z (about 2 months ago)
- Last Synced: 2026-04-19T10:30:52.862Z (about 2 months ago)
- Topics: ai-workflows, compiler, developer-tools, durable-execution, llm, runtime, rust, type-system, workflow-engine
- Language: Rust
- Homepage:
- Size: 2.56 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Marlin
Marlin is a small language and local-first runtime for long-running AI workflows that need to pause, survive restarts, and resume safely with typed state.
Use Marlin when your workflow cannot afford to lose its place.
## Why Marlin Exists
Typical workflow today:
```text
LLM/API call -> tool/API call -> wait for human -> process crashes -> rebuild state by hand or replay work
```
With Marlin:
```text
effectful steps run -> workflow enters a durable wait state -> run state and effect log are persisted -> restart later -> continue or resume from the saved workflow state
```
That is a real runtime behavior in this repo:
- wait steps persist a durable waiting state to disk
- completed steps stay recorded in the persisted run
- `continue` and `resume` reload that persisted run instead of starting from scratch
Important caveat:
- Marlin does **not** truthfully promise "nothing ever re-runs"
- interrupted `Pure` steps may replay
- interrupted `Effectful` steps may require explicit recovery instead of blind replay
- committed wait steps resume from saved state, but interrupted waits still follow recovery rules
The exact implementation lives in:
- [`crates/marlin-runtime/src/lib.rs`](./crates/marlin-runtime/src/lib.rs)
- [`crates/marlin-nodes/src/lib.rs`](./crates/marlin-nodes/src/lib.rs)
- [`examples/review_wait.marlin`](./examples/review_wait.marlin)
One concrete example:
- a plugin-backed LLM/API drafts an outbound sales message
- a human approves it
- a second plugin-backed LLM/API step finalizes the approved message
- the workflow pauses again before dispatch
- after restart, Marlin reloads that saved run instead of replaying the finalized message step
The worked version of that story lives in [`docs/why-marlin.md`](./docs/why-marlin.md).
If you want the easiest real app example to copy, start with:
- built-in support triage app: [`examples/real_world/support_triage`](./examples/real_world/support_triage)
The repo also ships the longer outbound comparison:
- without Marlin: [`examples/real_world/without_marlin`](./examples/real_world/without_marlin)
- with Marlin: [`examples/real_world/with_marlin`](./examples/real_world/with_marlin)
## What To Measure
Marlin is trying to solve brittle long-running workflows, not raw model throughput.
So the most important metrics are:
- `expensive_step_replay_rate` after restart
- `manual_recovery_rate` after interrupted work
- `operator_visibility` of current step, completed steps, pending steps, and last effect events
- `time_to_recover` from restart to a useful inspectable state
Useful secondary metrics later:
- `draft_calls_avoided`
- `tokens_avoided`
- `cost_avoided_usd`
Marlin now surfaces token/cost-style usage data in `inspect` when a node records it in effect detail.
The real-world `DraftOutboundMessage` plugin example does this today.
## Observed Example Results
These long-running comparison results came from a manual local rerun against [`examples/real_world/fake_openai_server.py`](./examples/real_world/fake_openai_server.py), so both sides used the same HTTP-backed draft step.
Important:
- token counts are real observed values from that rerun
- USD figures are estimates derived from the example's configured token pricing, not billed provider invoices
Observed result without Marlin:
- initial draft cost:
- `49` total tokens
- `0.000041` USD estimated from configured pricing
- after manager approval, the app ran a second expensive finalize step
- the process crashed after finalize completed but before that result became durable workflow state
- `recover` had to replay finalize
- final stored record showed:
- `finalize_attempts = 2`
- `finalize_recovery_replays = 1`
- `finalize_total_tokens_total = 170`
- `finalize_cost_usd_estimate_total = 0.0001328`
- `dispatch_status = pending`
- `recovery_status = manual_recovery_replayed_finalize`
- extra post-crash finalize cost:
- `85` extra tokens
- `0.0000664` USD estimated from configured pricing
Observed result with Marlin:
- the draft + finalize steps together reported:
- `134` total tokens
- `0.000107` USD estimated from configured pricing
- after `approve`, the run entered a second durable wait at `workflow.dispatch`
- after cold restart, `continue` reloaded the same saved run at `workflow.dispatch`
- after `dispatch`, the run completed with:
- `workflow.finalize.attempts = 1`
- `workflow.notify.status = completed`
- extra post-restart finalize cost:
- `0` extra tokens
- `0` extra USD estimated
That is the current proof point: Marlin keeps completed expensive step state on disk across restart, so the workflow does not lose its place at later wait boundaries either.
## Status
Marlin is a very early beta.
This repository is:
- unstable
- incomplete
- still evolving at the language and runtime boundary
Do not treat the current syntax, IR, runtime artifact, or CLI output as stable yet.
## What Marlin Is
Marlin is meant to be:
- a small language
- a type checker and validator
- a durable execution engine
- a node/plugin system for AI workflows
- an additive inference-aware execution layer through `marlin-llm` and `marlin-gateway`
- a CLI-first tool
## What Marlin Is Not
Marlin does **not** make the LLM itself infer tokens faster.
It does not make GPT, Claude, or other models run faster at the GPU level.
What it can do is make **systems around LLMs** faster and better:
- faster to describe
- faster to validate
- faster to iterate on
- less wasteful in retries and bad wiring
- more reliable when humans, tools, and long-running waits are involved
- easier to recover after a crash because workflow state is persisted to disk
So the promise is not "faster neural nets".
The promise is closer to:
> a better language and runtime for building LLM-driven systems
Today that also includes one real execution path for model calls:
```text
workflow -> built-in LLMCall or plugin -> marlin_gateway_openai -> OpenAI-compatible upstream
```
That path gives Marlin a place to normalize:
- backend and model choice
- usage and cost metadata
- latency
- deterministic route selection
without pretending Marlin itself is a serving engine.
## What Is Being Done Right Now
The current work is focused on the core, not product surfaces.
Active work:
- tightening `marlin-lang`
- benchmarking Marlin over stable workload shapes and persisted runtime lifecycle phases
- keeping one semantic source of truth
- locking the typed IR and prepared-program boundary
- hardening persisted recovery semantics
- expanding runtime/operator measurements beyond compiler phases
- making the LLM execution path more inference-aware through `marlin-llm` and `marlin-gateway`
- surfacing richer usage and routing metadata to operators through `inspect`
Not being built yet:
- frontend studio
- cloud control plane
- browser extension
- multi-service product surface
## Install
Prerequisite:
- Rust toolchain with `cargo`
Install the CLI directly from GitHub:
```bash
cargo install --locked --git https://github.com/duriantaco/marlin.git marlin-cli
```
Check that the binary is available:
```bash
marlin --help
```
## Documentation
Long-form docs live in [`docs/`](./docs) and are set up to publish to GitHub Pages from [`.github/workflows/docs.yml`](./.github/workflows/docs.yml).
The docs are written to treat the codebase as the source of truth. Start with:
- why Marlin: [`docs/why-marlin.md`](./docs/why-marlin.md)
- real-world examples: [`examples/real_world/README.md`](./examples/real_world/README.md)
- real-world built-in app example: [`examples/real_world/support_triage/README.md`](./examples/real_world/support_triage/README.md)
- built-in LLM example with typed structured output: [`examples/llm_call_builtin.marlin`](./examples/llm_call_builtin.marlin)
- landing page: [`docs/index.md`](./docs/index.md)
- quickstart: [`docs/getting-started.md`](./docs/getting-started.md)
- integration guide: [`docs/integrating-marlin.md`](./docs/integrating-marlin.md)
- plugin guide: [`docs/plugins.md`](./docs/plugins.md)
- LLM gateway path: [`docs/llm-gateway.md`](./docs/llm-gateway.md)
- worked examples: [`docs/examples.md`](./docs/examples.md)
- runtime behavior: [`docs/runtime-semantics.md`](./docs/runtime-semantics.md)
Supporting docs:
- language/runtime boundary: [`docs/language-and-artifact.md`](./docs/language-and-artifact.md)
- CLI: [`docs/cli.md`](./docs/cli.md)
- limits: [`docs/limits-and-guarantees.md`](./docs/limits-and-guarantees.md)
## How To Use It
Prerequisite:
- Rust toolchain with `cargo`
From the repo root, validate the example program:
```bash
cargo run -p marlin-cli -- check examples/lead_review.marlin
```
Prepare the example program for runtime:
```bash
cargo run -p marlin-cli -- run examples/lead_review.marlin
```
Run the built-in `LLMCall` example against a local OpenAI-compatible endpoint:
```bash
/usr/bin/env python3 examples/real_world/fake_openai_server.py --port 8080
cargo run -p marlin-cli -- run examples/llm_call_builtin.marlin
```
That example uses `output_schema: "Answer"` so downstream bindings like `call.structured.answer`
are type-checked, while `call.text` still stays available for plain-text flows.
Run the simplest app-level example:
```bash
cd examples/real_world/support_triage
/usr/bin/env python3 ../fake_openai_server.py --port 8080
/usr/bin/env python3 app.py start \
ticket-123 \
"Taylor at ExampleCo" \
"Locked out after password reset" \
"We reset the password twice and still cannot sign in from either laptop."
/usr/bin/env python3 app.py status ticket-123
```
That example shows the same typed `LLMCall` surface inside a thin app wrapper that only stores
`ticket_id -> run_dir` and resumes `workflow.review` when a human decides.
Explain what Marlin will prepare from source:
```bash
cargo run -p marlin-cli -- explain examples/lead_review.marlin
```
Run a workflow that enters a wait state:
```bash
cargo run -p marlin-cli -- run examples/review_wait.marlin
```
If your workflow uses external plugin nodes, Marlin discovers them from:
- `MARLIN_PLUGIN_DIRS`
- `.marlin/plugins` under the current project tree or run directory tree
Read [`docs/plugins.md`](./docs/plugins.md) for the manifest format and runtime contract.
If you want the shipped LLM/API execution path, read [`docs/llm-gateway.md`](./docs/llm-gateway.md), the built-in app example under [`examples/real_world/support_triage`](./examples/real_world/support_triage), and the plugin-backed comparison under [`examples/real_world/with_marlin`](./examples/real_world/with_marlin).
Inspect a persisted run:
```bash
cargo run -p marlin-cli -- inspect .marlin/runs/
```
Get the same lifecycle state in machine-readable form:
```bash
cargo run -p marlin-cli -- status .marlin/runs/ --json
```
Resume a waiting run:
```bash
cargo run -p marlin-cli -- resume .marlin/runs/ workflow.review true
```
Continue a persisted run after restart:
```bash
cargo run -p marlin-cli -- continue .marlin/runs/
```
Cancel an active run:
```bash
cargo run -p marlin-cli -- cancel .marlin/runs/ "operator stop"
```
Run the workspace tests:
```bash
cargo test --workspace
```
Run the benchmark harness:
```bash
cargo run --release -p marlin-bench -- --format markdown
```
List the benchmark workloads:
```bash
cargo run --release -p marlin-bench -- --list-workloads
```
What the CLI currently does:
- `check` parses source, loads built-ins plus discovered plugins, and lowers source into validated typed IR
- `check`, `run`, `continue`, `resume`, `status`, and `cancel` now also support `--json` for machine-readable lifecycle output
- `explain` compiles and prepares a workflow, then prints the prepared step plan and runtime-facing metadata
- `run` prepares the program, persists a run record under `.marlin/runs`, and either completes or enters a wait state
- `continue` reloads a persisted run, reapplies recovery rules, re-checks provider/version and executor provenance compatibility, and can re-check waiting runs against persisted deadlines
- `inspect` loads a persisted run plus recent effect-log entries and now starts with the current status, next recommended action, run-level usage, and latest LLM/routing summary before the per-step detail dump
- `status` reads a persisted run directory and now surfaces the current runtime state, waiting step/kind, and next recommended operator action alongside total attempts, retried steps, and timed-out steps
- `resume` resumes a waiting step from disk with a JSON payload after verifying the persisted node provider and executor provenance still match the loaded runtime
- `cancel` marks active work cancelled and persists that terminal state
- pure steps can retry from persisted state when their manifest retry policy allows it
- pure and wait steps can time out from persisted state when their manifest timeout policy allows it
- external plugin executors run through the subprocess contract in `crates/marlin-plugin`
- gateway-backed LLM plugins can route OpenAI-compatible calls through `marlin_gateway_openai` with deterministic rule-based routing
What it does not do yet:
- production-grade replay guarantees
- robust effect recovery for real external systems
- hard timeout preemption for arbitrary executor code
- automatic retries or timeout enforcement for effectful steps
- stable public language/runtime guarantees
Runtime limits that matter today:
- retry policy is implemented only for `Pure` steps
- timeout policy is implemented only for `Pure` and `Wait` steps
- effectful executors still need explicit idempotent recovery instead of automatic replay
- a restart during an in-flight effectful step may block for recovery instead of blindly continuing
- source syntax does not yet expose retry/timeout configuration directly; today those policies come through manifests/catalog entries
## Initial Scope
The first version of Marlin should avoid product bloat.
We start with:
- one Rust workspace
- one CLI
- one semantic core
- one runtime core
- no frontend
- no browser extension
- no cloud control plane
Those can come later if the core proves worth it.
## Workspace Layout
`crates/marlin-lang`
- syntax
- AST
- validation
- typed IR
- runtime preparation
`crates/marlin-runtime`
- execution model over flattened plans
- durable state and effect log
- recovery rules for restarted runs
- scheduler interfaces
`crates/marlin-nodes`
- built-in manifests and runtime executors
`crates/marlin-plugin`
- plugin manifest loading
- subprocess executor protocol
- external node registry wiring
`crates/marlin-llm`
- shared `LLMCall` request/response/accounting contract
- normalized routing, cache, and budget hint types
`crates/marlin-gateway`
- OpenAI-compatible LLM execution layer
- routing policy loading
- usage/cost/latency normalization
`crates/marlin-cli`
- `marlin check`
- `marlin run`
- `marlin continue`
- `marlin status`
- `marlin resume`
- `marlin cancel`
`crates/marlin-bench`
- benchmark harness for Marlin compiler and persisted runtime lifecycle cost
- stable workload suite for tracking performance over time
## Current State
This repository is intentionally minimal.
The current scaffold exists to lock the semantic/runtime boundary before we commit to syntax sugar or UI. Today that boundary is:
- source text -> AST
- AST -> typed IR
- typed IR -> prepared program
- runtime -> schedules persisted execution over the prepared program
That is the contract the current compiler/runtime work is optimizing around.
## Current Language Direction
Marlin is aiming for:
- schema-first data shapes
- reusable groups/modules
- null-aware flow semantics
- one canonical typed IR
- CLI-first workflows
See [DESIGN.md](/Users/oha/marlin/DESIGN.md) for the design contract, [docs/ARCHITECTURE.md](/Users/oha/marlin/docs/ARCHITECTURE.md) for crate boundaries, and [ROADMAP.md](/Users/oha/marlin/ROADMAP.md) for the current implementation status and next milestones.
See [docs/PERFORMANCE.md](/Users/oha/marlin/docs/PERFORMANCE.md) for the current Marlin performance snapshot and [docs/BENCHMARKS.md](/Users/oha/marlin/docs/BENCHMARKS.md) for the benchmark design and methodology.