https://github.com/multikernel/branching

BranchContext gives AI agents and automated workflows copy-on-write branching over filesystems and processes
https://github.com/multikernel/branching
ai-agents sandboxing
Last synced: 3 months ago
JSON representation
BranchContext gives AI agents and automated workflows copy-on-write branching over filesystems and processes
Host: GitHub
URL: https://github.com/multikernel/branching
Owner: multikernel
License: apache-2.0
Created: 2026-02-10T21:14:38.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-14T03:12:00.000Z (3 months ago)
Last Synced: 2026-03-14T13:57:45.079Z (3 months ago)
Topics: ai-agents, sandboxing
Language: Python
Homepage:
Size: 313 KB
Stars: 7
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # BranchContext

Let AI agents try things without consequences.

When an agent explores multiple strategies - applying different patches,

trying different prompts, or testing alternative approaches - it normally

has to snapshot the workspace, run the attempt, then clean up the mess

before trying the next one. BranchContext eliminates that overhead.

Fork the workspace into parallel copy-on-write branches, run speculative

attempts in each, commit the winner, and abort the rest - instantly.

No snapshots, no cleanup, no leftover state.

Based on the paper [Fork, Explore, Commit: OS Primitives for Agentic Exploration](https://arxiv.org/abs/2602.08199).

## Install

```

pip install BranchContext

```

Requires Python >= 3.10. No external dependencies.

### Docker

The Docker image ships with [BranchFS](https://github.com/multikernel/branchfs)

built in -- no need to install FUSE, compile Rust, or configure any filesystem

yourself. Just pull the image and go:

```bash

docker pull multikernel/branching

```

Run directly with `docker run`:

```bash

docker run --rm --device /dev/fuse --cap-add SYS_ADMIN \

  --security-opt apparmor:unconfined \

  -v $(pwd):/src multikernel/branching run -- make test

```

Or use the `branching-docker` wrapper (in `integration/docker/`) which handles

the Docker flags for you:

```bash

branching-docker -w ./myproject run -- make test

branching-docker -w . speculate -c "./fix_a.sh" -c "./fix_b.sh"

branching-docker -w . best-of-n -n 5 -- ./solve.py

```

The `-w` flag specifies a host directory to use as the workspace. It is

bind-mounted into the container, and BranchFS is mounted on top automatically.

Committed changes are written back to the host directory.

To build the image from source:

```bash

docker build -t branching -f integration/docker/Dockerfile .

```

## Quick start

```python

from branching import Workspace

ws = Workspace("/mnt/workspace")

# Auto-commit on success, auto-abort on exception

with ws.branch("attempt") as b:

    subprocess.run(["agent", "--workdir", str(b.path)], check=True)

```

The agent writes to `b.path`, which is an isolated copy-on-write view.

If the command succeeds, changes are merged back into the workspace. If it

raises, everything is rolled back - the workspace is untouched.

## Agent patterns

BranchContext ships with seven high-level patterns that cover the most common

agent workflows. Each is a callable class: instantiate with config, call with

a workspace.

### Parallel speculation (first wins)

Run multiple strategies in parallel. The first one that succeeds gets

committed; the rest are aborted.

Use when you have several plausible approaches and care about latency more

than optimality: bug fixes where any passing patch is good enough, tool

selection where multiple tools could work, or prompt variants where you

just need one that doesn't error out. Pairs naturally with the ``n=``

parameter in OpenAI's Chat Completions API to race N variations in

parallel.

```python

from branching import Workspace, Speculate

import openai

client = openai.OpenAI()

resp = client.chat.completions.create(

    model="gpt-4o", n=5,

    messages=[{"role": "user", "content": prompt}],

)

def make_candidate(code: str):

    def candidate(path: Path) -> bool:

        (path / "fix.py").write_text(code)

        return run_tests(path)

    return candidate

candidates = [make_candidate(c.message.content) for c in resp.choices]

outcome = Speculate(candidates, first_wins=True, timeout=60)(ws)

if outcome.committed:

    print(f"Fix {outcome.winner.branch_index} succeeded!")

```

### Best-of-N with scoring

Run N candidates in parallel and commit the highest-scoring success.

Use when quality matters more than speed: code generation where you want

the cleanest output across multiple temperatures, translation with a BLEU

scorer picking the best variant, or any task with a reliable quality metric.

Pairs naturally with the ``n=`` parameter in OpenAI's Chat Completions API

to generate N variations in a single call, then test each in an isolated

branch.

Candidates can return ``bool`` or ``(bool, float)``. Scoring is flexible:

pass pre-computed ``scores`` (e.g. from logprobs), provide an ``evaluate``

callback for post-execution scoring, or let candidates score themselves.

```python

from branching import BestOfN

import openai

client = openai.OpenAI()

resp = client.chat.completions.create(

    model="gpt-4o", n=5, logprobs=True, top_logprobs=1,

    messages=[{"role": "user", "content": prompt}],

)

# Pre-computed confidence scores from logprobs

logprob_scores = [

    sum(t.logprob for t in c.logprobs.content) / len(c.logprobs.content)

    for c in resp.choices

]

# Candidates just apply code and test -- return bare bool

candidates = [make_test(c.message.content) for c in resp.choices]

# BestOfN picks the highest-logprob passing candidate

outcome = BestOfN(candidates, scores=logprob_scores)(ws)

```

#### RL training rollouts

Pass ``commit=False`` to collect scores from all candidates without

modifying the workspace. Every branch runs to completion and aborts --

the base stays pristine for the next batch. This gives you cheap,

isolated rollout environments for policy gradient methods like GRPO.

```python

from branching import Workspace, BestOfN

ws = Workspace("/mnt/workspace")

for prompt in training_batch:

    candidates = [make_candidate(prompt) for _ in range(N)]

    outcome = BestOfN(candidates, commit=False)(ws)

    # All N results available -- extract (success, score) for training

    rewards = [(r.success, r.score) for r in outcome.all_results]

    trainer.step(prompt, rewards)

```

### Reflexion (retry with feedback)

Run a task, and if it fails, generate a critique and feed it back into the

next attempt. The agent learns from its mistakes across retries.

Use when failures carry diagnostic signal: fixing test failures where the

error log tells you what went wrong, iterating on a solution where a

validator explains why it was rejected, or multi-step plans where each

failed attempt narrows the search space for the next one.

```python

from branching import Reflexion

def task(path: Path, attempt: int, feedback: str | None) -> bool:

    if feedback:

        (path / "critique.txt").write_text(feedback)

    return run_and_test(path)

def critique(path: Path) -> str:

    return analyze_failure(path / "test_output.log")

outcome = Reflexion(task, max_retries=3, critique=critique)(ws)

```

### Tree of Thoughts

Explore multiple strategies in parallel, optionally expanding the best one

into deeper sub-strategies across multiple levels.

Use when the problem has hierarchical structure: architectural decisions

where you first pick a framework then optimize within it, multi-stage

pipelines where each stage has variants worth exploring, or planning tasks

where high-level strategies each decompose into tactical choices.

```python

from branching import TreeOfThoughts

def strategy_a(path: Path) -> tuple[bool, float]:

    apply_approach_a(path)

    return run_tests(path), evaluate_quality(path)

def strategy_b(path: Path) -> tuple[bool, float]:

    apply_approach_b(path)

    return run_tests(path), evaluate_quality(path)

outcome = TreeOfThoughts(

    [strategy_a, strategy_b],

    max_depth=2,

    expand=lambda path, depth: generate_refinements(path),

)(ws)

```

### Beam Search

Keep the top-K branches alive at each depth level instead of just one

winner. Interpolates between BestOfN (all parallel, one level) and

TreeOfThoughts (one winner per level). At each level, all candidates

across all beams are scored globally and only the top-K survive.

Inspired by [EnCompass](https://arxiv.org/abs/2512.03571), which showed

that multi-level beam search outperforms both BestOfN and single-winner

tree search for hierarchical agent tasks.

Use when the problem has hierarchical structure *and* you want to hedge

across multiple promising directions: multi-step code migrations where

several rewrite strategies look viable at each stage, planning tasks where

pruning to one path too early loses good alternatives, or any setting where

TreeOfThoughts' single-winner-per-level is too aggressive.

```python

from branching import BeamSearch

def strategy_a(path: Path) -> tuple[bool, float]:

    apply_approach_a(path)

    return run_tests(path), evaluate_quality(path)

def strategy_b(path: Path) -> tuple[bool, float]:

    apply_approach_b(path)

    return run_tests(path), evaluate_quality(path)

outcome = BeamSearch(

    [strategy_a, strategy_b, strategy_c, strategy_d],

    expand=lambda path, depth: generate_refinements(path),

    beam_width=2,

    max_depth=3,

)(ws)

```

### Tournament (pairwise elimination)

Run N candidates in parallel, then narrow to one through pairwise

elimination via a judge function. The convergent dual of Tree of Thoughts:

starts wide, narrows to one.

Use when you have a reliable pairwise comparator but no absolute scoring

function: patch selection where an LLM judge picks the better diff,

A/B-style evaluation where candidates are compared head-to-head, or

any setting where relative ranking is easier than absolute scoring.

```python

from branching import Tournament

def make_patch(code: str):

    def candidate(path: Path) -> bool:

        (path / "fix.patch").write_text(code)

        return apply_and_test(path)

    return candidate

candidates = [make_patch(p) for p in generate_patches(n=8)]

def judge(path_a: Path, path_b: Path) -> int:

    # 0 = a wins, 1 = b wins

    return llm_compare(path_a / "diff.patch", path_b / "diff.patch")

outcome = Tournament(candidates, judge=judge)(ws)

```

### Cascaded speculation (adaptive fan-out)

Start with one attempt. If it fails, widen to more parallel candidates,

each informed by error context from prior failures. Repeat with increasing

fan-out until one succeeds or all waves are exhausted.

Inspired by [Cascade Speculative Drafting](https://arxiv.org/abs/2312.11462),

which applies the same start-cheap-escalate-on-failure principle to LLM

token generation.

Use when most tasks succeed on the first try and you want to minimize

wasted compute: coding agents where one LLM call usually works but

occasionally needs retries with error feedback, test-fix loops where the

error log from a failed attempt is the best guide for the next one, or

any workload with variable difficulty where paying for N parallel branches

upfront is wasteful.

```python

from branching import Cascaded

def solve(path: Path, feedback: list[str]) -> tuple[bool, str]:

    result = run_agent(path, prior_errors=feedback)

    if result.tests_pass:

        return True, ""

    return False, result.error_output

outcome = Cascaded(solve, fan_out=(1, 2, 4), timeout=120)(ws)

```

The task returns `(success, error_context)`. On failure, the error string

is collected and passed as feedback to subsequent waves. On success, it is

ignored. Empty error strings are silently dropped.

## Lower-level usage

The patterns above are built on two lower-level primitives you can use

directly when you need more control.

### Branching with manual control

```python

with ws.branch("attempt", on_success=None, on_error=None) as b:

    result = run_agent(workdir=b.path)

    if result.confident:

        b.commit()

    else:

        b.abort()

```

### Nested branches

Branches can nest - useful for hierarchical exploration (e.g. pick a

strategy, then explore variants within it).

```python

with ws.branch("strategy_a") as a:

    apply_strategy(a.path)

    with a.branch("variant_1") as v1:

        tweak(v1.path)

        # v1 auto-commits into a on success

    # a auto-commits into main on success

```

### Process forking

For crash-prone agent code, `BranchContext` runs each task in a forked child

process with its own process group. The child is automatically killed on

timeout or context exit.

For sandboxing (filesystem confinement, resource limits, syscall filtering),

combine with [sandlock](https://github.com/multikernel/sandlock).

```python

from branching import BranchContext

with ws.branch("forked", on_success=None, on_error=None) as fb:

    with BranchContext(run_agent, workspace=fb.path) as ctx:

        try:

            ctx.wait(timeout=30)

            fb.commit()

        except ProcessBranchError:

            fb.abort()

```

Run N tasks in parallel, each in its own forked process:

```python

with BranchContext.create(

    targets=[task_a, task_b, task_c],

    workspaces=[ws_a.path, ws_b.path, ws_c.path],

) as contexts:

    for ctx in contexts:

        ctx.wait(timeout=60)

```

## CLI

The `branching` command exposes the agent patterns as shell commands.

Auto-detects the workspace from your current directory, or pass `-w PATH`.

All commands support `--json` for machine-readable output.

### run

Run a command in a new branch. Commits on exit 0, aborts on non-zero.

```bash

branching run -- ./build.sh

branching run --on-error none -- python train.py

branching run --ask -- make test          # prompt before commit/abort

```

### speculate

Race N commands in parallel branches. First success wins.

```bash

branching speculate -c "./fix_a.sh" -c "./fix_b.sh" -c "./fix_c.sh"

branching speculate --timeout 60 -c "python solve_v1.py" -c "python solve_v2.py"

```

### best-of-n

Run CMD N times in parallel, commit the highest-scoring success.

The child process can write a score to fd 3 (`echo 0.95 >&3`).

If nothing is written, score defaults to 1.0 for success / 0.0 for failure.

Each child receives `BRANCHING_ATTEMPT` (0-indexed) in its environment.

```bash

branching best-of-n -n 5 -- ./solve.py

branching best-of-n -n 3 --timeout 120 --json -- python attempt.py

branching best-of-n -n 3 -- bash -c 'python run.py && echo "$SCORE" >&3'

```

### reflexion

Sequential retry with optional critique feedback loop.

The child receives `BRANCHING_ATTEMPT` (0-indexed) and `BRANCHING_FEEDBACK`

(empty on first attempt, critique output on retries) in its environment.

```bash

branching reflexion --retries 5 -- ./fix.sh

branching reflexion --retries 3 --critique "./review.sh" -- ./solve.py

branching reflexion --retries 3 --critique "python critique.py" --json -- python agent.py

```

### status

Show workspace info and active branches.

```bash

branching status

branching status --json

```

## How it works

BranchContext uses [BranchFS](https://github.com/multikernel/branchfs), a

copy-on-write FUSE filesystem, to create instant, zero-cost branches of your

workspace. Branches are virtual paths within a single mount, with

first-winner-commit semantics.

You just create a `Workspace` pointed at a mounted BranchFS path.

Process forking (`BranchContext`) uses `fork(2)` + process groups to run

each task in an isolated child process. The child's working directory is set

to the branch path, and `mprotect(2)` enforces copy-on-write invariants on

parent memory regions.

BranchContext focuses purely on branching. For sandboxing (filesystem

confinement, syscall filtering, resource limits), use

[sandlock](https://github.com/multikernel/sandlock) alongside branching --

the two are designed to compose together.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/multikernel/branching

Awesome Lists containing this project

README