https://github.com/xbrianh/gremlins

Background coding agents that execute, review, and land work unattended.
https://github.com/xbrianh/gremlins

agent-orchestration ai-coding-assistant-tools automation background-jobs claude-code coding-agents llm-agents

Last synced: 8 days ago
JSON representation

Background coding agents that execute, review, and land work unattended.

Host: GitHub
URL: https://github.com/xbrianh/gremlins
Owner: xbrianh
Created: 2026-05-02T16:18:34.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-06-07T17:14:50.000Z (17 days ago)
Last Synced: 2026-06-07T17:14:56.710Z (17 days ago)
Topics: agent-orchestration, ai-coding-assistant-tools, automation, background-jobs, claude-code, coding-agents, llm-agents
Language: Python
Homepage:
Size: 2.73 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 9
Metadata Files:
- Readme: README.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

# gremlins

Background coding-agent pipelines that plan, implement, review, and land work
end-to-end. Given a goal or GitHub issue, a gremlin runs the full
plan → implement → review-code → address-code cycle unattended, writing
artifacts to the per-user state directory resolved by
`platformdirs.user_state_dir("gremlins")` and optionally opening a pull
request. A fleet manager tracks running, stalled, and finished gremlins and
provides stop / land / close operations.

**Status: brand-new and a bit janky.** This is a fresh project, actively
shaped by daily use. Expect rough edges — stream timeouts, the occasional
merge conflict from parallel gremlins, a few stages still finding their
final shape. Bug reports, ideas, and PRs are all welcome.

---

## Using gremlins with a coding assistant

Paste the output of `gremlins prompt-for-assistant` into a fresh Claude Code session (or any compatible assistant) to configure it as a competent gremlins collaborator.

The workflow: you discuss the work with the assistant, it captures discrete units as GitHub issues or plan files, launches gremlins in the background to implement them, and lands each finished gremlin before starting dependent work. You stay at the strategic level — deciding what to build and in what order — while gremlins handle the implementation cycle unattended. The assistant maintains a queue of running, pending, and blocked work and surfaces it on request.

---

## Using gremlins across multiple repos

When you run `gremlins launch`, the launcher captures the current working
directory's repo root via `git rev-parse --show-toplevel` and stores it as
`project_root` in the gremlin's `state.json`. That value pins the worktree
base, child process cwd, and pipeline discovery for that gremlin's lifetime.

**To work on a different repo: `cd` there, then `gremlins launch`.** There is
no `--project-root` flag; the cwd at launch time is the contract.

**Fleet view** (`gremlins`) shows gremlins from all repos by default.
Pass `--here` to filter to the current repo's `project_root`.

**Pipeline discovery** walks from the launching cwd, so `.gremlins/pipelines/`
overrides in each repo apply to gremlins launched from that repo.

**Queue caveat**: there is one global queue and the runner's cwd is frozen at
`gremlins queue run --detach` time. To queue work against a different repo,
prefix the command with `cd`:

```sh
gremlins queue add "cd /path/to/other-repo && gremlins launch gh --plan '#42' --wait"
gremlins queue add "cd /path/to/other-repo && gremlins land "
```

**State isolation**: each gremlin's state lives under its own directory
(resolved via `platformdirs.user_state_dir("gremlins")//`), so two repos
can have running gremlins simultaneously without interference.

---

## Runtime CLI prerequisites

- `gh` — [GitHub CLI](https://github.com/cli/cli#installation)
- `git` — [Git](https://git-scm.com/downloads) (pre-installed on most systems)
- `claude` — [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code)

## Dev install

```sh
uv venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
uv pip install -e ".[dev]"
```

## Make targets

| Target | What it runs |
|---|---|
| `make test` | `pytest` |
| `make lint` | `ruff check .` |
| `make format` | `ruff format --check .` (check only — does not rewrite files) |
| `make typecheck` | `pyright` |
| `make check` | lint + format + typecheck |

## CLI subcommands

Invoked as `python -m gremlins.cli ` or `gremlins `
after install. The authoritative list and per-subcommand description lives in
the dispatch table in [`gremlins/cli/__init__.py`](gremlins/cli/__init__.py).

| Subcommand | Purpose |
|---|---|
| `launch ` | Launch a background gremlin by pipeline name (`gremlins launch --list` to see available) |
| `resume` | Re-spawn an existing gremlin from its recorded stage |
| `stop` | Send SIGTERM to a running gremlin and wait for it to exit |
| `land` | Land a finished gremlin onto the current branch |
| `rm` | Delete a dead gremlin's state dir, worktree, and branch |
| `close` | Mark a dead gremlin as closed |
| `log` | Tail the gremlin's log file |
| `ack` | Acknowledge a gremlin waiting for human input |
| `skip` | Skip a gremlin waiting for human input |
| `queue` | Manage the gremlin launch queue |
| `prompt-for-assistant` | Print the assistant setup prompt to stdout |

`_run-pipeline` is an internal spawn boundary; not for direct use.

### `queue` sub-subcommands

| Sub-subcommand | Description |
|---|---|
| `add [--run] ` | Add a command to the queue; `--run` also starts the runner if idle |
| `list [--watch] [--json]` | List queued items |
| `run [--once] [--poll-interval SEC] [--detach]` | Start the queue runner |
| `requeue [--done]` | Move failed (and optionally done) items back to pending |
| `clear [--failed\|--done\|--pending\|--purge\|--item STEM]` | Remove items from the queue |
| `set-state --item STEM` | Manually transition a queue item to a different state |
| `stop` | Stop the detached runner |

### Launch flags

#### Per-pipeline flags

Flags vary by pipeline. The first stage's `__init__` signature defines the accepted flags; `gremlins launch --help` prints the full list.

Common infrastructure flags (accepted by all pipelines):

| Flag | Default | Description |
|---|---|---|
| `--plan ` | — | Path to a plan/spec file, or a GitHub issue ref (`42`, `#42`, `owner/repo#42`, or issue URL) |
| `--description ` | — | Human-readable description stored in state |
| `--parent ` | — | Parent gremlin ID (used by boss to track child ownership) |
| `--print-id` | false | Print the gremlin ID to stdout after launch |
| `-c`/`--instructions ` | — | Instructions string (mutually exclusive with `--plan`) |
| `--base-ref ` | `HEAD` | Git ref to branch the worktree from; ignored for gh pipelines (always anchors to origin default branch). In parallel pipelines, automatically propagated to all child processes. |
| `--spec ` | — | Path to a coding-style spec file passed into stages |
| `--bypass` | false | Skip permission checks; run in bypass mode |

## Pipeline configuration

Gremlins runs a sequence of stages defined in a YAML file. The bundled
pipelines work out of the box; a project-local YAML can override any of them.

### Discovery order

`--pipeline ` resolves as follows:

1. A value with a `.yaml` suffix or more than one path component is loaded
directly as a filesystem path.
2. Otherwise `./.gremlins/pipelines/.yaml` is checked first
(project-local override).
3. Then `gremlins/pipelines/.yaml` (bundled) is checked.

The pipeline name is the first non-flag argument to `gremlins launch`. Run `gremlins launch --list` to see all available pipeline names.

### Selecting a pipeline

```sh
gremlins launch local # bundled local.yaml
gremlins launch gh # bundled gh.yaml
```

### Schema reference

**Top-level keys:**

```yaml
name: my-pipeline # optional; defaults to the file stem

default_client: claude:sonnet # optional; provider:model string

prompt_dir: ../prompts # optional; relative to YAML, defaults to the YAML's directory

stages:
- name: plan
type: plan
client: copilot:gpt-5.4 # optional; overrides default_client for this stage
prompt: gremlins:plan.md # `gremlins:NAME` -> bundled prompts; bare NAME -> prompt_dir
options: {}
```

| Key | Description |
|---|---|
| `name` | Pipeline display name; defaults to the file stem |
| `default_client` | `provider:model` string used for stages without an explicit `client:` |
| `prompt_dir` | Directory that bare-name `prompt:` paths resolve against, relative to the YAML file. Defaults to the YAML's directory. |
| `stages` | Ordered list of stage entries or parallel groups |

**Per-stage keys:**

| Key | Description |
|---|---|
| `name` | Unique stage identifier; used for `resume` targeting |
| `type` | Registered stage type (see [Available stage types](#available-stage-types)) |
| `client` | `provider:model` string; overrides `default_client` for this stage |
| `prompt` | Path or list of paths. `gremlins:NAME` resolves from the bundled package prompts; a bare `NAME` resolves from the pipeline's `prompt_dir`. |
| `options` | Free-form dict passed to the stage |

**`provider:model` format:**

Providers: `claude` (default), `copilot`, `openai`, `xai`, `anthropic`. The model part is optional — `claude:` and `claude:sonnet` are both valid. Examples: `claude:sonnet`, `copilot:gpt-5.4`, `openai:gpt-4o`. Per-stage `client:` in YAML takes precedence over the CLI `--client` flag; `default_client:` at the pipeline level does not.

**Parallel-group form:**

```yaml
- name: reviews
parallel:
- name: review-detail
type: review-code
client: claude:sonnet
- name: review-security
type: review-code
client: claude:sonnet
max_concurrent: 2 # optional; defaults to all children at once
```

| Key | Description |
|---|---|
| `name` | Group identifier |
| `parallel` | List of child stage entries (no nesting allowed) |
| `max_concurrent` | Max simultaneously running children (optional) |

### Client specifiers

Clients are specified as `provider:model` inline strings, either at the pipeline level (`default_client:`) or per stage (`client:`). The model part is optional.

```yaml
default_client: claude:sonnet # all stages default to this
stages:
- name: plan
type: plan
- name: implement
type: implement
client: copilot:gpt-5.4 # this stage uses copilot instead
```

Providers: `claude`, `copilot`, `openai`, `xai`, `anthropic`. The CLI `--client provider:model` flag overrides the pipeline-level `default_client:` but yields to per-stage `client:` settings.

### `prompt:` field

```yaml
prompt: gremlins:plan.md # single bundled file
prompt: [gremlins:code_style.md, plan.md] # mix bundled and local; concatenated with \n\n
```

Each entry is one of:

- `gremlins:NAME` — resolved from the bundled prompts shipped with the
package. Use this for prompts owned by gremlins (`code_style.md`,
`plan_gh.md`, etc.).
- bare `NAME` — resolved from the pipeline's top-level `prompt_dir:`
(relative to the YAML file; defaults to the YAML's own directory). Use
this for prompts you author and check in alongside your pipeline.

Lists are joined with `\n\n` before being passed to the stage. There is
no search fallback between the two — the prefix is the contract, so a
custom YAML reads as self-describing about which prompts come from the
package vs which must be provided locally.

By convention, project-local prompts live in `./.gremlins/prompts/` (a peer
of `./.gremlins/pipelines/`, not nested under it) and pipelines set
`prompt_dir: ../prompts`.

### `options:` field

A free-form dict passed verbatim to the stage. Selected options by stage
(see [`gremlins/stages/AGENTS.md`](gremlins/stages/AGENTS.md) for the full list):

**`verify`** — runs a list of shell commands with an agent fix-loop:

```yaml
options:
cmds: ["make check", "make test"] # commands to run (joined with &&)
max_attempts: 3 # fix-loop retries (default: 3)
```

For `local` stages, model options (`plan_model`, `impl_model`, `address_model`,
`test_fix_model`, `detail`) can also be set here to override the CLI defaults.

### Available stage types

| Type | Description |
|---|---|
| `plan` | Produces an implementation plan |
| `implement` | Applies the plan to the working tree |
| `review-code` | Runs a code review and writes findings to disk |
| `verify` | Runs check and test commands with an agent fix-loop |
| `exec` | Runs shell commands with in:/out: artifact bindings |
| `agent` | Resolves in: artifacts, renders prompt, invokes agent, verifies out: artifacts |
| `handoff` | Runs the handoff agent once per boss loop iteration |
| `loop` | Iterates body stages until a termination predicate or max iterations |
| `sequence` | Runs body stages sequentially using child state |
| `github-open-pull-request` | Opens a pull request on GitHub |
| `github-request-copilot-review` | Requests a Copilot review on the open PR |
| `github-wait-copilot` | Polls until Copilot posts its review |
| `github-wait-ci` | Polls PR CI checks until they pass or exhaust attempts |

### Parallel groups

Wrap sibling stages in a `parallel:` list to run them concurrently:

```yaml
default_client: claude:sonnet

stages:
- name: plan
type: plan

- name: reviews
parallel:
- name: review-detail
type: review-code
- name: review-security
type: review-code
max_concurrent: 2

- name: address-code
type: agent
```

**Execution and failure:** The parallel group executes in three phases:
1. **Fan-out** — each child stage starts independently as a subprocess
2. **Concurrent execution** — all children run simultaneously (up to `max_concurrent`)
3. **Fan-in** — all children finish or one bails; siblings continue running until group completion

If any child fails (raises `Bail`), the pipeline halts after the group finishes —
siblings are not cancelled mid-run by default. This can be changed with `cancel_on_bail: true`
to cancel outstanding tasks immediately. The bail is evaluated via `bail_policy` (default: `any`,
meaning one failed child halts the group; set `bail_policy: all` to halt only when all children bail).
Subsequent stages are skipped; the operator can resume or ack the group via CLI.

**State isolation:** Each child gets its own state directory and subprocess.
Client overrides, worktree paths, and artifact bindings are isolated per-child.
Children run in parallel without blocking each other. Parent `state.json` is updated
during the concurrent phase (e.g., `active_children` snapshot); copying child artifact
bindings into the parent registry is deferred until fan-in completes.

**Resume targeting:** Use the full child gremlin ID (form: `----`,
visible in fleet view) to resume a specific child. Resuming the parent group ID re-spawns all
children that haven't landed.

**Base ref propagation:** The `--base-ref` flag is automatically propagated from
the parent to all child processes, ensuring consistent branching across the group.
Child worktrees are derived from the parent's base_ref as recorded in state.

### Worked example: project-local override

Create `.gremlins/pipelines/local.yaml` to override the bundled `local`
pipeline. This example uses Opus for plan/implement/address stages and adds
a `verify` stage before `review-code`:

```yaml
name: local

stages:
- { type: plan, options: { plan_model: opus } }
- { type: implement, options: { impl_model: opus } }
- { type: verify, options: { cmds: ["pytest"] } }
- { type: review-code }
- { name: address-code, type: agent, options: { address_model: opus } }
```

Add a `prompt:` key to any stage to supply a custom prompt; paths are
relative to the YAML file.

### Worked example: parallel reviewers

Run two `review-code` passes in parallel, then address both:

```yaml
name: local

default_client: claude:sonnet

stages:
- { type: plan }
- { type: implement }

- name: reviews
parallel:
- name: review-detail
type: review-code
- name: review-security
type: review-code
max_concurrent: 2

- { name: address-code, type: agent }
```

Note: `review-code` does not currently support per-stage prompt overrides
via YAML — both passes use the built-in detail lens.

### Stage definitions

YAML `stage-definitions:` lets you name and reuse stage patterns within a pipeline:

```yaml
stage-definitions:
review-base: &review-base
type: review-code
client: claude:sonnet
prompt: gremlins:code_style.md

stages:
- { type: plan }
- { type: implement }
- name: review-detail
<<: *review-base
prompt: [gremlins:code_style.md, detail_review.md]
- name: review-security
<<: *review-base
prompt: security_review.md
```

Definitions provide base `type`, `options`, and `prompt`. Call-sites can override
`prompt` and `options` via YAML anchors (as shown above) or via template placeholders
in multi-stage recipes. Call-sites own the `name:`, `in:`, and `out:` keys;
`out:` is forbidden inside a definition, but `in:` can be declared and will be
merged with call-site `in:` values. For single-stage definitions, only `name`, `in`,
and `out` keys can be safely overridden; to vary `prompt` or `options`, use anchors.

### Artifact binding

Stages can bind artifacts via `in:` and `out:` maps. These define what data
flows between stages in the pipeline:

```yaml
stages:
- name: scan
type: exec
options:
cmds: ["python scan.py > $ARTIFACTS/report.json"]
out:
report: file://session/report

- name: analyze
type: agent
in:
report: report
prompt: |
The scanning report is in {report}.
Propose fixes.
```

**Artifact URI schemes:**
- `file://session/` — Session artifact: a file created under the gremlin's `$ARTIFACTS` directory
- `git://ref/` — Git ref name (e.g., `git://ref/main` returns the string `main`)
- `git://commit/` — Commit SHA (e.g., `git://commit/abc123def` returns the full SHA)
- `git://range/..` — Commit range/log between two refs
- `gh://pulls//head` — GitHub PR head ref (and other `gh://` schemes for GitHub data)
- `file://`, `git://`, `gh://` — File artifact resolvers support these base schemes

**Artifact binding semantics:**
- `in:` values are registry key paths (e.g., `report` or `report.critical?default`) with optional dotted attribute access and `?default` fallback
- `out:` values are URI strings that name what the stage produces; downstream stages reference the key name (not the URI) in their `in:` maps
- Prompt/option substitution uses `{var}` tokens (not `{{var}}`); artifacts bound via `in:` become available for substitution
- `in:` can be declared in a stage definition and will be merged with call-site `in:` values; `out:` cannot appear inside a definition

### Stage definitions and bundled recipes

Some stage types are not built-in — they are provided as bundled YAML recipes and must be wired in via `stage-definitions:` before use:

```yaml
stage-definitions:
github-push-to-pr-branch: gremlins:github_push_to_pr_branch

stages:
- { name: push, type: github-push-to-pr-branch }
```

`gremlins:NAME` resolves the recipe from the bundled package (`gremlins/recipes/stages/NAME.yaml`). A bare path resolves relative to the pipeline file.

### Bundled pipelines

The canonical reference pipelines:

- [`gremlins/pipelines/local.yaml`](gremlins/pipelines/local.yaml) — `gremlins launch local`
- [`gremlins/pipelines/gh.yaml`](gremlins/pipelines/gh.yaml) — `gremlins launch gh`
- [`gremlins/pipelines/gh-terse.yaml`](gremlins/pipelines/gh-terse.yaml) — `gremlins launch gh-terse`
- [`gremlins/pipelines/pr-extend.yaml`](gremlins/pipelines/pr-extend.yaml) — `gremlins launch pr-extend`
- [`gremlins/pipelines/boss.yaml`](gremlins/pipelines/boss.yaml) — `gremlins launch boss`

## Error handling and recovery

Gremlins can fail or get stuck during execution. Understanding how to recover is essential for running long-running pipelines.

### Bail semantics

When a stage detects an unrecoverable condition (e.g., a code review requests changes, secrets are detected, or a merge conflict blocks progress), it raises a `Bail` exception with a detail string.

By convention, agent-based stages emit a `BAIL: : ` marker at the end of their output. The `` token is conventionally one of:
- `reviewer_requested_changes` — code review found issues that must be addressed
- `security` — security review detected problems
- `secrets` — credentials or sensitive data detected in the code
- `other` — stage-specific or unknown failure condition

The bail detail is written to a per-attempt `bail_.json` file in the gremlin's state directory and is visible in the fleet view. When a stage bails, the entire pipeline halts — subsequent stages do not run, but the gremlin's state is preserved for recovery.

### Recovering from gremlin failures

When a gremlin bails and halts, you have three recovery options:

**`gremlins resume `** — Re-spawn the bailed gremlin from the stage where it
bailed. Use this when the cause has been fixed externally (e.g., a code review
fix has been merged, or a merge conflict has been resolved). The gremlin will
restart from the bailed stage with the current worktree state.

**`gremlins ack `** — Acknowledge the gremlin without re-running. Use this
when the bailed condition is acceptable (e.g., the review found minor style
issues that don't block landing, or external work was already completed). The
gremlin marks the bailed stage as complete and proceeds to subsequent stages.

**`gremlins skip `** — Create a new sibling attempt with the same parameters
and a fresh ID, leaving the failed gremlin in place. Use this for transient
failures (timeouts, CI hangs) that won't self-resolve. Both attempts are visible
in the fleet; the new attempt begins from the start.

### Handling parallel group failures

When a child in a parallel group bails:
- The group halts after all currently-running children finish (not mid-run), unless `cancel_on_bail: true`
- The bail reason is attributed to the child stage name
- `gremlins resume ` re-spawns all children that haven't landed
- `gremlins resume ----` resumes only that child (use the full child ID from fleet view)

If the cause was a transient failure affecting multiple children, `skip` the entire
group and re-launch the pipeline to restart all children.

### Boss-chain recovery

When a boss gremlin spawns child gremlins (`gremlins launch ... --parent `),
the boss halts if a child bails. At this point:
- The child's gremlin ID is visible in the fleet view as a child of the boss
- Recover the child (`resume`, `ack`, or `skip`) independently
- Once the child lands or is abandoned, resume the boss (`gremlins resume `)

The boss resumes from its child-spawn stage and proceeds with the next iteration
(re-planning, re-implementing, or wrapping up, depending on the pipeline).

## What can a gremlin do to my machine?

Gremlins operate in one of two permission modes:

**Default mode** (no flags): The agent is restricted to an allowlist of tools
(Read, Edit, Write, Bash, Grep, Glob) and its Bash commands are path-scoped to
the gremlin's git worktree. It can read and modify files inside that worktree
and blocks direct path references outside it. This is a best-effort token
check, not a full sandbox — indirect references (heredocs, computed paths) may
not be caught.

**Bypass mode** (`--bypass`, `GREMLINS_BYPASS_PERMISSIONS=1`, project
`.gremlins/permissions.yaml bypass_permissions: true`, or user config
`~/.config/gremlins/config.toml bypass_permissions = true`): All permission
checks are disabled. The agent can use any tool and reference any path. Use
this when the task genuinely requires broader access (e.g. a pipeline that
modifies system config).

The three opt-in paths for bypass are:
1. `gremlins launch --bypass` — single-launch override
2. `GREMLINS_BYPASS_PERMISSIONS=1` in the environment
3. `bypass_permissions: true` in `.gremlins/permissions.yaml` (project) or
`bypass_permissions = true` in `~/.config/gremlins/config.toml` (user)

**Honest disclaimer**: The allowlist limits *reach* — what paths and tools the
agent can invoke. It does not limit *impact within reach*. A gremlin with
write access to your worktree can make any change inside it. Review landed
commits before merging.

**Backend differences**: On `openai:` and `xai:` backends, gremlins owns the
tool layer and enforces the allowlist directly. On the `anthropic:` backend,
enforcement is coarser — the SDK loop uses vendor-defined tools and the path
scoping is advisory. On `claude:` and `copilot:` subprocess backends, the
gremlins-layer permission block is **not** translated into CLI flags or
settings — the underlying CLI reads the operator's ambient config and
enforces whatever the operator has configured there. See "Backend config
inheritance" below.

### Backend config inheritance

The `claude:` backend is a thin wrapper around `claude -p`. It does *not*
materialize a per-gremlin config dir, and it does *not* set
`CLAUDE_CONFIG_DIR` for the subprocess. Whatever the operator has configured
for their interactive Claude session is exactly what the subprocess sees:

- **Settings** — `~/.claude/settings.json` (plus any project-level
`.claude/settings.json` the CLI discovers) is read by the CLI directly.
The gremlins-layer `allowed_tools` / `disallowed_tools` block has no
effect on `claude:` runs; configure tool permissions via your own
Claude settings or use the `anthropic:` backend.

Gremlin worktrees — where the `claude:` subprocess does its file edits —
live under a stable, gremlins-scoped prefix in the system temp directory.
Discover it at runtime:

```
python -c "from gremlins import paths; print(paths.work_root())"
```

On Linux/macOS this is `/tmp/gremlins`; the OS reclaims orphaned
worktrees on reboot. A single `permissions.allow` rule in
`~/.claude/settings.json` covers every worktree path:

```json
{
"permissions": {
"allow": [
"Edit(/**)",
"Write(/**)",
"Read(/**)"
]
}
}
```

Replace `` with the actual output of the command above.
- **MCP servers and hooks** — inherited from the user's Claude config.
- **Auth** — subscription auth follows `~/.claude/.credentials.json` (or the
macOS keychain) exactly as it would for an interactive session.
- **Permission mode** — the only thing the wrapper still controls per call:
`--permission-mode bypassPermissions` when bypass is enabled, otherwise
`default`.

#### True process isolation: use an SDK backend

If you need per-gremlin tool allow-lists, hermetic config, or a clean
separation between gremlins and your interactive Claude session, use one of
the SDK-backed providers instead:

- `anthropic:` — `claude-agent-sdk` with `setting_sources=[]` (no
ambient settings, no MCP, no hooks). Requires `ANTHROPIC_API_KEY`.
`allowed_tools` from the native block is enforced by the SDK.
- `openai:` / `xai:` — `openai-agents` SDK with the
in-tree `GREMLINS_TOOLS` list. Per-gremlin `allowed_tools` filters that
list. Requires `OPENAI_API_KEY` / `XAI_API_KEY`.

Set via pipeline YAML:

```yaml
default_client: anthropic:claude-sonnet-4-6
# or per-stage:
stages:
- name: implement
client: anthropic:claude-sonnet-4-6
```

Subscription auth is not available on the SDK backends — that is Anthropic
policy, not a gremlins limitation.

### Local environment overrides

If `.gremlins/env` exists in the project root, gremlins sources it through
`bash` at startup and merges any new or changed variables into the process
environment before any stage runs. All subprocesses (plan, implement, verify,
review) inherit the result automatically.

> **Security warning:** because `.gremlins/env` is executed as a bash script,
> it can run arbitrary code. Do not run gremlins in a repository unless you
> have reviewed the contents of `.gremlins/env` and trust them.

The file is sourced via `bash`, so it can use command substitution,
conditionals, and anything bash supports:

```sh
export VIRTUAL_ENV=$(poetry env info --path)
export PATH="$VIRTUAL_ENV/bin:$PATH"
export TEST_DATABASE_URL=postgresql://localhost/mydb_test
```

Add `.gremlins/env` to your `~/.gitignore_global` or project `.gitignore`.

### Loader API

`gremlins/pipeline/loader.py` exposes:

- `load_pipeline(path)` → `Pipeline` — parses a YAML file, resolves `clients`
via `CLIENT_FACTORIES`, and validates every stage `type` against
`STAGE_REGISTRY` (populated by importing `gremlins.stages.all`).
- `resolve_pipeline_path(name_or_path, base_dir)` — resolves a name or path
using the discovery order above.

Dataclasses: `Pipeline`, `StageEntry` (parallel groups have `type="parallel"`
internally and carry a `children` list and optional `max_concurrent`).

## Internals docs

- [`gremlins/AGENTS.md`](gremlins/AGENTS.md) — module layout, entry points,
testability seam, byte-stable strings
- [`gremlins/fleet/AGENTS.md`](gremlins/fleet/AGENTS.md) — fleet manager internals
- [`gremlins/orchestrators/AGENTS.md`](gremlins/orchestrators/AGENTS.md) — orchestrator internals
- [`gremlins/stages/AGENTS.md`](gremlins/stages/AGENTS.md) — stage internals

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xbrianh/gremlins

Awesome Lists containing this project

README