https://github.com/karolswdev/guildr

Self-hosted, single-model SDLC orchestrator. One local LLM (Qwen3 via llama.cpp) plays five roles — Architect, Coder, Tester, Reviewer, Deployer — to drive a project from idea to deployed code, with human-approval gates and a LAN-only PWA.
https://github.com/karolswdev/guildr

agent-orchestration ai-agents ai-coding-assistant ai-orchestration autonomous-agents code-generation developer-tools dogfooding fastapi llama-cpp llm local-llm multi-agent pwa python qwen sdlc-automation self-hosted

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/karolswdev/guildr
Owner: karolswdev
License: mit
Created: 2026-04-19T23:48:48.000Z (2 months ago)
Default Branch: main
Last Pushed: 2026-04-20T02:27:56.000Z (2 months ago)
Last Synced: 2026-04-20T03:39:00.392Z (2 months ago)
Topics: agent-orchestration, ai-agents, ai-coding-assistant, ai-orchestration, autonomous-agents, code-generation, developer-tools, dogfooding, fastapi, llama-cpp, llm, local-llm, multi-agent, pwa, python, qwen, sdlc-automation, self-hosted
Language: Python
Size: 5.63 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # guildr



  



> **Status: alpha.** Dry-run pipeline is verified end-to-end (438 tests, 90% coverage).

> Live llama-server path runs but hasn't been battle-tested. Point it at a

> throwaway side project first.

**What it is.** A self-hosted SDLC pipeline that turns a one-paragraph idea

(`qwendea.md`) into a reviewed, tested, deploy-documented project — using

**one local LLM** across five specialised roles, with human approval gates at

the points that actually matter.

**Why you'd use it.**

- **No API bills, no vendor lock.** Your GPU, your tokens, your filesystem.

  Qwen3 on llama.cpp does all the work.

- **Evidence over vibes.** Every task ships with a declared verification

  command. The Tester re-runs it independently of the Coder — "I wrote it"

  never implies "I tested it."

- **You stay in the loop.** Pipeline halts before implementation and before

  deployment. Approve, reject, or edit in the PWA from your phone.

- **Auditable by default.** Every phase writes a markdown artifact

  (`sprint-plan.md`, `TEST_REPORT.md`, `REVIEW.md`, `DEPLOY.md`) plus a

  JSONL event log. You can read every decision the model made.

- **LAN-only out of the box.** Backend rejects non-RFC1918 source IPs unless

  you explicitly opt in.

---

## How a project breaks up

You write `qwendea.md` — one paragraph describing what you want. Everything

else is produced by the pipeline:

```mermaid

flowchart LR

    Q[qwendea.md
your one-paragraph idea] --> A[1. Architect]

    A --> SP[sprint-plan.md
tasks + evidence reqs]

    SP --> G1{Gate:
approve plan?}

    G1 -->|approve| C[2. Coder]

    G1 -.->|reject / edit| A

    C --> Src[source + tests
written to project dir]

    Src --> T[3. Tester]

    T --> TR[TEST_REPORT.md
per-task PASS/FAIL]

    TR -->|any FAIL| C

    TR -->|all PASS| R[4. Reviewer]

    R --> RV[REVIEW.md
APPROVED / REJECTED
+ criterion checklist]

    RV --> G2{Gate:
ship it?}

    G2 -->|approve| D[5. Deployer]

    G2 -.->|reject| C

    D --> DP[DEPLOY.md
target + env
+ smoke tests]

    style A fill:#2a4d6e,color:#fff

    style C fill:#2a4d6e,color:#fff

    style T fill:#2a4d6e,color:#fff

    style R fill:#2a4d6e,color:#fff

    style D fill:#2a4d6e,color:#fff

    style G1 fill:#6e4d2a,color:#fff

    style G2 fill:#6e4d2a,color:#fff

```

What each role is responsible for:

| # | Role | Input | Output | What it actually does |

|---|---|---|---|---|

| 1 | **Architect** | `qwendea.md` | `sprint-plan.md` | Breaks the idea into numbered tasks; each task declares a verification command the Tester will later re-run. |

| 2 | **Coder** | Approved sprint plan | Source files + tests | Implements tasks one at a time. Writes the test alongside the code, not after. |

| 3 | **Tester** | Source tree | `TEST_REPORT.md` | Re-runs each task's declared evidence command from a clean shell. Looping back to the Coder if anything fails — up to `ORCHESTRATOR_MAX_RETRIES`. |

| 4 | **Reviewer** | Source + test report | `REVIEW.md` | Checks against the sprint plan's acceptance criteria, flags scope creep, demands fixes. Not a rubber stamp — can reject and kick back to Coder. |

| 5 | **Deployer** | Approved review | `DEPLOY.md` | Writes the runbook: target, env vars, manual steps, smoke tests. Does not push anything — that's yours. |

**Retries are contextual, not blind.** When a phase fails, the harness feeds

the diff + failure tail back into the *next* attempt, and can optionally ask

a second "Coach" model for a diagnostic. The primary never sees the Coach's

output directly — it only sees the *advice* the Coach produced, so the main

context stays clean.

**Gates are strict.** Without `--no-gates`, the pipeline *stops* after the

Architect and after the Reviewer. You approve in the PWA (or CLI); rejecting

kicks the phase back with your feedback appended to the context.

---

## Architecture

```mermaid

flowchart TB

    User([👤 You])

    PWA[PWA / CLI
guildr run]

    API[FastAPI backend
LAN-only middleware]

    ORCH[Orchestrator engine
phase state machine
retry + validate]

    Pool[Async upstream pool]

    LLM[llama-server
Qwen3.6-35B-A3B]

    Gate{Human gate}

    FS[(project-dir/
sprint-plan.md
TEST_REPORT.md
REVIEW.md
DEPLOY.md)]

    User -->|HTTP / SSE| PWA

    PWA --> API

    API --> ORCH

    ORCH --> Pool

    Pool --> LLM

    LLM --> Pool

    Pool --> ORCH

    ORCH --> Gate

    Gate -->|approve| ORCH

    Gate -->|reject| User

    ORCH --> FS

```

### The models doing the work

Everything runs locally on consumer-ish hardware behind a LAN, served by

[llama.cpp](https://github.com/ggerganov/llama.cpp):

| Role | Model | Quant | Job |

|---|---|---|---|

| **Primary** | Qwen3.6-35B-A3B (MoE, 3B active) | Q5\_K\_M | All five orchestrator roles |

| **Coach** | Qwen3.6-35B-A3B | Q6\_K | Second-opinion diagnostic on failed phase retries |

One model, five hats. The Coach is just the same model on a second box,

asked a different question — its output informs the next retry's prompt but

never reaches the Primary's context window directly.

---

## Quickstart

### Install

```bash

git clone https://github.com/karolswdev/guildr.git

cd guildr

./install.sh        # uses uv tool / pipx / pip --user, in that order

guildr --help

```

Prereqs: Python 3.12+, Node 18+ (for the PWA bundle), and a llama.cpp server

for live runs. Dry-run works with no LLM at all.

### Dry-run (no LLM required)

```bash

mkdir -p /tmp/demo && echo "# A tiny CLI that prints hello." > /tmp/demo/qwendea.md

guildr run --from-env --dry-run --no-gates --project /tmp/demo

ls /tmp/demo/   # sprint-plan.md, TEST_REPORT.md, REVIEW.md, DEPLOY.md

```

### Live run

```bash

llama-server -m path/to/Qwen3.6-35B-A3B.gguf -np 1 --host 127.0.0.1 --port 8080

```

```bash

export LLAMA_SERVER_URL=http://127.0.0.1:8080

export PROJECT_DIR=/path/to/your/project   # must contain qwendea.md

guildr run --from-env

```

Or with a config file (see `config.example.yaml`):

```bash

guildr run --config config.yaml

```

### Inspect a run

```bash

guildr inspect /path/to/your/project              # phase + retry summary

guildr inspect /path/to/your/project --phase architect

guildr inspect /path/to/your/project --tokens

```

### Web UI (PWA)

```bash

uvicorn web.backend.app:app --host 0.0.0.0 --port 8000

```

Open `http://:8000` from any device on the same LAN. The

frontend bundle is built by `web/frontend/build.sh` (called automatically by

`install.sh`).

---

## Configuration

| Variable | Default | Description |

|---|---|---|

| `LLAMA_SERVER_URL` | (required for live) | llama.cpp endpoint (e.g. `http://127.0.0.1:8080`) |

| `PROJECT_DIR` | `.` | Project working directory |

| `ORCHESTRATOR_MAX_RETRIES` | `3` | Max retries per phase |

| `ORCHESTRATOR_PROJECTS_DIR` | `/tmp/orchestrator-projects` | Root for PWA-created projects |

| `ORCHESTRATOR_EXPOSE_PUBLIC` | `0` | Set to `1` to allow non-RFC1918 web access (logs a WARNING) |

CLI flags override env vars; env vars override `--config` YAML.

## Project layout produced by a run

```

/

├── qwendea.md              # your one-paragraph idea (source of truth)

├── sprint-plan.md          # Architect output

├── TEST_REPORT.md          # Tester output

├── REVIEW.md               # Reviewer output

├── DEPLOY.md               # Deployer output

└── .orchestrator/

    ├── state.json          # phase, retries, gate decisions

    ├── sessions/           # exported session transcripts

    └── logs/               # per-phase structured logs (.jsonl)

```

---

## Where this came from

guildr started as a one-screen bash loop. The loop poked a local opencode

session at Qwen3, fed it a phase plan, watched for idleness, ran a verifier.

When the verifier failed, the loop stuffed the diff + failure tail into the

next prompt. That was the whole trick — and it worked well enough that the

bash scaffold kept growing features (watchdogs, a retry coach, structured

handoff docs) until it was clearly trying to become a real framework.

So we let it. The harness wrote the orchestrator. The orchestrator is what

the harness wishes it were when it grows up.

```mermaid

flowchart LR

    subgraph harness["bash harness (the scaffold)"]

        BL[build-phase.sh
watchdogs + verifier gate
retry-context composer]

    end

    subgraph models["local llama.cpp"]

        P[Qwen3 PRIMARY
writes the code]

        C[Qwen3 COACH
diagnoses failures]

    end

    subgraph product["this repo (guildr)"]

        OR[Orchestrator engine
5 role prompts
retry + gate logic
web PWA]

    end

    BL -->|phase plan + retries| P

    BL -.->|failure context| C

    C -.->|hypothesis + next-attempt advice| BL

    P -->|writes / edits| OR

    OR -.->|future: replaces| BL

    style harness fill:#2d2d44,color:#fff

    style models fill:#1a3d2e,color:#fff

    style product fill:#3d1a2e,color:#fff

```

The dotted arrow is the punchline: the artifact built by the harness is

itself a more polished, testable, web-driven version of the harness. The

retry-coach module in `bootstrap/lib/coach.sh` was itself proposed and added

by the harness during one of its own retries.

> 📜 **Receipts.** The exact phase plans and end-of-phase handoffs the model

> worked from are checked in under [`docs/methodology/`](docs/methodology/).

> Read those for the unfiltered version of what got fed to the LLM.

---

## Development

```bash

python -m venv .venv && source .venv/bin/activate

pip install -e ".[dev]"

pytest -q                                          # full suite (~20s, 438 tests)

pytest tests/test_integration_dry_run.py -v        # full pipeline e2e (dry-run)

pytest --cov=orchestrator --cov=web --cov-report=term-missing

```

## Security

guildr is designed for self-hosted, single-user use on a trusted LAN. The

web backend rejects non-RFC1918 source IPs by default; the llama-server

upstream has no authentication. Do not expose this to the internet without

adding your own auth layer.

## License

MIT — see [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/karolswdev/guildr

Awesome Lists containing this project

README