https://github.com/ikennaokpala/forge

Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.
https://github.com/ikennaokpala/forge
agent-skill bdd claude-code gherkin quality-engineering testing
Last synced: 1 day ago
JSON representation
Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.
Host: GitHub
URL: https://github.com/ikennaokpala/forge
Owner: ikennaokpala
License: mit
Created: 2026-02-07T07:58:30.000Z (14 days ago)
Default Branch: main
Last Pushed: 2026-02-20T06:47:12.000Z (1 day ago)
Last Synced: 2026-02-20T06:57:34.989Z (1 day ago)
Topics: agent-skill, bdd, claude-code, gherkin, quality-engineering, testing
Homepage:
Size: 146 KB
Stars: 7
Watchers: 0
Forks: 2
Open Issues: 14
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project

README

          # Forge

**Behavioral validation forged in, not bolted on.**

Forge is an autonomous behavioral validation swarm skill for [Claude Code](https://claude.com/claude-code) that combines BDD behavioral verification, 7 behavioral validation gates, confidence-tiered learning, topological governance, and self-healing fix loops. It spawns 8 specialized agents that work in parallel to verify, test, fix, and commit — continuously — until every Gherkin scenario passes and every behavioral validation gate clears.

---

## Key Features

- **8 specialized agents** working in parallel with cost-optimized model routing

- **Gherkin behavioral specifications** as the single source of truth

- **7 behavioral validation gates**: Functional, Behavioral, Coverage, Security, Accessibility, Resilience, Contract

- **12 topological governance specifications** (§1.1–§1.12) — mathematical foundations for autonomous behavioral validation

- **Confidence-tiered fix patterns** (Platinum/Gold/Silver/Bronze) with Nash Equilibrium convergence

- **Defect prediction** based on historical failure data and file changes

- **Chaos/resilience testing** with controlled failure injection

- **Cross-context dependency awareness** with cascade re-testing and sheaf cohomology consistency

- **Shared types and cross-cutting validation** across bounded contexts

- **Agent-optimized ADRs** with MUST/MUST NOT constraints and verification commands

- **Visual regression testing** with pixel-by-pixel comparison

- **Architecture-agnostic** — monolith, microservices, monorepo, mobile+backend

- **Optional Agentic QE integration** for enhanced pattern search, security scanning, and more

- **External-only mocking** — mock third-party services, never internal code (production-validated policy)

- **Spec drift detection** — detects when Gherkin specs and implementation diverge

- **LLM-as-Judge meta-review** — second-model evaluation with Anti-Echo-Chamber guarantee

- **Self-reflection gate** — Bug Fixer asks "What could go wrong?" before committing

- **Hallucination Gate** — deterministic pre-LLM boundary (AST resolution, contract hash, mocking detection)

- **Agent criticality scoring** — bottleneck detection via Dirichlet energy and automatic optimization

- **Narya-proofs** — counterfactual verification proving fix necessity and sufficiency

- **Property-based testing** — generate 1000+ test cases from invariants

- **Mutation testing** — inject bugs to verify test effectiveness

- **Blake3 witness chain** — cryptographic tamper-evident audit trail for gate verdicts

- **Infrastructure readiness markers** — specify formally, implement pragmatically, upgrade transparently

---

## Philosophy

### Three Pillars

| Pillar | Source | What It Does |

|--------|--------|--------------|

| **Build** | DDD+ADR+TDD methodology | Structured development with behavioral validation gates, defect prediction, confidence-tiered fixes |

| **Verify** | BDD/Gherkin behavioral specs | Continuous behavioral verification — the PRODUCT works, not just the CODE |

| **Heal** | Autonomous E2E fix loop | Test → Analyze → Fix → Commit → Learn → Repeat |

### "DONE DONE"

"DONE DONE" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every behavioral validation gate clears. Every dependency graph is satisfied.

---

## Quick Start

```bash

# Copy SKILL.md to your Claude Code skills directory

cp SKILL.md ~/.claude/skills/forge.md

# Run on your project

/forge --autonomous --context payments

```

---

## Invocation Modes

| Command | Description |

|---------|-------------|

| `/forge --autonomous --all` | Full autonomous run — all contexts, all gates |

| `/forge --autonomous --context [name]` | Single context autonomous run |

| `/forge --verify-only` | Behavioral verification only (no fixes) |

| `/forge --verify-only --context [name]` | Verify single context |

| `/forge --fix-only --context [name]` | Fix failures, don't generate new tests |

| `/forge --learn` | Analyze patterns, update confidence tiers |

| `/forge --add-coverage --screens [names]` | Add coverage for new screens/pages/components |

| `/forge --spec-gen --context [name]` | Generate Gherkin specs for a context |

| `/forge --spec-gen --all` | Generate Gherkin specs for all contexts |

| `/forge --gates-only` | Run behavioral validation gates without test execution |

| `/forge --gates-only --context [name]` | Run behavioral validation gates for single context |

| `/forge --predict` | Defect prediction only |

| `/forge --predict --context [name]` | Predict defects for single context |

| `/forge --chaos --context [name]` | Chaos/resilience testing for a context |

| `/forge --chaos --all` | Chaos testing for all contexts |

| `/forge --drift-check` | Spec drift detection |

| `/forge --drift-check --context [name]` | Drift check for single context |

| `/forge --regressions` | Behavioral regression analysis |

| `/forge --regressions --context [name]` | Regressions for single context |

| `/forge --meta-review` | LLM-as-Judge meta-evaluation |

| `/forge --meta-review --context [name]` | Meta-review for single context |

| `/forge --mutation --context [name]` | Mutation testing for a context |

| `/forge --mutation --critical-only` | Mutation testing for critical paths only |

---

## Architecture

### Autonomous Loop

```

Specify → Test → Analyze → Fix → Audit → Gate → Commit → Learn → Repeat

```

```

┌────────────────────────────────────────────────────────────────────┐

│                    FORGE AUTONOMOUS LOOP                            │

├────────────────────────────────────────────────────────────────────┤

│                                                                    │

│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐      │

│  │ Specify  │──▶│   Test   │──▶│ Analyze  │──▶│   Fix    │      │

│  │ (Gherkin)│   │ (Run)    │   │ (Root    │   │ (Tiered) │      │

│  └──────────┘   └──────────┘   │  Cause)  │   └──────────┘      │

│       ▲                        └──────────┘        │              │

│       │                                            ▼              │

│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐      │

│  │  Learn   │◀──│  Commit  │◀──│  Gate    │◀──│  Audit   │      │

│  │ (Update  │   │ (Auto)   │   │ (7 Gates)│   │ (A11y)   │      │

│  │  Tiers)  │   └──────────┘   └──────────┘   └──────────┘      │

│  └──────────┘                                                     │

│       │                                                           │

│       └──────────────── REPEAT ──────────────────────────────────│

│                                                                    │

│  Loop continues until: ALL 7 VALIDATION GATES PASS or MAX 10    │

└────────────────────────────────────────────────────────────────────┘

```

### Execution Phases

1. **Phase 0** — Backend setup (build, run, health check, seed data)

2. **Phase 1** — Behavioral specification & architecture records (Gherkin specs, ADRs)

3. **Phase 2** — Contract & dependency validation (schemas, shared types, cross-cutting)

4. **Phase 3** — Swarm initialization (load patterns, predictions, confidence tiers)

5. **Phase 4** — Spawn 8 autonomous agents in parallel

6. **Phase 5** — Behavioral validation gates evaluation (7 gates after every fix cycle, BFT consensus ≥5/7)

---

## Behavioral Validation Gates

| Gate | Check | Threshold | Blocking |

|------|-------|-----------|----------|

| 1. Functional | All tests pass | 100% pass rate | YES |

| 2. Behavioral | Gherkin scenarios satisfied | 100% of targeted scenarios | YES |

| 3. Coverage | Path coverage | >=85% overall, >=95% critical | YES (critical only) |

| 4. Security | No secrets, SAST checks, no injection vectors | 0 critical/high violations | YES |

| 5. Accessibility | Labels, target sizes, contrast | WCAG AA | Warning only |

| 6. Resilience | Offline, timeout, error handling | Tested for target context | Warning only |

| 7. Contract | API response matches schema | 0 mismatches | YES |

---

## Agent Roles

| Agent | Model | Role | v1.2.0 Enhancement |

|-------|-------|------|--------------------|

| **Specification Verifier** | Sonnet | Generates/validates Gherkin specs and ADRs for bounded contexts | — |

| **Test Runner** | Haiku | Executes E2E test suites, parses results, maps failures to specs | — |

| **Failure Analyzer** | Sonnet | Root cause analysis, pattern matching, dependency impact assessment | MaTTS — 3 parallel reasoning trajectories with self-contrast |

| **Bug Fixer** | Opus | Applies confidence-tiered fixes from first principles | Driver-Observer algebraic connectivity (λ₂ monitoring) |

| **Behavioral Validation Gate Enforcer** | Haiku | Evaluates all 7 gates, arbitrates agent disagreements | BFT consensus model (≥5/7 threshold, VETO for blocking gates) |

| **Accessibility Auditor** | Sonnet | WCAG AA audit: labels, contrast, targets, focus order | — |

| **Auto-Committer** | Haiku | Stages fixed files, creates detailed commits with gate statuses | — |

| **Learning Optimizer** | Sonnet | Updates confidence tiers, defect prediction, coverage metrics | DISTILL phase — LoRA-style abstraction with EWC++ anti-forgetting |

---

## Topological Governance (v1.2.0)

Forge v1.2.0 introduces 12 formal topological governance specifications (§1.1–§1.12) that provide mathematical foundations for autonomous behavioral validation. Production heuristics from v1.1.0 — criticality scoring, regression tracking, blocking gates — are now anchored to formal mathematical equivalents.

### Four Specification Clusters

| Cluster | Sections | Purpose |

|---------|----------|---------|

| **Consistency & Verification** | §1.1–§1.5 | Sheaf cohomology for cross-context consistency, Dirichlet energy for system tension, persistent Laplacian for regression tracking, Hallucination Gate for pre-LLM verification, Blake3 witness chain for tamper-evident audit |

| **Swarm Stability** | §1.6–§1.7 | Algebraic connectivity (Fiedler value λ₂) for agent coordination monitoring, MinCut isolation for quarantining anomalous agent output |

| **Memory & Reasoning** | §1.8–§1.11 | Hyperbolic memory (Poincaré ball) for hierarchical code embeddings, GF(3) triadic validation for phase transitions, Narya-proofs for counterfactual fix verification, Johnson-Lindenstrauss for sublinear test coverage |

| **Execution Plane** | §1.12 | WASM/Rust pure-function tasks for deterministic verification (Blake3 hashing, eigenvalue computation, GF(3) validation, HNSW search, contract hash comparison, JL projection) |

### Infrastructure Readiness

Every specification is operational today. Infrastructure readiness markers define the path from "correct" to "correct and fast":

| Specification | Current Implementation | Native Infrastructure |

|---|---|---|

| Blake3 witness chain (§1.5) | SHA-256 hashing | Blake3 native hashing |

| Hyperbolic memory (§1.8) | Flat key-value lookups across 10 namespaces | HNSW-indexed Poincaré ball embeddings |

| JL coverage (§1.11) | Defect prediction with failure probability ranking | Random projection to O(log n) representative tests |

| WASM execution (§1.12) | LLM structured reasoning for pure functions | WASM/Rust compilation with sub-ms latency |

---

## Configuration

### Project Config (optional)

```yaml

# forge.config.yaml — placed at repo root

architecture: microservices

backend:

  services:

    - name: auth-service

      port: 8081

      healthEndpoint: /health

      buildCommand: npm run build

      runCommand: npm start

frontend:

  technology: react

  testCommand: npx cypress run --spec {target}

  testDir: cypress/e2e/

  specDir: cypress/e2e/specs/

# Model routing overrides

model_routing:

  bug-fixer: opus

  failure-analyzer: sonnet

  test-runner: haiku

# Visual regression

visual_regression:

  enabled: true

  threshold: 0.001

# Agentic QE integration

integrations:

  agentic-qe:

    enabled: true

    domains: [defect-intelligence, security-compliance, visual-accessibility, contract-testing]

```

### Context Config (optional)

```yaml

# forge.contexts.yaml — bounded context definitions

contexts:

  - name: identity

    testFile: identity.cy.ts

    specFile: identity.feature

    paths: 68

    subdomains: [Auth, Profiles, Verification]

  - name: payments

    testFile: payments.cy.ts

    specFile: payments.feature

    paths: 89

    subdomains: [Wallet, Cards, Transactions]

dependencies:

  identity:

    blocks: [payments, orders]

  payments:

    depends_on: [identity]

    blocks: [orders, subscriptions]

```

If no configuration files are present, Forge auto-discovers the project structure on first run.

---

## Agentic QE Integration

Forge optionally integrates with [Agentic QE](https://github.com/proffesor-for-testing/agentic-qe) via MCP for enhanced capabilities:

| Capability | Without AQE | With AQE |

|-----------|-------------|----------|

| Pattern Storage | claude-flow memory | ReasoningBank (vector-indexed, 150x faster) |

| Defect Prediction | File changes + history | Specialized defect-intelligence agents |

| Security Scanning | Gate 4 static checks | Full SAST/DAST analysis |

| Accessibility | Built-in auditor | visual-tester + accessibility-auditor |

| Contract Testing | Schema validation | contract-validator + graphql-tester |

| Progress | `.forge/progress.jsonl` | AG-UI real-time streaming |

All AQE features are additive. Forge works identically without AQE installed.

---

## References

- [Continuous Behavioral Verification: Ongoing Path to Done](https://www.linkedin.com/pulse/continuous-behavioral-verification-ongoing-path-done-ikenna-okpala) — Ikenna Okpala

- [Build with Quality Skill: How I Build Software 10x Faster](https://www.linkedin.com/pulse/build-quality-skill-how-i-build-software-10x-faster-mondweep-chakravorty) — Mondweep Chakravorty

- [claude-code-v3-qe-skill](https://github.com/mondweep/vibe-cast) — V3 QE Skill

- [agentic-qe](https://github.com/proffesor-for-testing/agentic-qe) — Agentic QE Framework

- Advanced Topological Governance in Autonomous Software Engineering — Formal mathematical foundations (sheaf theory, spectral analysis, Galois fields) for v1.2.0 specifications

---

## License

MIT
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ikennaokpala/forge

Awesome Lists containing this project

README