An open API service indexing awesome lists of open source software.

https://github.com/ikennaokpala/forge

Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.
https://github.com/ikennaokpala/forge

agent-skill bdd claude-code gherkin quality-engineering testing

Last synced: 1 day ago
JSON representation

Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.

Awesome Lists containing this project

README

          

# Forge

**Behavioral validation forged in, not bolted on.**

Forge is an autonomous behavioral validation swarm skill for [Claude Code](https://claude.com/claude-code) that combines BDD behavioral verification, 7 behavioral validation gates, confidence-tiered learning, topological governance, and self-healing fix loops. It spawns 8 specialized agents that work in parallel to verify, test, fix, and commit — continuously — until every Gherkin scenario passes and every behavioral validation gate clears.

---

## Key Features

- **8 specialized agents** working in parallel with cost-optimized model routing
- **Gherkin behavioral specifications** as the single source of truth
- **7 behavioral validation gates**: Functional, Behavioral, Coverage, Security, Accessibility, Resilience, Contract
- **12 topological governance specifications** (§1.1–§1.12) — mathematical foundations for autonomous behavioral validation
- **Confidence-tiered fix patterns** (Platinum/Gold/Silver/Bronze) with Nash Equilibrium convergence
- **Defect prediction** based on historical failure data and file changes
- **Chaos/resilience testing** with controlled failure injection
- **Cross-context dependency awareness** with cascade re-testing and sheaf cohomology consistency
- **Shared types and cross-cutting validation** across bounded contexts
- **Agent-optimized ADRs** with MUST/MUST NOT constraints and verification commands
- **Visual regression testing** with pixel-by-pixel comparison
- **Architecture-agnostic** — monolith, microservices, monorepo, mobile+backend
- **Optional Agentic QE integration** for enhanced pattern search, security scanning, and more
- **External-only mocking** — mock third-party services, never internal code (production-validated policy)
- **Spec drift detection** — detects when Gherkin specs and implementation diverge
- **LLM-as-Judge meta-review** — second-model evaluation with Anti-Echo-Chamber guarantee
- **Self-reflection gate** — Bug Fixer asks "What could go wrong?" before committing
- **Hallucination Gate** — deterministic pre-LLM boundary (AST resolution, contract hash, mocking detection)
- **Agent criticality scoring** — bottleneck detection via Dirichlet energy and automatic optimization
- **Narya-proofs** — counterfactual verification proving fix necessity and sufficiency
- **Property-based testing** — generate 1000+ test cases from invariants
- **Mutation testing** — inject bugs to verify test effectiveness
- **Blake3 witness chain** — cryptographic tamper-evident audit trail for gate verdicts
- **Infrastructure readiness markers** — specify formally, implement pragmatically, upgrade transparently

---

## Philosophy

### Three Pillars

| Pillar | Source | What It Does |
|--------|--------|--------------|
| **Build** | DDD+ADR+TDD methodology | Structured development with behavioral validation gates, defect prediction, confidence-tiered fixes |
| **Verify** | BDD/Gherkin behavioral specs | Continuous behavioral verification — the PRODUCT works, not just the CODE |
| **Heal** | Autonomous E2E fix loop | Test → Analyze → Fix → Commit → Learn → Repeat |

### "DONE DONE"

"DONE DONE" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every behavioral validation gate clears. Every dependency graph is satisfied.

---

## Quick Start

```bash
# Copy SKILL.md to your Claude Code skills directory
cp SKILL.md ~/.claude/skills/forge.md

# Run on your project
/forge --autonomous --context payments
```

---

## Invocation Modes

| Command | Description |
|---------|-------------|
| `/forge --autonomous --all` | Full autonomous run — all contexts, all gates |
| `/forge --autonomous --context [name]` | Single context autonomous run |
| `/forge --verify-only` | Behavioral verification only (no fixes) |
| `/forge --verify-only --context [name]` | Verify single context |
| `/forge --fix-only --context [name]` | Fix failures, don't generate new tests |
| `/forge --learn` | Analyze patterns, update confidence tiers |
| `/forge --add-coverage --screens [names]` | Add coverage for new screens/pages/components |
| `/forge --spec-gen --context [name]` | Generate Gherkin specs for a context |
| `/forge --spec-gen --all` | Generate Gherkin specs for all contexts |
| `/forge --gates-only` | Run behavioral validation gates without test execution |
| `/forge --gates-only --context [name]` | Run behavioral validation gates for single context |
| `/forge --predict` | Defect prediction only |
| `/forge --predict --context [name]` | Predict defects for single context |
| `/forge --chaos --context [name]` | Chaos/resilience testing for a context |
| `/forge --chaos --all` | Chaos testing for all contexts |
| `/forge --drift-check` | Spec drift detection |
| `/forge --drift-check --context [name]` | Drift check for single context |
| `/forge --regressions` | Behavioral regression analysis |
| `/forge --regressions --context [name]` | Regressions for single context |
| `/forge --meta-review` | LLM-as-Judge meta-evaluation |
| `/forge --meta-review --context [name]` | Meta-review for single context |
| `/forge --mutation --context [name]` | Mutation testing for a context |
| `/forge --mutation --critical-only` | Mutation testing for critical paths only |

---

## Architecture

### Autonomous Loop

```
Specify → Test → Analyze → Fix → Audit → Gate → Commit → Learn → Repeat
```

```
┌────────────────────────────────────────────────────────────────────┐
│ FORGE AUTONOMOUS LOOP │
├────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Specify │──▶│ Test │──▶│ Analyze │──▶│ Fix │ │
│ │ (Gherkin)│ │ (Run) │ │ (Root │ │ (Tiered) │ │
│ └──────────┘ └──────────┘ │ Cause) │ └──────────┘ │
│ ▲ └──────────┘ │ │
│ │ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Learn │◀──│ Commit │◀──│ Gate │◀──│ Audit │ │
│ │ (Update │ │ (Auto) │ │ (7 Gates)│ │ (A11y) │ │
│ │ Tiers) │ └──────────┘ └──────────┘ └──────────┘ │
│ └──────────┘ │
│ │ │
│ └──────────────── REPEAT ──────────────────────────────────│
│ │
│ Loop continues until: ALL 7 VALIDATION GATES PASS or MAX 10 │
└────────────────────────────────────────────────────────────────────┘
```

### Execution Phases

1. **Phase 0** — Backend setup (build, run, health check, seed data)
2. **Phase 1** — Behavioral specification & architecture records (Gherkin specs, ADRs)
3. **Phase 2** — Contract & dependency validation (schemas, shared types, cross-cutting)
4. **Phase 3** — Swarm initialization (load patterns, predictions, confidence tiers)
5. **Phase 4** — Spawn 8 autonomous agents in parallel
6. **Phase 5** — Behavioral validation gates evaluation (7 gates after every fix cycle, BFT consensus ≥5/7)

---

## Behavioral Validation Gates

| Gate | Check | Threshold | Blocking |
|------|-------|-----------|----------|
| 1. Functional | All tests pass | 100% pass rate | YES |
| 2. Behavioral | Gherkin scenarios satisfied | 100% of targeted scenarios | YES |
| 3. Coverage | Path coverage | >=85% overall, >=95% critical | YES (critical only) |
| 4. Security | No secrets, SAST checks, no injection vectors | 0 critical/high violations | YES |
| 5. Accessibility | Labels, target sizes, contrast | WCAG AA | Warning only |
| 6. Resilience | Offline, timeout, error handling | Tested for target context | Warning only |
| 7. Contract | API response matches schema | 0 mismatches | YES |

---

## Agent Roles

| Agent | Model | Role | v1.2.0 Enhancement |
|-------|-------|------|--------------------|
| **Specification Verifier** | Sonnet | Generates/validates Gherkin specs and ADRs for bounded contexts | — |
| **Test Runner** | Haiku | Executes E2E test suites, parses results, maps failures to specs | — |
| **Failure Analyzer** | Sonnet | Root cause analysis, pattern matching, dependency impact assessment | MaTTS — 3 parallel reasoning trajectories with self-contrast |
| **Bug Fixer** | Opus | Applies confidence-tiered fixes from first principles | Driver-Observer algebraic connectivity (λ₂ monitoring) |
| **Behavioral Validation Gate Enforcer** | Haiku | Evaluates all 7 gates, arbitrates agent disagreements | BFT consensus model (≥5/7 threshold, VETO for blocking gates) |
| **Accessibility Auditor** | Sonnet | WCAG AA audit: labels, contrast, targets, focus order | — |
| **Auto-Committer** | Haiku | Stages fixed files, creates detailed commits with gate statuses | — |
| **Learning Optimizer** | Sonnet | Updates confidence tiers, defect prediction, coverage metrics | DISTILL phase — LoRA-style abstraction with EWC++ anti-forgetting |

---

## Topological Governance (v1.2.0)

Forge v1.2.0 introduces 12 formal topological governance specifications (§1.1–§1.12) that provide mathematical foundations for autonomous behavioral validation. Production heuristics from v1.1.0 — criticality scoring, regression tracking, blocking gates — are now anchored to formal mathematical equivalents.

### Four Specification Clusters

| Cluster | Sections | Purpose |
|---------|----------|---------|
| **Consistency & Verification** | §1.1–§1.5 | Sheaf cohomology for cross-context consistency, Dirichlet energy for system tension, persistent Laplacian for regression tracking, Hallucination Gate for pre-LLM verification, Blake3 witness chain for tamper-evident audit |
| **Swarm Stability** | §1.6–§1.7 | Algebraic connectivity (Fiedler value λ₂) for agent coordination monitoring, MinCut isolation for quarantining anomalous agent output |
| **Memory & Reasoning** | §1.8–§1.11 | Hyperbolic memory (Poincaré ball) for hierarchical code embeddings, GF(3) triadic validation for phase transitions, Narya-proofs for counterfactual fix verification, Johnson-Lindenstrauss for sublinear test coverage |
| **Execution Plane** | §1.12 | WASM/Rust pure-function tasks for deterministic verification (Blake3 hashing, eigenvalue computation, GF(3) validation, HNSW search, contract hash comparison, JL projection) |

### Infrastructure Readiness

Every specification is operational today. Infrastructure readiness markers define the path from "correct" to "correct and fast":

| Specification | Current Implementation | Native Infrastructure |
|---|---|---|
| Blake3 witness chain (§1.5) | SHA-256 hashing | Blake3 native hashing |
| Hyperbolic memory (§1.8) | Flat key-value lookups across 10 namespaces | HNSW-indexed Poincaré ball embeddings |
| JL coverage (§1.11) | Defect prediction with failure probability ranking | Random projection to O(log n) representative tests |
| WASM execution (§1.12) | LLM structured reasoning for pure functions | WASM/Rust compilation with sub-ms latency |

---

## Configuration

### Project Config (optional)

```yaml
# forge.config.yaml — placed at repo root
architecture: microservices
backend:
services:
- name: auth-service
port: 8081
healthEndpoint: /health
buildCommand: npm run build
runCommand: npm start
frontend:
technology: react
testCommand: npx cypress run --spec {target}
testDir: cypress/e2e/
specDir: cypress/e2e/specs/

# Model routing overrides
model_routing:
bug-fixer: opus
failure-analyzer: sonnet
test-runner: haiku

# Visual regression
visual_regression:
enabled: true
threshold: 0.001

# Agentic QE integration
integrations:
agentic-qe:
enabled: true
domains: [defect-intelligence, security-compliance, visual-accessibility, contract-testing]
```

### Context Config (optional)

```yaml
# forge.contexts.yaml — bounded context definitions
contexts:
- name: identity
testFile: identity.cy.ts
specFile: identity.feature
paths: 68
subdomains: [Auth, Profiles, Verification]
- name: payments
testFile: payments.cy.ts
specFile: payments.feature
paths: 89
subdomains: [Wallet, Cards, Transactions]

dependencies:
identity:
blocks: [payments, orders]
payments:
depends_on: [identity]
blocks: [orders, subscriptions]
```

If no configuration files are present, Forge auto-discovers the project structure on first run.

---

## Agentic QE Integration

Forge optionally integrates with [Agentic QE](https://github.com/proffesor-for-testing/agentic-qe) via MCP for enhanced capabilities:

| Capability | Without AQE | With AQE |
|-----------|-------------|----------|
| Pattern Storage | claude-flow memory | ReasoningBank (vector-indexed, 150x faster) |
| Defect Prediction | File changes + history | Specialized defect-intelligence agents |
| Security Scanning | Gate 4 static checks | Full SAST/DAST analysis |
| Accessibility | Built-in auditor | visual-tester + accessibility-auditor |
| Contract Testing | Schema validation | contract-validator + graphql-tester |
| Progress | `.forge/progress.jsonl` | AG-UI real-time streaming |

All AQE features are additive. Forge works identically without AQE installed.

---

## References

- [Continuous Behavioral Verification: Ongoing Path to Done](https://www.linkedin.com/pulse/continuous-behavioral-verification-ongoing-path-done-ikenna-okpala) — Ikenna Okpala
- [Build with Quality Skill: How I Build Software 10x Faster](https://www.linkedin.com/pulse/build-quality-skill-how-i-build-software-10x-faster-mondweep-chakravorty) — Mondweep Chakravorty
- [claude-code-v3-qe-skill](https://github.com/mondweep/vibe-cast) — V3 QE Skill
- [agentic-qe](https://github.com/proffesor-for-testing/agentic-qe) — Agentic QE Framework
- Advanced Topological Governance in Autonomous Software Engineering — Formal mathematical foundations (sheaf theory, spectral analysis, Galois fields) for v1.2.0 specifications

---

## License

MIT