https://github.com/ikennaokpala/forge
Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.
https://github.com/ikennaokpala/forge
agent-skill bdd claude-code gherkin quality-engineering testing
Last synced: 1 day ago
JSON representation
Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.
- Host: GitHub
- URL: https://github.com/ikennaokpala/forge
- Owner: ikennaokpala
- License: mit
- Created: 2026-02-07T07:58:30.000Z (14 days ago)
- Default Branch: main
- Last Pushed: 2026-02-20T06:47:12.000Z (1 day ago)
- Last Synced: 2026-02-20T06:57:34.989Z (1 day ago)
- Topics: agent-skill, bdd, claude-code, gherkin, quality-engineering, testing
- Homepage:
- Size: 146 KB
- Stars: 7
- Watchers: 0
- Forks: 2
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Forge
**Behavioral validation forged in, not bolted on.**
Forge is an autonomous behavioral validation swarm skill for [Claude Code](https://claude.com/claude-code) that combines BDD behavioral verification, 7 behavioral validation gates, confidence-tiered learning, topological governance, and self-healing fix loops. It spawns 8 specialized agents that work in parallel to verify, test, fix, and commit — continuously — until every Gherkin scenario passes and every behavioral validation gate clears.
---
## Key Features
- **8 specialized agents** working in parallel with cost-optimized model routing
- **Gherkin behavioral specifications** as the single source of truth
- **7 behavioral validation gates**: Functional, Behavioral, Coverage, Security, Accessibility, Resilience, Contract
- **12 topological governance specifications** (§1.1–§1.12) — mathematical foundations for autonomous behavioral validation
- **Confidence-tiered fix patterns** (Platinum/Gold/Silver/Bronze) with Nash Equilibrium convergence
- **Defect prediction** based on historical failure data and file changes
- **Chaos/resilience testing** with controlled failure injection
- **Cross-context dependency awareness** with cascade re-testing and sheaf cohomology consistency
- **Shared types and cross-cutting validation** across bounded contexts
- **Agent-optimized ADRs** with MUST/MUST NOT constraints and verification commands
- **Visual regression testing** with pixel-by-pixel comparison
- **Architecture-agnostic** — monolith, microservices, monorepo, mobile+backend
- **Optional Agentic QE integration** for enhanced pattern search, security scanning, and more
- **External-only mocking** — mock third-party services, never internal code (production-validated policy)
- **Spec drift detection** — detects when Gherkin specs and implementation diverge
- **LLM-as-Judge meta-review** — second-model evaluation with Anti-Echo-Chamber guarantee
- **Self-reflection gate** — Bug Fixer asks "What could go wrong?" before committing
- **Hallucination Gate** — deterministic pre-LLM boundary (AST resolution, contract hash, mocking detection)
- **Agent criticality scoring** — bottleneck detection via Dirichlet energy and automatic optimization
- **Narya-proofs** — counterfactual verification proving fix necessity and sufficiency
- **Property-based testing** — generate 1000+ test cases from invariants
- **Mutation testing** — inject bugs to verify test effectiveness
- **Blake3 witness chain** — cryptographic tamper-evident audit trail for gate verdicts
- **Infrastructure readiness markers** — specify formally, implement pragmatically, upgrade transparently
---
## Philosophy
### Three Pillars
| Pillar | Source | What It Does |
|--------|--------|--------------|
| **Build** | DDD+ADR+TDD methodology | Structured development with behavioral validation gates, defect prediction, confidence-tiered fixes |
| **Verify** | BDD/Gherkin behavioral specs | Continuous behavioral verification — the PRODUCT works, not just the CODE |
| **Heal** | Autonomous E2E fix loop | Test → Analyze → Fix → Commit → Learn → Repeat |
### "DONE DONE"
"DONE DONE" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every behavioral validation gate clears. Every dependency graph is satisfied.
---
## Quick Start
```bash
# Copy SKILL.md to your Claude Code skills directory
cp SKILL.md ~/.claude/skills/forge.md
# Run on your project
/forge --autonomous --context payments
```
---
## Invocation Modes
| Command | Description |
|---------|-------------|
| `/forge --autonomous --all` | Full autonomous run — all contexts, all gates |
| `/forge --autonomous --context [name]` | Single context autonomous run |
| `/forge --verify-only` | Behavioral verification only (no fixes) |
| `/forge --verify-only --context [name]` | Verify single context |
| `/forge --fix-only --context [name]` | Fix failures, don't generate new tests |
| `/forge --learn` | Analyze patterns, update confidence tiers |
| `/forge --add-coverage --screens [names]` | Add coverage for new screens/pages/components |
| `/forge --spec-gen --context [name]` | Generate Gherkin specs for a context |
| `/forge --spec-gen --all` | Generate Gherkin specs for all contexts |
| `/forge --gates-only` | Run behavioral validation gates without test execution |
| `/forge --gates-only --context [name]` | Run behavioral validation gates for single context |
| `/forge --predict` | Defect prediction only |
| `/forge --predict --context [name]` | Predict defects for single context |
| `/forge --chaos --context [name]` | Chaos/resilience testing for a context |
| `/forge --chaos --all` | Chaos testing for all contexts |
| `/forge --drift-check` | Spec drift detection |
| `/forge --drift-check --context [name]` | Drift check for single context |
| `/forge --regressions` | Behavioral regression analysis |
| `/forge --regressions --context [name]` | Regressions for single context |
| `/forge --meta-review` | LLM-as-Judge meta-evaluation |
| `/forge --meta-review --context [name]` | Meta-review for single context |
| `/forge --mutation --context [name]` | Mutation testing for a context |
| `/forge --mutation --critical-only` | Mutation testing for critical paths only |
---
## Architecture
### Autonomous Loop
```
Specify → Test → Analyze → Fix → Audit → Gate → Commit → Learn → Repeat
```
```
┌────────────────────────────────────────────────────────────────────┐
│ FORGE AUTONOMOUS LOOP │
├────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Specify │──▶│ Test │──▶│ Analyze │──▶│ Fix │ │
│ │ (Gherkin)│ │ (Run) │ │ (Root │ │ (Tiered) │ │
│ └──────────┘ └──────────┘ │ Cause) │ └──────────┘ │
│ ▲ └──────────┘ │ │
│ │ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Learn │◀──│ Commit │◀──│ Gate │◀──│ Audit │ │
│ │ (Update │ │ (Auto) │ │ (7 Gates)│ │ (A11y) │ │
│ │ Tiers) │ └──────────┘ └──────────┘ └──────────┘ │
│ └──────────┘ │
│ │ │
│ └──────────────── REPEAT ──────────────────────────────────│
│ │
│ Loop continues until: ALL 7 VALIDATION GATES PASS or MAX 10 │
└────────────────────────────────────────────────────────────────────┘
```
### Execution Phases
1. **Phase 0** — Backend setup (build, run, health check, seed data)
2. **Phase 1** — Behavioral specification & architecture records (Gherkin specs, ADRs)
3. **Phase 2** — Contract & dependency validation (schemas, shared types, cross-cutting)
4. **Phase 3** — Swarm initialization (load patterns, predictions, confidence tiers)
5. **Phase 4** — Spawn 8 autonomous agents in parallel
6. **Phase 5** — Behavioral validation gates evaluation (7 gates after every fix cycle, BFT consensus ≥5/7)
---
## Behavioral Validation Gates
| Gate | Check | Threshold | Blocking |
|------|-------|-----------|----------|
| 1. Functional | All tests pass | 100% pass rate | YES |
| 2. Behavioral | Gherkin scenarios satisfied | 100% of targeted scenarios | YES |
| 3. Coverage | Path coverage | >=85% overall, >=95% critical | YES (critical only) |
| 4. Security | No secrets, SAST checks, no injection vectors | 0 critical/high violations | YES |
| 5. Accessibility | Labels, target sizes, contrast | WCAG AA | Warning only |
| 6. Resilience | Offline, timeout, error handling | Tested for target context | Warning only |
| 7. Contract | API response matches schema | 0 mismatches | YES |
---
## Agent Roles
| Agent | Model | Role | v1.2.0 Enhancement |
|-------|-------|------|--------------------|
| **Specification Verifier** | Sonnet | Generates/validates Gherkin specs and ADRs for bounded contexts | — |
| **Test Runner** | Haiku | Executes E2E test suites, parses results, maps failures to specs | — |
| **Failure Analyzer** | Sonnet | Root cause analysis, pattern matching, dependency impact assessment | MaTTS — 3 parallel reasoning trajectories with self-contrast |
| **Bug Fixer** | Opus | Applies confidence-tiered fixes from first principles | Driver-Observer algebraic connectivity (λ₂ monitoring) |
| **Behavioral Validation Gate Enforcer** | Haiku | Evaluates all 7 gates, arbitrates agent disagreements | BFT consensus model (≥5/7 threshold, VETO for blocking gates) |
| **Accessibility Auditor** | Sonnet | WCAG AA audit: labels, contrast, targets, focus order | — |
| **Auto-Committer** | Haiku | Stages fixed files, creates detailed commits with gate statuses | — |
| **Learning Optimizer** | Sonnet | Updates confidence tiers, defect prediction, coverage metrics | DISTILL phase — LoRA-style abstraction with EWC++ anti-forgetting |
---
## Topological Governance (v1.2.0)
Forge v1.2.0 introduces 12 formal topological governance specifications (§1.1–§1.12) that provide mathematical foundations for autonomous behavioral validation. Production heuristics from v1.1.0 — criticality scoring, regression tracking, blocking gates — are now anchored to formal mathematical equivalents.
### Four Specification Clusters
| Cluster | Sections | Purpose |
|---------|----------|---------|
| **Consistency & Verification** | §1.1–§1.5 | Sheaf cohomology for cross-context consistency, Dirichlet energy for system tension, persistent Laplacian for regression tracking, Hallucination Gate for pre-LLM verification, Blake3 witness chain for tamper-evident audit |
| **Swarm Stability** | §1.6–§1.7 | Algebraic connectivity (Fiedler value λ₂) for agent coordination monitoring, MinCut isolation for quarantining anomalous agent output |
| **Memory & Reasoning** | §1.8–§1.11 | Hyperbolic memory (Poincaré ball) for hierarchical code embeddings, GF(3) triadic validation for phase transitions, Narya-proofs for counterfactual fix verification, Johnson-Lindenstrauss for sublinear test coverage |
| **Execution Plane** | §1.12 | WASM/Rust pure-function tasks for deterministic verification (Blake3 hashing, eigenvalue computation, GF(3) validation, HNSW search, contract hash comparison, JL projection) |
### Infrastructure Readiness
Every specification is operational today. Infrastructure readiness markers define the path from "correct" to "correct and fast":
| Specification | Current Implementation | Native Infrastructure |
|---|---|---|
| Blake3 witness chain (§1.5) | SHA-256 hashing | Blake3 native hashing |
| Hyperbolic memory (§1.8) | Flat key-value lookups across 10 namespaces | HNSW-indexed Poincaré ball embeddings |
| JL coverage (§1.11) | Defect prediction with failure probability ranking | Random projection to O(log n) representative tests |
| WASM execution (§1.12) | LLM structured reasoning for pure functions | WASM/Rust compilation with sub-ms latency |
---
## Configuration
### Project Config (optional)
```yaml
# forge.config.yaml — placed at repo root
architecture: microservices
backend:
services:
- name: auth-service
port: 8081
healthEndpoint: /health
buildCommand: npm run build
runCommand: npm start
frontend:
technology: react
testCommand: npx cypress run --spec {target}
testDir: cypress/e2e/
specDir: cypress/e2e/specs/
# Model routing overrides
model_routing:
bug-fixer: opus
failure-analyzer: sonnet
test-runner: haiku
# Visual regression
visual_regression:
enabled: true
threshold: 0.001
# Agentic QE integration
integrations:
agentic-qe:
enabled: true
domains: [defect-intelligence, security-compliance, visual-accessibility, contract-testing]
```
### Context Config (optional)
```yaml
# forge.contexts.yaml — bounded context definitions
contexts:
- name: identity
testFile: identity.cy.ts
specFile: identity.feature
paths: 68
subdomains: [Auth, Profiles, Verification]
- name: payments
testFile: payments.cy.ts
specFile: payments.feature
paths: 89
subdomains: [Wallet, Cards, Transactions]
dependencies:
identity:
blocks: [payments, orders]
payments:
depends_on: [identity]
blocks: [orders, subscriptions]
```
If no configuration files are present, Forge auto-discovers the project structure on first run.
---
## Agentic QE Integration
Forge optionally integrates with [Agentic QE](https://github.com/proffesor-for-testing/agentic-qe) via MCP for enhanced capabilities:
| Capability | Without AQE | With AQE |
|-----------|-------------|----------|
| Pattern Storage | claude-flow memory | ReasoningBank (vector-indexed, 150x faster) |
| Defect Prediction | File changes + history | Specialized defect-intelligence agents |
| Security Scanning | Gate 4 static checks | Full SAST/DAST analysis |
| Accessibility | Built-in auditor | visual-tester + accessibility-auditor |
| Contract Testing | Schema validation | contract-validator + graphql-tester |
| Progress | `.forge/progress.jsonl` | AG-UI real-time streaming |
All AQE features are additive. Forge works identically without AQE installed.
---
## References
- [Continuous Behavioral Verification: Ongoing Path to Done](https://www.linkedin.com/pulse/continuous-behavioral-verification-ongoing-path-done-ikenna-okpala) — Ikenna Okpala
- [Build with Quality Skill: How I Build Software 10x Faster](https://www.linkedin.com/pulse/build-quality-skill-how-i-build-software-10x-faster-mondweep-chakravorty) — Mondweep Chakravorty
- [claude-code-v3-qe-skill](https://github.com/mondweep/vibe-cast) — V3 QE Skill
- [agentic-qe](https://github.com/proffesor-for-testing/agentic-qe) — Agentic QE Framework
- Advanced Topological Governance in Autonomous Software Engineering — Formal mathematical foundations (sheaf theory, spectral analysis, Galois fields) for v1.2.0 specifications
---
## License
MIT