{"id":49339233,"url":"https://github.com/allthingssecurity/agentictesting","last_synced_at":"2026-04-27T03:01:43.865Z","repository":{"id":351292750,"uuid":"1210359376","full_name":"allthingssecurity/agentictesting","owner":"allthingssecurity","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-14T11:09:46.000Z","size":105,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-14T13:10:11.356Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/allthingssecurity.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-14T10:39:18.000Z","updated_at":"2026-04-14T11:09:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/allthingssecurity/agentictesting","commit_stats":null,"previous_names":["allthingssecurity/agentictesting"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/allthingssecurity/agentictesting","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allthingssecurity%2Fagentictesting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allthingssecurity%2Fagentictesting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allthingssecurity%2Fagentictesting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allthingssecurity%2Fagentictesting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/allthingssecurity","download_url":"https://codeload.github.com/allthingssecurity/agentictesting/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allthingssecurity%2Fagentictesting/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32320683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-27T03:01:41.282Z","updated_at":"2026-04-27T03:01:43.851Z","avatar_url":"https://github.com/allthingssecurity.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TestForge v0.5 — Polyglot Agentic Testing Framework\n\nA self-evolving, language-aware testing harness powered by LangGraph and GPT-5. TestForge dynamically discovers testing tools for any language, orchestrates them through specialized agents, and evolves its own configuration via Meta-Harness optimization.\n\n## What Testing Does This Do?\n\nTestForge covers **6 categories of testing** across any language — all orchestrated by LLM agents that decide what to run, how to interpret results, and what to do about failures.\n\n### 1. Unit Testing\nRuns the language-native test runner against your test suite. Catches logic bugs, regressions, and assertion failures.\n\n| Language | Tool | What It Catches |\n|----------|------|-----------------|\n| Python | pytest | Function-level failures, exceptions, assertion errors |\n| Rust | cargo test | Panics, assertion failures, `#[should_panic]` violations |\n| JS/TS | jest / vitest | Expect mismatches, thrown errors, async failures |\n| Go | go test | Table-driven test failures, benchmark regressions |\n| Java | JUnit 5 (Maven/Gradle) | Assertion errors, exception tests |\n| Ruby | RSpec | Example failures, matcher mismatches |\n| C# | dotnet test | xUnit/NUnit assertion failures |\n\n### 2. Static Analysis (SAST)\nScans source code without executing it. Finds security vulnerabilities, code smells, and bug patterns.\n\n| Tool | Languages | What It Catches |\n|------|-----------|-----------------|\n| **Semgrep** | 30+ languages | SQL injection, XSS, hardcoded secrets, insecure patterns |\n| **Bandit** | Python | Security anti-patterns (eval, exec, shell injection) |\n| **Cargo Clippy** | Rust | Idiomatic issues, potential bugs, performance anti-patterns |\n| **ESLint** | JS/TS | Code quality, security rules, unused variables |\n| **golangci-lint** | Go | Lint aggregator (gosec, staticcheck, errcheck, etc.) |\n| **RuboCop** | Ruby | Style violations, security cops |\n| **SpotBugs** | Java | Null pointer, resource leaks, correctness bugs |\n\n### 3. Dynamic Analysis (DAST)\nTests running applications by sending requests and probing for vulnerabilities.\n\n| Tool | What It Does |\n|------|-------------|\n| **Nuclei** | Template-based vulnerability scanning against live URLs |\n| **Cargo Miri** | Detects undefined behavior in Rust (memory safety, aliasing violations) |\n| **Valgrind** | Memory leaks, invalid reads/writes in C/C++ |\n\n### 4. API Testing \u0026 Fuzzing\nExercises HTTP/REST/GraphQL APIs with generated inputs to find contract violations and crashes.\n\n| Tool | What It Does |\n|------|-------------|\n| **Schemathesis** | Property-based fuzzing from OpenAPI/GraphQL schemas |\n| **Pact** | Consumer-driven contract testing |\n\n### 5. Dependency \u0026 Supply Chain Auditing\nChecks your dependencies for known vulnerabilities.\n\n| Tool | Language | What It Catches |\n|------|----------|-----------------|\n| **npm audit** | JS/TS | CVEs in node_modules |\n| **cargo audit** | Rust | RustSec advisory database |\n| **pip-audit / safety** | Python | PyPI vulnerability database |\n| **bundler-audit** | Ruby | Gem vulnerabilities |\n| **govulncheck** | Go | Go vulnerability database |\n| **OWASP dep-check** | Java | NVD vulnerability database |\n\n### 6. End-to-End / Browser Testing\nDrives a real browser to test user flows.\n\n| Tool | What It Does |\n|------|-------------|\n| **Playwright** | Cross-browser E2E testing (Chromium, Firefox, WebKit) |\n| **Cypress** | Component + E2E testing for web apps |\n\n### What the Agents Add on Top\n\nThe testing tools above run as subprocesses. The LLM agents provide the **intelligence layer**:\n\n| Agent | What It Does |\n|-------|-------------|\n| **ToolScout** | Dynamically discovers which tools are available — reads Cargo.toml, package.json, go.mod and probes PATH. No hardcoding needed. |\n| **Planner** | Analyzes the project and decides which test categories to run and in what order |\n| **Executor** | Invokes tools with the right arguments, interprets output |\n| **Healer** | Reads failing tests, understands the bug, patches test files, re-runs to verify |\n| **Triage** | Classifies each failure: severity (critical/high/medium/low), category (regression, security, flaky), recommendation |\n| **Meta-Harness** | Evolves the harness configuration itself by analyzing prior execution traces (arXiv:2603.28052) |\n\n## What Makes This Different\n\n| Feature | Traditional CI | TestForge |\n|---------|---------------|-----------|\n| Tool discovery | Hardcoded per project | **GPT-5 ToolScout** reads ecosystem files, probes PATH, generates tool specs at runtime |\n| Language support | Configure per language | **Auto-detects** languages from package files + file extensions |\n| Test repair | Manual fix | **Healer agent** reads failures, patches tests, re-runs to verify |\n| Finding triage | Manual review | **Triage agent** classifies severity, category, recommendations |\n| Harness optimization | Manual tuning | **Meta-Harness** (arXiv:2603.28052) evolves configs from execution traces |\n| Report format | Pick one | JSON + HTML + JUnit XML simultaneously |\n\n## Quick Start\n\n```bash\n# Install\npip install -e .\n\n# See available tools for your project\ntestforge tools list -r /path/to/your/project\n\n# Dry-run: detect languages and plan\ntestforge plan -r /path/to/your/project\n\n# Full pipeline\nexport OPENAI_API_KEY=sk-...\ntestforge run -m testforge.yaml -r /path/to/your/project\n\n# Meta-Harness evolution (optimize the harness itself)\ntestforge evolve -r /path/to/your/project --iterations 3\n```\n\n## Architecture\n\nSee [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full system design.\n\n```\nSTART\n  → detect_languages       (pure fn: scans project)\n  → discover_tools          (pure fn: registry lookup)\n  → tool_scout              (GPT-5: dynamic tool discovery)\n  → planner                 (GPT-5: language-aware test plan)\n  → compact_memory_pre      (pure fn: summarize context)\n  → [parallel executors]    (one per detected language)\n  → compact_memory_post     (pure fn: merge results)\n  → healer                  (GPT-5: fix failing tests)\n  → triage                  (GPT-5: classify findings)\n  → reporter                (pure fn: JSON + HTML + JUnit XML)\n  → meta_evaluate           (pure fn: score tool effectiveness)\n  → END\n```\n\n## Supported Languages \u0026 Tools\n\nTestForge has **12 built-in adapters** and discovers additional tools dynamically:\n\n| Language | Built-in Adapters | Dynamically Discovered (examples) |\n|----------|------------------|----------------------------------|\n| Python | pytest, semgrep | ruff, bandit, mypy, pyright |\n| JavaScript/TypeScript | jest, vitest, playwright | eslint, npm-audit |\n| Rust | cargo-test | cargo clippy, cargo audit, cargo miri |\n| Go | go-test | golangci-lint, gosec, govulncheck |\n| Java/Kotlin | junit5 | spotbugs, checkstyle, gradle test |\n| Ruby | rspec | rubocop, brakeman |\n| C# | dotnet-test | dotnet format |\n| C/C++ | — | ctest, cppcheck, clang-tidy |\n\n## Documentation\n\n- **[SETUP.md](docs/SETUP.md) — How to run TestForge against your project** (start here)\n- [DESIGN.md](docs/DESIGN.md) — Why harnesses matter, theoretical foundations\n- [ARCHITECTURE.md](docs/ARCHITECTURE.md) — System architecture and component design\n- [WORKING.md](docs/WORKING.md) — How the pipeline works step-by-step\n- [HARNESS_GUIDE.md](docs/HARNESS_GUIDE.md) — Designing harnesses for testing scenarios\n- [TOOLS_REFERENCE.md](docs/TOOLS_REFERENCE.md) — Tool adapters per runtime/language\n- [META_HARNESS.md](docs/META_HARNESS.md) — Meta-Harness evolution (arXiv:2603.28052)\n\n## Example Results\n\nWe ran TestForge against three example projects. Each demonstrates different testing categories depending on what the ToolScout discovered and what was available on the machine.\n\n### Python Project (`example_project/`) — FastAPI bookstore\n\n| Category | Tool | Status |\n|----------|------|--------|\n| Unit Testing | pytest | 14 passed, 3 failed (division-by-zero, off-by-one bugs) |\n| SAST | semgrep | 0 findings |\n| DAST | — | Not run (no live server spun up) |\n| API Fuzz | — | Not run (schemathesis not installed) |\n| Dependency Audit | — | Not run |\n| E2E / Browser | — | Not run |\n\n**Coverage: 2 of 6 categories.** The app has FastAPI endpoints that *could* be tested with Schemathesis (API fuzz) and Nuclei (DAST) if a server were running and tools installed.\n\nHealer patched failing tests: changed `test_zero_quantity` to expect `ZeroDivisionError`, updated pagination assertions to match the off-by-one behavior.\n\n### Rust Project (`example_rust/`) — math library\n\n| Category | Tool | Status |\n|----------|------|--------|\n| Unit Testing | cargo test | 8 passed, 4 failed (email validation, clamp logic bugs) |\n| SAST | semgrep | 0 findings |\n| SAST/Lint | cargo clippy (dynamic) | 0 warnings |\n| DAST | cargo miri (dynamic) | **Critical: undefined behavior detected** |\n| API Fuzz | — | N/A (no HTTP API) |\n| Dependency Audit | — | Not run (cargo-audit not installed) |\n| E2E / Browser | — | N/A (no UI) |\n\n**Coverage: 3 of 6 categories.** ToolScout dynamically discovered cargo clippy and cargo miri — no adapter code was written for either. Miri found UB that no other tool caught.\n\nHealer patched `lib.rs`: added `#[ignore = \"Known bug\"]` annotations to 3 failing tests with explanations.\n\n### TypeScript Project (`example_ts/`) — task manager library\n\n| Category | Tool | Status |\n|----------|------|--------|\n| Unit Testing | vitest | 6 passed, 4 failed (filterByPriority, completionPercentage, searchTasks bugs) |\n| SAST | semgrep | 0 findings |\n| DAST | — | Not run |\n| API Fuzz | — | N/A (no HTTP API) |\n| Dependency Audit | npm audit (dynamic) | **0 vulnerabilities** |\n| E2E / Browser | playwright (dynamic) | Not run (no browser tests configured) |\n\n**Coverage: 3 of 6 categories.** ToolScout dynamically discovered npm-audit and playwright. npm-audit confirmed clean dependencies. Playwright was detected but no E2E test files existed.\n\nTriage classified all 4 failures as high-severity regressions with specific fix recommendations.\n\n### What's Not Covered Yet\n\nTo exercise all 6 categories in one run, you'd need:\n- A running HTTP server for DAST (Nuclei) and API fuzzing (Schemathesis)\n- Playwright test files for E2E\n- Security tools installed (`cargo-audit`, `bandit`, etc.)\n\nThe framework supports all 6 — the examples just don't have all the prerequisites installed.\n\n## Generated Reports\n\nEach example project has its reports in `artifacts/`:\n- `report.html` — dark-themed dashboard ([Rust](example_rust/artifacts/report.html), [TypeScript](example_ts/artifacts/report.html), [Python](example_project/artifacts/report.html))\n- `report.json` — machine-readable structured data\n- `junit.xml` — CI-compatible (Jenkins, GitHub Actions, GitLab)\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallthingssecurity%2Fagentictesting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fallthingssecurity%2Fagentictesting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallthingssecurity%2Fagentictesting/lists"}