{"id":51125325,"url":"https://github.com/srbarrios/agentic-test-explorer","last_synced_at":"2026-06-25T07:01:26.622Z","repository":{"id":356985173,"uuid":"1234501960","full_name":"srbarrios/agentic-test-explorer","owner":"srbarrios","description":"An agnostic AI-driven exploratory test framework that intelligently explores, tests, and validates any application","archived":false,"fork":false,"pushed_at":"2026-05-17T18:53:27.000Z","size":3323,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-17T20:48:03.210Z","etag":null,"topics":["agentic","ai-agents","ai-testing","autonomous-testing","browser-automation","exploratory-testing","langchain","langgraph","mcp","playwright","python","qa-automation","test-automation"],"latest_commit_sha":null,"homepage":"https://oscarbarrios.tech/agentic-test-explorer/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/srbarrios.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-10T09:10:33.000Z","updated_at":"2026-05-17T18:53:29.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/srbarrios/agentic-test-explorer","commit_stats":null,"previous_names":["srbarrios/agentic-test-explorer"],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/srbarrios/agentic-test-explorer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srbarrios%2Fagentic-test-explorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srbarrios%2Fagentic-test-explorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srbarrios%2Fagentic-test-explorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srbarrios%2Fagentic-test-explorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/srbarrios","download_url":"https://codeload.github.com/srbarrios/agentic-test-explorer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srbarrios%2Fagentic-test-explorer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34763482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-25T02:00:05.521Z","response_time":101,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic","ai-agents","ai-testing","autonomous-testing","browser-automation","exploratory-testing","langchain","langgraph","mcp","playwright","python","qa-automation","test-automation"],"created_at":"2026-06-25T07:01:25.544Z","updated_at":"2026-06-25T07:01:26.612Z","avatar_url":"https://github.com/srbarrios.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"logo.png\" alt=\"Agentic Explorer Logo\" width=\"500\" align=\"center\" /\u003e\n\nA product-agnostic, AI-driven exploratory test framework that intelligently\nexplores, tests, and validates **any** web application. Configure it for your stack via a\nsmall `config.yaml`, point it at your app, and let specialized agents drive a real browser\nto find bugs, render anomalies, and unscripted edge cases.\n\nPowered by a **LangGraph Swarm** architecture, **Playwright**, and your choice of\n**Claude** (default) or **Google Gemini**, this framework dynamically routes tasks to\nbehavioral QA personas and advanced stress/exploration agents, self-heals from UI errors,\noptionally consults user-provided MCP servers and Agent Skills for domain knowledge,\ngenerates reproducible Playwright test scripts from every bug found, and writes Markdown\nexecutive test reports.\n\nIt can also **analyze GitHub Pull Requests** — pass a PR URL and the framework extracts the\ncode diff, feeds it to an LLM, and auto-generates targeted test missions covering the UI\nareas most likely impacted by the changes.\n\n---\n\n## 🎬 Demo\n\nhttps://github.com/user-attachments/assets/9a17d846-5bee-4055-a97a-f9dd7ee191c2\n\n---\n\n## 🏗️ Architecture\n\nThe framework is built on a **Supervisor-Worker Swarm** pattern. Based on the mission type\n(determined by the `thread_id` keyword), the system spins up either a **Standard** or\n**Advanced** routing graph.\n\n```mermaid\ngraph TD\n    classDef user fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff;\n    classDef core fill:#3b82f6,stroke:#2563eb,stroke-width:2px,color:#fff;\n    classDef supervisor fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff;\n    classDef agent fill:#10b981,stroke:#059669,stroke-width:2px,color:#fff;\n    classDef db fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff;\n    classDef tool fill:#ec4899,stroke:#db2777,stroke-width:2px,color:#fff;\n    classDef external fill:#475569,stroke:#334155,stroke-width:2px,color:#fff;\n\n    User([User / CI]):::user --\u003e|YAML Missions| Main(main.py):::core\n    User --\u003e|GitHub PR URL| PR(pr_analyzer.py):::core\n    PR --\u003e|MCP or gh CLI| GH[GitHub API]:::external\n    PR --\u003e|Generated Missions| Main\n\n    Main --\u003e|Standard Missions| S_Supervisor{QA Supervisor}:::supervisor\n    Main --\u003e|Advanced Missions| A_Supervisor{Adv. Supervisor}:::supervisor\n    Main --\u003e|Checkpoints + Store| DB[(SQLite Memory)]:::db\n\n    subgraph SQA [Standard QA Swarm]\n        S_Supervisor \u003c--\u003e|Routes \u0026 Returns| S_New([New User Agent]):::agent\n        S_Supervisor \u003c--\u003e S_Power([Power User Agent]):::agent\n        S_Supervisor \u003c--\u003e S_Adv([Adversarial User Agent]):::agent\n    end\n\n    subgraph ATS [Advanced Testing Swarm]\n        A_Supervisor \u003c--\u003e|Routes \u0026 Returns| A_Acc([Accessibility User Agent]):::agent\n        A_Supervisor \u003c--\u003e A_Data([Data Heavy User Agent]):::agent\n        A_Supervisor \u003c--\u003e A_Imp([Impatient User Agent]):::agent\n        A_Supervisor \u003c--\u003e A_Ret([Returning User Agent]):::agent\n        A_Supervisor \u003c--\u003e A_Explorer([Explorer Agent]):::agent\n    end\n\n    SQA --\u003e Tools[[Tools \u0026 APIs]]:::tool\n    ATS --\u003e Tools\n\n    subgraph Integrations [External Integrations]\n        Tools --\u003e|JSON Intents / Action Tape| Engine[Browser Engine]:::external\n        Engine --\u003e|Playwright| PW[Chromium]:::external\n        Tools --\u003e|Optional Docs/Knowledge| MCP[User-configured MCP Servers]:::external\n        Tools --\u003e|Optional Skills| Skills[User-installed Agent Skills]:::external\n        Tools --\u003e|UI under test| WebApp[Your Web Application]:::external\n    end\n\n    style SQA fill:#f0fdf4,stroke:#22c55e,stroke-width:2px,stroke-dasharray: 5 5,color:#166534\n    style ATS fill:#fffbeb,stroke:#f59e0b,stroke-width:2px,stroke-dasharray: 5 5,color:#b45309\n    style Integrations fill:#f8fafc,stroke:#64748b,stroke-width:2px,stroke-dasharray: 5 5,color:#0f172a\n```\n\n### Architecture Details\n\n1. **Mission Dispatcher (`main.py`)**: Loads `missions/*.yaml` files and provisions the\n   correct graph network based on `thread_id` naming conventions\n   (`accessibility`, `data_heavy`, `impatient`, `returning`, `explorer`, `chaos`, or\n   `autonomous` route to the advanced graph; everything else to the standard 3-persona\n   swarm). Can also accept a `--pr-url` to auto-generate missions from a GitHub Pull\n   Request via `pr_analyzer.py`.\n2. **Supervisor-Worker Flow**: A Supervisor node dynamically evaluates the workspace state\n   and dispatches control to specialized worker nodes.\n3. **Record-and-Translate Browser Engine** (`src/agentic_explorer/tools/browser/engine.py`):\n   Agents are the *brain*—they never touch the browser directly. Instead they emit strict\n   JSON intents to `execute_browser_command`. The engine:\n   - Validates selectors against a resilience policy (rejects XPath / positional CSS at\n     runtime).\n   - Executes the command with Playwright and captures an Accessibility Tree / DOM\n     snapshot.\n   - Appends every command to an immutable **Action Tape**\n     (`report_\u003cthread_id\u003e/action_tape.jsonl`).\n   - On bug detection, `generate_reproduction_spec` translates the tape into a runnable\n     `reproduction_*.spec.ts` Playwright test.\n4. **Tool Modality**: Agents receive (1) the deterministic browser engine, (2) screenshot\n   capture and reproduction-generation tools, (3) any **MCP servers you configure** in\n   `mcp_servers.json`, and (4) any **Agent Skills** installed under `AGENT_SKILLS_ROOT`.\n   The framework ships zero hardcoded MCP servers or skills — bring your own.\n5. **State \u0026 Memory (`agent_memory.sqlite`)**: An asynchronous SQLite checkpointer\n   remembers agent states (including the `action_tape` field), allowing a reused\n   `thread_id` to resume precisely where it left off. A companion **LangGraph Store**\n   (optionally configured with an embedding index for semantic search) provides four\n   levels of cross-session memory, with LLM-driven operations powered by **Langmem**:\n   - **Semantic** — page knowledge, selector reliability, application quirks, plus\n     Langmem-managed agent observations (via `record_observation` tool)\n   - **Episodic** — session summaries, deduplicated bug catalog\n   - **Procedural** — self-improving agent prompts and routing rules optimized via\n     Langmem's `create_prompt_optimizer`\n   - **Prioritization** — risk-scored page ranking injected into supervisor routing\n\n   Agents can query past findings at runtime via the `recall_past_findings` tool,\n   which uses semantic search when an embedding index is configured and falls back to\n   keyword matching otherwise. Agents can also proactively record observations via the\n   `record_observation` tool (powered by Langmem's `create_manage_memory_tool`).\n   The supervisor receives a `MEMORY_CONTEXT` section with known pages, bugs, quirks,\n   agent observations, and high-risk areas on every routing cycle.\n\n### Source Layout\n\n- `src/agentic_explorer/main.py` — CLI entry, swarm graph compiler, transient-error retry\n- `src/agentic_explorer/pr_analyzer.py` — PR-driven test scenario generation (GitHub MCP\n  server preferred, `gh` CLI fallback)\n- `src/agentic_explorer/auth_setup.py` — generic login flow that saves `auth.json`\n- `src/agentic_explorer/config.py` — `config.yaml` loader (with `${ENV}` interpolation)\n- `src/agentic_explorer/utils/llm.py` — `make_llm()` multi-provider factory; supports\n  Claude (API key / Vertex AI) and Gemini (API key / OAuth) with auto-detection\n- `src/agentic_explorer/utils/llm_json.py` — YAML/JSON extraction helpers for LLM responses\n- `src/agentic_explorer/orchestration/graph_base.py` — shared graph infrastructure\n  (`AgentState`, node factories, tool filtering)\n- `src/agentic_explorer/orchestration/standard_graph.py` — 3 standard QA personas\n- `src/agentic_explorer/orchestration/advanced_graph.py` — 4 advanced personas plus autonomous explorer\n- `src/agentic_explorer/memory.py` — cross-session memory: semantic (pages, selectors,\n  quirks, Langmem-managed agent observations), episodic (session summaries, bug catalog),\n  procedural (self-improving prompts via Langmem prompt optimizer), semantic-search recall\n  tool, proactive observation tool, regression mission generation, app model export, test\n  prioritization\n- `src/agentic_explorer/tools/browser/engine.py` — Record-and-Translate browser engine\n- `src/agentic_explorer/tools/common/custom_tools.py` — screenshot, MCP loader,\n  Skills tools\n- `src/agentic_explorer/ui/state_emitter.py` — non-blocking state bridge for Visual Mode\n- `src/agentic_explorer/ui/swarm_diagram.py` — Mermaid diagram generator\n- `src/agentic_explorer/ui/dashboard.py` — Streamlit dashboard app\n\n---\n\n## ✨ Key Features\n\n* **Product-Agnostic**: One small `config.yaml` adapts the framework to any web app.\n* **Persona-Driven QA Agents**: Three standard QA personas plus five advanced agents —\n  each prompted around a specific testing strategy.\n* **Record-and-Translate Engine**: Agents emit JSON intents, the deterministic engine\n  executes and records every step to an immutable Action Tape. Every bug automatically\n  generates a reproducible `reproduction_*.spec.ts` Playwright script.\n* **Resilient Selector Policy (Engine-Enforced)**: `execute_browser_command` rejects\n  brittle XPath / positional selectors at runtime, enforcing\n  `data-test-subj` → `aria-label` → visible text priority.\n* **Self-Healing Browser Execution**: Playwright actions are wrapped to catch uncaught\n  exceptions. Errors are returned as natural language so agents can adapt strategies.\n* **Screenshot Evidence**: Agents capture full-page screenshots when bugs or anomalies are\n  detected, then generate reproducible Playwright specs from the Action Tape.\n* **Bring-Your-Own MCP**: Plug in any MCP servers via a standard\n  `mcp_servers.json` — agents query them for domain knowledge instead of guessing.\n* **Bring-Your-Own Skills**: Install Agent Skills (per the\n  [agentskills.io](https://agentskills.io/specification) spec) under `AGENT_SKILLS_ROOT`\n  and the framework exposes them automatically.\n* **Cross-Session Learning**: A four-level memory system (semantic, episodic, procedural,\n  prioritization) powered by **Langmem** lets agents learn across sessions. The framework\n  remembers page structures, selector reliability, application quirks, agent observations,\n  past bugs, and which testing strategies worked. Agent prompts and supervisor routing\n  rules self-improve via Langmem's prompt optimizer after each batch. Agents can\n  proactively record observations and recall past findings using semantic search.\n* **Regression Testing**: Run `--regression` to auto-generate missions from the bug\n  catalog — no YAML needed. The framework targets pages with known open bugs and\n  historically flaky areas.\n* **Application Model Export**: Run `--export-model` to export the discovered application\n  structure (pages, selectors with reliability scores, bugs, quirks, session stats) as\n  `app_model.json`.\n* **PR-Driven Test Generation**: Pass a GitHub PR URL (`--pr-url`) and the framework\n  extracts the diff (preferring the GitHub MCP server, falling back to `gh` CLI), sends\n  it to an LLM, and auto-generates targeted mission YAML covering the UI areas impacted\n  by the code changes. When historical bug data exists, it's injected into the LLM\n  prompt for better-targeted missions. Optionally execute the generated missions\n  immediately with `--execute`.\n* **Automated Artifact Generation**: Every test produces an isolated folder containing\n  raw execution traces, the Action Tape, bug screenshots, reproducible `.spec.ts` files,\n  and an executive Markdown report.\n\n---\n\n## 🛠️ Setup\n\n### 1. Dependencies\n\nPython 3.11+ is required. A virtual environment is highly recommended.\n\n```bash\n# Create and activate a virtual environment (plain venv or uv)\npython -m venv .venv\nsource .venv/bin/activate\n\n# Install the package and all dependencies (editable mode)\npip install -e .\n\n# Or, if you use uv (recommended — much faster):\nuv venv\nuv pip install -e .\n\n# Optional: Install Visual Mode (Streamlit dashboard)\npip install -e \".[visual]\"\n# Or with uv:\nuv pip install -e \".[visual]\"\n\n# Install the Playwright Chromium browser\nplaywright install chromium\n```\n\n\u003e **Keeping dependencies up to date:** After pulling new changes, always re-sync your\n\u003e virtual environment to pick up any added or updated packages:\n\u003e\n\u003e ```bash\n\u003e # pip\n\u003e pip install -e .\n\u003e\n\u003e # uv\n\u003e uv pip install -e .\n\u003e ```\n\n### 2. Environment Variables\n\nCopy `.env.example` → `.env` and fill in your values. The framework supports two LLM\nproviders — **Claude** (default) and **Gemini** — and auto-detects which to use from\navailable credentials.\n\n```env\n# --- LLM Provider (optional — auto-detected from credentials if not set) ---\n# LLM_PROVIDER=\"claude\"         # or: gemini\n\n# --- Claude authentication (default provider — choose one) ---\n\n# Option A: Direct API key\nANTHROPIC_API_KEY=\"your_anthropic_api_key_here\"\n\n# Option B: Vertex AI (zero config if you already use Claude Code)\n# The framework reads ~/.claude/settings.json automatically. If it contains\n# CLAUDE_CODE_USE_VERTEX=1 and ANTHROPIC_VERTEX_PROJECT_ID, Claude on Vertex\n# AI is used with no additional setup.\n\n# --- Gemini authentication (alternative provider — choose one) ---\n\n# Option A: API key\n# GOOGLE_API_KEY=\"your_gemini_api_key_here\"\n\n# Option B: OAuth credentials (no env var needed)\n# If GOOGLE_API_KEY is not set, the framework loads ~/.gemini/oauth_creds.json\n# produced by: gemini auth login\n\n# --- Application under test ---\nAPP_URL=\"https://your-app.example.com\"\nAPP_USERNAME=\"your_user\"\nAPP_PASSWORD=\"your_password\"\n\nAPP_CONFIG=\"./config.yaml\"\nMCP_SERVERS_CONFIG=\"./mcp_servers.json\"\n\nAGENT_SKILLS_ROOT=\"./agent-skills\"\nAGENT_SKILL_SCRIPT_TIMEOUT=\"60\"\n```\n\n**Provider auto-detection order** (when `LLM_PROVIDER` is not set):\n\n| Priority | Credential Source | Provider |\n|----------|-------------------|----------|\n| 1 | `ANTHROPIC_API_KEY` env var | Claude (direct API) |\n| 2 | `~/.claude/settings.json` with `CLAUDE_CODE_USE_VERTEX=1` | Claude (Vertex AI) |\n| 3 | `GOOGLE_API_KEY` env var | Gemini (API key) |\n| 4 | `~/.gemini/oauth_creds.json` | Gemini (OAuth) |\n\n**Smart model defaults** — the framework picks the best model for your auth method:\n\n| Auth Method | Default Model | Rationale |\n|-------------|---------------|-----------|\n| Claude API key | `claude-haiku-4-5` | Fast, economical |\n| Claude Vertex AI | `claude-haiku-4-5` | Fast, economical |\n| Gemini API key | `gemini-3.1-flash-lite` | Fast, economical |\n| Gemini OAuth | `gemini-3.1-flash-lite` | Fast, economical |\n\nOverride models via env vars (`CLAUDE_MODEL`, `GEMINI_MODEL`) or in `config.yaml` (see below).\n\n### 3. App Configuration\n\nCopy `config.yaml.example` → `config.yaml` and customize for your application:\n\n```yaml\napp:\n  name: \"My Web Application\"\n  url: ${APP_URL}\n  description: \"Brief description used to give agent prompts domain context.\"\n\nauth:\n  method: form\n  selectors:\n    username: 'input[name=\"username\"]'\n    password: 'input[name=\"password\"]'\n    submit:   'button[type=\"submit\"]'\n  post_login_check: 'a[href=\"/home\"]'   # selector that confirms login worked\n\npaths:\n  mcp_servers: ./mcp_servers.json\n  skills_root: ./agent-skills\n\n# LLM provider (optional — auto-detected from credentials by default)\nllm:\n  # provider: claude              # or: gemini\n  # claude_model: claude-sonnet-4-6\n  # claude_vision_model: claude-haiku-4-5\n  # gemini_model: gemini-3.1-flash-lite\n  # gemini_vision_model: gemini-3.1-flash-lite\n\n  # Embedding model for semantic search in long-term memory (optional).\n  # When configured, recall_past_findings uses vector similarity instead of\n  # keyword matching.  Gemini users can use their existing API key; Claude\n  # users can run a local model via Ollama.\n  # embedding_model: google-genai:models/embedding-001   # Gemini (768d)\n  # embedding_dims: 768\n  # embedding_model: ollama:nomic-embed-text             # Ollama local (768d)\n  # embedding_dims: 768\n```\n\n### 4. (Optional) MCP Servers\n\nCopy `mcp_servers.json.example` → `mcp_servers.json` and list any MCP servers you want\nthe agents to consult. Format follows the standard Claude Desktop / Code shape:\n\n```json\n{\n  \"mcpServers\": {\n    \"github\": {\n      \"transport\": \"http\",\n      \"url\": \"https://api.githubcopilot.com/mcp/\"\n    },\n    \"my-docs\": {\n      \"transport\": \"http\",\n      \"url\": \"https://my-docs.example.com/_mcp/\"\n    }\n  }\n}\n```\n\nThe **`github` entry** is used by the PR analyzer (`--pr-url`) to fetch PR data via MCP\ntools (`get_pull_request`, `get_pull_request_diff`, `get_pull_request_files`). If not\nconfigured, the analyzer falls back to the `gh` CLI.\n\nIf the file is missing or empty, agents simply run without MCP tools.\n\n### 5. (Optional) Agent Skills\n\nInstall any Skills (per [agentskills.io](https://agentskills.io/specification)) under\nthe directory pointed at by `AGENT_SKILLS_ROOT` (default `./agent-skills/`). The framework\ndiscovers them automatically and exposes `fetch_agent_skill` and `run_agent_skill_script`\nto agents. If the directory is missing the framework just logs an info message.\n\n### 6. Authenticate\n\nGenerate a reusable `auth.json` cookie file so subsequent runs can skip the login screen:\n\n```bash\nagent-auth\n```\n\nThe auth flow uses the selectors defined in `config.yaml \u003e auth`. Adjust them to match\nyour app's login form.\n\n---\n\n## 📊 Visual Mode (Real-Time Dashboard)\n\nThe framework includes an optional **Visual Mode** — a Streamlit-based real-time dashboard that displays live browser screenshots, swarm state diagrams, thought streams, and action tapes while missions execute.\n\n### Architecture: The Spectator Pattern\n\nVisual Mode uses a **one-way \"spectator\" architecture** with zero performance overhead:\n\n```\nMain Process (LangGraph)          Streamlit Dashboard\n┌─────────────────────┐          ┌──────────────────┐\n│ Supervisor → Agent  │──JSON──▶ │ Polls every ~1s: │\n│ Playwright engine   │──JPEG──▶ │  .agent_state.json│\n│ (fire-and-forget)   │          │  .latest_vision.jpg│\n└─────────────────────┘          └──────────────────┘\n```\n\nThe main process writes state atomically to `.agent_state.json` and async screenshots to `.latest_vision.jpg`. The dashboard polls these files. No IPC, no callbacks, no blocking.\n\n### Installation\n\nVisual Mode requires Streamlit as an optional dependency:\n\n```bash\n# Install with visual mode support\npip install -e \".[visual]\"\n\n# Or with uv:\nuv pip install -e \".[visual]\"\n```\n\n### Usage\n\nAdd the `--visual` flag to any mission:\n\n```bash\n# Standard mission with visual mode\nagent-explorer --missions missions/new_user_agent.yaml --visual\n\n# Visual mode with headed browser (recommended for debugging)\nagent-explorer --missions missions/explorer_agent.yaml --headed --visual\n\n# PR-driven testing with visual mode\nagent-explorer --pr-url https://github.com/org/repo/pull/123 --execute --visual\n\n# Regression testing with visual mode\nagent-explorer --regression --headed --visual\n```\n\nThe Streamlit dashboard will automatically open in your default browser at `http://localhost:8501`. If it doesn't open automatically, navigate to that URL manually.\n\n### Dashboard Features\n\nThe dashboard provides four key views:\n\n- **Sidebar**: Mission ID, graph type (standard/advanced), LLM provider, and live metrics (steps, bugs, explored paths)\n- **Live Browser Vision**: Real-time JPEG screenshots of the Playwright viewport, updated after each browser command\n- **Swarm State Diagram**: Interactive Mermaid diagram showing the Supervisor-Worker topology with the currently active node highlighted in green\n- **Tabbed Activity Views**:\n  - **Thought Stream**: Latest LLM reasoning from the active agent\n  - **Action Tape**: Recent browser commands with execution time and status\n  - **Bugs**: Discovered bugs with detailed descriptions and bug count\n  - **Paths**: URLs visited during the mission\n\n### Performance Impact\n\n**Zero** when `--visual` is not used — all emission code short-circuits on a single boolean check. When enabled:\n- State writes: ~1ms per update (2KB JSON + atomic `os.replace`)\n- Screenshots: Fire-and-forget async tasks (JPEG quality 50, ~30-80KB)\n- Main process never waits for the dashboard\n\n---\n\n## 🚀 Usage\n\n### Defining Missions\n\nMissions live in `missions/*.yaml`. See [`missions/README.md`](missions/README.md) for the\nschema and writing guide. Eight templates ship in the repo, one for each supported agent:\n\n- [`missions/new_user_agent.yaml`](missions/new_user_agent.yaml)\n- [`missions/power_user_agent.yaml`](missions/power_user_agent.yaml)\n- [`missions/adversarial_user_agent.yaml`](missions/adversarial_user_agent.yaml)\n- [`missions/accessibility_user_agent.yaml`](missions/accessibility_user_agent.yaml)\n- [`missions/data_heavy_user_agent.yaml`](missions/data_heavy_user_agent.yaml)\n- [`missions/impatient_user_agent.yaml`](missions/impatient_user_agent.yaml)\n- [`missions/returning_user_agent.yaml`](missions/returning_user_agent.yaml)\n- [`missions/explorer_agent.yaml`](missions/explorer_agent.yaml)\n\nAll of them contain placeholders (`\u003cYOUR_APP\u003e`, `\u003cAPP_URL\u003e`, `\u003cexample_search_term\u003e`, …) — fill\nthem in for your application before running.\n\n### Running Missions from YAML\n\n```bash\n# Standard 3-persona QA swarm (uses auto-detected provider — Claude by default)\nagent-explorer --missions missions/new_user_agent.yaml\n\n# Explicitly choose a provider\nagent-explorer --missions missions/power_user_agent.yaml --provider claude\nagent-explorer --missions missions/power_user_agent.yaml --provider gemini\n\n# Advanced persona mission\nagent-explorer --missions missions/accessibility_user_agent.yaml --headed\n\n# Autonomous exploration (visible browser recommended)\nagent-explorer --missions missions/explorer_agent.yaml --headed\n\n# Clear all memory (checkpoints + learned knowledge) to restart fresh\nagent-explorer --missions missions/new_user_agent.yaml --clear-all\n\n# Clear only checkpoints (preserves learned memory: pages, bugs, procedures)\nagent-explorer --missions missions/new_user_agent.yaml --clear-checkpoints\n\n# Clear only learned memory (preserves checkpoints for resume)\nagent-explorer --missions missions/new_user_agent.yaml --clear-learned\n\n# Override the supervisor step limit (default: 30)\nagent-explorer --missions missions/new_user_agent.yaml --max-steps 50\n\n# Suppress verbose ReAct console output (traces.log still captures everything)\nagent-explorer --missions missions/new_user_agent.yaml --quiet\n```\n\n### Regression Testing \u0026 Model Export\n\n```bash\n# Auto-generate and run missions targeting known bugs (no --missions needed)\nagent-explorer --regression --headed\n\n# Combine regression with manual missions\nagent-explorer --missions missions/new_user_agent.yaml --regression\n\n# Export discovered app structure as JSON\nagent-explorer --export-model\n```\n\n### PR-Driven Test Generation\n\nGenerate targeted test scenarios from a GitHub Pull Request.\n\nThe analyzer **prefers the GitHub MCP server** when a `\"github\"` entry exists in\n`mcp_servers.json` (see setup above). If the MCP server is not configured or unreachable,\nit **falls back to the [`gh` CLI](https://cli.github.com/)** (must be installed and\nauthenticated via `gh auth login`).\n\n```bash\n# Generate missions only (writes missions/pr_123.yaml)\nagent-explorer --pr-url https://github.com/org/repo/pull/123\n\n# Generate and execute immediately\nagent-explorer --pr-url https://github.com/org/repo/pull/123 --execute --headed\n\n# Write generated missions to a custom directory\nagent-explorer --pr-url https://github.com/org/repo/pull/123 --output-dir ./pr-missions\n\n# Combine with existing missions\nagent-explorer --missions missions/new_user_agent.yaml --pr-url https://github.com/org/repo/pull/123 --execute\n```\n\nThe analyzer extracts the PR title, description, file list, and full code diff, then sends\nthem along with the app context from `config.yaml` to an LLM. The LLM maps the changes to\nthe remaining standard and advanced personas and generates 3-8 targeted missions with\nspecific, actionable prompts. Generated mission files follow the same YAML format as\nhand-written ones and can be re-run later with `--missions`.\n\n---\n\n## 📊 Test Artifacts\n\nFor every mission, the framework generates a `report_\u003cthread_id\u003e/` directory containing:\n\n1. **`traces.log`** — Full audit trail of every thought, plan, and tool invocation.\n2. **`test_report.md`** — Concise executive summary generated by the LLM (objective,\n   actions, bugs, Action Tape stats, PASS/FAIL).\n3. **`action_tape.jsonl`** — Line-delimited JSON log of every deterministic browser\n   command. The source for reproduction scripts.\n4. **`reproduction_*.spec.ts`** — Auto-generated Playwright TypeScript tests, one per bug\n   detected. Run with:\n   ```bash\n   npx playwright test report_\u003cthread_id\u003e/reproduction_*.spec.ts --headed\n   ```\n5. **`screenshots/`** — Image evidence captured on every detected bug.\n\n---\n\n## 🤖 Guide for Autonomous Agents\n\nIf you are an AI coding assistant contributing to this repository, see [`AGENTS.md`](AGENTS.md)\nfor the conventions covering agent registration, selector policy, and tool behavior.\n\n---\n\n## 📄 License\n\nThis project is licensed under the MIT License. See [`LICENSE`](LICENSE) for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrbarrios%2Fagentic-test-explorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrbarrios%2Fagentic-test-explorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrbarrios%2Fagentic-test-explorer/lists"}