{"id":51070871,"url":"https://github.com/vimalyad/accura","last_synced_at":"2026-06-23T10:32:56.031Z","repository":{"id":366645851,"uuid":"1277207460","full_name":"vimalyad/accura","owner":"vimalyad","description":"An accuracy-first browser agent. TypeScript, Playwright, model-agnostic — develop on free models, run on Claude.","archived":false,"fork":false,"pushed_at":"2026-06-22T18:18:22.000Z","size":242,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-22T19:22:08.108Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vimalyad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-22T17:22:38.000Z","updated_at":"2026-06-22T18:18:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vimalyad/accura","commit_stats":null,"previous_names":["vimalyad/accura"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/vimalyad/accura","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Faccura","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Faccura/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Faccura/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Faccura/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vimalyad","download_url":"https://codeload.github.com/vimalyad/accura/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vimalyad%2Faccura/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34686725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-23T10:32:55.399Z","updated_at":"2026-06-23T10:32:56.026Z","avatar_url":"https://github.com/vimalyad.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/accura-icon.svg\" width=\"96\" height=\"96\" alt=\"Accura logo\"\u003e\n\u003c/p\u003e\n\n# Accura\n\nAn accuracy-first browser agent. TypeScript, Playwright — runs on Claude\n(Anthropic) via external API calls.\n\nAccura optimizes one metric: **task success rate**. Latency is explicitly not\na constraint, so the architecture spends time wherever it buys correctness:\nit re-observes after every action, verifies every step, samples multiple\ncandidates at uncertain decisions, simulates irreversible actions before\nrunning them, and refuses to declare success it cannot prove.\n\n## Quickstart\n\n```sh\npnpm install\npnpm --filter @accura/browser exec playwright install chromium\npnpm build\n\n# run a task on Claude (needs ANTHROPIC_API_KEY)\nnode apps/cli/dist/main.js \"Find the price of the Super Widget\" --url https://example.com --profile final\n\n# run the eval suite\nnode apps/cli/dist/main.js eval packages/evals/suites/fixtures.json --profile final --seeds 3\n```\n\nModel keys come from your shell, or a local `.env` loaded with\n`node --env-file=.env …`. The shipped profile is `configs/final.json` — **Claude\nvia external API calls, no local model hosting**: a Sonnet 4.6 executor (adaptive\nthinking), an Opus 4.8 planner and judge, and a Sonnet 4.6 extractor. Needs\n`ANTHROPIC_API_KEY`.\n\n\u003e The full self-hosted platform and local-model profiles live on the\n\u003e [`self-hosted`](https://github.com/vimalyad/accura/tree/self-hosted) branch.\n\n## Architecture\n\nDesign rationale and the research behind every decision:\n[ARCHITECTURE.md](./ARCHITECTURE.md).\n\n### System overview\n\n```mermaid\nflowchart TD\n    CLI[\"apps/cli\u003cbr/\u003eaccura run · accura eval\"]\n\n    subgraph orchestration[\"Orchestration\"]\n        AGENT[\"@accura/agent\u003cbr/\u003eloop · planner · arbiter ·\u003cbr/\u003esimulation gate · recovery · traces\"]\n        EVALS[\"@accura/evals\u003cbr/\u003efixture server · multi-seed runner ·\u003cbr/\u003ebootstrap CIs · judge agreement\"]\n    end\n\n    subgraph capabilities[\"Capabilities\"]\n        PERCEPTION[\"@accura/perception\u003cbr/\u003eDOM walker · stable element ids ·\u003cbr/\u003enew-element diff · observer\"]\n        ACTIONS[\"@accura/actions\u003cbr/\u003ezod registry · 17 core actions ·\u003cbr/\u003ebatching with stale-DOM guards\"]\n        VERIFY[\"@accura/verify\u003cbr/\u003estate diff · grounding check ·\u003cbr/\u003etrajectory judge\"]\n        MEMORY[\"@accura/memory\u003cbr/\u003eskill store · induction ·\u003cbr/\u003edeterministic replay\"]\n        LLM[\"@accura/llm\u003cbr/\u003eanthropic + openai-compatible ·\u003cbr/\u003estructured output · model router\"]\n    end\n\n    subgraph foundation[\"Foundation\"]\n        BROWSER[\"@accura/browser\u003cbr/\u003eplaywright session · stability gate ·\u003cbr/\u003escreenshots · watchdogs · CDP hatch\"]\n        SHARED[\"@accura/shared\u003cbr/\u003eResult · errors · logger · profiles\"]\n    end\n\n    CLI --\u003e AGENT\n    CLI --\u003e EVALS\n    EVALS --\u003e AGENT\n    AGENT --\u003e PERCEPTION\n    AGENT --\u003e ACTIONS\n    AGENT --\u003e VERIFY\n    AGENT --\u003e MEMORY\n    AGENT --\u003e LLM\n    PERCEPTION --\u003e BROWSER\n    ACTIONS --\u003e BROWSER\n    ACTIONS --\u003e PERCEPTION\n    VERIFY --\u003e LLM\n    MEMORY --\u003e ACTIONS\n    BROWSER --\u003e SHARED\n    LLM --\u003e SHARED\n```\n\n### One agent step, end to end\n\n```mermaid\nflowchart TD\n    START([task]) --\u003e SETUP[\"judge derives key points\u003cbr/\u003eplanner creates checklist\u003cbr/\u003ememory: matching skills injected,\u003cbr/\u003ebest skill replayed deterministically\"]\n    SETUP --\u003e GATE\n\n    subgraph step[\"every step\"]\n        GATE[\"stability gate:\u003cbr/\u003edomcontentloaded → network quiet →\u003cbr/\u003etwo zero-mutation windows\"]\n        GATE --\u003e OBSERVE[\"perceive: enumerated elements\u003cbr/\u003e[id]\u0026lt;tag\u0026gt; with *new-element marks,\u003cbr/\u003epage text, scroll hints, warnings\"]\n        OBSERVE --\u003e NOTES[\"verifier notes: what changed ·\u003cbr/\u003econtradiction check ·\u003cbr/\u003eFORBIDDEN / STUCK advice\"]\n        NOTES --\u003e EXEC[\"executor → structured output\u003cbr/\u003e{eval, memory, goal, actions 1..3}\"]\n        EXEC --\u003e FLAGGED{flagged\u003cbr/\u003edecision?}\n        FLAGGED -- \"uncertain / contradiction\" --\u003e BON[\"sample 3 candidates →\u003cbr/\u003ededup → arbiter picks\"]\n        FLAGGED -- no --\u003e IRREV\n        BON --\u003e IRREV{irreversible\u003cbr/\u003eaction?}\n        IRREV -- yes --\u003e SIM[\"simulate outcome\"]\n        SIM -- mismatch --\u003e BLOCK[\"block action +\u003cbr/\u003eforce replan\"]\n        SIM -- ok --\u003e RUN\n        IRREV -- no --\u003e RUN[\"execute batch:\u003cbr/\u003eids → live elements ·\u003cbr/\u003estale-DOM guards ·\u003cbr/\u003erecovery hard-blocks repeats\"]\n        BLOCK --\u003e RECORD[\"record step + trace\"]\n        RUN --\u003e RECORD\n    end\n\n    RECORD --\u003e DONE{done\u003cbr/\u003edeclared?}\n    DONE -- no --\u003e GATE\n    DONE -- \"success=false\" --\u003e HONEST([honest failure returned])\n    DONE -- \"success=true\" --\u003e GROUND{grounding:\u003cbr/\u003eevery claimed value\u003cbr/\u003eexists in observations?}\n    GROUND -- no --\u003e REJECT[\"rejection reason injected\u003cbr/\u003e(max 2, then honest failure)\"]\n    REJECT --\u003e GATE\n    GROUND -- yes --\u003e JUDGE{trajectory judge:\u003cbr/\u003eall key points\u003cbr/\u003edemonstrably met?}\n    JUDGE -- no --\u003e REJECT\n    JUDGE -- yes --\u003e WIN([success])\n    WIN --\u003e INDUCE[\"skill induced → memory →\u003cbr/\u003enext run replays it\"]\n```\n\n### Model roles per profile\n\n```mermaid\nflowchart LR\n    subgraph roles[\"Agent roles\"]\n        E[executor]\n        P[planner]\n        J[judge / arbiter]\n        X[extractor]\n        S[skill-inductor]\n    end\n\n    subgraph final[\"configs/final.json — Claude\"]\n        SONNET[\"Sonnet 4.6\u003cbr/\u003eadaptive thinking · effort high ·\u003cbr/\u003eclickAt enabled\"]\n        OPUS[\"Opus 4.8\"]\n    end\n\n    E ==\u003e SONNET\n    X ==\u003e SONNET\n    S ==\u003e SONNET\n    P ==\u003e OPUS\n    J ==\u003e OPUS\n```\n\nCapability flags degrade gracefully: a non-vision executor gets DOM-only\nobservations; only coordinate-grounded models (Claude) get the `clickAt`\nfallback action.\n\n### The five accuracy mechanisms\n\n1. **Clean enumerated action space** (`perception`) — the model picks from\n   stable indexed element ids and never invents selectors. The single\n   highest-leverage change in the published evidence (AgentOccam, +26.6 pts).\n2. **Verification everywhere** (`verify`) — a deterministic state diff after\n   every step, a \"your actions succeeded but nothing changed\" contradiction\n   check, and a two-layer `done` gate: code-level grounding of claimed values,\n   then a skeptical key-point judge. Attacks the #1 measured failure mode:\n   confident false success.\n3. **Hard recovery rules** (`agent`) — an identical action that failed twice\n   is blocked in code, not just prompted away; stuck-detection forces a\n   strategy change.\n4. **Test-time spending** (`agent`) — best-of-3 with an arbiter at flagged\n   decisions only; outcome simulation before irreversible actions. Latency is\n   the currency, accuracy the purchase.\n5. **Compounding memory** (`memory`) — verified successes are distilled into\n   text-grounded recipes; later runs replay them deterministically and fall\n   back to the live executor at the first mismatch (AWM/SkillWeaver, +31–51%\n   relative).\n\nEverything is measured by `evals` (multi-seed runs, bootstrap 95% CIs,\njudge-agreement tracking) — no accuracy claim without numbers.\n\n## Packages\n\n| Package | What it does |\n|---|---|\n| `@accura/shared` | Result type, errors, logging, zod-validated model profiles |\n| `@accura/llm` | Provider-agnostic ChatModel (Anthropic SDK + any OpenAI-compatible endpoint), structured output with repair reprompts, role-based model router |\n| `@accura/browser` | Playwright session: stability gate, exact-dimension screenshots, popup/dialog/download/crash watchdogs, CDP escape hatch |\n| `@accura/perception` | In-page walker → enumerated interactive elements with stable ids, new-element diffing, id→element resolution |\n| `@accura/actions` | Zod-validated action registry, 17 core actions (incl. `doubleClick`), multi-action batching with stale-DOM guards |\n| `@accura/verify` | State-diff step verifier, deterministic data-grounding check, skeptical trajectory judge |\n| `@accura/agent` | The loop: planner, best-of-N arbiter, simulation gate, recovery policy, done gating, JSONL traces |\n| `@accura/memory` | Cross-run skills: induction from verified successes, deterministic replay with live fallback, scoring/retirement |\n| `@accura/evals` | Task suites, multi-seed runner, bootstrap CIs, judge-agreement harness, failure clustering |\n| `apps/cli` | `accura \"\u003ctask\u003e\"` and `accura eval \u003csuite\u003e` |\n\n## Status\n\nThe agent and CLI are implemented and tested — unit tests plus\nbrowser-integration tests against real Chromium and full-pipeline end-to-end\nruns. Verified end-to-end on the `final` (Claude) profile.\n\n## Development\n\n```sh\npnpm build      # turbo build across the workspace\npnpm test       # unit + browser integration tests\npnpm lint       # eslint\npnpm typecheck  # tsc --noEmit\n```\n\nOne branch per phase, merged to `main` after its exit criteria pass; see git\nhistory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvimalyad%2Faccura","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvimalyad%2Faccura","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvimalyad%2Faccura/lists"}