{"id":45079261,"url":"https://github.com/paradigmxyz/evmbench","last_synced_at":"2026-02-24T19:01:35.609Z","repository":{"id":339254553,"uuid":"1161070751","full_name":"paradigmxyz/evmbench","owner":"paradigmxyz","description":"A benchmark and harness for finding and exploiting smart contract bugs","archived":false,"fork":false,"pushed_at":"2026-02-20T18:06:55.000Z","size":1142,"stargazers_count":269,"open_issues_count":3,"forks_count":35,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-02-21T20:37:16.157Z","etag":null,"topics":["agents","ai","audit","blockchain","blockchain-technology","eth","ethereum","evm","security","solidity","testing","ui"],"latest_commit_sha":null,"homepage":"https://paradigm.xyz/evmbench","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paradigmxyz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-18T17:44:28.000Z","updated_at":"2026-02-21T20:34:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/paradigmxyz/evmbench","commit_stats":null,"previous_names":["paradigmxyz/evmbench"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/paradigmxyz/evmbench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fevmbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fevmbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fevmbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fevmbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paradigmxyz","download_url":"https://codeload.github.com/paradigmxyz/evmbench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paradigmxyz%2Fevmbench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29749933,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-23T07:44:07.782Z","status":"ssl_error","status_checked_at":"2026-02-23T07:44:07.432Z","response_time":90,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","audit","blockchain","blockchain-technology","eth","ethereum","evm","security","solidity","testing","ui"],"created_at":"2026-02-19T14:12:15.369Z","updated_at":"2026-02-24T19:01:35.568Z","avatar_url":"https://github.com/paradigmxyz.png","language":"TypeScript","readme":"\u003cp align=\"center\"\u003e\n    \u003cpicture align=\"center\"\u003e\n        \u003cimg alt=\"evmbench cover\" src=\"assets/cover-dark.png\"\u003e\n    \u003c/picture\u003e\n\u003c/p\u003e\n\n**evmbench is a benchmark and agent harness for finding and exploiting smart contract bugs.**\n\n\u003ca href=\"#how-it-works\"\u003e\u003cb\u003e\u003cu\u003eHow it works\u003c/u\u003e\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"#security\"\u003e\u003cb\u003e\u003cu\u003eSecurity\u003c/u\u003e\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"#key-services\"\u003e\u003cb\u003e\u003cu\u003eKey services\u003c/u\u003e\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"#repo-layout\"\u003e\u003cb\u003e\u003cu\u003eRepo layout\u003c/u\u003e\u003c/b\u003e\u003c/a\u003e | \u003ca href=\"#quickstart-local-dev\"\u003e\u003cb\u003e\u003cu\u003eQuickstart (local dev)\u003c/u\u003e\u003c/b\u003e\u003c/a\u003e\n\nThis repository contains a companion interface to the `evmbench` detect evaluation ([code](https://github.com/openai/frontier-evals)).\n\nUpload contract source code, select an agent, and receive a structured vulnerability report rendered in the UI.\n\n\n## How it works\n\n### Architecture\n\n```\nFrontend (Next.js)\n    │\n    ├─ POST /v1/jobs/start ───► Backend API (FastAPI, port 1337)\n    │                               ├─► PostgreSQL (job state)\n    ├─ GET  /v1/jobs/{id}           ├─► Secrets Service (port 8081)\n    │                               └─► RabbitMQ (job queue)\n    └─ GET  /v1/jobs/history                │\n                                             ▼\n                                        Instancer (consumer)\n                                              │\n                                    ┌─────────┴──────────┐\n                                    ▼                    ▼\n                              Docker backend       K8s backend (optional)\n                                    │                    │\n                                    └────────┬───────────┘\n                                             ▼\n                                      Worker container\n                                        ├─► Secrets Service (fetch bundle)\n                                        ├─► (optional) OAI Proxy (port 8084) ──► OpenAI API\n                                        └─► Results Service (port 8083)\n```\n\n### End-to-end flow\n\n1. User uploads a zip of contract files via the frontend. The UI sends the archive, selected model key, and (optionally) an OpenAI API key to `/v1/jobs/start`.\n2. The backend creates a job record in Postgres, stores a secret bundle in the Secrets Service, and publishes a message to RabbitMQ.\n3. The Instancer consumes the job and starts a worker (Docker locally; Kubernetes backend is optional).\n4. The worker fetches its bundle from the Secrets Service, unpacks the uploaded zip to `audit/`, then runs Codex in \"detect-only\" mode:\n   - prompt: `backend/worker_runner/detect.md` (copied to `$HOME/AGENTS.md` inside the container)\n   - model map: `backend/worker_runner/model_map.json` (maps UI model keys to Codex model IDs)\n   - command wrapper: `backend/worker_runner/run_codex_detect.sh`\n5. The agent writes `submission/audit.md`. The worker validates that the output contains parseable JSON with `{\"vulnerabilities\": [...]}` and then uploads it to the Results Service.\n6. The frontend polls job status and renders the report with file navigation and annotations.\n\n## Security\n\n`evmbench` runs an LLM-driven agent against uploaded, untrusted code. Treat the worker runtime (filesystem, logs, outputs) as an untrusted environment.\n\nSee `SECURITY.md` for the full trust model and operational guidance.\n\nOpenAI credential handling:\n\n- **Direct BYOK (default)**: worker receives a plaintext OpenAI key (`OPENAI_API_KEY` / `CODEX_API_KEY`).\n- **Proxy-token mode (optional)**: worker receives an opaque token and routes requests through `oai_proxy` (plaintext key stays outside the worker).\n\nEnabling proxy-token mode:\n\n```bash\ncd backend\ncp .env.example .env\n# set BACKEND_OAI_KEY_MODE=proxy and OAI_PROXY_AES_KEY=...\ndocker compose --profile proxy up -d --build\n```\n\nOperational note: worker runtime is bounded by default; override the max audit runtime with `EVM_BENCH_CODEX_TIMEOUT_SECONDS` (default: 10800 seconds).\n\n## Key services\n\n| Service | Default port | Role |\n|---|---:|---|\n| `backend` | 1337 | Main API: job submission, status, history, auth |\n| `secretsvc` | 8081 | Stores and serves per-job secret bundles (zip + key material) |\n| `resultsvc` | 8083 | Receives worker results, validates/parses, persists to DB |\n| `oai_proxy` | 8084 | Optional OpenAI proxy for proxy-token mode |\n| `instancer` | (n/a) | RabbitMQ consumer that starts worker containers/pods |\n| `worker` | (n/a) | Executes the detect-only agent and uploads results |\n| Postgres | 5432 | Job state persistence |\n| RabbitMQ | 5672 | Job queue |\n\n## Repo layout\n\n```\n.\n├── README.md\n├── SECURITY.md\n├── LICENSE\n├── frontend/                 Next.js UI (upload zip, select model, view results)\n├── backend/\n│   ├── api/                  Main FastAPI API (jobs, auth, integration)\n│   ├── instancer/            RabbitMQ consumer; starts workers (Docker/K8s)\n│   ├── secretsvc/            Bundle storage service\n│   ├── resultsvc/            Results ingestion + persistence\n│   ├── oai_proxy/            Optional OpenAI proxy (proxy-token mode)\n│   ├── prunner/              Optional cleanup of stale workers\n│   ├── worker_runner/        Detect prompt + model map + Codex runner script\n│   ├── docker/\n│   │   ├── base/             Base image: codex, foundry, slither, node, tools\n│   │   ├── backend/          Backend services image\n│   │   └── worker/           Worker image + entrypoint\n│   └── compose.yml           Full stack (DB/MQ + services)\n└── deploy/                   Optional deployment scripts/examples\n```\n\n## Quickstart (local dev)\n\nEnsure Docker and Bun are available.\n\nBuild the base and worker images first (required before starting the stack):\n\n```bash\ncd backend\ndocker build -t evmbench/base:latest -f docker/base/Dockerfile .\ndocker build -t evmbench/worker:latest -f docker/worker/Dockerfile .\n```\n\nStart backend stack (API + dependencies):\n\n```bash\ncp .env.example .env\n# For local dev, the placeholder secrets in .env.example are sufficient.\n# For internet-exposed deployments, replace them with strong values.\ndocker compose up -d --build\n```\n\nStart frontend dev server:\n\n```bash\ncd frontend\nbun install\nbun dev\n```\n\nOpen:\n\n- `http://127.0.0.1:3000` (frontend)\n- `http://127.0.0.1:1337/v1/integration/frontend` (backend config endpoint)\n\n## Acknowledgments\nThank you to many folks on the OtterSec team for support, particularly with building the frontend: es3n1n, jktrn, TrixterTheTux, sahuang\n\n[![Apache-2.0 License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](/LICENSE)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadigmxyz%2Fevmbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fparadigmxyz%2Fevmbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparadigmxyz%2Fevmbench/lists"}