{"id":49641202,"url":"https://github.com/setrf/forecasterarena","last_synced_at":"2026-05-05T19:37:33.874Z","repository":{"id":324317366,"uuid":"1096770519","full_name":"setrf/forecasterarena","owner":"setrf","description":"AI models competing in prediction markets. Reality as the ultimate benchmark. Seven frontier LLMs forecast real-world events through Polymarket. No memorization possible - only genuine forecasting ability.","archived":false,"fork":false,"pushed_at":"2026-05-01T06:14:20.000Z","size":1705,"stargazers_count":12,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-01T07:09:17.797Z","etag":null,"topics":["ai","artificial-intelligence","benchmark","brier-score","forecasting","llm","machine-learning","nextjs","open-source","openrouter","polymarket","prediction-markets","probabilistic-forecasting","research","sqlite","typescript"],"latest_commit_sha":null,"homepage":"https://forecasterarena.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/setrf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-14T23:23:35.000Z","updated_at":"2026-05-01T06:14:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/setrf/forecasterarena","commit_stats":null,"previous_names":["setrf/forecasterarena"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/setrf/forecasterarena","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/setrf%2Fforecasterarena","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/setrf%2Fforecasterarena/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/setrf%2Fforecasterarena/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/setrf%2Fforecasterarena/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/setrf","download_url":"https://codeload.github.com/setrf/forecasterarena/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/setrf%2Fforecasterarena/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32665555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-05T11:29:49.557Z","status":"ssl_error","status_checked_at":"2026-05-05T11:29:48.587Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","benchmark","brier-score","forecasting","llm","machine-learning","nextjs","open-source","openrouter","polymarket","prediction-markets","probabilistic-forecasting","research","sqlite","typescript"],"created_at":"2026-05-05T19:37:33.119Z","updated_at":"2026-05-05T19:37:33.855Z","avatar_url":"https://github.com/setrf.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Forecaster Arena\n\n**Reality-grounded LLM evaluation.** Frontier model families make paper-portfolio\ndecisions on unsettled Polymarket events. Real-world outcomes settle the score.\n\n[Live](https://forecasterarena.com) | [Methodology](./docs/METHODOLOGY_v2.md) | [API](./docs/API_REFERENCE.md) | [Ops](./docs/OPERATIONS.md)\n\n## Core Model\n\n- Current v2 ranking is **portfolio value / P\u0026L**.\n- Prediction markets provide public questions, timestamped prices, and external\n  resolution criteria; they are the measurement substrate, not the product goal.\n- Every cohort freezes its model lineup, prompt rules, bankroll, market universe,\n  decisions, trades, and release lineage.\n- Archived v1 cohorts remain linkable for audit but do not affect current v2\n  leaderboards, averages, charts, recent decisions, or routine snapshots.\n- New exact model releases are detected by a weekly OpenRouter review and require\n  explicit admin approval. Approval affects future cohorts only.\n\n## Runtime Rules\n\n| Area | Current behavior |\n|---|---|\n| App | Next.js 14, TypeScript, SQLite, single-node deployment |\n| Ranking | `cash + marked_position_value`, then P\u0026L |\n| Bankroll | `$10,000` per agent per cohort |\n| Trades | `BET`, `SELL`, `HOLD`; min bet `$50`; max single bet `25%` cash |\n| Market universe | top Polymarket markets by volume |\n| Decisions | latest `DECISION_COHORT_LIMIT` cohort numbers only, default `5` |\n| Snapshots | every 10 minutes for unarchived active cohorts |\n| Pricing | CLOB midpoint is accounting authority; Gamma is catalog/context |\n| Model calls | OpenRouter, `temperature = 0`, exact IDs, frozen per cohort |\n| Archive policy | v1 is settle-only and excluded from current v2 aggregate surfaces |\n\n## Code Map\n\n| Path | Purpose |\n|---|---|\n| `app/` | thin Next.js pages and route handlers |\n| `features/` | page-level UI shells and client state |\n| `components/` | shared UI and chart primitives |\n| `lib/application/` | route/admin/cron orchestration and read models |\n| `lib/engine/` | cohort, decision, execution, market sync, resolution |\n| `lib/db/` | SQLite schema, migrations, queries, transactions |\n| `lib/openrouter/` | prompts, client, parser |\n| `lib/polymarket/` | Gamma/CLOB/resolution clients and transforms |\n| `lib/pricing/` | CLOB-validated market pricing |\n| `lib/scoring/` | P\u0026L and historical diagnostics |\n| `tests/`, `playwright/` | Vitest and browser coverage |\n\nLayering rules live in [ARCHITECTURE.md](./ARCHITECTURE.md).\n\n## Local Setup\n\n```bash\nnpm install\ncp .env.example .env.local # if present; otherwise create .env.local\nnpm run dev\n```\n\nMinimum useful `.env.local`:\n\n```bash\nOPENROUTER_API_KEY=...\nCRON_SECRET=...\nADMIN_PASSWORD=...\nNEXT_PUBLIC_SITE_URL=http://localhost:3000\nDATABASE_PATH=data/forecaster.db\nBACKUP_PATH=backups\nDECISION_COHORT_LIMIT=5\n```\n\nDevelopment fallbacks exist for cron/admin secrets; production fails closed when\nrequired secrets are missing.\n\n## Verification\n\n```bash\nnpm run typecheck\nnpm run check:architecture\nnpm run check:queries\nnpm run test\nnpm run test:coverage\nnpm run build:standalone\nnpm run test:e2e\nnpm run test:e2e:empty\nnpm run check:openrouter-lineup\nnpm run check:openrouter-upgrades\n```\n\n`build:standalone` also verifies production asset layout so CSS/JS/font files are\npresent in `.next/standalone`.\n\n## Cron Endpoints\n\nAll cron routes require:\n\n```http\nAuthorization: Bearer {CRON_SECRET}\n```\n\n| Route | Typical cadence |\n|---|---|\n| `POST /api/cron/sync-markets` | every 5 minutes |\n| `POST /api/cron/start-cohort` | Sunday 00:00 UTC |\n| `POST /api/cron/run-decisions` | Sunday 00:05 UTC |\n| `POST /api/cron/check-resolutions` | hourly |\n| `POST /api/cron/take-snapshots` | every 10 minutes |\n| `POST /api/cron/check-model-lineup` | Monday 09:00 UTC |\n| `POST /api/cron/backup` | low-traffic weekly window |\n\nDo not call `run-decisions` casually in production; it spends OpenRouter credits.\n\n## Data\n\n| Path | Meaning |\n|---|---|\n| `data/forecaster.db` | default SQLite database |\n| `backups/` | SQLite backups |\n| `backups/exports/` | temporary admin CSV ZIP exports |\n\nProduction state should live outside immutable release directories.\n\n## Docs\n\nStart with [docs/README.md](./docs/README.md). Current docs should be concise,\nlinked, and operationally necessary. Historical plans and audits should be kept\nonly when they are still needed for auditability; otherwise remove them.\n\n## License\n\n[MIT](./LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsetrf%2Fforecasterarena","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsetrf%2Fforecasterarena","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsetrf%2Fforecasterarena/lists"}