{"id":51076951,"url":"https://github.com/zozo123/inferoa-demo","last_synced_at":"2026-06-23T15:01:58.722Z","repository":{"id":363775221,"uuid":"1264830406","full_name":"zozo123/inferoa-demo","owner":"zozo123","description":"Interactive simulation of inference-native agent mechanics (Inferoa, built on vLLM): prefix caching, context compression, model routing","archived":false,"fork":false,"pushed_at":"2026-06-10T10:11:08.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T10:22:19.068Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zozo123.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-10T08:08:59.000Z","updated_at":"2026-06-10T10:11:12.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zozo123/inferoa-demo","commit_stats":null,"previous_names":["zozo123/inferoa-demo"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/zozo123/inferoa-demo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Finferoa-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Finferoa-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Finferoa-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Finferoa-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zozo123","download_url":"https://codeload.github.com/zozo123/inferoa-demo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Finferoa-demo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34694786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-23T15:01:57.915Z","updated_at":"2026-06-23T15:01:58.716Z","avatar_url":"https://github.com/zozo123.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Inferoa — inference flight recorder\n\nAn interactive, single-file demo of the ideas in [Announcing Inferoa](https://inferoa.agentic-in.ai/blog/announcing-inferoa/):\nan **inference-native, tokenmaxxing agent harness** for long-horizon coding work, built on the vLLM stack.\n\n**Live demo:** https://zozo123.github.io/inferoa-demo/\n\n## What it shows\n\nPress **RUN** and watch an 8-turn agent loop (\"fix the failing auth test\") spend tokens through two harnesses at once:\n\n- **Context window** — cached prefix vs. fresh input vs. output, turn by turn\n- **Semantic router** — each step routed to self-hosted vLLM or a frontier model, with the reason\n- **Cost race** — cumulative spend: naive harness (full resend, raw tool dumps, frontier-only) vs. Inferoa\n\n## What it is (and isn't)\n\nThis is a **client-side simulation** — no model is called. The mechanics are real; the rates are\nparameterized from the results reported in the announcement:\n\n| Lever | Reported result |\n|---|---|\n| Prefix-cache discipline | 90.0% cached-token discount |\n| CodeGraph context | 80.8% context reduction |\n| RTK tool records | 61.4% tool-output reduction |\n\nPricing in the sim is illustrative ($/Mtok: frontier 3.00 in / 0.30 cached / 15.00 out; self-hosted 0.10 in / 0.30 out).\nWith those parameters the simulated task lands at **~21.6× cheaper** with an **87% cache-hit rate**.\n\n## Receipts — the real runs\n\nThe simulation is for legibility; these executed for real in isolated [islo.dev](https://islo.dev) sandboxes:\n\n- **Real agent PR:** [inferoa-receipts#1](https://github.com/zozo123/inferoa-receipts/pull/1) — an agent in a sandbox was handed a genuinely failing test (tz-naive vs tz-aware datetimes; the task prompt named the suspected cause), produced the fix, re-ran pytest to green, and opened a PR containing the verbatim before/after output. It proves the sandboxed execute-verify-publish workflow, not unguided diagnosis.\n- **Real Inferoa → real vLLM:** `inferoa@0.11.0` (npm) in one sandbox drove `vLLM v0.22.1` (Qwen2.5-0.5B, CPU, `enable_prefix_caching=True`) in another, wired through a public `islo share` URL. Inferoa's event log records `provider_id: vllm:openai_compatible:https://….share.islo.dev/v1` with 16,829-token prompts and stable prompt/tool-schema hashes.\n- **Real prefix-cache metrics:** vLLM's Prometheus counters after the run — `prefix_cache_hits_total 1,611,008` / `prefix_cache_queries_total 1,647,574` = **97.8% measured hit rate**. (Distinct from the announcement's \"90.0% cached-token discount\", which is a pricing metric; both come from byte-stable prefixes.)\n- **Honest limits:** a 0.5B CPU model demonstrates the mechanics, not frontier coding ability.\n\n## Run locally\n\nIt's one `index.html`. Open it, or:\n\n```\npython3 -m http.server 8080\n```\n\n## The real thing\n\n```\nnpm install -g inferoa\ninferoa setup\ninferoa\n```\n\n---\n\n*Unofficial demo. Stack credits: vLLM Engine · vLLM Semantic Router · vLLM Omni · CodeGraph · RTK.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzozo123%2Finferoa-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzozo123%2Finferoa-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzozo123%2Finferoa-demo/lists"}