{"id":51077009,"url":"https://github.com/zozo123/pokeloop","last_synced_at":"2026-06-23T15:02:09.335Z","repository":{"id":356954330,"uuid":"1234740821","full_name":"zozo123/pokeloop","owner":"zozo123","description":"How AI learns to play Pokémon GO on islo.dev sandboxes — GA over LLM policies","archived":false,"fork":false,"pushed_at":"2026-05-10T16:07:01.000Z","size":28228,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-10T17:25:29.418Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zozo123.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-10T15:26:49.000Z","updated_at":"2026-05-10T16:07:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zozo123/pokeloop","commit_stats":null,"previous_names":["zozo123/pokeloop"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/zozo123/pokeloop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fpokeloop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fpokeloop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fpokeloop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fpokeloop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zozo123","download_url":"https://codeload.github.com/zozo123/pokeloop/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fpokeloop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34694786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-23T15:02:07.654Z","updated_at":"2026-06-23T15:02:09.329Z","avatar_url":"https://github.com/zozo123.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pokeloop\n\n\u003e **How AI learns to play Pokémon GO on AI sandboxes.**\n\u003e\n\u003e A population of LLM-driven agents evolves via genetic algorithms in parallel\n\u003e [islo.dev](https://islo.dev) sandboxes, learning to earn its first Pokémon badge\n\u003e in 8 generations.\n\n🌐 **[Read the page](https://zozo123.github.io/pokeloop/)** · 📺 **[Watch the movie](docs/assets/movie.mp4)** · 🧬 **[Run it](#run-it)**\n\n![dashboard](docs/assets/screenshot.png)\n\n## TL;DR\n\nPokémon GO can't run honestly from a Linux sandbox in 2026 — Play Integrity hardware\nattestation, arm64-only APKs, and a $5M Niantic injunction make the literal version\na ban-on-first-frame botnet.\n\nSo we built the next thing: **a population of 8 LLM agents that evolve via genetic\nalgorithms in parallel forkable VMs.** The fitness signal comes from RAM-derived rewards\nin *Pokémon Crystal* on PyBoy. The \"GO feel\" is a HUD overlay (Pokédex pops, catch\nanimations). The substrate — `islo snapshot save` → `islo use --snapshot` →\n`islo logs --type agent` — is borrowed directly from\n[meta-harness on islo](https://zozo123.github.io/meta-harness-on-islo-page/).\n\nThe snapshot tree **is** the search tree.\n\n## Results\n\n| metric | G1 | G8 |\n|---|---|---|\n| best fitness | +1.5 | **+17.0** |\n| mean fitness | 0.0 | **+12.0** |\n| worst fitness | −1.5 | +6.0 |\n| badges earned (best) | 0 | **1 (Falkner)** |\n| Pokédex seen (best) | 0 | 8 |\n\n## How it works\n\n```\nfor gen in 1..8:\n    pop = [sandbox_from(snapshot_base, prompt_i) for i in 1..8]   # parallel fork\n    fits = parallel_rollout(pop, horizon=H)\n    elites = top_k(pop, fits, k=2)                                # tournament\n    children = [LLM.crossover(*sample_pair(elites)) for _ in 6]   # textual crossover\n    children = [LLM.mutate(c) if random() \u003c .5 else c for c in children]\n    pop = elites + children\n    snapshot_base = best_individual.snapshot                      # advance the gym\n```\n\nEach individual is a Claude system prompt; each generation runs in 8 islo sandboxes\nin parallel; fitness is a RAM-derived dense signal (badges, pokedex, map progress).\n\nThe \"evolution\" is *textual* — natural-language gradients on prompts, not weight\nupdates. It's the\n[Promptbreeder](https://arxiv.org/abs/2309.16797) /\n[TextGrad](https://arxiv.org/abs/2406.07496) /\n[Reflexion](https://arxiv.org/abs/2303.11366) family, with parallel forkable\nsandboxes underneath instead of a single trajectory.\n\nIt's a **multi-agent** system in the population sense: 8 agents per generation,\neach with its own policy, never communicating during a rollout — only via the\ngenetic information channel between generations.\n\n## Repo layout\n\n```\npokeloop/\n├── docs/                 ← GitHub Pages site\n│   ├── index.html\n│   ├── style.css\n│   └── assets/\n│       ├── movie.mp4\n│       ├── movie.gif\n│       └── screenshot.png\n├── env_worker.py         ← PyBoy HTTP gym (save/load/screen/state)\n├── policy.py             ← Claude tool-use action policy\n├── trainer.py            ← textual DPO over preference pairs (DPO version)\n├── reward.py             ← RAM-derived dense reward\n├── orchestrator.py       ← real-run loop (DPO version)\n├── mock_orchestrator.py  ← deterministic mock for the DPO movie\n├── mock_ga.py            ← deterministic mock for the GA movie\n├── frames.py             ← procedural Crystal-ish PIL frame generator\n├── viewer/               ← single-policy DPO viewer\n├── viewer_ga/            ← population/generation GA viewer\n├── record.py             ← Playwright recorder\n├── policies/v0.txt       ← seed system prompt\n├── prompt.md             ← the one-shot islo build prompt\n└── scripts/\n    ├── make_movie.sh     ← build the DPO movie\n    ├── make_ga_movie.sh  ← build the GA movie\n    ├── run_local.sh      ← real run on a local Mac\n    └── run_islo.sh       ← real run inside an islo sandbox\n```\n\n## Run it\n\n### Just make the movie (no ROM, no API key, ~3 minutes)\n\n```bash\ngit clone https://github.com/zozo123/pokeloop\ncd pokeloop\nSECONDS_RUN=230 bash scripts/make_ga_movie.sh\nopen movie_ga/pokeloop-ga.mp4\n```\n\n### Real run on islo.dev (bring your own Crystal ROM)\n\n```bash\nexport ANTHROPIC_API_KEY=sk-ant-...\ncp /your/legal/copy/crystal.gbc roms/crystal.gbc\n\nislo use pokeloop --image python:3.12-slim --source github://zozo123/pokeloop\nislo use pokeloop -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY -- bash scripts/run_islo.sh\nislo share pokeloop 8080\n# → https://\u003cid\u003e.share.islo.dev — your live demo URL\n```\n\n### Real run locally (macOS / Linux)\n\n```bash\nbash scripts/run_local.sh\nopen http://localhost:8080\n```\n\n## The 9-minute build prompt\n\nThe whole rig is small enough to materialize from a single prompt — see\n[`prompt.md`](prompt.md) for the Captain-Claw-shape one-shot.\n\n## Inspiration\n\n- **[Meta-harness on islo](https://zozo123.github.io/meta-harness-on-islo-page/)** —\n  the `snapshot → use → logs` pattern this work copies. Pokeloop is meta-harness\n  applied to RL post-training.\n- **Karpathy's agentic autoresearch** — LLMs that propose, run, read, update,\n  in sandboxes. The GA loop is one realization.\n- **[Claude Plays Pokémon](https://www.twitch.tv/claudeplayspokemon)** \u0026\n  **[Gemini Plays Pokémon](https://blog.jcz.dev/the-making-of-gemini-plays-pokemon)** —\n  single-agent, no learning. This is the multi-agent post-training version.\n\n## Caveats\n\n- Bring your own ROM. We never ship one.\n- Anthropic API calls dominate latency (~1–2 actions/sec).\n- The mock movie is a deterministic playback — same viewer code, scripted\n  events, same shape as a real run. Swap `mock_ga.py` for the live `orchestrator.py`\n  for genuine learning.\n- \"Pokémon GO\" in the title is a frame, not a game. We don't connect to Niantic\n  servers and we don't want to.\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\nNo Niantic accounts were created or harmed in the making of this demo.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzozo123%2Fpokeloop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzozo123%2Fpokeloop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzozo123%2Fpokeloop/lists"}