{"id":50005830,"url":"https://github.com/comnik/autoprobe","last_synced_at":"2026-05-19T17:48:48.969Z","repository":{"id":356980809,"uuid":"1227414876","full_name":"comnik/autoprobe","owner":"comnik","description":"An exploration of programmatic context.","archived":false,"fork":false,"pushed_at":"2026-05-10T18:11:19.000Z","size":5844,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-10T20:14:55.266Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/comnik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-02T16:46:24.000Z","updated_at":"2026-05-10T18:11:19.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/comnik/autoprobe","commit_stats":null,"previous_names":["comnik/autoprobe"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/comnik/autoprobe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comnik%2Fautoprobe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comnik%2Fautoprobe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comnik%2Fautoprobe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comnik%2Fautoprobe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/comnik","download_url":"https://codeload.github.com/comnik/autoprobe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/comnik%2Fautoprobe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33226593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-19T15:49:41.270Z","status":"ssl_error","status_checked_at":"2026-05-19T15:49:22.917Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-19T17:48:48.195Z","updated_at":"2026-05-19T17:48:48.961Z","avatar_url":"https://github.com/comnik.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# autoprobe\n\nExperimental agent harness where context is constructed by executable programs that\nconstantly probe the environment. Like any other codebase, these programs are written and\nevolved by the agent entirely on its own, or in collaboration with human users. This active\n*programmatic context* then becomes an alternative to passive memory systems based on static\nfiles.\n\n**Objectives**\n\n1. 10x fewer in-context tokens without loss of quality or time taken to complete a task.\n2. Reusable abstractions that persist across sessions. No hard compaction.\n3. Fine-grained grounding and steering by human users.\n\n## Motivation\n\nIntelligence is going to be energy constrained. In-context tokens consume orders of\nmagnitude more energy than tokens flowing through traditional deterministic programs. So it\nis generally much more efficient for an intelligent agent to hard-code the reasoning steps\nrequired to solve a problem into a traditional program, rather than actually solve it\nin-context.\n\nBut in-context is where the magic happens, where the agent can course correct if an\nassumption suddenly clashes with reality. The challenge is to find a balance, where the\ngenerated programs continuously validate their core assumptions against ground truth and\nescalate back to the agent if any assumption is violated.\n\nIf a traditional agent needs to know whether all tests are passing, or whether a server is\nup, it may read thousands of tokens of logs, or rely on stale information from the\nconversation history. Wrapped in `autoprobe`, an agent should write a script that checks and\noutputs `SERVER_STATUS: UP` or `10/10 tests passing`.\n\nI wanted a goal-seeking agent harness that encourages program synthesis combined with\ncontinuous probing of the environment, in order to minimize token usage without sacrificing\nintelligence.\n\n### The memory lens\n\n\u003e From `.md` to `.sh`.\n\nAnother way to think about `autoprobe` is as a memory system for agents that is based on\nexecutable programs, rather than static files.\n\nMost memory systems rely on static markdown files to build up a persistent knowledge base.\nBut like any form of documentation, static knowledge bases can drift from reality. Again,\nthe challenge is to continuously test the encoded knowledge against a ground truth, but to\ndo so *out of context*.\n\nA markdown \"memory\" is just a program that happens to only be executable in-context. Nothing\nprevents an intelligent agent from encoding its hard earned knowledge about the environment\nit is operating in (say a codebase) in a program that is executable out of context. This is\nthe difference between documenting the layout of a codebase in a `.md` vs calling `ls` or\n`grep`. Both have different strenghts and weaknesses. The `.md` compresses the knowledge but\ncan drift. Calling `ls` and `grep` always reflects ground truth, but can cause lots of\nredundant information to spill into the context window.\n\nSo the key is to encourage the agent to write its \"memory programs\" in such a way that when\nexecuted, they return a compressed representation of the knowledge, but also validate their\nunderlying assumptions against the current state of the environment. For example, knowledge\nabout a specific component in a codebase should come with a check that ensures that the\ncomponent still exists at the expected location. Knowledge about the architecture of the\ncodebase should come with a check of the dependency graph.\n\nInstead of an agent that writes a diary of what it did, `autoprobe` agents install probes in\nthe environment they are operating in. The context window becomes a live dashboard of\nsensors that is always a fresh, verified reflection of reality.\n\n## Architecture\n\nAt the core of `autoprobe` is [an agent loop like any\nother](https://ampcode.com/notes/how-to-build-an-agent). Where it differs is in the\nrepresentation of the context. Instead of modeling context as a conversation interspersed\nwith tool calls, the `autoprobe` harness constructs the context from scratch on every\niteration, by assembling the outputs of a library of installed programs. It is worth it to\nspend cheap out-of-context compute in order to improve the signal-to-noise ratio of the\ncontext window.\n\nThe library is just a directory in the local filesystem. Files in that directory are assumed\nto be executable. In each iteration, the harness executes every installed program and\nappends the output to the context for that model call.\n\n`autoprobe init` sets up the library (`.autoprobe/programs` by default) and pre-installs a\n*cornerstone* program which explains the approach.\n\nHuman users can contribute their own programs to the library, or edit those created by the\nagent. Typically, at least one human-provided program is used to set (and verify!) the\noverall goal to work towards. For simple goals, this can be specified inline via the\n`autoprobe run --goal ...` argument.\n\nTo be clear: `autoprobe` can still perform regular tool calls. The difference is really just\nthat in each iteration, the context passed to the LLM is constructed entirely from scratch.\nEstablished tools like `read`, `write`, `edit`, and `bash` are also how the agent is\nexpected to update the library.\n\nEach iteration re-runs the programs and rebuilds the user-side context from scratch.\nAssistant messages and tool results are retained only while the model is mid tool-using\ncycle; once it produces a response with no tool calls the cycle ends and the next\niteration starts fresh with just the new program outputs. When the reconstructed\ncontext matches the previous one byte-for-byte (programs produced identical output\nand nothing new has happened), the harness idles with exponential backoff (capped at\n30s) instead of re-querying the model. The agent never auto-terminates; quit with `q`\nin the TUI.\n\n![Workflow](workflow.png)\n\n## Usage\n\nInitialize an `autoprobe` directory in your project:\n\n```\nautoprobe init\n```\n\nThis launches an interactive picker for the model provider (Anthropic, OpenAI, Google, or\nxAI Grok) and a specific model, then creates `.autoprobe/`:\n\n- `config.yaml` — the chosen provider and model\n- `programs/` — the cornerstone program plus anything you or the agent install\n- `reinforcement/` — per-tool reinforcement messages appended to tool results\n\nSkip the picker by passing both flags:\n\n```\nautoprobe init --provider openai --model gpt-5-codex\n```\n\nPassing only one of `--provider` / `--model` skips that screen and prompts for the other.\n\nRe-running `init` on an existing directory refreshes the embedded assets and preserves your\nconfig unless you override it via flags or the picker.\n\nSet the appropriate API key for your chosen provider:\n\n- Anthropic: `ANTHROPIC_API_KEY`\n- OpenAI: `OPENAI_API_KEY`\n- Google: `GEMINI_API_KEY` or `GOOGLE_API_KEY`\n- Grok (xAI): `XAI_API_KEY`\n\nThen run the agent:\n\n```\nautoprobe run                            # run autoprobe on the .autoprobe/ directory\nautoprobe run --goal \"make tests pass\"   # inline goal, appended as a final program output\nautoprobe run -n 20                      # bounded run; exits after 20 iterations\n```\n\n## Evaluation\n\n[ProgramBench](https://programbench.com/) instances are used as a first testing ground for\n`autoprobe`. The `evals` directory contains scripts to setup a\n[sprite](https://sprites.dev/), run an evaluation on it, and download the resulting\n`autoprobe` traces (which are human readable HTML pages).\n\n## FAQ\n\n**Q: How is this different from having an agent write skills or tools?**\n\nThe installed programs are automatically executed on every iteration and so have a chance to\nfeed information from the environment to the agent pro-actively. Skills also hard-code the\nprogressive disclosure mechanism, whereas with `autoprobe` the agent can evolve its own.\n\n**Q: Can I use `autoprobe` with my favourite model?**\n\n`autoprobe` supports Anthropic Claude, OpenAI (including Codex), Google Gemini, and\nxAI Grok. Pick one when running `autoprobe init` (or pass `--provider` and `--model`\nto skip the picker). Reasoning / thinking content round-trips across turns for the\nfirst three providers; xAI does not return a replayable reasoning signature, so Grok\nruns without thinking continuity (the agent loop tolerates this). Tool calling works\nthe same way regardless of which provider you choose.\n\n**Q: Can I use `autoprobe` with my favourite coding harness?**\n\nNo, the `autoprobe` interaction model can't be tacked on to a conventional harness via\nplugin / skill. However programmatic context is a simple idea and easy to implement, so open\nsource harnesses like the great [pi](https://github.com/earendil-works/pi) could easily be\nforked and adapted.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomnik%2Fautoprobe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcomnik%2Fautoprobe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomnik%2Fautoprobe/lists"}