{"id":48497784,"url":"https://github.com/avilum/minrlm","last_synced_at":"2026-04-07T13:03:09.562Z","repository":{"id":344900005,"uuid":"1146663770","full_name":"avilum/minrlm","owner":"avilum","description":"Stop forcing LLMs to answer in one pass. Give them a runtime. Recursive Language Model that improves any LLM, while reducing token usage up to 4X.","archived":false,"fork":false,"pushed_at":"2026-04-05T11:23:09.000Z","size":14785,"stargazers_count":60,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-04-05T11:23:55.448Z","etag":null,"topics":["agent","ai-agents","cost-optimization","latency-optimization","llm","llm-inference","llmops","recursive-language-model","rlm","token-optimization"],"latest_commit_sha":null,"homepage":"https://avilum.github.io/minrlm/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/avilum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-31T13:26:42.000Z","updated_at":"2026-04-05T09:53:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"0bf33f9c-ddcb-4999-b8ad-eccf9d3fd232","html_url":"https://github.com/avilum/minrlm","commit_stats":null,"previous_names":["avilum/minrlm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/avilum/minrlm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avilum%2Fminrlm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avilum%2Fminrlm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avilum%2Fminrlm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avilum%2Fminrlm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/avilum","download_url":"https://codeload.github.com/avilum/minrlm/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avilum%2Fminrlm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31513382,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","ai-agents","cost-optimization","latency-optimization","llm","llm-inference","llmops","recursive-language-model","rlm","token-optimization"],"created_at":"2026-04-07T13:03:06.477Z","updated_at":"2026-04-07T13:03:09.556Z","avatar_url":"https://github.com/avilum.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eminRLM\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003cb\u003eStop forcing LLMs to answer in one pass. Give them a runtime.\u003c/b\u003e\n  \u003c/p\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://pypi.org/project/minrlm/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/minrlm?color=blue\" alt=\"PyPI\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/avilum/minrlm/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/avilum/minrlm?style=social\" alt=\"Stars\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/avilum/minrlm/blob/master/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-green\" alt=\"MIT License\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://avilum.github.io/minrlm/recursive-language-model.html\"\u003e\u003cimg src=\"https://img.shields.io/badge/blog-post-orange\" alt=\"Blog Post\"\u003e\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/sudoku.gif\" alt=\"minRLM demo - LLM writes code, REPL executes, answer returned\" width=\"700\"\u003e\n\u003c/p\u003e\n\nTook a base model. Wrapped it in a tiny recursive loop: **generate code - execute - refine - repeat**.\n\nDidn't change the model. Didn't add training. Didn't add data.\n\nJust stopped forcing it to answer in one pass.\n\nThe performance jump is not subtle:\n\n| | Vanilla (one-shot) | minRLM (recursive) |\n|---|---|---|\n| **AIME 2025** | 0% | **96%** |\n| **Sudoku Extreme** | 0% | **80%** |\n| **Overall (GPT-5.2)** | 48.2% | **78.2%** (+30pp) |\n| **Tokens used** | 20,967 | **8,151** (3.6x less) |\n| **Cost** | $7.92 | **$2.86** (2.8x cheaper) |\n\n\u003csub\u003e6,600+ evaluations across 4 models and 13 tasks. \u003ca href=\"https://avilum.github.io/minrlm/recursive-language-model.html\"\u003eFull blog post\u003c/a\u003e | \u003ca href=\"eval/README.md\"\u003eDetailed results\u003c/a\u003e\u003c/sub\u003e\n\n---\n\n## Try it in 10 seconds\n\n```bash\npip install minrlm\nexport OPENAI_API_KEY=\"sk-...\"\n\n# Analyze a file - data never enters the prompt\nuvx minrlm \"How many ERROR lines in the last hour?\" ./server.log\n\n# Pure computation - the REPL writes the algorithm\nuvx minrlm \"Return all primes up to 1,000,000, reversed.\"\n# -\u003e 78,498 primes in 6,258 tokens. Output: 616K chars. 25x savings.\n\n# Pipe anything\ncat huge_dataset.csv | uvx minrlm \"Which product had the highest return rate?\"\n\n# Chain: solve a Sudoku, then pipe the solution to verify it\nuvx minrlm -s \"Solve this Sudoku:\n  ..3|.1.|...\n  .4.|...|8..\n  ...|..6|.2.\n  ---+---+---\n  .8.|.5.|..1\n  ...|...|...\n  5..|.8.|.6.\n  ---+---+---\n  .7.|6..|...\n  ..2|...|.5.\n  ...|.3.|9..\" \\\n  | uvx minrlm -s 'Verify this sudoku board, is it valid? return {\"board\":str, \"valid\": bool}'\n```\n\n```python\nfrom minrlm import RLM\n\nrlm = RLM(model=\"gpt-5-mini\")\n\n# 50MB CSV? Same cost as 5KB. Data never enters the prompt.\nanswer = rlm.completion(\n    task=\"Which product had the highest return rate in Q3?\",\n    context=open(\"q3_returns.csv\").read()\n)\n```\n\n---\n\n## How it works\n\n```\nStandard LLM:\n  [System prompt] + [500K tokens of raw context] + [Question]\n  = Expensive. Slow. Accuracy degrades with length.\n\nminRLM:\n  input_0 = \"\u003c500K chars in REPL memory\u003e\"     # never in prompt\n  LLM writes: errors = [l for l in input_0.splitlines() if \"ERROR\" in l]\n              FINAL(len(errors))\n  = Code runs. Answer returned. ~4K tokens total.\n```\n\nThe model writes Python to query the data. Attention runs only on the results. A 7M-character document costs the same as a 7K one.\n\n**Not ReAct.** One REPL, 1-2 iterations, no growing context. Every step is Python you can read, rerun, and debug.\n\n### What makes it work\n\n- **Entropy profiling** - zlib compression heatmap of the input. A needle in 7MB shows up as an entropy spike; the model skips straight to it\n- **Task routing** - auto-detects structured data, MCQ, code retrieval, math, search \u0026 extract. Each gets a specialized code pattern\n- **Two-pass search** - if the first pass returns \"unknown\", a second pass runs with keywords from first-pass evidence\n- **Sub-LLM delegation** - outer model gathers evidence via `search()`, passes it to `sub_llm(task, evidence)` for focused reasoning\n- **Flat token cost** - context never enters the conversation. Only the entropy map and a head/mid/tail preview do\n- **DockerREPL** - every execution in a sandboxed container with seccomp. No network, no filesystem, stdlib only\n\n---\n\n## The scaling story\n\nThe REPL isn't a crutch for weak models - it's a lever that better models pull harder.\n\n| Model | minRLM | Vanilla | Gap | Tasks won |\n|-------|--------|---------|-----|-----------|\n| GPT-5-nano (small) | 53.7% | 63.2% | -9.5 | 4/12 |\n| GPT-5-mini (mid) | 72.7% | 69.5% | +3.2 | 7/12 |\n| GPT-5.4-mini (mid, newer) | 69.5% | 47.2% | +22.3 | 8/12 |\n| GPT-5.2 (frontier) | **78.2%** | 48.2% | **+30.0** | **11/12** |\n\nSmall model? Recursion adds overhead. Frontier model? Recursion dominates.\n\nThe gap isn't model size. It's the execution model.\n\n| | | |\n|---|---|---|\n| ![Summary](docs/summary_dashboard.png) | ![Accuracy](docs/accuracy_per_task.png) | ![Tokens](docs/token_savings.png) |\n| ![Cost](docs/accuracy_vs_cost.png) | ![Latency](docs/accuracy_vs_latency.png) | ![Per Task](docs/cost_per_task.png) |\n\n---\n\n## When to use it (and when not to)\n\n**Use it when:**\n- Large context (docs, logs, CSV, JSON) - cost stays flat as data grows\n- You want debuggable reasoning - every step is readable Python, not hidden attention\n- Token efficiency matters - 3.6x fewer tokens than comparable approaches\n\n**Skip it when:**\n- Short context (\u003c8K tokens) - a direct call is simpler\n- Code retrieval (RepoQA) - the one task where vanilla wins everywhere\n- You need third-party packages - the sandbox is stdlib-only\n\n---\n\n## REPL tools\n\n| Function | What it does |\n|----------|--------------|\n| `input_0` | Your context data (string, never in the prompt) |\n| `search(text, pattern)` | Substring search with context windows |\n| `sub_llm(task, context)` | Recursive LLM call on a sub-chunk |\n| `FINAL(answer)` | Return answer and stop |\n\n---\n\n## Works with any OpenAI-compatible endpoint\n\n```python\n# Local / self-hosted\nrlm = RLM(model=\"llama-3.1-70b\", base_url=\"http://localhost:8000/v1\")\n\n# Hugging Face\nfrom openai import OpenAI\nhf = OpenAI(base_url=\"https://router.huggingface.co/v1\", api_key=\"hf_...\")\nrlm = RLM(model=\"openai/gpt-oss-120b\", client=hf)\n```\n\nWorks with: OpenAI, Hugging Face, Anthropic (via proxy), vLLM, Ollama, LiteLLM, or anything OpenAI-compatible.\n\n---\n\n## More ways to run\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eVisualizer (Gradio UI)\u003c/b\u003e\u003c/summary\u003e\n\n```bash\ngit clone https://github.com/avilum/minrlm \u0026\u0026 cd minrlm\nuv sync --extra visualizer\nuv run python examples/visualizer.py   # http://localhost:7860\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eOpenCode integration\u003c/b\u003e\u003c/summary\u003e\n\n**1. Start the proxy:**\n```bash\nuv run --with \".[proxy]\" examples/proxy.py\n# RLM Proxy initialized | model=gpt-5-mini | docker=False\n# Uvicorn running on http://0.0.0.0:8000\n```\n\n**2. Config** (`opencode/opencode.json`): set `provider.minrlm.api` to `http://localhost:8000/v1`. See [opencode/opencode.json](opencode/opencode.json).\n\n**3. Run:**\n```bash\nOPENCODE_CONFIG=opencode.json opencode run \"First prime after 1 million\"\n# \u003e 1000003\n```\n\n**[Full tutorial](docs/opencode-minrlm-tutorial.md)**\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eDocker sandbox\u003c/b\u003e\u003c/summary\u003e\n\nLLM-generated code runs in isolated Docker containers. No network, read-only filesystem, memory-capped, seccomp-filtered.\n\n```python\nrlm = RLM(model=\"gpt-5-mini\", use_docker=True, docker_memory=\"256m\")\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eRun the benchmarks yourself\u003c/b\u003e\u003c/summary\u003e\n\n```bash\ngit clone https://github.com/avilum/minrlm \u0026\u0026 cd minrlm\nuv sync --extra eval\n\n# Smoke test\nuv run python eval/quickstart.py\n\n# Full benchmark (reproduces the tables above)\nuv run python eval/run.py \\\n    --tasks all \\\n    --runners minrlm-reasoning,vanilla,official \\\n    --runs 50 --parallel 12 --task-parallel 12 \\\n    --output-dir logs/my_eval\n```\n\nFull results: [`eval/README.md`](eval/README.md)\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eExamples\u003c/b\u003e\u003c/summary\u003e\n\n```bash\nuv run python examples/minimal.py            # vanilla vs RLM side-by-side\nuv run python examples/advanced_usage.py     # search, sub_llm, callbacks\nuv run python examples/visualizer.py         # Gradio UI\nuv run uvicorn examples.proxy:app --port 8000  # OpenAI-compatible proxy\n```\n\u003c/details\u003e\n\n---\n\n## Why this matters\n\n[Context window rot](https://arxiv.org/abs/2509.21361) is real - model accuracy degrades as input grows, even when the answer is right there. Bigger windows aren't the fix. Less input, better targeted, is.\n\nThe same pattern is showing up everywhere: Anthropic's [web search tool](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/web-search-tool) writes code to filter results, [MCP](https://modelcontextprotocol.io/) standardizes code execution access, [smolagents](https://huggingface.co/docs/smolagents/en/index) goes further. They all converge on the same idea: let the model use code to work with data instead of attending to all of it.\n\nFeels less like \"prompting\" and more like giving the model a runtime.\n\n---\n\n## Future work\n\n- **More models** - Claude Opus 4.6, Gemini 2.5, open-weight models. Does the scaling trend hold across providers?\n- **Agentic pipelines** - using the RLM pattern as a retrieval step inside multi-step agent workflows\n- **More tasks** - stress-testing edge cases and domains where the approach might break\n\nContributions welcome. Open an issue or PR.\n\n---\n\n## Credits\n\nBuilt by [Avi Lumelsky](https://github.com/avilum). Independent implementation - not a fork.\n\nThe RLM concept comes from [Zhang, Kraska, and Khattab (2025)](https://arxiv.org/abs/2512.24601). Official implementation: [github.com/alexzhang13/rlm](https://github.com/alexzhang13/rlm).\n\n\u003cdetails\u003e\n\u003csummary\u003eCitation\u003c/summary\u003e\n\n```\n@misc{zhang2026recursivelanguagemodels,\n      title={Recursive Language Models},\n      author={Alex L. Zhang and Tim Kraska and Omar Khattab},\n      year={2026},\n      eprint={2512.24601},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2512.24601},\n}\n```\n\u003c/details\u003e\n\n## Star History\n\n\u003ca href=\"https://www.star-history.com/?repos=avilum%2Fminrlm\u0026type=date\u0026legend=top-left\"\u003e\n \u003cpicture\u003e\n   \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/image?repos=avilum/minrlm\u0026type=date\u0026theme=dark\u0026legend=top-left\" /\u003e\n   \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/image?repos=avilum/minrlm\u0026type=date\u0026legend=top-left\" /\u003e\n   \u003cimg alt=\"Star History Chart\" src=\"https://api.star-history.com/image?repos=avilum/minrlm\u0026type=date\u0026legend=top-left\" /\u003e\n \u003c/picture\u003e\n\u003c/a\u003e\n\n## License\n\nMIT\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favilum%2Fminrlm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Favilum%2Fminrlm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favilum%2Fminrlm/lists"}