{"id":50303262,"url":"https://github.com/interwebologist/bruteloop","last_synced_at":"2026-05-28T14:01:17.030Z","repository":{"id":360294200,"uuid":"1233421437","full_name":"interwebologist/BruteLoop","owner":"interwebologist","description":"(WIP) BruteLoop - fully autonomous coding agent. Made for the fully augmented engineer using diversity of thought biased model ensembles.","archived":false,"fork":false,"pushed_at":"2026-05-25T20:43:37.000Z","size":42,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-25T21:26:46.704Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/interwebologist.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-09T00:12:08.000Z","updated_at":"2026-05-25T20:43:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/interwebologist/BruteLoop","commit_stats":null,"previous_names":["interwebologist/bruteloop"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/interwebologist/BruteLoop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interwebologist%2FBruteLoop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interwebologist%2FBruteLoop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interwebologist%2FBruteLoop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interwebologist%2FBruteLoop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/interwebologist","download_url":"https://codeload.github.com/interwebologist/BruteLoop/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interwebologist%2FBruteLoop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33611254,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-28T14:01:15.549Z","updated_at":"2026-05-28T14:01:17.012Z","avatar_url":"https://github.com/interwebologist.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🦾 🤖 BruteLoop\n\nSimple, bare-bones coding agent for an autonomous \"fully augmented\" workflow that works in parallel to the human, not with a human hitting \"allow\" on each and every turn. BruteLoop calls this \"training wheels\". A BruteLoop master will be able to mostly one-shot their feature. Time will be spent designing within scope and reviewing code (though BruteLoop aims to make the latter optional with automated code review with the optional multi-biased adversarial (multi-LLM) code review).\n\nBruteLoop assumes the popular one-shot LLM \"Car Wash\" test is usually always correct and its actually a test to understand if humans know how LLMs work.\n\nWith bruteloop, you only need the bare minimum in a harness to drive massive results. This isn't about collecting plugins, themes, or custom TUIs. Inspired by mini-swe-agent(bash only) , but with system design tool patterns I believe are important for Local models and speed.\n\n## features\n- focus on self hosted / local models \n- focus on working in parallel to agents : fully autonomous, no training wheels\n- Auditable to \"see\" what the model did if things go sideways, but otherwise not meant to be watched ( JSONL / OTel )\n- Sandboxed for host or network safety\n- LLMs w/ diff biases to auto review or correct code adding to autonomy (not in MVP)\n- Compaction system\n- Repomapping enabled to limit agent searching\n- Code Repos as references held locally in .bruteloop/references reduces time (no web searching) and context\n- inputs guardrails\n\n### Multi-LLM \n\nResearch from early 2026 confirms that 27B–35B models occupy the \"Intelligence Frontier\"—the point where adding more \nparameters yields diminishing returns compared to the gains from Multi-Agent Diversity. For BruteLoop, switching to a \n3-model \"Triangulation\" setup (e.g., 27B + 30B + 35B) targeted at AMD Strix Halo or Nvidia Spark 128GB machines often beats a 2-model \"Duel\" (27B + 70B) because it breaks \"logical deadlocks\" through consensus.\n\nResearchers show that \"diversity of thought\" in LLM ensembles prevents correlated errors. Since different models \nfail in different ways, their \"competing biases\" act as a safety net, catching bugs and security holes that a \nsingle model would miss.\n\n## AI generated output below. These are just guidelines for our work above. \n\nKey Benefits from Research\n\n    Error Decoupling: Models with different training data rarely make the same logic slip. When one model is \"blind\" to a bug, the other likely sees it.\n\n    Security Gains: Multi-LLM setups (like the MULTI-LLMSECCODEEVAL framework) can reduce security flaws by nearly 50% compared to single models.\n\n    Consensus Accuracy: Systems like EnsLLM use \"voting\" between models. If two models agree on a code block despite having different \"biases,\" the code is statistically much more likely to be correct.\n\n    Hallucination Check: A \"Critic\" model with a different bias is better at spotting when a \"Generator\" model is making up fake libraries, syntax or bad code.\n\nRelevant Whitepapers\n\n    \"Ensemble Learning for LLMs in Text and Code Generation: A Survey\" (2026): Confirms that \"diversity representation\" leads to higher code quality.  \n\n    https://www.researchgate.net/publication/401575248_Ensemble_Learning_for_Large_Language_Models_in_Text_and_Code_Generation_A_Survey#:~:text=Our%20findings%20highlight%20key%20benefits,quality%2C%20and%20greater%20application%20flexibility.\n\n    \"Enhancing LLM Code Generation with Ensembles\" (2025): Proves that independent models are less prone to making identical mistakes.\n    \n    https://arxiv.org/html/2503.15838v1#:~:text=We%20conduct%20extensive%20experiments%20on,prone%20to%20making%20identical%20mistakes\n\n### **Foundational Research: The Science of Scaling \u0026 Bias**\n\n- **“Towards a Science of Scaling Agent Systems: When and Why Agent Systems Work”** (Google Research, 2026)  \n  Proves that while independent agents amplify errors by **17.2x**, centralized systems (like BruteLoop) contain error propagation to just **4.4x**. It confirms that \"Centralized Coordination\" is required to prevent \"anchoring cascades\" in complex code tasks.  \n  - [Link: https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/]\n\n- **“The Illusion of Thinking: Understanding Reasoning Models via the Lens of Problem Complexity”** (Apple Machine Learning Research, 2025/2026)  \n  Identifies that Large Reasoning Models (LRMs) suffer from a \"complete accuracy collapse\" beyond certain complexities and fail at exact computation. This justifies BruteLoop's use of **Medium-Complexity Ensembles (27B–35B)** with explicit algorithmic loops to achieve consistent reasoning where single \"super-models\" fail.  \n  - [Link: https://machinelearning.apple.com/research/illusion-of-thinking]\n\n- **“Fragile Preferences: A Deep Dive into Order Effects in Large Language Models”** (arXiv:2506.14092, 2026)  \n  The first comprehensive study on the \"Primacy Bias\" in LLMs. It explains why a single \"Critic\" is often unreliable due to quality-dependent positional bias. This paper is the mechanical justification for BruteLoop’s **Triangulation protocol**, using multiple diverse models to neutralize these \"fragile preferences.\"  \n  - [Link: https://arxiv.org/abs/2506.14092]\n\n- **“ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs”** (ACL, 2024/2025)  \n  Introduces the \"Round-Table\" framework, demonstrating that a diverse ensemble of models (different families/sizes) participating in a multi-round debate consistently outperforms even the strongest single model on logic-dense benchmarks.  \n  - [Link: https://arxiv.org/abs/2409.11562]\n\n- **“Refute-or-Promote: Adversarial Stage-Gated Multi-Agent Review”** (April 2026)  \n  Establish the \"Kill Mandate\" protocol used in BruteLoop. By forcing critics to attempt to \"disprove\" code rather than \"improve\" it, this methodology reports a **79–83% prospective kill rate** of plausible-but-wrong code reports.  \n  - [Link: https://arxiv.org/abs/2604.19049]\n\n\n\n# Tech involved in testing\n\n# BRUTELOOP: AGENTIC STACK v1.0.4 (2026-RELEASE)\n\nHardware_Architecture:\n  SoC: AMD Ryzen AI Max+ 395 (Strix Halo)\n  Cores: 16 Zen 5 CPU Cores / 32 Threads\n  GPU: Radeon 8060S (40 Compute Units / RDNA 3.5)\n  Memory: 128GB LPDDR5X-8000 (Unified GTT Pool)\n\nKernel_Configuration:\n  OS: Linux (Ubuntu 24.04+ / Noble Numbat)\n  Driver: ROCm 7.x (with gfx1151 support)\n  Kernel_Parameters:\n    - ttm.pages_limit=30720000  # Enables 120GB+ GTT mapping\n    - amdgpu.gttsize=120000      # Allocates system RAM to iGPU\n    - hugepages: Enabled\n\nInference_Engine:\n  Platform: llama.cpp (Custom Build)\n  Build_Flags:\n    - GGML_HIP=ON               # ROCm Acceleration\n    - GGML_HIP_ROCWMMA_FATTN=ON # Flash Attention for Strix Halo\n  Server: llama-server (Router Mode)\n  Quantization: GGUF (Q4_K_M / Q5_K_M)\n  Cache_Precision: Q8_0 (K-Cache/V-Cache)\n\nModel_Ensemble (Triangulation):\n  Generator: Qwen 3.6-27B (Dense, 262K Context)\n  Adversary: Llama 3.3-70B (Dense, Logical Auditor)\n  Alternative: DeepSeek-32B-V3.2 (Safety/Structural)\n\nOrchestration:\n  Platform: Docker (ROCm-optimized container)\n  Loop_Logic: Python 3.12+ (Asyncio)\n  API_Schema: OpenAI-Compatible REST\n  Context_Management: Dynamic KV Cache Paging (256K Context Window)\n\nMethodologies:\n  - Adversarial Multi-Agent Review (AMAR)\n  - Diversity of Thoughts (DoT) Triangulation\n  - Strategic Friction Loop (SFL)\n\n### Adversarial Multi‑Agent Review (AMAR) for code improvement\n\n- **“Adversarial Multi‑Agent Evaluation of Large Language Models through Iterative Debate”** (2025)  \n  Uses advocates, judges, and juries as LLM agents to debate and refine code and reasoning outputs, inspired by courtroom dynamics. The paper shows that agent‑driven adversarial review in a loop improves response quality and alignment with human judgment, especially when multiple LLMs critique each other’s code.  \n  - [Link: https://openreview.net/forum?id=06ZvHHBR0i]\n\n- **“Adversarial Multi‑Agent LLM Defect Review”** (2026)  \n  Describes a stage‑gated adversarial multi‑agent pipeline for detecting defects and vulnerabilities in code and system outputs by continuously attacking and critiquing each other’s work. The looped, cross‑model critique yields high pre‑disclosure rejection and kill rates, making it one of the stronger setups for using multiple LLMs in a critic/judge loop around code.  \n  - [Link: https://www.emergentmind.com/papers/2604.19049]\n\n---\n\n### Diversity of Thought (DoT) in multi‑LLM ensembles for code\n\n- **“Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi‑Agent Debate Frameworks”** (arXiv 2410.12853, 2024)  \n  Explicitly studies “diversity of thought” by combining different model families (Gemini, Mixtral, PaLM‑2‑M) in a multi‑agent debate framework. The paper shows that ensembling multiple reasoning paths across agents in a loop significantly improves code‑related reasoning and problem‑solving, especially when agents critique each other’s drafts.  \n  - [Link: https://arxiv.org/abs/2410.12853]\n\n- **“Diversity of Thought Improves Reasoning Abilities of Large Language Models”** (ICLR‑style submission, 2024)  \n  Introduces methods like DIV‑SE / IDIV‑SE that generate diverse prompt variations under a fixed compute budget, forcing different reasoning trajectories. When applied across multiple LLMs in a loop, this diversity of prompts and agents consistently boosts performance on code‑rich reasoning and planning tasks.  \n  - [Link: https://openreview.net/forum?id=FvfhHucpLd]\n\n---\n\n### LLM‑as‑judge / critic loops for code quality\n\n- **“Adversarial Multi‑Agent Evaluation of Large Language Models through Iterative Debate”** (2025)  \n  Positions one LLM as a judge that aggregates and scores arguments from multiple critic agents, iteratively refining the code or reasoning output. The loop where at least two LLMs critique and one acts as judge consistently improves code quality over single‑model baselines.  \n  - [Link: https://openreview.net/forum?id=06ZvHHBR0i]\n\n- **“Adversarial Multi‑Agent LLM Defect Review”** (2026)  \n  Uses dedicated critic agents that attack code and system outputs, followed by a judge‑style aggregation layer that scores and prioritizes findings. The paper demonstrates that this looped, multi‑LLM critic/judge pattern is especially effective for improving code robustness and security when multiple models are involved.  \n  - [Link: https://www.emergentmind.com/papers/2604.19049]\n\n\n\nSRE Performance Note: To maximize stability, the Docker container should be pinned to the \nhigh-performance Zen 5 cores using taskset or cpuset-cpus to prevent CPU-iGPU scheduling \nconflicts during heavy compilation phases.\n# bruteloop\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finterwebologist%2Fbruteloop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finterwebologist%2Fbruteloop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finterwebologist%2Fbruteloop/lists"}