{"id":49615212,"url":"https://github.com/binary-husky/alphaautoresearch","last_synced_at":"2026-05-04T21:07:13.813Z","repository":{"id":349967618,"uuid":"1204692065","full_name":"binary-husky/AlphaAutoResearch","owner":"binary-husky","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-18T02:48:02.000Z","size":466,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-18T04:32:28.201Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/binary-husky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-08T08:37:58.000Z","updated_at":"2026-04-18T02:48:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/binary-husky/AlphaAutoResearch","commit_stats":null,"previous_names":["binary-husky/alpha-rl-research"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/binary-husky/AlphaAutoResearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binary-husky%2FAlphaAutoResearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binary-husky%2FAlphaAutoResearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binary-husky%2FAlphaAutoResearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binary-husky%2FAlphaAutoResearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/binary-husky","download_url":"https://codeload.github.com/binary-husky/AlphaAutoResearch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binary-husky%2FAlphaAutoResearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32624786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-04T10:08:07.713Z","status":"ssl_error","status_checked_at":"2026-05-04T10:08:02.005Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-04T21:07:12.763Z","updated_at":"2026-05-04T21:07:13.805Z","avatar_url":"https://github.com/binary-husky.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Alpha Auto Research\n\n[中文版 (Chinese Version)](README.ZH.md)\n\n\u003e 2025-2026, \"AI doing research autonomously\" went from an academic dream to engineering reality. [AI Scientist](https://sakana.ai/ai-scientist/) was published in [Nature](https://www.nature.com/articles/s41586-026-10265-5), [AI-Researcher](https://github.com/HKUDS/AI-Researcher) earned NeurIPS 2025 Spotlight, and [OpenAI declared fully-automated research its North Star](https://www.technologyreview.com/2026/03/20/1134438/openai-is-throwing-everything-into-building-a-fully-automated-researcher/). But most of these systems focus on **writing papers**. We focus on **running experiments**.\n\nAlpha Auto Research is an automated RL research system built on a **Leader-Worker architecture**. A **leader agent** reads a natural-language research topic, designs multi-stage experiment plans, generates structured experiment blueprints, dispatches them to GPU clusters, monitors progress, collects results, and writes analysis reports — while **worker agents** execute individual training runs on distributed compute backends. Submit a research topic before bed, wake up to a finished report.\n\nThe system is **fully open-source**, powered by [OpenCode](https://github.com/anthropics/opencode) AI agents and [AgentJet](https://github.com/modelscope/AgentJet) (an open-source RL training framework by ModelScope). As the [oh-my-opencode](https://github.com/nicepkg/oh-my-opencode) author put it: *\"Claude Code's a nice prison, but it's still a prison.\"* — we need full control over conversation history, breakpoint recovery, and agent behavior, which only open-source tooling can provide. It supports Alibaba Cloud PAI DLC (灵骏) and SSH-based clusters as compute backends. **No expensive frontier models required** — the entire system runs on affordable LLM APIs like [MiniMax M2.7](https://www.minimax.io/) or GLM coding plans. Got unused coding plan quota sitting idle overnight? Let Auto Research squeeze every last drop of value out of it.\n\n## Key Features\n\n- **Long-horizon experiment loops**: Single RL training runs can take hours or days. The system autonomously handles the full cycle — hypothesize, design, dispatch, wait, analyze, iterate — all unattended\n- **Maximize cluster utilization**: Multi-GPU parallel dispatch across SSH servers or Alibaba Cloud PAI DLC (灵骏). Squeeze every GPU-hour out of your cluster instead of running experiments one by one\n- **Fully open-source, no vendor lock-in**: Built on [OpenCode](https://github.com/anthropics/opencode) — full control over conversation history, breakpoint recovery, and agent behavior. No dependence on closed-source tools that break when you need deep customization\n- **Cheap enough to run freely**: Powered by affordable LLM APIs (MiniMax M2.7, GLM coding plans, or any OpenAI-compatible model). When API cost is effectively zero, \"should I run one more experiment?\" is never a question\n- **Fully autonomous or human-in-the-loop — your choice**: The Blueprint mechanism lets the leader agent draft structured experiment plans while you review, tweak parameters, or take over any sub-task at any stage. Every step is transparent — no wasted compute from days of training in the wrong direction. Or just let it run end-to-end with `--no-human-in-the-loop`\n- **Leader-Worker architecture**: Leader designs experiments and analyzes results; Workers execute training on GPU clusters\n- **Blueprint-based coordination**: Structured markdown \"contracts\" between Leader and Worker ensure reproducibility\n- **Robust unattended operation**: Auto-recovery from API timeouts, GPU contention, network issues, and agent crashes — runs for days without manual restarts\n\n## Real-World Results: 6 Research Topics Completed Autonomously\n\nThe system has been validated on **6 independent research topics**, all completed autonomously by AI agents — from topic submission to final report, with zero human intervention. Here are the highlights:\n\n| # | Research Topic | Key Finding | Model | Duration |\n|---|---|---|---|---|\n| 1 | AppWorld `max_steps` hyperparameter search | `max_steps=15` is optimal: same accuracy as 25, but **40% faster** (efficiency 1.87x) | Qwen2.5-14B | ~8h overnight |\n| 2 | LoRA Rank \u0026 Alpha for math reasoning | `rank=32, alpha=64` is the sweet spot: rank 8→32 gives +15.1%, but 32→128 only +1.3% (diminishing returns) | Qwen2.5-7B | ~6h |\n| 3 | Qwen3 multi-scale comparison (GSM8K) | **14B beats 32B** (94.67% vs 92.87%) — bigger is not always better; 8B shows the highest learning efficiency (+34.93%) | Qwen3-8B/14B/32B | ~12h |\n| 4 | Countdown math reasoning | 8B achieves a **3x performance leap** (26.78% → 83.64%), nearly closing the gap with 14B | Qwen3-8B/14B/32B | ~12h |\n| 5 | Learn2Ask medical dialogue | 14B wins again (82.14%); agent auto-diagnosed an API key failure and generated a detailed error report | Qwen3-8B/14B | ~8h |\n| 6 | Training anomaly detection mechanisms | Studied `compute_madness_checklist`, `agent_madness_termination`, and `agent_madness_reward` effects on training stability | Qwen2.5-7B | ~3h |\n\n**What the AI research agent did well:**\n- Designed efficient experiment plans (e.g., testing 3 strategic values instead of brute-force grid search)\n- Pre-planned decision trees before running experiments (\"if result is A, do X; if B, do Y\")\n- Knew when to stop — didn't waste compute on unnecessary follow-up stages when evidence was already sufficient\n- Honestly reported limitations (incomplete training, lack of statistical significance, etc.)\n\n\u003e For a detailed walkthrough with figures and analysis, see the full blog post: [`subject_ajet_appworld_step_study/auto_research_blog.md`](subject_ajet_appworld_step_study/auto_research_blog.md)\n\n## Quick Start\n\n### 1. Install\n\n```bash\npip install -e .\n```\n\n### 2. Configure\n\n```bash\ncp research_config.example.jsonc research_config.jsonc\n# Edit research_config.jsonc with your credentials and backend settings\n```\n\n### 3. Clone training codebase\n\n```bash\ngit clone https://github.com/modelscope/AgentJet.git codebase/agentjet\n```\n\n### 4. Run\n\n```bash\n# Plan a new research topic (using SSH backend)\nalpha-new-plan --runner=ssh --topic=\"research_topic/my_topic.md\"\n\n# Review the plan, then begin experiments\nalpha-resume --runner=ssh \\\n    --topic=\"research_topic/my_topic.md\" \\\n    -r \"permission granted, begin research\"\n```\n\n## CLI Reference\n\nAfter `pip install -e .`, the following commands are available:\n\n### Convenience Commands\n\n| Command | Description |\n|---|---|\n| `alpha-new-plan` | Plan a research topic from scratch |\n| `alpha-resume-plan` | Resume and refine an existing plan |\n| `alpha-resume` | Start or resume experiment execution |\n| `alpha-auto` | Fully autonomous research, no human review |\n\nAll accept `--topic=\u003cpath\u003e` and `--runner=\u003cssh|pai\u003e` (defaults to ssh). The `-r \u003ctext\u003e` flag is available on all except `alpha-auto`.\n\n### Core Commands\n\n| Command | Description |\n|---|---|\n| `alpha-rl-research leader [OPTIONS]` | Full leader role with all flags |\n| `alpha-rl-research worker [OPTIONS]` | Worker role (runs on compute nodes) |\n| `alpha-run-blueprint --blueprint=\u003cpath\u003e` | Launch a single experiment blueprint |\n| `alpha-scan-jobs` | List running and recent jobs |\n| `alpha-stop-jobs --stop-job-id=\u003cid\u003e` | Stop a running job (`--delete` to remove) |\n\n### Leader Options\n\n```\n--topic PATH               Research topic file or inline text\n--blueprint PATH           Path to a research skill .md file\n--resume                   Resume the latest session\n-r, --resume-instruction   Instruction for the resumed session\n--only-run-planning        Generate plan only, don't execute experiments\n--no-human-in-the-loop     Fully autonomous role, no human review steps\n                           (leader only; conflicts with --only-run-planning,\n                           --resume, and -r)\n```\n\n## Architecture\n\n### Leader Agent (the \"brain\")\n\nThe Leader Agent receives a natural-language research topic and autonomously:\n\n1. **Parses the topic** — identifies variables to compare and controls to hold fixed\n2. **Designs multi-stage experiments** — coarse-to-fine progressive search with pre-planned decision branches (\"if result is A, do X; if B, do Y\")\n3. **Generates experiment blueprints** — structured markdown documents containing everything a Worker needs\n4. **Dispatches to GPU clusters** — runs multiple experiments in parallel via PAI DLC or SSH\n5. **Monitors progress** — polls experiment status at regular intervals\n6. **Analyzes results** — reads metrics, generates comparison charts, writes conclusions\n7. **Iterates or terminates** — follows the decision tree to decide if another round is needed or the evidence is sufficient\n\n### Worker Agent (the \"hands\")\n\nEach Worker Agent runs on an independent GPU node:\n\n- Sets up the environment per the blueprint\n- Launches training and monitors with adaptive polling intervals\n- Auto-recovers from GPU contention, process crashes, and resource conflicts\n- Reports results (success or failure) so the Leader never waits indefinitely\n\n### Blueprint: the contract between Leader and Worker\n\nBlueprints are structured markdown files with **7 standard sections** that serve as the communication protocol:\n\n| Section | Purpose |\n|---|---|\n| `[exp_purpose]` | Hypothesis being tested, and how this experiment differs from others |\n| `[exp_codebase_dir]` | Absolute path to experiment code |\n| `[exp_venv_exe]` | Path to Python executable / virtual environment |\n| `[exp_yaml_path]` | Path to training config YAML |\n| `[exp_launch_command]` | Command to start training |\n| `[exp_result_dir]` | Output directory for results |\n| `[exp_max_time]` | Maximum allowed runtime |\n\nAn optional **notes section** can include environment setup steps, config references, and **pre-experiment hypotheses** — so the analysis can judge whether results matched expectations rather than rationalizing after the fact.\n\n### Workflow\n\n```\n Research Topic (.md)\n        |\n        v\n  +-- LEADER AGENT --+\n  |  1. Read topic    |\n  |  2. Design plan   |  \u003c--- User reviews \u0026 confirms (optional)\n  |  3. Generate      |\n  |     blueprints    |\n  +--------+----------+\n           |\n     dispatch via runner\n     (PAI DLC or SSH)\n           |\n    +------+------+\n    |      |      |\n    v      v      v\n WORKER  WORKER  WORKER    Each worker:\n  exp_1   exp_2   exp_3    - Sets up environment\n    |      |      |        - Runs training\n    v      v      v        - Monitors \u0026 fixes errors\n results results results   - Writes experiment_log.md\n    |      |      |\n    +------+------+\n           |\n           v\n  +-- LEADER AGENT --+\n  |  Review results   |\n  |  Update progress  |  ---\u003e Iterate or finalize\n  |  Generate report  |\n  +-------------------+\n```\n\n### Robustness: run for days without human intervention\n\n- **Auto-resume on crash**: A guardian loop wraps each agent. When an agent is interrupted (API timeout, network issue, context overflow), it automatically resumes from the breakpoint with full conversation history\n- **Tolerates LLM service outages**: If the LLM API is rate-limited or temporarily unavailable, the system waits patiently and resumes seamlessly once service recovers — even after hours of downtime\n- **Self-healing Workers**: GPU contention? Reallocate. Training process died? Restart. Zombie processes? Clean up automatically\n- **Permission-aware**: Detects permission denials and suggests alternative approaches; supports \"full trust\" mode for fully unattended operation\n\n## Runner Backends\n\nThe `--runner` CLI argument selects the compute backend (`ssh` or `pai`). Both backends expose the same interface to the Leader Agent — submit blueprint, wait for results, collect data — so **switching backends requires changing only one flag**, with zero impact on the research workflow.\n\n\u003e **Tip**: Debug and iterate on a local SSH server first, then switch to `--runner=pai` to scale the same research topic to cloud GPUs seamlessly.\n\n### SSH (`--runner ssh`)\n\nLaunches workers on SSH-accessible machines via tmux sessions. Ideal for teams with their own GPU servers — zero additional cost. Configure hosts in the `ssh` section:\n\n```jsonc\n\"ssh\": {\n    \"hosts\": [\n        { \"host\": \"192.168.1.10\", \"port\": 22, \"user\": \"root\" }\n    ]\n}\n```\n\nSupports localhost with automatic SSH key setup.\n\n### Alibaba Cloud PAI DLC (`--runner pai`)\n\nLaunches workers as Alibaba Cloud PAI DLC (灵骏) jobs by cloning a template job. Ideal for elastic scaling — run 6+ experiments in parallel without worrying about local hardware limits. Configure credentials and job defaults in the `alibaba_cloud` and `pai_job_template` sections.\n\n## Configuration\n\nSee [`research_config.example.jsonc`](research_config.example.jsonc) for all options. Key sections:\n\n| Section | Purpose |\n|---|---|\n| `runner` | Backend selection: `\"pai\"` or `\"ssh\"` |\n| `paths` | Home directory and project root |\n| `alibaba_cloud` | PAI DLC credentials and region |\n| `api_keys` | API keys for training services |\n| `pai_job_template` | Job cloning template and defaults |\n| `remote_monitor` | Optional kite-client remote monitoring |\n| `ssh` | SSH host list for the SSH runner |\n\n## Project Structure\n\n```\nalpha_auto_research/\n  config.py                  Config loader (research_config.jsonc)\n  opencode_runner.py         Leader/worker agent orchestrator\n  cli.py                     Convenience CLI entry points\n  blueprint_runner/\n    base.py                  Abstract ExperimentSubagent interface\n    pai_runner.py            Alibaba PAI DLC backend\n    ssh_runner.py            SSH/tmux backend\n    blueprint_runner.py      Single blueprint executor\n    scan_jobs.py             Job listing\n    stop_jobs.py             Job management\n  pai/\n    client.py                PAI DLC API client\n  utils/\n    pty.py                   Pseudo-terminal runner\n    smart_daemon.py          Detached process management\n  skills/\n    leader_experiment/SKILL.md       Leader agent instructions\n    worker_experiment/SKILL.md       Worker agent instructions\n```\n\n## Usage Examples\n\n### Basic: plan, review, execute\n\n```bash\n# Step 1: Generate a research plan (Leader reads the topic, designs experiments)\nalpha-new-plan \\\n    --runner=ssh \\\n    --topic=\"research_topic/example_01_content_madness_detect.md\"\n\n# Step 2: Review the generated plan, then confirm execution\nalpha-resume \\\n    --runner=ssh \\\n    --topic=\"research_topic/example_01_content_madness_detect.md\" \\\n    -r \"permission granted, begin research\"\n```\n\n### Iterative planning: refine before executing\n\n```bash\n# Generate initial plan\nalpha-new-plan \\\n    --runner=ssh \\\n    --topic=\"research_topic/example_02_kl_abl.md\"\n\n# Refine the plan with specific instructions\nalpha-resume-plan \\\n    --runner=ssh \\\n    --topic=\"research_topic/example_02_kl_abl.md\" \\\n    -r \"max_env_worker: 64 -\u003e 128, max_num_seqs-\u003e1024, revise your plan accordingly\"\n\n# Confirm execution\nalpha-resume \\\n    --runner=ssh \\\n    --topic=\"research_topic/example_02_kl_abl.md\" \\\n    -r \"permission granted, begin research\"\n```\n\n### Cloud execution with PAI DLC\n\n```bash\nalpha-new-plan \\\n    --runner=pai \\\n    --topic=\"research_topic/example_03_appworld.md\"\n\nalpha-resume-plan \\\n    --runner=pai \\\n    --topic=\"research_topic/example_03_appworld.md\" \\\n    -r \"polish your plan\"\n\nalpha-resume \\\n    --runner=pai \\\n    --topic=\"research_topic/example_03_appworld.md\" \\\n    -r \"permission granted, begin research\"\n```\n\n### Resume and finalize reports\n\n```bash\n# Resume a broken experiment with corrective instructions\nalpha-new-plan \\\n    --runner=ssh \\\n    --topic=\"research_topic/example_02_kl_abl.md\" \\\n    -r \"Look at what you have done! Yaml is all wrong, refer to agentjet/ajet/default_config/ajet_default.yaml\"\n\n# Tell the leader to write the final report after experiments finish\nalpha-resume \\\n    --runner=pai \\\n    --topic=\"research_topic/example_03_appworld.md\" \\\n    -r \"the experiment is finished, write report\"\n\n# Customize report style\nalpha-resume-plan \\\n    --runner=pai \\\n    --topic=\"research_topic/example_03_appworld.md\" \\\n    -r \"use seaborn! show as many details as possible, write report in markdown format with figures included.\"\n```\n\n### Fully autonomous (no human review)\n\n```bash\nalpha-auto \\\n    --runner=ssh \\\n    --topic=\"research_topic/my_topic.md\"\n```\n\n### Job management\n\n```bash\n# List all running and recent jobs\nalpha-scan-jobs --runner=ssh\n\n# Stop a specific job\nalpha-stop-jobs --runner=ssh --stop-job-id=\u003cjob_id\u003e\n\n# Stop and delete a job\nalpha-stop-jobs --runner=ssh --stop-job-id=\u003cjob_id\u003e --delete\n```\n\n## Writing a Research Topic\n\nA research topic is a markdown file describing what you want to investigate. Example:\n\n```markdown\n## Research Task\n\nInvestigate the effect of `max_steps` on the balance between training effectiveness and speed.\nUse `Qwen2.5-14B-Instruct`, 8 GPUs per experiment, max 24 hours per experiment.\n\n## Capacity\n\nMax parallel experiment blueprints: 3\n```\n\nKey elements to include:\n- **Research question**: What variable(s) to investigate and what tradeoffs to explore\n- **Model and resources**: Which model to use, GPU count per experiment\n- **Constraints**: Max parallel experiments, time limits per experiment\n- **Codebase and config references**: Point to relevant code paths and YAML configs\n\nSee `research_topic/` for complete examples.\n\n## Tech Stack\n\n| Component | Role |\n|---|---|\n| [OpenCode](https://github.com/anthropics/opencode) | Open-source AI agent runtime — reads files, executes commands, manages processes. Supports conversation persistence and resume-from-breakpoint |\n| [AgentJet](https://github.com/modelscope/AgentJet) | Open-source RL training framework (Apache 2.0) by ModelScope. Multi-GPU distributed training, LoRA, diverse tasks (math, AppWorld, medical dialogue, etc.) |\n| LLM API (affordable) | Powers the agents' reasoning. Uses exclusively low-cost models (e.g., MiniMax M2.7) via OpenAI-compatible APIs — no expensive frontier models needed |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinary-husky%2Falphaautoresearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbinary-husky%2Falphaautoresearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinary-husky%2Falphaautoresearch/lists"}