{"id":47875962,"url":"https://github.com/eren23/crucible","last_synced_at":"2026-04-04T01:14:24.007Z","repository":{"id":346010486,"uuid":"1188086420","full_name":"eren23/crucible","owner":"eren23","description":"Autonomous ML research on rental GPUs — LLM-driven hypothesis generation and fleet orchestration on RunPod","archived":false,"fork":false,"pushed_at":"2026-03-29T19:55:22.000Z","size":1362,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-29T21:29:57.700Z","etag":null,"topics":["ai-scientist","autonomous-research","experiment-automation","gpu","hyperparameter-optimization","llm","machine-learning","mlops","python","runpod"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eren23.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-21T15:47:36.000Z","updated_at":"2026-03-29T19:55:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/eren23/crucible","commit_stats":null,"previous_names":["eren23/crucible"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eren23/crucible","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fcrucible","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fcrucible/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fcrucible/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fcrucible/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eren23","download_url":"https://codeload.github.com/eren23/crucible/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eren23%2Fcrucible/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31383923,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T23:20:52.058Z","status":"ssl_error","status_checked_at":"2026-04-03T23:20:51.675Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-scientist","autonomous-research","experiment-automation","gpu","hyperparameter-optimization","llm","machine-learning","mlops","python","runpod"],"created_at":"2026-04-04T01:14:19.205Z","updated_at":"2026-04-04T01:14:23.999Z","avatar_url":"https://github.com/eren23.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Crucible\n\n\u003e **Alpha software.** Crucible works for the author's use case (autonomous ML research on RunPod). It may work for yours. APIs will change. Bug reports and PRs welcome.\n\nAutonomous ML research on rental GPUs. LLM-driven hypothesis generation + fleet orchestration on RunPod/SSH.\n\nYou bring a training script. Crucible decides what experiments to run, provisions the compute, executes them across tiers, and learns from the results.\n\n## Why Crucible?\n\nNo single existing tool combines fleet orchestration on rental GPUs with autonomous experiment design. The closest alternatives:\n\n- **SkyPilot** provisions GPUs across 20+ clouds but doesn't decide what experiments to run\n- **Optuna/Ax** optimize hyperparameters mathematically but don't provision compute or reason about architectures\n- **AI Scientist** generates hypotheses but runs single-machine with a 42% failure rate and no fleet management\n- **W\u0026B/MLflow** track experiments but don't execute them autonomously\n\nCrucible connects these concerns into one loop: **analyze → hypothesize → provision → execute → reflect → promote or kill**.\n\n## Origins\n\nBorn from [OpenAI Parameter Golf](https://github.com/openai/parameter-golf) (March–April 2026), a competition to train the best 16MB language model on 8xH100s in 10 minutes. The autonomous research infrastructure we built for the competition turned out to be general-purpose. Crucible extracts and generalizes it.\n\n## What Works Today\n\n- Fleet orchestration on RunPod (provision, bootstrap, dispatch, collect, destroy)\n- Generic SSH provider for any machine you can SSH into\n- Experiment execution with live output parsing, OOM retry, tier presets\n- Claude-driven autonomous research loop (hypothesis → batch → execute → reflect)\n- MCP server so Claude can control experiments via tool use\n- Model zoo with transformer components (RMSNorm, RoPE, GQA, SmearGate, etc.)\n- Analysis: leaderboard, sensitivity analysis, Pareto frontier\n- YAML project configuration (`crucible.yaml`)\n- Experiment notes (attach freeform observations to runs with YAML frontmatter)\n- Research tracks (group projects by research direction in the Crucible Hub)\n- Crucible Hub (`~/.crucible-hub/`) for cross-project knowledge sharing, git-synced\n- Research briefing (LLM session orientation with project context and findings)\n- REST API server (`crucible serve`) — 10 FastAPI endpoints wrapping MCP tools\n- W\u0026B bridge with image logging and run annotation support\n- 112 MCP tools for AI agent integration (fleet, design, context, notes, hub, tracks, briefing, architecture composition, tree search, training generalization, plugin system, community taps)\n- Unified plugin system — 12 pluggable component types (optimizers, schedulers, callbacks, loggers, providers, architectures, data adapters, objectives, and more) with 3-tier precedence and auto-discovery\n- Community taps — Homebrew-style git-based plugin sharing (`crucible tap add`, `search`, `install`, `publish`)\n- Interactive TUI for browsing experiment designs grouped by status\n\n## What's Coming\n\n- SkyPilot provider (20+ cloud support)\n- Optuna/Ax integration (mathematical HPO alongside LLM-driven search)\n- Code-level search (LLM modifies training scripts, not just configs)\n- PyPI release\n\n## Quick Start\n\n```bash\n# Install from source\npip install -e \".[all]\"\n\n# Initialize a project\ncrucible init\n\n# Edit crucible.yaml — point at your training script\n\n# Run a smoke test\ncrucible run experiment --preset smoke\n\n# Or go autonomous\ncrucible research start --budget-hours 10 --tier proxy\n```\n\n## Core Concepts\n\n### crucible.yaml\n\nLike docker-compose for ML experiments:\n\n```yaml\nname: my-project\nprovider:\n  type: runpod\n  gpu_types: [\"NVIDIA GeForce RTX 4090\"]\ntraining:\n  - backend: torch\n    script: train.py\npresets:\n  smoke: { MAX_WALLCLOCK_SECONDS: \"60\", ITERATIONS: \"400\" }\n  proxy: { MAX_WALLCLOCK_SECONDS: \"1800\", ITERATIONS: \"6000\" }\nresearcher:\n  model: claude-sonnet-4-6-20250514\n  budget_hours: 10.0\n```\n\n### Training Contract\n\nCrucible doesn't own your training code. Any script that reads env vars and prints parseable output works:\n\n**Input** (env vars):\n- `ITERATIONS`, `MAX_WALLCLOCK_SECONDS`, `TRAIN_BATCH_TOKENS`\n- `MODEL_FAMILY`, `NUM_LAYERS`, `MODEL_DIM`, etc.\n- `RUN_ID`, `RUN_BACKEND`, `RUN_PRESET`\n\n**Output** (stdout patterns):\n- `step:{step}/{total} train_loss:{loss}`\n- `step:{step}/{total} val_loss:{loss} val_bpb:{bpb}`\n- `Serialized model ... {N} bytes`\n\n### Experiment Tiers\n\nExperiments earn their way to expensive compute:\n\n| Tier | Duration | Use Case |\n|------|----------|----------|\n| smoke | ~1 min | Quick validation |\n| proxy | ~30 min | Main exploration |\n| medium | ~1 hr | Extended runs |\n| promotion | ~2 hrs | Best candidates |\n\n### Fleet Management\n\n```bash\ncrucible fleet provision --count 4\ncrucible fleet bootstrap\ncrucible fleet status\ncrucible fleet destroy\n```\n\n### Autonomous Researcher\n\nClaude-powered: analyze results, generate hypotheses, design batches, execute, reflect, promote or kill.\n\n```bash\ncrucible research start --budget-hours 10 --tier proxy --dry-run\n```\n\n### MCP Integration\n\n```bash\ncrucible mcp serve  # starts stdio MCP server for Claude (112 tools)\ncrucible serve      # starts REST API server (FastAPI, 10 endpoints)\n```\n\n## CLI Reference\n\n```\ncrucible init\ncrucible fleet {status|provision|destroy|bootstrap|sync|monitor}\ncrucible run {experiment|queue|enqueue|dispatch|collect|day|night}\ncrucible analyze {rank|sensitivity|pareto|export|summary}\ncrucible research {start|status}\ncrucible data {download|sync|status}\ncrucible mcp serve\ncrucible models list\ncrucible hub {status|sync|findings}\ncrucible track {create|list|switch}\ncrucible note {add|get|search}\ncrucible serve [--port PORT]\ncrucible tui\ncrucible store {list|diff|get}\n```\n\n## Installation\n\n```bash\npip install crucible-ml[all]        # everything\npip install crucible-ml             # minimal (orchestration only)\npip install crucible-ml[torch]      # model zoo\npip install crucible-ml[anthropic]  # autonomous researcher\npip install crucible-ml[mcp]        # MCP server\n```\n\n## Validated Workflow (Tested 2026-03-21)\n\nThis exact sequence was run and confirmed working on 2 RunPod pods:\n\n```bash\ncd /path/to/your-ml-project\ncrucible fleet provision --count 2 --name-prefix crucible-test\ncrucible fleet bootstrap --train-shards 1\ncrucible run enqueue --spec experiments.json --limit 3\ncrucible run dispatch\ncrucible fleet monitor --watch 60\ncrucible run collect\ncrucible analyze rank --top 10\ncrucible fleet destroy\n```\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md). Highest-impact areas:\n- **Compute providers**: Modal, Lambda, SkyPilot backends\n- **Search strategies**: Optuna, Ax integration\n- **Training script examples**: Show Crucible working with your framework\n- **Bug reports**: File issues, we'll fix them\n\n## Roadmap\n\nSee [ROADMAP.md](ROADMAP.md) for the full plan — what works, what's next, what we won't build, and honest competitive assessment.\n\n## Project Structure\n\n```\nsrc/crucible/\n├── core/          # Config, env, I/O, types, logging, finding, hub\n├── fleet/         # Provider-abstracted fleet management\n│   └── providers/ # RunPod, SSH backends\n├── runner/        # Experiment execution, output parsing, presets, tracking, notes\n├── models/        # Model zoo (components + architectures)\n├── researcher/    # LLM-driven autonomous research loop, briefing\n├── analysis/      # Leaderboard, sensitivity, Pareto frontier\n├── data/          # Manifest-driven HuggingFace data pipeline\n├── mcp/           # MCP server for Claude agent integration (112 tools)\n├── training/      # Training backends (torch) — factored from train_gpt.py\n├── api/           # Lightweight REST API server (FastAPI)\n├── tui/           # Interactive experiment design browser (Textual)\n└── cli/           # CLI entry points\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feren23%2Fcrucible","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feren23%2Fcrucible","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feren23%2Fcrucible/lists"}