{"id":35458512,"url":"https://github.com/invergent-ai/surogate","last_synced_at":"2026-05-10T13:13:22.405Z","repository":{"id":332050191,"uuid":"1126557956","full_name":"invergent-ai/surogate","owner":"invergent-ai","description":"Training/Fine-tuning at the speed of light","archived":false,"fork":false,"pushed_at":"2026-04-25T05:19:40.000Z","size":71828,"stargazers_count":172,"open_issues_count":6,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-04-25T06:26:33.899Z","etag":null,"topics":["cuda","deep-learning","fine-tuning","generative-ai","llama","llm","llms","nvidia-gpu","qwen","sft"],"latest_commit_sha":null,"homepage":"https://surogate.ai","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/invergent-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-02T06:32:04.000Z","updated_at":"2026-04-25T04:10:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"ad8fe178-f6e1-4a62-ad71-3fe5f04f564f","html_url":"https://github.com/invergent-ai/surogate","commit_stats":null,"previous_names":["invergent-ai/surogate"],"tags_count":32,"template":false,"template_full_name":null,"purl":"pkg:github/invergent-ai/surogate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/invergent-ai%2Fsurogate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/invergent-ai%2Fsurogate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/invergent-ai%2Fsurogate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/invergent-ai%2Fsurogate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/invergent-ai","download_url":"https://codeload.github.com/invergent-ai/surogate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/invergent-ai%2Fsurogate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32377599,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T09:24:15.638Z","status":"ssl_error","status_checked_at":"2026-04-28T09:24:15.071Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","deep-learning","fine-tuning","generative-ai","llama","llm","llms","nvidia-gpu","qwen","sft"],"created_at":"2026-01-03T07:20:42.805Z","updated_at":"2026-05-10T13:13:22.398Z","avatar_url":"https://github.com/invergent-ai.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\" style=\"padding: 2rem\"\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://surogate.ai/#gh-dark-mode-only\"\u003e\n    \u003cimg\n      alt=\"Surogate\"\n      width=\"40%\"\n      src=\"https://github.com/invergent-ai/surogate/raw/main/assets/logo-white.svg#gh-dark-mode-only\"\n    /\u003e\n  \u003c/a\u003e\n\n  \u003ca href=\"https://surogate.ai/#gh-light-mode-only\"\u003e\n    \u003cimg\n      alt=\"Surogate\"\n      width=\"40%\"\n      src=\"https://github.com/invergent-ai/surogate/raw/main/assets/logo-black.svg#gh-light-mode-only\"\n    /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch3\u003e⚡ FP8/FP4 Training, Fine-tuning and RL at the speed of light\u003c/h3\u003e\n\u003ch4\u003e\u003c/h4\u003e\n\u003cdiv\u003e\n\u003ca href=\"https://surogate.ai\"\u003eHome\u003c/a\u003e ·\n\u003ca href=\"https://docs.surogate.ai\"\u003eDocs\u003c/a\u003e ·\n\u003ca href=\"https://github.com/invergent-ai/surogate/tree/master/examples\"\u003eExamples\u003c/a\u003e ·\n\u003ca href=\"https://docs.surogate.ai/reference/benchmarks\"\u003eBenchmarks\u003c/a\u003e ·\n\u003ca href=\"https://github.com/invergent-ai/surogates\"\u003eManaged Agents\u003c/a\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\n[![GitHub stars](https://img.shields.io/github/stars/invergent-ai/surogate?style=social)](https://github.com/invergent-ai/surogate)\n[![GitHub issues](https://img.shields.io/github/issues/invergent-ai/surogate)](https://github.com/invergent-ai/surogate/issues)\n[![GitHub pull requests](https://img.shields.io/github/issues-pr/invergent-ai/surogate)](https://github.com/invergent-ai/surogate/pulls)\n[![Twitter Follow](https://img.shields.io/twitter/follow/surogate_ai?style=social)](https://x.com/surogate_ai)\n\n\u003c/div\u003e\n\n# Surogate Trainer\nSurogate Trainer is built for developers and enterprises that need fast experimentation — whether running on-premise or in the cloud.\n\n**⚡ The Surogate trainer surpasses all existing training frameworks in performance for single-GPU, multi-GPU and GPU+CPU by a large margin.**\n\n**✨ The native CPU offloading feature achieves superior performance and VRAM usage compared to QLoRA. You can fine-tune models at native bf16 precision, rendering QLoRA obsolete.**\n\n### Highlights\n\n- **🔧 Pre-training + Fine-tuning**: full fine-tuning, LoRA\n- [**🔧 BF16, FP8 and NVFP4 Reinforcement Learning**](https://docs.surogate.ai/guides/rl-training): advanced GRPO training and evaluation with custom, deterministic environments\n- [**🔧 RL Environments***](https://docs.surogate.ai/guides/rl-environments): predictable environments for RL training\n- [**🖥️...🖥️ Native multi-GPU**](https://docs.surogate.ai/guides/multi-gpu) training with multi-threaded backend\n- [**🖥️...🖥️ Native multi-Node**](https://docs.surogate.ai/guides/multi-node) DDP training with Ray\n- **⚡ Native C++/CUDA engine** for near–Speed-Of-Light (SOL) throughput\n- [**🔥 Python DSL**](https://docs.surogate.ai/about/dsl) with AOT auto-differentiation for adding new model architectures\n- [**⚖️ Smart CPU Offloading**](https://docs.surogate.ai/guides/offloading) for weights, gradients, activations, quants\n- **📜 Pre-built training recipes**:\n  - [**💎 BF16**](https://docs.surogate.ai/guides/precision-and-recipes#bf16): Baseline recipe using `bfloat16` for all GEMMs, designed for maximum numerical accuracy. No quantization is applied.\n  - [**🔥 FP8**](https://docs.surogate.ai/guides/precision-and-recipes#fp8-hybrid): Native `FP8` training delivering extreme performance with `E4M3` used for activations and weights and `E5M2` for gradients. Uses per-tensor delayed scaling to provide stable training.\n  - [**🔥 NVFP4**](https://docs.surogate.ai/guides/precision-and-recipes#fp4-nvfp4): Native CUTLASS `FP4 E2M1` training with two-level block scaling for extreme performance and memory efficiency on Blackwell GPUs (**SM100+**: B200, B300, RTX 50xx series). Uses stochastic rounding and random Hadamard Transforms for numerical stability. **Supports NVIDIA B200, B300, RTX 5070, 5080, 5090 !!**\n- [**⚡ BnB/FP8/NVFP4 QLoRA**](https://docs.surogate.ai/guides/qlora) Support for a variety of QLoRA configurations, including online quantization (FP8, NVFP4, BnB) or loading pre-quantized weights (FP8, NVFP4)\n- [**👌 Optimizers**](https://docs.surogate.ai/guides/optimizers): AdamW 8bit, !! NorMuon !!\n- **🖥️ Runs on all NVIDIA GPUs**: sm80, sm86, sm89, sm90, sm100, sm103, sm120, sm121\n- [**🧪 Mixed-precision training**](https://docs.surogate.ai/guides/precision-and-recipes#mixed-precision-training): Mix different dtypes for GEMMs, model, gradients and LoRA recipes to create your own flavor.\n- **🛡️ Designed for reliability**: deterministic configs, explicit recipes, and a clear C++ core\n- [**🧬 Adaptive Training**](https://docs.surogate.ai/about/adaptive-training): built-in automated training monitoring with automatic phase detection, multi-criteria early stopping (convergence, compute-efficiency, divergence, plateau), auto LR management, MoE imbalance detection, Chinchilla token budgeting and dynamic epoch adjustment\n- [**🎨 Dedicated MoE Features**](https://docs.surogate.ai/guides/moe): Expert Parallelism, Least-Loaded EP load-balancing, MoE training metrics, Imbalance detection\n- **🥞 Stacked LoRA training**: Train a LoRA adapter on top of another LoRA adapter to skip offline merging into base model.\n\n---\n\n## 🧠 Supported Models:\nWe support the following models. Please create a PR if you need a specific model\n\n| Model              | Architecture                                            | Model Sizes                   |\n| ------------------ | ------------------------------------------------------- | ----------------------------- |\n| Qwen3              | Qwen3ForCausalLM                                        | 0.6B, 1.7B, 4B, 8B, 14B, 35B  |\n| Qwen3VL            | Qwen3VLForConditionalGeneration                         | 2B, 4B, 8B, 32B               |\n| Qwen3 MoE          | Qwen3MoeForCausalLM                                     | 30B-A3B, 235B-A22B            |\n| Qwen3.5            | Qwen3_5ForCausalLM, Qwen3_5ForConditionalGeneration     | 0.8B, 2B 4B, 9B, 27B          |\n| Qwen3.5 Moe        | Qwen3MoeForCausalLM, Qwen3_5MoeForConditionalGeneration | 35B-A3B, 122B-A10B, 397B-A17B |\n| Nemotron Nano v3   | NemotronHForCausalLM                                    | 30B-A3B                       |\n| Nemotron Super v3  | NemotronHForCausalLM                                    | 120B-A12B                     |\n| Nemotron Cascade 2 | NemotronHForCausalLM                                    | 30B-A3B                       |\n| GPT-OSS            | GptOssForCausalLM                                       | 20B, 120B                     |\n| Llama 3.1          | LlamaForCausalLM                                        | 8B, 70B, 405B                     |\n| Llama 3.2          | LlamaForCausalLM                                        | 1B, 3B                      |\n\n\n## 🚀 Quickstart\nYou can interact with the Surogate High-Performance Training Engine at the framework level via the CLI.\n\n### Run the Surogate Training Engine:\n\n#### Option A: Run using Docker (recommended)\nSurogate provides 3 docker images for various CUDA versions. Currently only the `x86-64` architecture is supported.\n\n| CUDA   | Image                                        | Recommended NVIDIA Driver | Minimum NVIDIA Driver |\n| ------ | -------------------------------------------- | ------------------------- | --------------------- |\n| 12.8.1 | `ghcr.io/invergent-ai/surogate:latest-cu128` | `\u003e= 570.124.06`           | `\u003e= 525`              |\n| 12.9.1 | `ghcr.io/invergent-ai/surogate:latest-cu129` | `\u003e= 575.57.08`            | `\u003e= 525`              |\n| 13.1   | `ghcr.io/invergent-ai/surogate:latest-cu130` | `\u003e= 590.48.01`            | `\u003e= 580`              |\n\n```bash\ndocker run --gpus=all -v /my/local/config.yaml:/home/surogate/config.yaml -v /my/local/output_dir:\u003cOUTPUT_DIR_FROM_CONFIG_YAML\u003e \u003cIMAGE\u003e sft config.yaml\n```\n\n#### Option B: Install via script\n```bash\ncurl -LsSf https://surogate.ai/install.sh | sh\n```\n\n#### Option C: Build from source (dev / contributors)\nYou need CUDA 12.8/12.9/13.x installed on your machine and NCCL development libraries libnccl-dev for your CUDA version\n\n```bash\n# ...clone repo...\nuv pip install -e .\n```\n\n---\n\n## Quickstart (SFT)\n\n1) Create a config (example):\n\n```yaml\nmodel: Qwen/Qwen3-0.6B\noutput_dir: ./output\n\n# training\nper_device_train_batch_size: 2\ngradient_accumulation_steps: 4\nsequence_len: 2048\nlearning_rate: 2e-4\n\n# LoRA / QLoRA\nlora: true\nlora_rank: 16\n# qlora_fp8: true  # optional, hardware-dependent\n# qlora_fp4: true  # Blackwell+\n# qlora_bnb: true  # Any GPU, lowest\n\ndatasets:\n  - path: \"mlabonne/FineTome-100k\"\n    type: auto\n```\n\n2) Run:\n```bash\nsurogate sft config.yaml\n```\n\n3) Outputs:\n- checkpoints, logs and artifacts are written under `output_dir`\n\n---\n\n## Hardware / Requirements\n\n- NVIDIA GPU + recent driver\n- CUDA **12.8, 12.9, 13**, NCCL, cuDNN\n- Linux x86_64\n\n### Supported NVIDIA GPUs:\n- `SM80`: A100, A30\n- `SM86`: A2, A16, A10, A40, RTX3050, RTX3060, RTX 3070, RTX 3080, RTX 3090, A2000, A3000, A4000, A5000, A6000\n- `SM89`: L4, L40, L40S, RTX 4050, RTX 4060, RTX 4070, RTX 4080, RTX 4090, RTX 2000 Ada, RTX 4000 SFF Ada, RTX 4000 Ada, RTX 4500 Ada, RTX 5000 Ada, RTX 6000 Ada\n- `SM90`: H100, H200, GH200\n- `SM100`: B200, GB200\n- `SM103`: B300, GB300\n- `SM120`: RTX PRO 6000/5000/4000/2500/2000 Blackwell,  RTX 5050,  RTX 5060,  RTX 5070,  RTX 5080,  RTX 5090\n- `SM121`: DGX Spark\n\n---\n\n## Documentation / Examples\n\n- Docs: https://docs.surogate.ai\n- Examples: https://github.com/invergent-ai/surogate/tree/master/examples\n\n---\n\n## Contributing\n\nWe welcome contributions across the entire ecosystem! If you are submitting a PR to the core framework, please ensure you include a clear description, steps to test locally, and relevant examples.\n\nIf you’re adding kernels/recipes or touching build/tooling, please keep changes minimal and include:\n- a short description of the change,\n- how to reproduce/validate locally (`make test` where applicable),\n- and any GPU/arch assumptions.\n\n---\n\n## License\n\nApache 2.0 — see [LICENSE](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finvergent-ai%2Fsurogate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finvergent-ai%2Fsurogate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finvergent-ai%2Fsurogate/lists"}