{"id":33926806,"url":"https://github.com/ml-rust/oxidizr","last_synced_at":"2026-01-11T06:01:51.910Z","repository":{"id":327631830,"uuid":"1110211060","full_name":"ml-rust/oxidizr","owner":"ml-rust","description":"A Rust-based LLM training framework built on Candle","archived":false,"fork":false,"pushed_at":"2025-12-05T09:03:50.000Z","size":158,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-13T18:20:13.579Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ml-rust.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-04T21:42:27.000Z","updated_at":"2025-12-09T23:24:55.000Z","dependencies_parsed_at":"2025-12-08T03:02:41.477Z","dependency_job_id":null,"html_url":"https://github.com/ml-rust/oxidizr","commit_stats":null,"previous_names":["farhan-syah/oxidizr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ml-rust/oxidizr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-rust%2Foxidizr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-rust%2Foxidizr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-rust%2Foxidizr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-rust%2Foxidizr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ml-rust","download_url":"https://codeload.github.com/ml-rust/oxidizr/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-rust%2Foxidizr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28293188,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-11T04:44:51.577Z","status":"ssl_error","status_checked_at":"2026-01-11T04:44:44.232Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-12T10:31:55.212Z","updated_at":"2026-01-11T06:01:51.892Z","avatar_url":"https://github.com/ml-rust.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Oxidizr\n\nA Rust-based LLM training framework built on [Candle](https://github.com/huggingface/candle). Oxidizr is a flexible trainer - bring your own config and dataset, and start training.\n\n[Full Documentation](docs/README.md) | [Architecture Guide](docs/architecture/overview.md) | [CLI Reference](docs/cli.md)\n\n## Installation\n\n**Recommended: Install from Git** (supports CUDA 12.x and 13.x):\n\n```bash\n# CPU-only\ncargo install --git https://github.com/farhan-syah/oxidizr\n\n# With CUDA support\ncargo install --git https://github.com/farhan-syah/oxidizr --features cuda\n\n# With CUDA + HuggingFace model publishing\ncargo install --git https://github.com/farhan-syah/oxidizr --features cuda,huggingface\n```\n\n**From crates.io** (CUDA 12.x only):\n\n```bash\ncargo install oxidizr\ncargo install oxidizr --features cuda\ncargo install oxidizr --features cuda,huggingface\n```\n\n\u003e **Note:** The crates.io version only supports CUDA 12.x. For CUDA 13.x support, install from Git.\n\n## Quick Start\n\n```bash\n# Clone and build\ngit clone https://github.com/farhan-syah/oxidizr\ncd oxidizr\ncargo build --release\n\n# Download nano-start dataset\npip install huggingface_hub\nhf download fs90/nano-start-data-bin --local-dir data/nano-start/tokenized --repo-type dataset\n\n# Train!\ncargo run --release -- train -f models/nano-start.yaml -d data/nano-start/tokenized/combined.bin\n```\n\nThat's it! Training works on CPU out of the box - no GPU required.\n\n**For faster training with GPU:**\n\n```bash\ncargo build --release --features cuda\ncargo run --release --features cuda -- train -f models/nano-start.yaml -d data/nano-start/tokenized/combined.bin\n```\n\nGPU training is significantly faster but completely optional. CPU training is fully functional, just slower.\n\n**After training, package and share your model:**\n\n```bash\n# Package the trained model\ncargo run --release -- pack\n\n# Push to HuggingFace (requires --features huggingface)\ncargo run --release --features huggingface -- push\n```\n\n## What is Oxidizr?\n\nOxidizr is a **production-grade LLM trainer** written in Rust. You provide:\n\n1. **A config file** - Model architecture, hyperparameters, training settings\n2. **A dataset** - Tokenized data (binary format or in-memory)\n\nOxidizr handles the training loop, optimization, checkpointing, and logging.\n\n### Why Oxidizr?\n\n- **Production-grade** - Train real models, not just toys. Modern architectures (Mamba, MLA, MoE) ready to use.\n- **No GPU required** - Full training support on CPU. Great for learning, prototyping, or when GPU isn't available.\n- **Researchers \u0026 Students** - Transparent codebase, easy to understand and modify. Perfect for experiments.\n- **Fast iteration** - Rust performance without Python overhead. Quick feedback loops.\n- **Portable** - Single binary, no complex dependencies. Works anywhere Rust compiles.\n\n### Sample Configs Included\n\nWe include several example configs in `models/` to help you get started:\n\n- `nano-start.yaml` - Educational config for beginners (cl100k_base vocab)\n- `nano.yaml` - Hybrid Mamba2 + MLA + MoE (~60M params)\n- `nano_mamba3.yaml` - Pure Mamba3 architecture\n- `nano_mamba3_hybrid.yaml` - Hybrid Mamba3 + MLA + MoE\n\nThese are **educational examples** showing you how to configure oxidizr. Feel free to create your own configs for your specific use case.\n\n## Bring Your Own Config and Data\n\n### Creating Your Config\n\nCreate a YAML file with your model architecture and training settings:\n\n```yaml\n# my_model.yaml\nmodel:\n  hidden_size: 512\n  num_layers: 8\n  num_heads: 8\n  kv_heads: 4\n  vocab_size: 128354 # Llama 3 + splintr agent tokens\n  max_seq_len: 512\n  rope_theta: 10000.0\n  intermediate_size: 2048\n\ntrainer:\n  learning_rate: 0.0003\n  batch_size: 2\n  max_steps: 5000\n  num_epochs: 2\n  gradient_accumulation: 1\n  checkpoint_dir: \"./checkpoints\"\n  log_interval: 10\n  save_interval: 500\n```\n\nRun it:\n\n```bash\ncargo run --release --features cuda -- train -f my_model.yaml\n```\n\n### Preparing Your Data\n\nOxidizr accepts tokenized data in binary format (u32 tokens):\n\n**Option 1: Use the educational dataset**\n\nThe `data/nano-start/` directory contains a curated educational dataset designed to help you understand LLM training fundamentals. See the `data/` directory for details.\n\n**Option 2: Bring your own tokenized data**\n\nCreate a binary file containing raw u32 tokens:\n\n```python\n# Using splintr tokenizer (recommended)\nfrom splintr import Tokenizer\n\ntokenizer = Tokenizer(\"llama3\")\ntokens = tokenizer.encode(\"Your training text here...\")\n\n# Save as binary u32 array\nimport numpy as np\nnp.array(tokens, dtype=np.uint32).tofile(\"data/my_dataset.bin\")\n```\n\nThen point your config to the data file, or load it programmatically in your training script.\n\n**Option 3: Generate dummy data for testing**\n\nFor quick testing, oxidizr can generate random tokens:\n\n```rust\nuse oxidizr::data::{LitDataLoader, create_dummy_data};\n\nlet tokens = create_dummy_data(128354, 100_000);  // vocab_size, num_tokens\nlet data_loader = LitDataLoader::new(tokens, batch_size, seq_len, device);\n```\n\n## Supported Architectures\n\nOxidizr supports multiple architectures:\n\n### Base Architectures\n\n- **GPT/Llama-style Transformer** - RoPE, RMSNorm, Grouped Query Attention (GQA), SwiGLU\n- **Mamba1** - State Space Model with selective mechanism for efficient long-range context\n- **Mamba2** - State Space Duality (SSD) algorithm, faster than Mamba1\n- **Mamba3** - Latest Mamba with trapezoidal discretization, complex-valued RoPE, and MIMO\n\n### Advanced Components\n\n- **MLA (Multi-Head Latent Attention)** - Compressed KV cache for memory efficiency\n- **MoE (Mixture of Experts)** - Fine-grained expert routing with load balancing\n\n### Hybrid Architectures\n\nYou can mix and match components. For example, the `nano_mamba2.yaml` config uses:\n\n- 6 Mamba2 layers for efficient sequential processing\n- 2 MLA + MoE layers for cross-sequence attention\n\nConfigure hybrid models by specifying which layers use which architecture in your YAML.\n\n## CLI Reference\n\nOxidizr uses a subcommand-based CLI:\n\n```bash\noxidizr \u003cSUBCOMMAND\u003e [OPTIONS]\n```\n\n### Subcommands\n\n- `train` - Train a model (default if -f flag is used)\n- `pack` - Package a trained model for distribution\n- `push` - Push a packaged model to HuggingFace Hub (requires `--features huggingface`)\n\n### Training\n\n```bash\noxidizr train -f \u003cconfig.yaml\u003e [OPTIONS]\n\n# Or use the legacy shortcut (backwards compatible):\noxidizr -f \u003cconfig.yaml\u003e [OPTIONS]\n\nOptions:\n  -f, --config \u003cFILE\u003e           Path to YAML configuration file (required)\n  -d, --data \u003cFILE\u003e             Path to tokenized data file (.bin)\n  --target-device \u003cgpu|cpu\u003e     Override target device (default: gpu if available)\n  --seq-len \u003cN\u003e                 Override sequence length from config\n  --batch-size \u003cN\u003e              Override batch size from config\n  --grad-accum \u003cN\u003e              Override gradient accumulation from config\n  --max-steps \u003cN\u003e               Override max training steps from config\n  --gpus \u003cIDS\u003e                  Comma-separated GPU IDs for multi-GPU (e.g., 0,1,2,3)\n  --sync-backend \u003ccpu|nccl\u003e     Gradient sync backend for multi-GPU (default: cpu)\n  --prefetch \u003cN\u003e                Prefetch N batches in background (default: 0)\n  --resume \u003cPATH|auto\u003e          Resume from checkpoint (.safetensors) or \"auto\" for latest\n  --headless                    Output JSON metrics only (for non-interactive terminals)\n  --dtype \u003cf32|f16|bf16\u003e        Model precision (default: f32)\n  --max-checkpoints \u003cN\u003e         Maximum checkpoints to keep (default: 10)\n  -h, --help                    Print help information\n```\n\n### Examples\n\n```bash\n# Basic training with default settings\ncargo run --release --features cuda -- train -f models/nano.yaml\n\n# Legacy syntax (backwards compatible)\ncargo run --release --features cuda -- -f models/nano.yaml\n\n# Force CPU execution\ncargo run --release -- train -f models/nano.yaml --target-device cpu\n\n# Override batch size and sequence length\ncargo run --release --features cuda -- train -f models/nano.yaml --batch-size 4 --seq-len 256\n\n# Multi-GPU training (2 GPUs)\ncargo run --release --features cuda -- train -f models/nano.yaml --gpus 0,1 --sync-backend cpu\n\n# Custom config file\ncargo run --release --features cuda -- train -f experiments/my_config.yaml\n```\n\n### Output Modes\n\n**Interactive mode (default):**\n\n- Shows a TUI progress bar with live loss, speed, and ETA\n- Best for interactive terminal sessions\n\n**Headless mode (`--headless`):**\n\n- Outputs JSON metrics to stdout\n- Use when: TUI doesn't render (CI/CD, piped output, non-interactive shells)\n- Use when: You want to parse training metrics programmatically\n\n```bash\n# If progress bar doesn't appear, use headless mode\ncargo run --release -- train -f models/nano.yaml --headless\n```\n\n### CPU vs GPU Training\n\n```bash\n# CPU training (no CUDA required)\ncargo build --release\ncargo run --release -- train -f models/nano.yaml --target-device cpu\n\n# GPU training (faster, requires CUDA)\ncargo build --release --features cuda\ncargo run --release --features cuda -- train -f models/nano.yaml --target-device gpu\n```\n\nCPU training is fully functional - just slower. Great for:\n\n- Learning and experimentation\n- Systems without GPU\n- Debugging and development\n\n## Model Distribution\n\nAfter training, you can package and share your models:\n\n### Packaging Models\n\n```bash\n# Interactive mode - select checkpoint from TUI\noxidizr pack\n\n# Non-interactive - specify checkpoint\noxidizr pack --checkpoint latest\noxidizr pack --checkpoint final\noxidizr pack --checkpoint 10000  # specific step\n\n# Custom options\noxidizr pack \\\n  --checkpoint-dir ./checkpoints \\\n  --checkpoint final \\\n  --name my-model \\\n  --username my-hf-username\n```\n\nThis creates a packaged model in `hf/\u003cusername\u003e/\u003cmodel\u003e/` with:\n\n- `model.safetensors` - Model weights\n- `config.json` - Inference configuration\n- `README.md` - Auto-generated model card\n\n### Publishing to HuggingFace\n\nRequires the `huggingface` feature flag and `huggingface-cli`:\n\n```bash\n# Build with HuggingFace support\ncargo build --release --features huggingface\n\n# Install HuggingFace CLI\npip install huggingface_hub\n\n# Interactive mode - select model from list\noxidizr push\n\n# Non-interactive - specify model\noxidizr push --model hf/username/model-name\n\n# Create private repository\noxidizr push --model hf/username/model-name --private\n```\n\n**Configuration**: Create a `.env` file (see `.env.example`):\n\n```env\nHF_USERNAME=your-username\nHF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n```\n\nGet your token from [HuggingFace settings](https://huggingface.co/settings/tokens).\n\nSee [hf/README.md](hf/README.md) for detailed documentation.\n\n## Multi-GPU Training\n\nOxidizr supports data-parallel training across multiple GPUs:\n\n```bash\n# Train on GPUs 0, 1, 2, 3 with CPU backend\ncargo run --release --features cuda -- train -f models/nano.yaml --gpus 0,1,2,3 --sync-backend cpu\n\n# Train with NCCL backend (faster for 4+ GPUs, requires nccl feature)\ncargo run --release --features cuda,nccl -- train -f models/nano.yaml --gpus 0,1 --sync-backend nccl\n```\n\n**How it works:**\n\n1. Dataset is sharded across GPUs\n2. Each GPU runs forward/backward pass on its shard\n3. Gradients are synchronized (all-reduced) across all GPUs\n4. Averaged gradients are applied by the optimizer\n\n**Effective batch size** = `batch_size × gradient_accumulation × num_gpus`\n\n## Configuration Guide\n\n### Basic Config Structure\n\n```yaml\nmodel:\n  # Architecture parameters\n  hidden_size: 512\n  num_layers: 8\n  num_heads: 8\n  kv_heads: 4 # For GQA (fewer KV heads than Q heads)\n  vocab_size: 128354 # Llama 3 + splintr agent tokens\n  max_seq_len: 512\n  rope_theta: 10000.0\n  intermediate_size: 2048\n\ntrainer:\n  # Training hyperparameters\n  learning_rate: 0.0003\n  batch_size: 2\n  max_steps: 5000\n  num_epochs: 2\n  gradient_accumulation: 1\n  checkpoint_dir: \"./checkpoints\"\n  log_interval: 10\n  save_interval: 500\n  load_balance_alpha: 0.0 # MoE load balancing (0.0 = disabled)\n```\n\n### Enabling Mamba2\n\nAdd these fields to use Mamba2 instead of standard attention:\n\n```yaml\nmodel:\n  # ... other fields ...\n\n  mamba2_num_heads: 48\n  mamba2_head_dim: 16\n  mamba2_state_size: 64\n  mamba2_chunk_size: 64\n  mamba2_n_groups: 1\n  mamba2_conv_kernel: 4\n  mamba2_expand: 2\n\n  # CONSTRAINT: hidden_size * mamba2_expand == mamba2_num_heads * mamba2_head_dim\n  # Example: 384 * 2 = 768 == 48 * 16 ✓\n```\n\n### Enabling Mamba3\n\nMamba3 extends Mamba2 with three innovations:\n\n```yaml\nmodel:\n  # ... Mamba2 base params (same as above) ...\n\n  # Mamba3 features\n  mamba3_enabled: true\n  mamba3_complex_rope: true # Complex-valued RoPE for state tracking\n  mamba3_mimo_rank: 0 # 0 = SISO, 4 = MIMO mode\n  mamba3_use_conv: false # false = trapezoidal discretization\n```\n\n### Enabling MLA (Multi-Head Latent Attention)\n\nFor compressed KV cache and memory efficiency:\n\n```yaml\nmodel:\n  # ... other fields ...\n\n  kv_latent_dim: 192 # Compressed KV dimension (instead of hidden_size)\n  q_latent_dim: 192 # Compressed query dimension\n  d_rope: 16 # RoPE dimension\n```\n\n### Enabling MoE (Mixture of Experts)\n\n```yaml\nmodel:\n  # ... other fields ...\n\n  num_experts: 4 # Total number of experts\n  experts_per_tok: 2 # Top-K routing (use 2 to prevent expert collapse)\n  shared_expert_enabled: true\n  intermediate_size: 1536\n\ntrainer:\n  load_balance_alpha: 0.01 # MoE load balancing loss weight (required \u003e 0 for MoE)\n```\n\n### Hybrid Architectures\n\nSpecify which layers use Mamba vs Attention:\n\n```yaml\nmodel:\n  # ... other fields ...\n\n  mamba_layers: [0, 1, 2, 4, 5, 6] # These layers use Mamba\n  # Other layers use MLA + MoE\n```\n\n## Project Structure\n\n```\noxidizr/\n├── src/\n│   ├── main.rs          # CLI entry point\n│   ├── config.rs        # Configuration loading\n│   ├── model.rs         # Transformer model\n│   ├── mamba.rs         # Mamba1 implementation\n│   ├── mamba2.rs        # Mamba2 with SSD\n│   ├── mamba3.rs        # Mamba3 with trapezoidal/RoPE/MIMO\n│   ├── data.rs          # Data loader\n│   └── trainer.rs       # Training loop\n├── models/\n│   ├── nano-start.yaml  # Educational config\n│   ├── nano.yaml        # Hybrid Mamba2 + MLA + MoE\n│   ├── nano_mamba3.yaml # Pure Mamba3\n│   └── ...              # More examples\n├── data/\n│   └── nano-start/      # Educational dataset for learning\n└── Cargo.toml\n```\n\n## System Requirements\n\n### Hardware\n\n- **CPU**: Full training support on CPU - no GPU required. Training will be slower but works completely.\n- **GPU**: Recommended for faster training. CUDA 12.x supported (requires `--features cuda`)\n- **Memory**: Depends on your model size and batch size\n\n### Software\n\n- Rust 1.70+ ([install via rustup](https://rustup.rs/))\n- CUDA Toolkit 12.x (optional, for GPU acceleration)\n\n## Nano Educational Project\n\nThe included `nano` configs are part of an educational initiative to help users learn LLM training fundamentals. The philosophy:\n\n- **Zero magic** - Full visibility into the training process\n- **Clear examples** - Well-documented configs showing best practices\n- **Starting point** - Use as a template for your own experiments\n\nThe `data/nano-start/` directory contains a curated dataset designed for learning. It's small enough to train quickly while demonstrating key concepts.\n\n**This is guidance, not a requirement.** Oxidizr is a general-purpose trainer. The nano examples exist to help you get started - you're free to create any architecture and use any dataset you want.\n\n## Tips and Best Practices\n\n### Memory Management\n\n- Start with small batch size and sequence length, then scale up\n- Use gradient accumulation to simulate larger batches without OOM\n- Monitor VRAM usage (oxidizr estimates memory requirements before training)\n\n### Effective Batch Size\n\n```\neffective_batch = batch_size × gradient_accumulation × num_gpus\n```\n\nExample: `batch_size=2`, `gradient_accumulation=4`, `num_gpus=2` → effective batch of 16\n\n### Data Prefetching\n\nEnable async data loading to overlap CPU I/O with GPU compute:\n\n```bash\ncargo run --release --features cuda -- train -f models/nano.yaml --prefetch 2\n```\n\n## Development\n\n```bash\n# Run tests\ncargo test\n\n# Build documentation\ncargo doc --open\n\n# Lint\ncargo clippy\n\n# Format\ncargo fmt\n```\n\n## License\n\nMIT License - See LICENSE file for details\n\n## Acknowledgments\n\n- Built on [Candle](https://github.com/huggingface/candle) by HuggingFace\n- Architecture inspired by Llama 2/3, Mamba, and DeepSeek\n- Designed for transparency and ease of use\n\n---\n\n**Status**: Beta | **Version**: 0.1.0 | **Last Updated**: 2025-12-05\n\n## Citation\n\nIf you use Splintr in your research, please cite:\n\n```bibtex\n@software{splintr,\n  author = {Farhan Syah},\n  title = {Oxidzr: A Rust-based LLM training framework},\n  year = {2025},\n  url = {https://github.com/farhan-syah/oxidizr}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-rust%2Foxidizr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fml-rust%2Foxidizr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-rust%2Foxidizr/lists"}