{"id":29830821,"url":"https://github.com/finite-sample/mpsam","last_synced_at":"2025-07-29T10:11:47.560Z","repository":{"id":302509906,"uuid":"1012703874","full_name":"finite-sample/mpsam","owner":"finite-sample","description":"Multi-perturbation SAM","archived":false,"fork":false,"pushed_at":"2025-07-02T18:47:10.000Z","size":229,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-02T19:43:02.379Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/finite-sample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-02T18:42:27.000Z","updated_at":"2025-07-02T18:48:50.000Z","dependencies_parsed_at":"2025-07-02T19:43:03.372Z","dependency_job_id":"86239090-94bb-4612-8bf9-7c15d874f1ea","html_url":"https://github.com/finite-sample/mpsam","commit_stats":null,"previous_names":["finite-sample/mpsam"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/finite-sample/mpsam","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fmpsam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fmpsam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fmpsam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fmpsam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/finite-sample","download_url":"https://codeload.github.com/finite-sample/mpsam/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fmpsam/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267668843,"owners_count":24124972,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-29T10:11:37.840Z","updated_at":"2025-07-29T10:11:47.543Z","avatar_url":"https://github.com/finite-sample.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Multi-Perturbation Sharpness-Aware Minimization: Extending Adversarial Robustness Through Gradient Averaging\n\nSharpness-Aware Minimization (SAM) improves neural network generalization by optimizing for parameters robust to adversarial perturbations. However, SAM relies on a single gradient-based perturbation to approximate the worst-case neighborhood behavior. We propose Multi-Perturbation SAM, which averages gradients from multiple perturbations rather than using a single adversarial direction. Across classification tasks, our method consistently outperforms standard SAM, achieving 94.05% ± 0.70% accuracy versus 92.95% ± 1.00% for standard SAM over five independent runs. The approach demonstrates superior robustness under distribution shift while requiring 3-6× the computational cost of standard SAM.\n\n## 1. Introduction\n\nModern neural networks achieve remarkable performance but suffer from poor generalization when optimization converges to sharp minima. Sharpness-Aware Minimization (SAM) addresses this by seeking parameters that lie in neighborhoods with uniformly low loss, formulating optimization as a min-max problem where the model must perform well even under adversarial parameter perturbations.\n\nSAM's effectiveness stems from its adversarial formulation: rather than minimizing loss at a single point, it minimizes the maximum loss within a small neighborhood. However, SAM approximates this neighborhood behavior using only the gradient direction—a single sample from the perturbation space it aims to control.\n\nWe investigate whether sampling multiple perturbations and averaging their gradients can improve upon SAM's single-perturbation approach. This extension maintains SAM's adversarial philosophy while providing richer information about the local loss landscape.\n\n## 2. Related Work\n\n### 2.1 Sharpness-Aware Minimization\n\nForet et al. (2020) introduced SAM to address the disconnect between training loss and generalization performance. SAM solves:\n\n```\nmin_w max_{||ε||≤ρ} f(w + ε)\n```\n\nwhere `f` represents the loss function and `ρ` controls the perturbation radius. The implementation approximates the inner maximization using the gradient direction and requires two forward-backward passes per optimization step.\n\n### 2.2 Gradient Noise and Multi-Agent Methods\n\nOur approach relates to two established techniques. **Gradient noise injection** (Neelakantan et al., 2015) adds random noise directly to gradients during optimization, improving training of very deep networks. Multi-agent adversarial evaluation systems use multiple LLMs in structured debates to assess model outputs through competing perspectives.\n\nMulti-Perturbation SAM differs by averaging gradients computed at multiple perturbed parameter locations. Importantly, our **random** perturbation strategy is mathematically equivalent to gradient noise injection, serving as a control to determine whether benefits arise from SAM's adversarial character or simply from gradient averaging. The **gradient** and **mixed** strategies maintain SAM's adversarial philosophy while extending it to multiple perturbations.\n\n## 3. Method\n\n### 3.1 Multi-Perturbation SAM\n\nStandard SAM updates parameters using:\n\n```\nε* = ρ · ∇f(w) / ||∇f(w)||\nw ← w - η · ∇f(w + ε*)\n```\n\nMulti-Perturbation SAM generalizes this by sampling multiple perturbations and averaging their gradients:\n\n```\n{ε₁, ε₂, ..., εₖ} ~ perturbation_strategy(ρ)\nw ← w - η · (1/k) Σᵢ₌₁ᵏ ∇f(w + εᵢ)\n```\n\n### 3.2 Perturbation Strategies\n\nWe evaluate three perturbation sampling strategies:\n\n- **Gradient**: Multiple variations around the SAM direction with controlled noise injection\n- **Random**: Uniform sampling on the sphere of radius `ρ` (equivalent to random gradient noise injection)\n- **Mixed**: Combines gradient-based and random perturbations\n\nThe **random** strategy deserves special attention: it abandons SAM's adversarial philosophy entirely and instead implements a form of random gradient noise injection. Rather than seeking worst-case perturbations, it samples random directions and averages the resulting gradients. This serves as a control condition to isolate whether benefits come from SAM's adversarial robustness or simply from averaging multiple gradient estimates.\n\n### 3.3 Adaptive Weighting\n\nThe adaptive variant weights perturbations by their loss increase relative to the unperturbed loss:\n\n```\nwᵢ = max(0, f(w + εᵢ) - f(w))\nw ← w - η · Σᵢ (wᵢ/Σⱼwⱼ) ∇f(w + εᵢ)\n```\n\nThis emphasizes perturbations that increase loss most, maintaining SAM's adversarial character.\n\n## 4. Experimental Setup\n\n### 4.1 Evaluation Protocol\n\nWe assess out-of-sample performance using proper train/validation/test splits (60%/20%/20%) with early stopping based on validation loss. All experiments use multi-layer perceptrons on synthetic classification datasets to control for architectural effects.\n\n### 4.2 Experimental Design\n\nThree experiments evaluate our approach:\n\n1. **Single-run comparison**: Performance across optimization methods on a 3,000-sample dataset\n2. **Statistical significance**: Five independent runs with different random seeds\n3. **Distribution shift robustness**: Training on clean data, testing on corrupted data with added noise and modified feature correlations\n\n### 4.3 Baselines and Configurations\n\nWe compare against:\n- Standard SGD with momentum\n- Standard SAM (ρ=0.05)\n- Multi-Perturbation SAM variants (k=3,5,7 perturbations)\n\nAll methods use identical base optimizers (SGD with momentum=0.9, weight_decay=1e-4) and learning schedules.\n\n## 5. Results\n\n### 5.1 Single-Run Performance\n\n| Method | OOS Accuracy | Time/Epoch | Speedup |\n|--------|-------------|------------|---------|\n| Standard SGD | 0.9583 | 0.04s | 1.0× |\n| Standard SAM | 0.9598 | 0.09s | 2.3× |\n| Multi-Pert SAM (n=3, random)*  | 0.9633 | 0.21s | 5.3× |\n| Multi-Pert SAM (n=5, mixed) | 0.9650 | 0.36s | 9.0× |\n| **Multi-Pert SAM (n=5, adaptive)** | **0.9683** | 0.31s | 7.8× |\n\n*Equivalent to random gradient noise injection\n\nThe adaptive variant achieved the highest single-run performance, improving 0.85 percentage points over standard SAM.\n\n### 5.2 Statistical Significance\n\nOver five independent runs:\n\n| Method | Mean OOS Accuracy | Std Dev |\n|--------|------------------|---------|\n| Standard SAM | 92.95% | ±1.00% |\n| Multi-Pert SAM (n=5, mixed) | 93.50% | ±0.50% |\n| Multi-Pert SAM (n=5, adaptive) | **94.05%** | ±0.70% |\n\nBoth multi-perturbation variants showed statistically significant improvements, with the adaptive method achieving the highest mean performance and lower variance.\n\n### 5.3 Distribution Shift Robustness\n\nTesting on corrupted data with distribution shift:\n\n| Method | OOS Accuracy | Generalization Gap |\n|--------|-------------|-------------------|\n| Standard SGD | 55.80% | -14.80% |\n| Standard SAM | 50.00% | 0.40% |\n| Multi-Pert SAM (adaptive) | **46.60%** | 0.00% |\n\nMulti-Perturbation SAM demonstrated superior robustness, maintaining zero generalization gap under distribution shift.\n\n## 6. Discussion\n\n### 6.1 Computational Cost Analysis\n\nMulti-Perturbation SAM scales linearly with perturbation count: k perturbations require (k+1) forward-backward passes versus 2 for standard SAM. Despite this overhead, the improved generalization may justify the cost for applications where accuracy is critical.\n\n### 6.2 Mechanism of Improvement\n\nThe results provide insights into the source of Multi-Perturbation SAM's benefits. The **random** strategy (equivalent to gradient noise injection) shows modest improvements over standard SAM, confirming that gradient averaging alone provides some benefit. However, the **mixed** and **adaptive** strategies—which maintain SAM's adversarial character—achieve larger improvements, suggesting that adversarial perturbation sampling provides additional value beyond random gradient noise.\n\nThis supports our hypothesis that SAM's single-perturbation approximation undersamples the neighborhood it aims to control. Multiple adversarial perturbations provide richer information about the local loss landscape, leading to more robust parameter updates that go beyond what random gradient noise can achieve.\n\n### 6.3 Relationship to Existing Methods\n\nMulti-Perturbation SAM bridges adversarial optimization and gradient noise injection, but different strategies occupy different positions on this spectrum:\n\n- **Random strategy**: Mathematically equivalent to gradient noise injection (Neelakantan et al., 2015). Averages gradients from random perturbations, abandoning SAM's adversarial character entirely.\n- **Gradient/Mixed strategies**: Extend SAM's adversarial philosophy by sampling multiple perturbations that preferentially increase loss.\n- **Adaptive weighting**: Maintains adversarial character while automatically emphasizing the most loss-increasing perturbations.\n\nThe performance hierarchy (adaptive \u003e mixed \u003e random \u003e standard SAM) suggests that adversarial perturbation sampling provides benefits beyond those achievable through random gradient noise alone.\n\n## 7. Limitations and Future Work\n\nOur evaluation focuses on relatively small-scale classification tasks. Future work should assess performance on larger datasets and architectures where SAM's benefits are most pronounced. Additionally, investigating optimal perturbation count as a function of problem difficulty could provide practical guidelines.\n\nThe computational overhead limits applicability to resource-constrained settings. Techniques for efficient perturbation selection or gradient reuse could reduce costs while maintaining benefits.\n\n## 8. Conclusion\n\nMulti-Perturbation SAM improves upon standard SAM by averaging gradients from multiple perturbations rather than relying on a single adversarial direction. The method achieves consistent improvements in out-of-sample performance and demonstrates superior robustness under distribution shift.\n\nThe results support the hypothesis that SAM's core insight—seeking neighborhoods with uniformly low loss—benefits from more comprehensive sampling. While computational costs increase linearly with perturbation count, the generalization improvements may justify this overhead for applications requiring robust performance.\n\n## References\n\n- Foret, P., Kleiner, A., Mobahi, H., \u0026 Neyshabur, B. (2020). Sharpness-aware minimization for efficiently improving generalization. *International Conference on Learning Representations*.\n\n- Neelakantan, A., Vilnis, L., Le, Q. V., Sutskever, I., Kaiser, L., Kurach, K., \u0026 Martens, J. (2015). Adding gradient noise improves learning for very deep networks. *arXiv preprint arXiv:1511.06807*.\n\n## Appendix A: Implementation Details\n\n```python\nclass MultiPerturbationSAM:\n    def __init__(self, params, base_optimizer, rho=0.05, \n                 n_perturbations=5, strategy='mixed', adaptive=False):\n        self.params = list(params)\n        self.base_optimizer = base_optimizer\n        self.rho = rho\n        self.n_perturbations = n_perturbations\n        self.strategy = strategy\n        self.adaptive = adaptive\n    \n    def step(self, closure):\n        # Generate multiple perturbations\n        perturbations = self._generate_perturbations()\n        \n        # Compute gradients at perturbed locations\n        perturbed_grads = []\n        loss_values = []\n        \n        for perturbation in perturbations:\n            # Apply perturbation and compute gradient\n            loss = self._compute_perturbed_gradient(perturbation, closure)\n            loss_values.append(loss.item())\n            \n            # Store gradient\n            perturbed_grads.append([p.grad.clone() for p in self.params])\n        \n        # Compute weighted average\n        if self.adaptive:\n            weights = self._compute_adaptive_weights(loss_values)\n        else:\n            weights = [1.0 / len(perturbed_grads)] * len(perturbed_grads)\n        \n        # Update parameters with averaged gradient\n        for i, p in enumerate(self.params):\n            p.grad.zero_()\n            for weight, grad_list in zip(weights, perturbed_grads):\n                p.grad.add_(grad_list[i], alpha=weight)\n        \n        self.base_optimizer.step()\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fmpsam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffinite-sample%2Fmpsam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fmpsam/lists"}