https://github.com/VILA-Lab/M-Attack
A Simple Baseline Achieving Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1. Paper at: https://arxiv.org/abs/2503.10635
https://github.com/VILA-Lab/M-Attack
adversarial-attack attack lvlms mllms
Last synced: about 1 month ago
JSON representation
A Simple Baseline Achieving Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1. Paper at: https://arxiv.org/abs/2503.10635
- Host: GitHub
- URL: https://github.com/VILA-Lab/M-Attack
- Owner: VILA-Lab
- License: mit
- Created: 2025-03-11T10:50:08.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-04-16T09:31:07.000Z (about 2 months ago)
- Last Synced: 2025-04-16T12:31:09.223Z (about 2 months ago)
- Topics: adversarial-attack, attack, lvlms, mllms
- Language: Python
- Homepage: https://vila-lab.github.io/M-Attack-Website/
- Size: 32.4 MB
- Stars: 52
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - VILA-Lab/M-Attack - Attack是一个针对大型语言模型(LLM)的简单但有效的黑盒对抗攻击基线方法,它能够在GPT-4.5/4o/o1等强大的黑盒模型上实现超过90%的攻击成功率。该项目旨在提供一个易于理解和实现的对抗攻击框架,用于评估和提高LLM的鲁棒性。M-Attack的工作原理基于某种策略,能够生成对抗性输入,诱导LLM产生错误或不期望的输出。该项目提供代码和相关资源,方便研究人员复现结果并进行进一步研究。具体细节和实验结果可以在论文https://arxiv.org/abs/2503.10635中找到。M-Attack的成功表明,即使是最先进的LLM也容易受到精心设计的对抗攻击的影响。 (A01_文本生成_文本对话 / 大语言对话模型及数据)