https://github.com/git-disl/Virus
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
https://github.com/git-disl/Virus
attack defense fine-tuning guardrail harmful llms moderation safety
Last synced: 7 months ago
JSON representation
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
- Host: GitHub
- URL: https://github.com/git-disl/Virus
- Owner: git-disl
- License: apache-2.0
- Created: 2025-01-08T16:02:12.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-30T07:04:15.000Z (over 1 year ago)
- Last Synced: 2025-01-30T07:30:27.289Z (over 1 year ago)
- Topics: attack, defense, fine-tuning, guardrail, harmful, llms, moderation, safety
- Language: Python
- Homepage: https://arxiv.org/pdf/2501.17433
- Size: 90.3 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0