https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety
alignment llm llm-finetuning
Last synced: 5 months ago
JSON representation
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
- Host: GitHub
- URL: https://github.com/LLM-Tuning-Safety/LLMs-Finetuning-Safety
- Owner: LLM-Tuning-Safety
- License: mit
- Created: 2023-10-06T16:02:27.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-23T21:19:44.000Z (about 1 year ago)
- Last Synced: 2024-08-12T08:09:31.085Z (8 months ago)
- Topics: alignment, llm, llm-finetuning
- Language: Python
- Homepage: https://llm-tuning-safety.github.io/
- Size: 23.2 MB
- Stars: 211
- Watchers: 4
- Forks: 21
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-MLLM-Safety - Github - Tuning-Safety/LLMs-Finetuning-Safety.svg?style=social&label=Star) (Evaluation)
- awesome-MLSecOps - LLMs-Finetuning-Safety
- Awesome-LLMSecOps - LLMs Finetuning Safety - tuning large language models |  | (PoC)