https://github.com/centerforaisafety/HarmBench
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
https://github.com/centerforaisafety/HarmBench
Last synced: 10 months ago
JSON representation
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Host: GitHub
- URL: https://github.com/centerforaisafety/HarmBench
- Owner: centerforaisafety
- License: mit
- Created: 2024-02-02T21:05:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-05T20:25:38.000Z (almost 2 years ago)
- Last Synced: 2024-08-12T08:13:07.799Z (almost 2 years ago)
- Language: Jupyter Notebook
- Homepage: https://harmbench.org
- Size: 101 MB
- Stars: 239
- Watchers: 5
- Forks: 43
- Open Issues: 19
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-ai-security - HarmBench
- awesome-production-machine-learning - HarmBench - HarmBench is a fast and scalable framework for evaluating automated red teaming methods and LLM attacks/defenses. (Evaluation and Monitoring)
- Awesome-MLLM-Safety - Github
- awesome-llm-eval - link
- awesome-ai-safety - HarmBench - Standardized evaluation framework for automated red teaming of LLMs (Center for AI Safety). (Red Teaming & Evaluation / Automated Red Teaming)
- awesome-ai-benchmarks-evaluation - HarmBench
- awesome-ai-safety-alignment - HarmBench - domain dataset for AI harm classification and safety testing. (Datasets)