https://github.com/centerforaisafety/HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
https://github.com/centerforaisafety/HarmBench

Last synced: 12 months ago
JSON representation

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

awesome-ai-security - HarmBench
awesome-production-machine-learning - HarmBench - HarmBench is a fast and scalable framework for evaluating automated red teaming methods and LLM attacks/defenses. (Evaluation and Monitoring)
Awesome-MLLM-Safety - Github
awesome-llm-eval - link
awesome-ai-safety - HarmBench - Standardized evaluation framework for automated red teaming of LLMs (Center for AI Safety). (Red Teaming & Evaluation / Automated Red Teaming)
awesome-ai-benchmarks-evaluation - HarmBench
awesome-ai-safety-alignment - HarmBench - domain dataset for AI harm classification and safety testing. (Datasets)
awesome-llm-security - HarmBench
awesome-ai-agent-evaluation - HarmBench - behavior robustness. (Safety & Robustness)

ecosyste.ms