{"id":13754116,"url":"https://github.com/thu-coai/SafetyBench","last_synced_at":"2025-05-09T22:30:54.776Z","repository":{"id":194985963,"uuid":"687256459","full_name":"thu-coai/SafetyBench","owner":"thu-coai","description":"Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]","archived":false,"fork":false,"pushed_at":"2024-06-24T04:14:06.000Z","size":1436,"stargazers_count":155,"open_issues_count":1,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-16T06:31:37.029Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thu-coai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-05T01:38:01.000Z","updated_at":"2024-11-12T08:31:25.000Z","dependencies_parsed_at":"2024-08-03T09:17:16.789Z","dependency_job_id":null,"html_url":"https://github.com/thu-coai/SafetyBench","commit_stats":null,"previous_names":["thu-coai/safetybench"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FSafetyBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FSafetyBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FSafetyBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FSafetyBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thu-coai","download_url":"https://codeload.github.com/thu-coai/SafetyBench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335359,"owners_count":21892657,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:40.887Z","updated_at":"2025-05-09T22:30:54.413Z","avatar_url":"https://github.com/thu-coai.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","Anthropomorphic-Taxonomy","Safety Benchmarks \u0026 Datasets","Benchmarks \u0026 Datasets"],"sub_categories":["大语言对话模型及数据","Typical Emotional Quotient (EQ)-Alignment Ability evaluation benchmarks","Harmful Content \u0026 Toxicity","Safety \u0026 Alignment Benchmarks"],"readme":"\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"figs/cover.png\" alt=\"SafetyBench\" width=\"85%\" /\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n   🌐 \u003ca href=\"https://llmbench.ai/safety\" target=\"_blank\"\u003eWebsite\u003c/a\u003e • 🤗 \u003ca href=\"https://huggingface.co/datasets/thu-coai/SafetyBench\" target=\"_blank\"\u003eHugging Face\u003c/a\u003e • ⏬ \u003ca href=\"#data\" target=\"_blank\"\u003eData\u003c/a\u003e •   📃 \u003ca href=\"https://arxiv.org/abs/2309.07045\" target=\"_blank\"\u003ePaper\u003c/a\u003e\n\u003c/p\u003e\n\nSafetyBench is a comprehensive benchmark for evaluating the safety of LLMs, which comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. SafetyBench also incorporates both Chinese and English data, facilitating the evaluation in both languages. Please visit our [website](https://llmbench.ai/safety) or check our [paper](https://arxiv.org/abs/2309.07045) for more details. \n\n![SafetyBench](./figs/overview.png)\n\n## News\n**🎉 `2024/06/24`:** SafetyBench has been accepted by the ACL 2024 Main Conference. See our camera-ready version of the paper at [Arxiv](https://arxiv.org/abs/2309.07045). SafetyBench is also integrated into [SuperBench](https://fm.ai.tsinghua.edu.cn/superbench/#/).\n\n## Table of Contents \u003c!-- omit from toc --\u003e\n- [Leaderboard](#leaderboard)\n- [Data](#data)\n  - [Download](#download)\n  - [Description](#description)\n- [How to Evaluate on SafetyBench](#how-to-evaluate-on-safetybench)\n- [How to Submit](#how-to-submit)\n- [Citation](#citation)\n\n\n## Leaderboard\nThe up-to-date leaderboards are on our [website](https://llmbench.ai/safety). We have three leaderboards for Chinese, English and Chinese subset respectively. We remove questions with highly sensitive keywords and downsample 300 questions for each category to construct the Chinese subset. Summarized evaluation results of some representative LLMs are shown below:\n\n![Result](./figs/res.png)\n\n## Data\n### Download\nWe put our data on the [Hugging Face website](https://huggingface.co/datasets/thu-coai/SafetyBench).\n\nYou can download the test questions and few-shot examples through `wget` directly. Just run the script [`download_data.sh`](./code/download_data.sh)\n\nAlternatively, you can download the test questions and few-shot examples through the `datasets` library. Just run the code [`download_data.py`](./code/download_data.py)\n\n### Description\n`test_zh`, `test_en` and `test_zh_subset` contain test questions for Chinese, English and Chinese subset respectively. `dev_zh` and `dev_en` contain 5 examples for each safety category, which can be used as few-shot demonstrations. \n\nNote that the `options` field in the data includes at most four items, corresponding to the options A, B, C, D in order. For the `answer` field in the dev data, the mapping rule is: 0-\u003eA, 1-\u003eB, 2-\u003eC, 3-\u003eD.\n\n## How to Evaluate on SafetyBench\nIn our paper, we conduct experiments in both zero-shot and five-shot settings. And we extract the predicted answers from models' responses. An example of evaluation code could be found at [code](./code/evaluate_baichuan.py). We don’t include CoT-based evaluation because SafetyBench is less reasoning-intensive than benchmarks testing the model’s\ngeneral capabilities such as MMLU. But feel free to submit your results based on CoT. The default prompt for zero-shot and five-shot evaluation is shown below:\n![figure](./figs/eva_prompts.png)\n\nTo enable more accurate extraction of the predicted answers, we made minor changes to the prompts for some models, which is shown below:\n![figure](./figs/prompt_change.png)\n\n## How to Submit\nYou need to first prepare a UTF-8 encoded JSON file with the following format, please refer to [submission_example.json](./submission_example.json) for details.\n\n```\n## key is the \"id\" field of the test questions\n## value is the predicted answer: 0-\u003eA, 1-\u003eB, 2-\u003eC, 3-\u003eD\n{\n    \"0\": 0,\n    \"1\": 1,\n    \"2\": 3,\n    \"3\": 2 \n}\n```\n\nThen you can submit the JSON file to our [website](https://llmbench.ai/safety). \n\n## Citation\n```\n@article{zhang2023safetybench,\n      title={SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions}, \n      author={Zhexin Zhang and Leqi Lei and Lindong Wu and Rui Sun and Yongkang Huang and Chong Long and Xiao Liu and Xuanyu Lei and Jie Tang and Minlie Huang},\n      journal={arXiv preprint arXiv:2309.07045},\n      year={2023}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-coai%2FSafetyBench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthu-coai%2FSafetyBench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-coai%2FSafetyBench/lists"}