{"id":28676498,"url":"https://github.com/zjunlp/chineseharm-bench","last_synced_at":"2025-06-13T23:04:50.822Z","repository":{"id":298843311,"uuid":"987632846","full_name":"zjunlp/ChineseHarm-bench","owner":"zjunlp","description":"ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark","archived":false,"fork":false,"pushed_at":"2025-06-13T07:04:49.000Z","size":2573,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-13T07:42:56.719Z","etag":null,"topics":["artificial-intelligence","benchmark","chinese","chineseharm-bench","harmful-content-detection","knowledge-augmentation","large-language-models","natural-language-processing","resource","safety"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-21T11:07:18.000Z","updated_at":"2025-06-13T07:04:53.000Z","dependencies_parsed_at":"2025-06-13T07:42:59.508Z","dependency_job_id":"2d5ab08b-e52c-4e6f-b2c6-ed4503c84b75","html_url":"https://github.com/zjunlp/ChineseHarm-bench","commit_stats":null,"previous_names":["zjunlp/chineseharm-bench"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/ChineseHarm-bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FChineseHarm-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FChineseHarm-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FChineseHarm-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FChineseHarm-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/ChineseHarm-bench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FChineseHarm-bench/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259732773,"owners_count":22903087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","benchmark","chinese","chineseharm-bench","harmful-content-detection","knowledge-augmentation","large-language-models","natural-language-processing","resource","safety"],"created_at":"2025-06-13T23:04:50.345Z","updated_at":"2025-06-13T23:04:50.809Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","readme":"\u003ch1 align=\"center\"\u003e ChineseHarm-bench\u003c/h1\u003e\n\u003ch3 align=\"center\"\u003e A Chinese Harmful Content  Detection Benchmark \u003c/h3\u003e\n\n\u003e ⚠️ **WARNING**: This project and associated data contain content that may be toxic, offensive, or disturbing. Use responsibly and with discretion.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003eProject\u003c/a\u003e •\n  \u003ca href=\"https://arxiv.org/abs/2506.10960v1\"\u003ePaper\u003c/a\u003e •\n  \u003ca href=\"https://huggingface.co/collections/zjunlp/chineseharm-bench-683b452c5dcd1d6831c3316c\"\u003eHugging Face\u003c/a\u003e \n\u003c/p\u003e\n  \n\u003cdiv\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figs/chineseharm_case.png\" width=\"80%\"/\u003e\u003c/p\u003e\n\u003c/div\u003e\n\n[![Awesome](https://awesome.re/badge.svg)](https://github.com/zjunlp/ChineseHarm-bench) ![](https://img.shields.io/github/last-commit/zjunlp/ChineseHarm-bench?color=green) \n\n## Table of Contents\n\n- 🌻 [Ethics Statement](#ethics-statement)\n- 🧐 [Acknowledgement](#acknowledgement)\n- 🌟 [Overview](#overview)\n- 🚀 [Installation](#installation)\n- 📚 [Inference](#inference)\n- 📉 [Baseline](#baseline)\n- 🚩 [Citation](#citation)\n\n## 🌻Ethics Statement\n\nWe obtain all data with proper authorization from the respective data-owning organizations and signed the necessary agreements.\n\n**The benchmark is released under the CC BY-NC 4.0 license.\nAll datasets have been anonymized and reviewed by the Institutional Review Board (IRB) of the data provider to ensure privacy protection.**\n\nMoreover, we categorically denounce any malicious misuse of this benchmark and are committed to ensuring that its development and use consistently align with human ethical principles.\n\n##  🧐Acknowledgement\n\nWe gratefully acknowledge Tencent for providing the dataset and LLaMA-Factory for the training codebase.\n\n\n## 🌟Overview\n\nWe introduce ChineseHarm-Bench, a professionally annotated benchmark for Chinese harmful content detection, covering six key categories. It includes a knowledge rule base to enhance detection and a knowledge-augmented baseline that enables smaller LLMs to match state-of-the-art performance. \n\nThe benchmark construction process is illustrated in the figure below. For more detailed procedures, please refer to our paper.\n\n\u003cdiv\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figs/main.png\"/\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n## 🚀Installation\n\n1. Clone the repositories:\n\n   ```bash\n   git clone https://github.com/zjunlp/ChineseHarm-bench\n   cd ChineseHarm-bench\n   git clone https://github.com/hiyouga/LLaMA-Factory\n   ```\n\n2. Install dependencies:\n\n   ```bash\n   cd LLaMA-Factory\n   pip install -e \".[torch,metrics]\" \n   ```\n\n## 📚Inference\nOur inference scripts support both **Huawei Ascend NPUs and NVIDIA GPUs**, enabling flexible deployment across different hardware platforms.\n\nWe release the following variants of our harmful content detection model:\n\n- [**ChineseGuard-1.5B**](https://huggingface.co/zjunlp/ChineseGuard-1.5B)\n- [**ChineseGuard-3B**](https://huggingface.co/zjunlp/ChineseGuard-3B)\n- [**ChineseGuard-7B**](https://huggingface.co/zjunlp/ChineseGuard-7B)\n\n🔹 Single Inference (Example)\n\nRun single-input inference using the ChineseGuard-1.5B model:\n\n```\nSCRIPT_PATH=\"../infer/single_infer.py\"\nmodel_name=\"zjunlp/ChineseHarm-1.5B\"\ntext=\"代发短信，有想做的联系我，无押金\"\n\npython $SCRIPT_PATH \\\n    --model_name $model_name \\\n    --text $text\n```\n\n🔸 Batch Inference (Multi-NPU or Multi-GPU)\n\nTo run inference on the entire ChineseHarm-Bench using ChineseGuard-1.5B and 8 NPUs:\n\n```\nSCRIPT_PATH=\"../infer/batch_infer.py\"\nmodel_name=\"zjunlp/ChineseHarm-1.5B\"\nfile_name=\"../benchmark/bench.json\"\noutput_file=\"../benchmark/bench_ChineseHarm-1.5B.json\"\n\npython $SCRIPT_PATH \\\n    --model_name $model_name \\\n    --file_name $file_name \\\n    --output_file $output_file \\\n    --num_npus 8\n\n```\n\n\u003e For more configuration options (e.g., batch size, device selection, custom prompt templates), please refer to `single_infer.py` and `batch_infer.py`.\n\u003e\n\u003e **Note:** The inference scripts support both NPU and GPU devices.\n\n**Evaluation: Calculating F1 Score**\n\nAfter inference, evaluate the predictions by computing the F1 score with the following command:\n\n```\npython ../calculate_metrics.py \\\n    --file_path \"../benchmark/bench_ChineseHarm-1.5B.json\" \\\n    --true_label_field \"标签\" \\\n    --predicted_label_field \"predict_label\"\n```\n\n## 📉Baseline\n\n**Hybrid Knowledgeable Prompting**\n\nFirst, generate diverse prompting instructions that reflect real-world violations:\n\n```\nSCRIPT_PATH=\"../baseline/Hybrid_Knowledgeable_Prompting.py\"\noutput_path=\"../baseline/prompt.json\"\npython $SCRIPT_PATH\\\n    --output_path $output_path\n```\n\n**Synthetic Data Curation**\n\nUse GPT-4o to generate synthetic texts conditioned on the above prompts:\n\n```\nSCRIPT_PATH=\"../baseline/Synthetic_Data_Curation.py\"\nbase_url=\"\"\napi_key=\"\"\ninput_file=\"../baseline/prompt.json\"\noutput_file=\"../baseline/train_raw.json\"  \n\npython $SCRIPT_PATH \\\n    --base_url $base_url\\\n    --api_key $api_key\\\n    --input_file $input_file\\\n    --output_file $output_file\n\n```\n\n\u003e 💡 The script calls the OpenAI API to generate responses based on each prompt.\n\n**Data Process**\n\nFilter out refused responses and sample a fixed number of instances per category to ensure balance:\n\n```\nSCRIPT_PATH=\"../baseline/Data_Process.py\"\ninput_file=\"../baseline/train_raw.json\"\noutput_file=\"../baseline/train.json\"  \nsample_size=3000\n\npython $SCRIPT_PATH \\\n    --input_file $input_file\\\n    --output_file $output_file\\\n    --sample_size $sample_size\n\n```\n\n\u003e ✅ The final output `train.json` contains `sample_size` samples per category, ready for training.\n\n**Knowledge-Guided Training**\n\nTo prepare for training, add the following entry to `LLaMA-Factory/data/dataset_info.json`:\n\n```\n\"train\":{\n  \"file_name\": \"../baseline/train.json\",\n  \"columns\": {\n    \"prompt\": \"Prompt_Detect\",\n    \"response\": \"违规类别\"\n  }\n}\n```\n\nTo train a 1.5B model using LLaMA-Factory:\n\n```\nmv ../train.yaml examples/train_full\nllamafactory-cli train  examples/train_full/train.yaml\n```\n\nFor more training configurations and customization options, please refer to the official [LLaMA-Factory GitHub repository](https://github.com/hiyouga/LLaMA-Factory).\n\n\n## 🚩Citation\n\nPlease cite our repository if you use ChineseHarm-bench in your work. Thanks!\n\n```bibtex\n@misc{liu2025chineseharmbenchchineseharmfulcontent,\n      title={ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark}, \n      author={Kangwei Liu and Siyuan Cheng and Bozhong Tian and Xiaozhuan Liang and Yuyang Yin and Meng Han and Ningyu Zhang and Bryan Hooi and Xi Chen and Shumin Deng},\n      year={2025},\n      eprint={2506.10960},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2506.10960}, \n}\n```\n\n## 🎉Contributors\n\nWe will offer long-term maintenance to fix bugs and solve issues. So if you have any problems, please put issues to us.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fchineseharm-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fchineseharm-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fchineseharm-bench/lists"}