https://github.com/zjunlp/chineseharm-bench

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
https://github.com/zjunlp/chineseharm-bench

artificial-intelligence benchmark chinese chineseharm-bench harmful-content-detection knowledge-augmentation large-language-models natural-language-processing resource safety

Last synced: 12 months ago
JSON representation

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

Host: GitHub
URL: https://github.com/zjunlp/chineseharm-bench
Owner: zjunlp
License: mit
Created: 2025-05-21T11:07:18.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-13T07:04:49.000Z (12 months ago)
Last Synced: 2025-06-13T07:42:56.719Z (12 months ago)
Topics: artificial-intelligence, benchmark, chinese, chineseharm-bench, harmful-content-detection, knowledge-augmentation, large-language-models, natural-language-processing, resource, safety
Language: Python
Homepage:
Size: 2.45 MB
Stars: 4
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

ChineseHarm-bench

A Chinese Harmful Content Detection Benchmark

> ⚠️ **WARNING**: This project and associated data contain content that may be toxic, offensive, or disturbing. Use responsibly and with discretion.

Project •
Paper •
Hugging Face

[![Awesome](https://awesome.re/badge.svg)](https://github.com/zjunlp/ChineseHarm-bench) ![](https://img.shields.io/github/last-commit/zjunlp/ChineseHarm-bench?color=green)

## Table of Contents

- 🌻 [Ethics Statement](#ethics-statement)
- 🧐 [Acknowledgement](#acknowledgement)
- 🌟 [Overview](#overview)
- 🚀 [Installation](#installation)
- 📚 [Inference](#inference)
- 📉 [Baseline](#baseline)
- 🚩 [Citation](#citation)

## 🌻Ethics Statement

We obtain all data with proper authorization from the respective data-owning organizations and signed the necessary agreements.

**The benchmark is released under the CC BY-NC 4.0 license.
All datasets have been anonymized and reviewed by the Institutional Review Board (IRB) of the data provider to ensure privacy protection.**

Moreover, we categorically denounce any malicious misuse of this benchmark and are committed to ensuring that its development and use consistently align with human ethical principles.

## 🧐Acknowledgement

We gratefully acknowledge Tencent for providing the dataset and LLaMA-Factory for the training codebase.

## 🌟Overview

We introduce ChineseHarm-Bench, a professionally annotated benchmark for Chinese harmful content detection, covering six key categories. It includes a knowledge rule base to enhance detection and a knowledge-augmented baseline that enables smaller LLMs to match state-of-the-art performance.

The benchmark construction process is illustrated in the figure below. For more detailed procedures, please refer to our paper.

## 🚀Installation

1. Clone the repositories:

```bash
git clone https://github.com/zjunlp/ChineseHarm-bench
cd ChineseHarm-bench
git clone https://github.com/hiyouga/LLaMA-Factory
```

2. Install dependencies:

```bash
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
```

## 📚Inference
Our inference scripts support both **Huawei Ascend NPUs and NVIDIA GPUs**, enabling flexible deployment across different hardware platforms.

We release the following variants of our harmful content detection model:

- [**ChineseGuard-1.5B**](https://huggingface.co/zjunlp/ChineseGuard-1.5B)
- [**ChineseGuard-3B**](https://huggingface.co/zjunlp/ChineseGuard-3B)
- [**ChineseGuard-7B**](https://huggingface.co/zjunlp/ChineseGuard-7B)

🔹 Single Inference (Example)

Run single-input inference using the ChineseGuard-1.5B model:

```
SCRIPT_PATH="../infer/single_infer.py"
model_name="zjunlp/ChineseHarm-1.5B"
text="代发短信，有想做的联系我，无押金"

python $SCRIPT_PATH \
--model_name $model_name \
--text $text
```

🔸 Batch Inference (Multi-NPU or Multi-GPU)

To run inference on the entire ChineseHarm-Bench using ChineseGuard-1.5B and 8 NPUs:

```
SCRIPT_PATH="../infer/batch_infer.py"
model_name="zjunlp/ChineseHarm-1.5B"
file_name="../benchmark/bench.json"
output_file="../benchmark/bench_ChineseHarm-1.5B.json"

python $SCRIPT_PATH \
--model_name $model_name \
--file_name $file_name \
--output_file $output_file \
--num_npus 8

```

> For more configuration options (e.g., batch size, device selection, custom prompt templates), please refer to `single_infer.py` and `batch_infer.py`.
>
> **Note:** The inference scripts support both NPU and GPU devices.

**Evaluation: Calculating F1 Score**

After inference, evaluate the predictions by computing the F1 score with the following command:

```
python ../calculate_metrics.py \
--file_path "../benchmark/bench_ChineseHarm-1.5B.json" \
--true_label_field "标签" \
--predicted_label_field "predict_label"
```

## 📉Baseline

**Hybrid Knowledgeable Prompting**

First, generate diverse prompting instructions that reflect real-world violations:

```
SCRIPT_PATH="../baseline/Hybrid_Knowledgeable_Prompting.py"
output_path="../baseline/prompt.json"
python $SCRIPT_PATH\
--output_path $output_path
```

**Synthetic Data Curation**

Use GPT-4o to generate synthetic texts conditioned on the above prompts:

```
SCRIPT_PATH="../baseline/Synthetic_Data_Curation.py"
base_url=""
api_key=""
input_file="../baseline/prompt.json"
output_file="../baseline/train_raw.json"

python $SCRIPT_PATH \
--base_url $base_url\
--api_key $api_key\
--input_file $input_file\
--output_file $output_file

```

> 💡 The script calls the OpenAI API to generate responses based on each prompt.

**Data Process**

Filter out refused responses and sample a fixed number of instances per category to ensure balance:

```
SCRIPT_PATH="../baseline/Data_Process.py"
input_file="../baseline/train_raw.json"
output_file="../baseline/train.json"
sample_size=3000

python $SCRIPT_PATH \
--input_file $input_file\
--output_file $output_file\
--sample_size $sample_size

```

> ✅ The final output `train.json` contains `sample_size` samples per category, ready for training.

**Knowledge-Guided Training**

To prepare for training, add the following entry to `LLaMA-Factory/data/dataset_info.json`:

```
"train":{
"file_name": "../baseline/train.json",
"columns": {
"prompt": "Prompt_Detect",
"response": "违规类别"
}
}
```

To train a 1.5B model using LLaMA-Factory:

```
mv ../train.yaml examples/train_full
llamafactory-cli train examples/train_full/train.yaml
```

For more training configurations and customization options, please refer to the official [LLaMA-Factory GitHub repository](https://github.com/hiyouga/LLaMA-Factory).

## 🚩Citation

Please cite our repository if you use ChineseHarm-bench in your work. Thanks!

```bibtex
@misc{liu2025chineseharmbenchchineseharmfulcontent,
title={ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark},
author={Kangwei Liu and Siyuan Cheng and Bozhong Tian and Xiaozhuan Liang and Yuyang Yin and Meng Han and Ningyu Zhang and Bryan Hooi and Xi Chen and Shumin Deng},
year={2025},
eprint={2506.10960},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.10960},
}
```

## 🎉Contributors

We will offer long-term maintenance to fix bugs and solve issues. So if you have any problems, please put issues to us.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zjunlp/chineseharm-bench

Awesome Lists containing this project

README

ChineseHarm-bench

A Chinese Harmful Content Detection Benchmark