{"id":28834208,"url":"https://github.com/openmoss/longsafety","last_synced_at":"2025-09-07T10:39:19.187Z","repository":{"id":268537212,"uuid":"886644494","full_name":"OpenMOSS/LongSafety","owner":"OpenMOSS","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-16T09:22:39.000Z","size":753,"stargazers_count":6,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-16T10:35:08.818Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMOSS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-11-11T10:53:26.000Z","updated_at":"2025-06-16T09:22:43.000Z","dependencies_parsed_at":"2025-06-16T10:36:18.526Z","dependency_job_id":"82bda62d-9d4d-41e2-886b-a2df3cf3a18f","html_url":"https://github.com/OpenMOSS/LongSafety","commit_stats":null,"previous_names":["luther-sparks/longsafetybench","openmoss/longsafety"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OpenMOSS/LongSafety","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongSafety","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongSafety/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongSafety/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongSafety/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMOSS","download_url":"https://codeload.github.com/OpenMOSS/LongSafety/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FLongSafety/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274026706,"owners_count":25209739,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-19T09:05:57.542Z","updated_at":"2025-09-07T10:39:19.164Z","avatar_url":"https://github.com/OpenMOSS.png","language":"Python","readme":"\n# LongSafety: Enhance Safety for Long-Context LLMs\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://huggingface.co/datasets/LutherXD/LongSafety-17k\" target=\"_blank\"\u003e🤗 HF Dataset\u003c/a\u003e •\n    \u003ca href=\"https://huggingface.co/datasets/LutherXD/LongSafetyBench\" target=\"_blank\"\u003e📊 HF Benchmark\u003c/a\u003e •\n    \u003ca href=\"https://arxiv.org/abs/2411.06899\" target=\"_blank\"\u003e📃 Paper\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    Read this in \u003ca href=\"README_zh.md\"\u003e中文\u003c/a\u003e.\n\u003c/p\u003e\n\n**LongSafety** is the first in-depth study on safety alignment for long-context Large Language Models (LLMs). As the context length of models significantly increases, safety issues in long-context scenarios urgently need to be addressed.\n\nThe main contributions of this project include:\n\n1.  **Analysis \u0026 Classification**: In-depth analysis of long-context safety issues, exploring more task scenarios, and classifying them into three categories: **Query Harmful (QH)**, **Partially Harmful (PH)**, and **Fully Harmful (FH)**.\n2.  **LongSafety Dataset**: Constructed the first training dataset for long-context safety alignment, **LongSafety**.\n    *   Contains **8 tasks**, covering the three scenarios mentioned above.\n    *   A total of **17k** high-quality samples.\n    *   Average context length reaches **40.9k tokens**.\n3.  **LongSafetyBench**: Constructed the first benchmark for evaluating long-context safety, **LongSafetyBench**.\n    *   Contains **10 tasks** (covering in-domain and out-of-domain tasks).\n    *   A total of **1k** test samples.\n    *   Average context length **41.9k tokens**.\n    *   Uses a multiple-choice format to evaluate the model's **HarmAwareness (HA)** and **SafeResponse (SR)** capabilities.\n\nExperiments demonstrate that training with LongSafety can effectively enhance the safety of models in both long-context and short-context scenarios, while maintaining their general capabilities.\n\n⚠️ **WARNING**: The associated paper and data contain unsafe content. Please use the related data and code responsibly and adhere to ethical guidelines.\n\n## 🔍 Table of Contents\n- [⚙️ Environment Setup](#preparation)\n- [🖥️ LongSafety Training](#longsafety-training)\n- [📊 LongSafetyBench Evaluation](#longsafetybench-evaluation)\n- [📝 Citation](#citation)\n- [🙏 Acknowledgements](#acknowledgements)\n\n\u003ca name=\"preparation\"\u003e\u003c/a\u003e\n\n## ⚙️ Environment Setup\n\n1.  Clone this repository:\n    ```bash\n    git clone https://github.com/OpenMOSS/LongSafety.git\n    cd LongSafety\n    ```\n\n2.  Install dependencies:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3.  Data Preparation:\n    ```bash\n    # Install Git LFS (if not already installed)\n    git lfs install\n\n    # Download LongSafety Training Dataset\n    git clone https://huggingface.co/datasets/LutherXD/LongSafety-17k\n\n    # Download LongSafetyBench Evaluation Dataset\n    git clone https://huggingface.co/datasets/LutherXD/LongSafetyBench\n    ```\n\n\u003ca name=\"longsafety-training\"\u003e\u003c/a\u003e\n\n## 🖥️ LongSafety Training\n\n### Dataset Introduction (LongSafety)\n\nThe LongSafety training dataset aims to enhance the safety of large models when processing long contexts through supervised fine-tuning (SFT). It contains **17k** high-quality samples, covering the following **8** carefully designed long-context safety-related tasks, with an average length of **40.9k tokens**.\n\n**Training Task List (Total 8):**\n\n*   **Query Harmful:**\n    *   Politically Incorrect\n    *   Medical Quiz\n    *   SafeMT Long\n    *   Keyword RAG\n    *   LawQA\n*   **Partially Harmful:**\n    *   Harmful NIAH\n    *   Counting Crimes\n*   **Fully Harmful:**\n    *   ManyShot Jailbreak\n\n![Task distribution in LongSafety](./images/LS_category.png)\n*(Task details can be found in Appendix A.1 and Figure 3a of the paper)*\n\n### Training Instructions\n\nWe use the [InternEvo](https://github.com/InternLM/InternEvo) framework for model fine-tuning. Specific training scripts and hyperparameter settings are as follows:\n\n```bash\n# [TODO] We will provide detailed training startup scripts and configuration file examples here soon.\n```\n\nWe will release the model weights fine-tuned with LongSafety later.\n\n\u003ca name=\"longsafetybench-evaluation\"\u003e\u003c/a\u003e\n\n## 📊 LongSafetyBench Evaluation\n\n### Benchmark Introduction (LongSafetyBench)\n\nLongSafetyBench is the first benchmark specifically designed to evaluate the safety of LLMs in long contexts. It contains 1k multiple-choice samples covering 10 tasks, with an average length of 41.9k tokens. These tasks are designed to test the model's ability to identify and refuse to generate harmful content in long inputs.\n\n**Evaluation Metrics:**\n*   **HarmAwareness (HA):** The model's ability to recognize potential harm in the input.\n*   **SafeResponse (SR):** The model's ability to provide a safe, harmless response (usually refusal) after recognizing harm.\n\n**Task List:** (Task details can be found in Appendix B.1 of the paper)\n*   HarmfulExtraction\n*   HarmfulTendency\n*   ManyShotJailbreak\n*   HarmfulNIAH\n*   CountingCrimes\n*   DocAttack\n*   HarmfulAdvice\n*   MedicalQuiz\n*   PoliticallyIncorrect\n*   LeadingQuestion\n\n![Task distribution in LongSafetyBench](./images/category.png)\n\n### Running Evaluation\n\n```bash\nmodel_name=\"\"\nmodel_type=\"\"   # can be one of ['vllm', 'oai', 'hf']\nmodel_path=\"\"\nmax_length=\"\"\ndata_path=\"\"\noutput_dir=\"./results/\"\ndata_parallel_size=\"1\"\napi_key=\"\"  # OpenAI SDK\nbase_url=\"\"\norganization=\"\"\n\n\npython -m eval.eval --model_type \"$model_type\"\\\n    --model \"$model_path\"\\\n    --model_name \"$model_name\"\\\n    --max_length \"$max_length\"\\\n    --data_path \"$data_path\"\\\n    --output_dir \"$output_dir\"\\\n    --data_parallel_size \"$data_parallel_size\"\\\n    --api_key \"$api_key\"\\\n    --base_url \"$base_url\"\\\n    --organization \"$organization\"\\\n```\n\n### Evaluation Results\n\n![Some evaluation results on LongSafetyBench](long_safety-barh.jpg)\n*(Refer to Figure 1 in the paper for more results)*\n\n\n\u003ca name=\"citation\"\u003e\u003c/a\u003e\n\n## 📝 Citation\n\nIf you use our dataset, benchmark, or code in your research, please cite our paper:\n\n```bibtex\n@misc{huang2024longsafety,\n      title={LongSafety: Enhance Safety for Long-Context LLMs},\n      author={Mianqiu Huang and Xiaoran Liu and Shaojun Zhou and Mozhi Zhang and Qipeng Guo and Linyang Li and Chenkun Tan and Yang Gao and Pengyu Wang and Linlin Li and Qun Liu and Yaqian Zhou and Xipeng Qiu and Xuanjing Huang},\n      year={2024},\n      eprint={2411.06899},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2411.06899},\n}\n```\n\n\u003ca name=\"acknowledgements\"\u003e\u003c/a\u003e\n\n## 🙏 Acknowledgements\n\nThanks to all researchers and developers who contributed to this project. Special thanks to the [Shanghai AI Laboratory](https://www.shlab.org.cn/), the [MOSS Team at Fudan University](https://github.com/OpenMOSS), and the [Huawei Noah's Ark Lab](https://www.noahlab.com.hk/#/home) for their support.\n```markdown\n# LongSafety: Enhance Safety for Long-Context LLMs\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://huggingface.co/datasets/LutherXD/LongSafety-17k\" target=\"_blank\"\u003e🤗 HF Dataset\u003c/a\u003e •\n    \u003ca href=\"https://huggingface.co/datasets/LutherXD/LongSafetyBench\" target=\"_blank\"\u003e📊 HF Benchmark\u003c/a\u003e •\n    \u003ca href=\"https://arxiv.org/abs/2411.06899\" target=\"_blank\"\u003e📃 Paper\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    Read this in \u003ca href=\"README_zh.md\"\u003e中文\u003c/a\u003e.\n\u003c/p\u003e\n\n**LongSafety** is the first in-depth study on safety alignment for long-context Large Language Models (LLMs). As the context length of models significantly increases, safety issues in long-context scenarios urgently need to be addressed.\n\nThe main contributions of this project include:\n\n1.  **Analysis \u0026 Classification**: In-depth analysis of long-context safety issues, exploring more task scenarios, and classifying them into three categories: **Query Harmful (QH)**, **Partially Harmful (PH)**, and **Fully Harmful (FH)**.\n2.  **LongSafety Dataset**: Constructed the first training dataset for long-context safety alignment, **LongSafety**.\n    *   Contains **8 tasks**, covering the three scenarios mentioned above.\n    *   A total of **17k** high-quality samples.\n    *   Average context length reaches **40.9k tokens**.\n3.  **LongSafetyBench**: Constructed the first benchmark for evaluating long-context safety, **LongSafetyBench**.\n    *   Contains **10 tasks** (covering in-domain and out-of-domain tasks).\n    *   A total of **1k** test samples.\n    *   Average context length **41.9k tokens**.\n    *   Uses a multiple-choice format to evaluate the model's **HarmAwareness (HA)** and **SafeResponse (SR)** capabilities.\n\nExperiments demonstrate that training with LongSafety can effectively enhance the safety of models in both long-context and short-context scenarios, while maintaining their general capabilities.\n\n⚠️ **WARNING**: The associated paper and data contain unsafe content. Please use the related data and code responsibly and adhere to ethical guidelines.\n\n## 🔍 Table of Contents\n- [⚙️ Environment Setup](#preparation)\n- [🖥️ LongSafety Training](#longsafety-training)\n- [📊 LongSafetyBench Evaluation](#longsafetybench-evaluation)\n- [📝 Citation](#citation)\n- [🙏 Acknowledgements](#acknowledgements)\n\n\u003ca name=\"preparation\"\u003e\u003c/a\u003e\n\n## ⚙️ Environment Setup\n\n1.  Clone this repository:\n    ```bash\n    git clone https://github.com/OpenMOSS/LongSafety.git\n    cd LongSafety\n    ```\n\n2.  Install dependencies:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3.  Data Preparation:\n    ```bash\n    # Install Git LFS (if not already installed)\n    git lfs install\n\n    # Download LongSafety Training Dataset\n    git clone https://huggingface.co/datasets/LutherXD/LongSafety-17k\n\n    # Download LongSafetyBench Evaluation Dataset\n    git clone https://huggingface.co/datasets/LutherXD/LongSafetyBench\n    ```\n\n\u003ca name=\"longsafety-training\"\u003e\u003c/a\u003e\n\n## 🖥️ LongSafety Training\n\n### Dataset Introduction (LongSafety)\n\nThe LongSafety training dataset aims to enhance the safety of large models when processing long contexts through supervised fine-tuning (SFT). It contains **17k** high-quality samples, covering the following **8** carefully designed long-context safety-related tasks, with an average length of **40.9k tokens**.\n\n**Training Task List (Total 8):**\n\n*   **Query Harmful:**\n    *   Politically Incorrect\n    *   Medical Quiz\n    *   SafeMT Long\n    *   Keyword RAG\n    *   LawQA\n*   **Partially Harmful:**\n    *   Harmful NIAH\n    *   Counting Crimes\n*   **Fully Harmful:**\n    *   ManyShot Jailbreak\n\n![Task distribution in LongSafety](./images/LS_category.png)\n*(Task details can be found in Appendix A.1 and Figure 3a of the paper)*\n\n### Training Instructions\n\nWe use the [InternEvo](https://github.com/InternLM/InternEvo) framework for model fine-tuning. Specific training scripts and hyperparameter settings are as follows:\n\n```bash\n# [TODO] We will provide detailed training startup scripts and configuration file examples here soon.\n```\n\nWe will release the model weights fine-tuned with LongSafety later.\n\n\u003ca name=\"longsafetybench-evaluation\"\u003e\u003c/a\u003e\n\n## 📊 LongSafetyBench Evaluation\n\n### Benchmark Introduction (LongSafetyBench)\n\nLongSafetyBench is the first benchmark specifically designed to evaluate the safety of LLMs in long contexts. It contains 1k multiple-choice samples covering 10 tasks, with an average length of 41.9k tokens. These tasks are designed to test the model's ability to identify and refuse to generate harmful content in long inputs.\n\n**Evaluation Metrics:**\n*   **HarmAwareness (HA):** The model's ability to recognize potential harm in the input.\n*   **SafeResponse (SR):** The model's ability to provide a safe, harmless response (usually refusal) after recognizing harm.\n\n**Task List:** (Task details can be found in Appendix B.1 of the paper)\n*   HarmfulExtraction\n*   HarmfulTendency\n*   ManyShotJailbreak\n*   HarmfulNIAH\n*   CountingCrimes\n*   DocAttack\n*   HarmfulAdvice\n*   MedicalQuiz\n*   PoliticallyIncorrect\n*   LeadingQuestion\n\n![Task distribution in LongSafetyBench](category.png)\n\n### Running Evaluation\n\n```bash\nmodel_name=\"\"\nmodel_type=\"\"   # can be one of ['vllm', 'oai', 'hf']\nmodel_path=\"\"\nmax_length=\"\"\ndata_path=\"\"\noutput_dir=\"./results/\"\ndata_parallel_size=\"1\"\napi_key=\"\"  # OpenAI SDK\nbase_url=\"\"\norganization=\"\"\n\n\npython -m eval.eval --model_type \"$model_type\"\\\n    --model \"$model_path\"\\\n    --model_name \"$model_name\"\\\n    --max_length \"$max_length\"\\\n    --data_path \"$data_path\"\\\n    --output_dir \"$output_dir\"\\\n    --data_parallel_size \"$data_parallel_size\"\\\n    --api_key \"$api_key\"\\\n    --base_url \"$base_url\"\\\n    --organization \"$organization\"\\\n```\n\n### Evaluation Results\n\n![Some evaluation results on LongSafetyBench](./images/long_safety-barh.jpg)\n*(Refer to Figure 1 in the paper for more results)*\n\n\n\u003ca name=\"citation\"\u003e\u003c/a\u003e\n\n## 📝 Citation\n\nIf you use our dataset, benchmark, or code in your research, please cite our paper:\n\n```bibtex\n@misc{huang2024longsafety,\n      title={LongSafety: Enhance Safety for Long-Context LLMs},\n      author={Mianqiu Huang and Xiaoran Liu and Shaojun Zhou and Mozhi Zhang and Qipeng Guo and Linyang Li and Chenkun Tan and Yang Gao and Pengyu Wang and Linlin Li and Qun Liu and Yaqian Zhou and Xipeng Qiu and Xuanjing Huang},\n      year={2024},\n      eprint={2411.06899},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2411.06899},\n}\n```\n\n\u003ca name=\"acknowledgements\"\u003e\u003c/a\u003e\n\n## 🙏 Acknowledgements\n\nThanks to all researchers and developers who contributed to this project. Special thanks to the [Shanghai AI Laboratory](https://www.shlab.org.cn/), the [MOSS Team at Fudan University](https://github.com/OpenMOSS), and the [Huawei Noah's Ark Lab](https://www.noahlab.com.hk/#/home) for their support.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Flongsafety","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenmoss%2Flongsafety","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Flongsafety/lists"}