{"id":18832934,"url":"https://github.com/declare-lab/safety-arithmetic","last_synced_at":"2025-04-14T04:31:48.277Z","repository":{"id":244883833,"uuid":"816316445","full_name":"declare-lab/safety-arithmetic","owner":"declare-lab","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-14T05:20:43.000Z","size":391,"stargazers_count":12,"open_issues_count":0,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-03-27T18:21:44.795Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/declare-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-17T13:47:57.000Z","updated_at":"2025-02-27T11:55:40.000Z","dependencies_parsed_at":"2024-06-18T04:13:50.312Z","dependency_job_id":"f826edf8-5992-4c50-8a62-9d90cd2e08b1","html_url":"https://github.com/declare-lab/safety-arithmetic","commit_stats":null,"previous_names":["declare-lab/safety-arithmetic"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fsafety-arithmetic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fsafety-arithmetic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fsafety-arithmetic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/declare-lab%2Fsafety-arithmetic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/declare-lab","download_url":"https://codeload.github.com/declare-lab/safety-arithmetic/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248821744,"owners_count":21166950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T01:59:33.849Z","updated_at":"2025-04-14T04:31:48.268Z","avatar_url":"https://github.com/declare-lab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations (EMNLP 2024 Main)\n\n:point_right: Dataset updated.\n\n👉 [Read the paper](https://arxiv.org/abs/2406.11801)\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Experiments](#experiments)\n- [FileStructure](#filestructure)\n- [Citation](#citation)\n\n## Installation\n\n```\npip install -r requirement.txt\n```\n\n## Experiments \n\n\u003col\u003e\n  \u003cli\u003eSafety Arithmetic\u003c/li\u003e\n  \u003cli\u003eHarm Direction Removal (HDR): TIES, Task Vector\u003c/li\u003e\n  \u003cli\u003eICV\u003c/li\u003e\n\u003c/ol\u003e\n\n## FileStructure\n\n### Safety Arithmetic\n```\nRun Safety_Arithmetic_Base_and_SFT.ipynb file for BASE and SFT models.\nRun Safety_Arithmetic_Edited.ipynb file for EDITED models.\n```\n### Harm Direction Removal (HDR) (w/ TIES)\n```\nRun HDR/HDR_TIES_BASE_AND_SFT.ipynb for SFT models and BASE models\nRun HDR/HDR_TIES_EDITED.ipynb for EDITED model.\n```\n### Harm Direction Removal (HDR) (w/ Task Vector)\n```\nRun HDR/HDR_Task_Vector_BASE.ipynb for BASE models\nRun HDR/HDR_Task_Vector_SFT.ipynb for SFT models\nRun HDR/HDR_Task_Vector_EDITED.ipynb for EDITED models.\n```\n### Only ICV\n```\nRun Safety_Arithmetic_Base_and_SFT.ipynb file by passing direct base/sft (without HDR).\nRun Safety_Arithmetic_Edited.ipynb file by passing direct edited (without HDR).\n```\n\n## Citation\nIf you find this useful in your research, please consider citing:\n\n```\n@misc{hazra2024safety,\n      title={Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations}, \n      author={Rima Hazra and Sayan Layek and Somnath Banerjee and Soujanya Poria},\n      year={2024},\n      eprint={2406.11801},\n      archivePrefix={arXiv},\n      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeclare-lab%2Fsafety-arithmetic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeclare-lab%2Fsafety-arithmetic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeclare-lab%2Fsafety-arithmetic/lists"}