{"id":28676522,"url":"https://github.com/zjunlp/biasedit","last_synced_at":"2025-06-13T23:04:59.709Z","repository":{"id":245714494,"uuid":"819029281","full_name":"zjunlp/BiasEdit","owner":"zjunlp","description":"[TrustNLP@NAACL 2025] BiasEdit: Debiasing Stereotyped Language Models via Model Editing","archived":false,"fork":false,"pushed_at":"2025-04-19T02:52:48.000Z","size":3300,"stargazers_count":11,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-19T11:09:39.456Z","etag":null,"topics":["artificial-intelligence","bias","biasedit","debiasing","editing","knowledge-editing","large-language-models","llm","model-editing","natural-language-processing","stereotypes"],"latest_commit_sha":null,"homepage":"https://zjunlp.github.io/project/BiasEdit","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-23T15:18:36.000Z","updated_at":"2025-04-19T03:36:58.000Z","dependencies_parsed_at":"2025-03-03T19:21:02.016Z","dependency_job_id":"87988b21-1070-49ab-ab68-ddd9b0380840","html_url":"https://github.com/zjunlp/BiasEdit","commit_stats":null,"previous_names":["zjunlp/biasedit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/BiasEdit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FBiasEdit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FBiasEdit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FBiasEdit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FBiasEdit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/BiasEdit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FBiasEdit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259732771,"owners_count":22903087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","bias","biasedit","debiasing","editing","knowledge-editing","large-language-models","llm","model-editing","natural-language-processing","stereotypes"],"created_at":"2025-06-13T23:04:58.744Z","updated_at":"2025-06-13T23:04:59.671Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BiasEdit: Debiasing Stereotyped Language Models via Model Editing\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2503.08588\"\u003e📃 Paper\u003c/a\u003e \n  \u003ca href=\"https://github.com/zjunlp/BiasEdit\"\u003e💻 Code\u003c/a\u003e \n  \u003ca href=\"https://zjunlp.github.io/project/BiasEdit\"\u003e🌏 Web\u003c/a\u003e \n\u003c/p\u003e\n\n\n\n\u003cdiv align=center\u003e\u003cimg src=\"fig/BiasEdit_fig1.gif\" width=\"70%\"/\u003e\u003c/div\u003e\n\n\n**BiasEdit** is an efficient *model editing* method to eliminate stereotyped bias from language models with small editor networks, including a *debiasing loss* to guide edits on partial parameters and a *remaining loss* to maintain the language modeling abilities during editing. Experimental results show BiasEdit' excellent performance on debiasing, modeling ability preservation, and robustness of gender reverse and semantic generality.\n\n## 📌 Table of Contents\n\n- [🛠️ Setup](#1)\n- [💻 BiasEdit](#2)\n    - [⌚️ Training Editor Networks](#2.1)\n    - [🚀 Debiasing with Editor Networks](#2.2)\n- [👀 Bias Tracing](#3)\n- [📝 Citation](#4)\n- [✨ Acknowledgements](#5)\n\n\u003ch2 id=\"1\"\u003e🛠️ Setup\u003c/h2\u003e\n\nThis codebase uses Python 3.9.18. Other versions may work as well.\n\nCreate an environment\nand install the dependencies:\n\n    $ conda create -n biasedit python=3.9\n    $ conda activate biasedit\n    (biasedit) $ pip install -r requirements.txt\n\n\n\u003ch2 id=\"2\"\u003e💻 BiasEdit\u003c/h2\u003e\n\u003cdiv align=center\u003e\u003cimg src=\"fig/BiasEdit_fig2.png\" width=\"80%\"/\u003e\u003c/div\u003e\n\nWith [StereoSet](https://aclanthology.org/2021.acl-long.416/), editor networks are trained to generate parameter shifts for debiasing at first. Then, the trained editor networks are used to conduct edits on language models and produce an unbiased model.\n\n\u003ch3 id=\"2.1\"\u003e⌚️ Training Editor Networks\u003c/h3\u003e\n\n- Formatted datasets with [train](./data/stereoset/train.json)/[dev](./data/stereoset/dev.json)/test (`gender_test.json`, `race_test.json`, `religion_test.json`) splits are in [data/stereoset](./data/stereoset). \n- Configurations are in [config](./config). Partial parameters to be edited are presented in [editor](./config/editor). The configurations, like weights to be edited, are in [model](config/model).\n- Experimental scripts are in [scripts](./scripts). All hyper-parameters are in the scripts. Since hyper-parameters have a great effect on hyper-network tuning, higly recommand conducting hyper-paramter tuning.\n- For the ablation study on the remaining loss, set `editor.loc_coef=0`.\n- Metrics can be found in the training log.\n\n\n\u003ch3 id=\"2.2\"\u003e🚀 Debiasing with Editor Networks\u003c/h3\u003e\n\n- Set `eval_only=True`\n- Set `data.valid_path` as the path of the test set\n- Metrics can be found at the end of the debiasing log, like \"Test ------- XXX\".\n- Experimental scripts are in [scripts](./scripts).\n\n\n\u003ch2 id=\"3\"\u003e👀 Bias Tracing\u003c/h2\u003e\n\nEnter [bias_tracing](./bias_tracing).\n\n\n\u003ch2 id=\"4\"\u003e📝 Citation\u003c/h2\u003e\n\nIf this code or paper was useful, please consider using the following citation:\n\n    @article{xin25BiasEdit,\n        title={BiasEdit: Debiasing Stereotyped Language Models via Model Editing},\n        author={Xin Xu, Wei Xu, Ningyu Zhang, Julian McAuley},\n        year={2025},\n        url={https://arxiv.org/pdf/2503.08588}\n    }\n\n\u003ch2 id=\"5\"\u003e✨ Acknowledgements\u003c/h5\u003e\n\n- Thanks for the original code from [MALMEN](https://github.com/ChenmienTan/malmen).\n- Thanks for StereoSet and all the baselines from [bias-bench](https://github.com/McGill-NLP/bias-bench).\n- For more model editing methods, please try [EasyEdit](https://github.com/zjunlp/EasyEdit).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fbiasedit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fbiasedit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fbiasedit/lists"}