https://github.com/zjunlp/biasedit
[TrustNLP@NAACL 2025] BiasEdit: Debiasing Stereotyped Language Models via Model Editing
https://github.com/zjunlp/biasedit
artificial-intelligence bias biasedit debiasing editing knowledge-editing large-language-models llm model-editing natural-language-processing stereotypes
Last synced: 10 months ago
JSON representation
[TrustNLP@NAACL 2025] BiasEdit: Debiasing Stereotyped Language Models via Model Editing
- Host: GitHub
- URL: https://github.com/zjunlp/biasedit
- Owner: zjunlp
- Created: 2024-06-23T15:18:36.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-19T02:52:48.000Z (12 months ago)
- Last Synced: 2025-04-19T11:09:39.456Z (12 months ago)
- Topics: artificial-intelligence, bias, biasedit, debiasing, editing, knowledge-editing, large-language-models, llm, model-editing, natural-language-processing, stereotypes
- Language: Python
- Homepage: https://zjunlp.github.io/project/BiasEdit
- Size: 3.15 MB
- Stars: 11
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BiasEdit: Debiasing Stereotyped Language Models via Model Editing
**BiasEdit** is an efficient *model editing* method to eliminate stereotyped bias from language models with small editor networks, including a *debiasing loss* to guide edits on partial parameters and a *remaining loss* to maintain the language modeling abilities during editing. Experimental results show BiasEdit' excellent performance on debiasing, modeling ability preservation, and robustness of gender reverse and semantic generality.
## 📌 Table of Contents
- [🛠️ Setup](#1)
- [💻 BiasEdit](#2)
- [⌚️ Training Editor Networks](#2.1)
- [🚀 Debiasing with Editor Networks](#2.2)
- [👀 Bias Tracing](#3)
- [📝 Citation](#4)
- [✨ Acknowledgements](#5)
🛠️ Setup
This codebase uses Python 3.9.18. Other versions may work as well.
Create an environment
and install the dependencies:
$ conda create -n biasedit python=3.9
$ conda activate biasedit
(biasedit) $ pip install -r requirements.txt
💻 BiasEdit
With [StereoSet](https://aclanthology.org/2021.acl-long.416/), editor networks are trained to generate parameter shifts for debiasing at first. Then, the trained editor networks are used to conduct edits on language models and produce an unbiased model.
⌚️ Training Editor Networks
- Formatted datasets with [train](./data/stereoset/train.json)/[dev](./data/stereoset/dev.json)/test (`gender_test.json`, `race_test.json`, `religion_test.json`) splits are in [data/stereoset](./data/stereoset).
- Configurations are in [config](./config). Partial parameters to be edited are presented in [editor](./config/editor). The configurations, like weights to be edited, are in [model](config/model).
- Experimental scripts are in [scripts](./scripts). All hyper-parameters are in the scripts. Since hyper-parameters have a great effect on hyper-network tuning, higly recommand conducting hyper-paramter tuning.
- For the ablation study on the remaining loss, set `editor.loc_coef=0`.
- Metrics can be found in the training log.
🚀 Debiasing with Editor Networks
- Set `eval_only=True`
- Set `data.valid_path` as the path of the test set
- Metrics can be found at the end of the debiasing log, like "Test ------- XXX".
- Experimental scripts are in [scripts](./scripts).
👀 Bias Tracing
Enter [bias_tracing](./bias_tracing).
📝 Citation
If this code or paper was useful, please consider using the following citation:
@article{xin25BiasEdit,
title={BiasEdit: Debiasing Stereotyped Language Models via Model Editing},
author={Xin Xu, Wei Xu, Ningyu Zhang, Julian McAuley},
year={2025},
url={https://arxiv.org/pdf/2503.08588}
}
✨ Acknowledgements
- Thanks for the original code from [MALMEN](https://github.com/ChenmienTan/malmen).
- Thanks for StereoSet and all the baselines from [bias-bench](https://github.com/McGill-NLP/bias-bench).
- For more model editing methods, please try [EasyEdit](https://github.com/zjunlp/EasyEdit).