https://github.com/zjunlp/biasedit

[TrustNLP@NAACL 2025] BiasEdit: Debiasing Stereotyped Language Models via Model Editing
https://github.com/zjunlp/biasedit

artificial-intelligence bias biasedit debiasing editing knowledge-editing large-language-models llm model-editing natural-language-processing stereotypes

Last synced: about 1 year ago
JSON representation

[TrustNLP@NAACL 2025] BiasEdit: Debiasing Stereotyped Language Models via Model Editing

Host: GitHub
URL: https://github.com/zjunlp/biasedit
Owner: zjunlp
Created: 2024-06-23T15:18:36.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2025-04-19T02:52:48.000Z (about 1 year ago)
Last Synced: 2025-04-19T11:09:39.456Z (about 1 year ago)
Topics: artificial-intelligence, bias, biasedit, debiasing, editing, knowledge-editing, large-language-models, llm, model-editing, natural-language-processing, stereotypes
Language: Python
Homepage: https://zjunlp.github.io/project/BiasEdit
Size: 3.15 MB
Stars: 11
Watchers: 3
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # BiasEdit: Debiasing Stereotyped Language Models via Model Editing



  📃 Paper 

  💻 Code 

  🌏 Web 





**BiasEdit** is an efficient *model editing* method to eliminate stereotyped bias from language models with small editor networks, including a *debiasing loss* to guide edits on partial parameters and a *remaining loss* to maintain the language modeling abilities during editing. Experimental results show BiasEdit' excellent performance on debiasing, modeling ability preservation, and robustness of gender reverse and semantic generality.

## 📌 Table of Contents

- [🛠️ Setup](#1)

- [💻 BiasEdit](#2)

    - [⌚️ Training Editor Networks](#2.1)

    - [🚀 Debiasing with Editor Networks](#2.2)

- [👀 Bias Tracing](#3)

- [📝 Citation](#4)

- [✨ Acknowledgements](#5)

🛠️ Setup


This codebase uses Python 3.9.18. Other versions may work as well.

Create an environment

and install the dependencies:

    $ conda create -n biasedit python=3.9

    $ conda activate biasedit

    (biasedit) $ pip install -r requirements.txt

💻 BiasEdit



With [StereoSet](https://aclanthology.org/2021.acl-long.416/), editor networks are trained to generate parameter shifts for debiasing at first. Then, the trained editor networks are used to conduct edits on language models and produce an unbiased model.

⌚️ Training Editor Networks


- Formatted datasets with [train](./data/stereoset/train.json)/[dev](./data/stereoset/dev.json)/test (`gender_test.json`, `race_test.json`, `religion_test.json`) splits are in [data/stereoset](./data/stereoset). 

- Configurations are in [config](./config). Partial parameters to be edited are presented in [editor](./config/editor). The configurations, like weights to be edited, are in [model](config/model).

- Experimental scripts are in [scripts](./scripts). All hyper-parameters are in the scripts. Since hyper-parameters have a great effect on hyper-network tuning, higly recommand conducting hyper-paramter tuning.

- For the ablation study on the remaining loss, set `editor.loc_coef=0`.

- Metrics can be found in the training log.

🚀 Debiasing with Editor Networks


- Set `eval_only=True`

- Set `data.valid_path` as the path of the test set

- Metrics can be found at the end of the debiasing log, like "Test ------- XXX".

- Experimental scripts are in [scripts](./scripts).

👀 Bias Tracing


Enter [bias_tracing](./bias_tracing).

📝 Citation


If this code or paper was useful, please consider using the following citation:

    @article{xin25BiasEdit,

        title={BiasEdit: Debiasing Stereotyped Language Models via Model Editing},

        author={Xin Xu, Wei Xu, Ningyu Zhang, Julian McAuley},

        year={2025},

        url={https://arxiv.org/pdf/2503.08588}

    }

✨ Acknowledgements

- Thanks for the original code from [MALMEN](https://github.com/ChenmienTan/malmen).

- Thanks for StereoSet and all the baselines from [bias-bench](https://github.com/McGill-NLP/bias-bench).

- For more model editing methods, please try [EasyEdit](https://github.com/zjunlp/EasyEdit).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zjunlp/biasedit

Awesome Lists containing this project

README

🛠️ Setup

💻 BiasEdit

⌚️ Training Editor Networks

🚀 Debiasing with Editor Networks

👀 Bias Tracing

📝 Citation