Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seonglae/emgsd-hermes
Steering GPT2-EMGSD less biased & Generating stereotyped text with vanilla GPT2 without fine tuning or prompt engineering
https://github.com/seonglae/emgsd-hermes
bias-correction bias-mitigation emgsd gpt2 sparse-autoencoder steering-vector stereotype
Last synced: 24 days ago
JSON representation
Steering GPT2-EMGSD less biased & Generating stereotyped text with vanilla GPT2 without fine tuning or prompt engineering
- Host: GitHub
- URL: https://github.com/seonglae/emgsd-hermes
- Owner: seonglae
- Created: 2024-11-23T10:10:36.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-12-01T02:35:20.000Z (about 1 month ago)
- Last Synced: 2024-12-01T03:24:34.497Z (about 1 month ago)
- Topics: bias-correction, bias-mitigation, emgsd, gpt2, sparse-autoencoder, steering-vector, stereotype
- Language: Jupyter Notebook
- Homepage: https://www.canva.com/design/DAGXTdRh__E/MfKS_-4Px4iXNyCT89Sh5g/edit
- Size: 425 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# EMGSD Hermes
This project explores bias mitigation in GPT2-EMGSD, leveraging correlation analysis for stereotype deduction and activation manipulation, highlighting the potential of an alternative to traditional fine-tuning. Additionally, it demonstrates the feasibility of inducing bias in vanilla GPT2 through activation engineering.## Fast Demo
```bash
# Install python 3.10 which is required by SAE-Lens
git clone https://github.com/seonglae/emgsd-hermes && cd emgsd-hermes
pip install torch colorama sae-lens transformers
python compare.py
```## Main Pipeline
TBA
### 1. Fine-tuning SAE with EMGSD dataset
```bash
python empsd.py
```
### 2. Extract features using correlation
```bash
python search_category.py
python search_stereo.py
# replace emgsd/*.json files
python draw_corr.py
```or if you want to calculate mutual information
```
python mi_stereo.py
```### 3. Compute ratio of stereotyped text in generation
```bash
python compare_all.py
```
## Loss Graph of fine-tuning SAE