Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/seonglae/emgsd-hermes

Steering GPT2-EMGSD less biased & Generating stereotyped text with vanilla GPT2 without fine tuning or prompt engineering
https://github.com/seonglae/emgsd-hermes

bias-correction bias-mitigation emgsd gpt2 sparse-autoencoder steering-vector stereotype

Last synced: 24 days ago
JSON representation

Steering GPT2-EMGSD less biased & Generating stereotyped text with vanilla GPT2 without fine tuning or prompt engineering

Awesome Lists containing this project

README

        

# EMGSD Hermes
This project explores bias mitigation in GPT2-EMGSD, leveraging correlation analysis for stereotype deduction and activation manipulation, highlighting the potential of an alternative to traditional fine-tuning. Additionally, it demonstrates the feasibility of inducing bias in vanilla GPT2 through activation engineering.

image

## Fast Demo
```bash
# Install python 3.10 which is required by SAE-Lens
⁠⁠⁠git clone ⁠ https://github.com/seonglae/emgsd-hermes && cd emgsd-hermes
p⁠ip install torch colorama sae-lens transformers
python compare.py
```

## Main Pipeline
TBA
### 1. Fine-tuning SAE with EMGSD dataset
```bash
python empsd.py
```
### 2. Extract features using correlation
```bash
python search_category.py
python search_stereo.py
# replace emgsd/*.json files
python draw_corr.py
```
image

or if you want to calculate mutual information
```
python mi_stereo.py
```

### 3. Compute ratio of stereotyped text in generation
```bash
python compare_all.py
```

image
image

## Loss Graph of fine-tuning SAE
image