https://github.com/danieldacosta/prompt-learning-bias

Apply the newly emerging field of prompt engineering to identify and measure social bias in language models
https://github.com/danieldacosta/prompt-learning-bias

Last synced: over 1 year ago
JSON representation

Apply the newly emerging field of prompt engineering to identify and measure social bias in language models

Host: GitHub
URL: https://github.com/danieldacosta/prompt-learning-bias
Owner: DanielDaCosta
Created: 2024-03-24T02:56:07.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-04-19T00:02:50.000Z (over 2 years ago)
Last Synced: 2025-01-11T01:10:37.816Z (over 1 year ago)
Language: Jupyter Notebook
Size: 188 KB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# prompt-learning-bias
Apply the newly emerging field of prompt engineering to identify and measure social bias in language models

# Custom Dataset
Created custom prompts for detecting bias on BERT, ALBERT and ROBERTA. The dataset follows the same format used in the CrowS-Pairs dataset (https://github.com/nyu-mll/crows-pairs/blob/master/data/crows_pairs_anonymized.csv).

Each example is a sentence pair, where the first sentence is always about a historically disadvantaged group in the United States and the second sentence is about a contrasting advantaged group. The first sentence can _demonstrate_ or _violate_ a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:
- `sent_more`: The sentence which is more stereotypical.
- `sent_less`: The sentence which is less stereotypical.
- `stereo_antistereo`: The stereotypical direction of the pair. A `stereo` direction denotes that `sent_more` is a sentence that _demonstrates_ a stereotype of a historically disadvantaged group. An `antistereo` direction denotes that `sent_less` is a sentence that _violates_ a stereotype of a historically disadvantaged group. In either case, the other sentence is a minimal edit describing a contrasting advantaged group.
- `bias_type`: The type of biases present in the example.
- `annotations`: The annotations of bias types from crowdworkers.
- `anon_writer`: The _anonymized_ id of the writer.
- `anon_annotators`: The _anonymized_ ids of the annotators.

# Evaluation Metric
For the evaluation metric with use use pseudo-log-likehood MLM scoring. Original source code: https://github.com/nyu-mll/crows-pairs/blob/master/metric.py

# Next Steps
1. Expand custom dataset to 100 samples
2. Re-evaluate MLM scoring metric in all of them
3. Expand it the metric to Auto-Regressive models: GPT-2 => We'll need to modify the original code

# References
https://github.com/nyu-mll/crows-pairs/tree/master

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/danieldacosta/prompt-learning-bias

Awesome Lists containing this project

README