https://github.com/bigscience-workshop/shadesofbias
Evaluation for Shades of Bias in Text
https://github.com/bigscience-workshop/shadesofbias
Last synced: about 1 year ago
JSON representation
Evaluation for Shades of Bias in Text
- Host: GitHub
- URL: https://github.com/bigscience-workshop/shadesofbias
- Owner: bigscience-workshop
- Created: 2024-06-10T15:42:36.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-10-21T23:31:16.000Z (over 1 year ago)
- Last Synced: 2024-11-11T03:13:00.352Z (over 1 year ago)
- Language: HTML
- Size: 10.2 MB
- Stars: 0
- Watchers: 6
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ShadesofBias
This repository provides scripts and code use in the [Shades of Bias in Text Dataset](https://huggingface.co/datasets/LanguageShades/BiasShades).
It includes code for processing the data, and for evaluation to measure bias in Language Models across languages.
## Data Processing
**process_dataset/map_dataset.py** takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and normalizes/formats to produce https://huggingface.co/datasets/LanguageShades/BiasShadesRaw
**process_dataset/extract_vocabulary.py** takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and aligns each statement to its corresponding template slots, printing out results -- and how well the alignment worked -- in https://huggingface.co/datasets/LanguageShades/LanguageCorrections
## Evaluation
### HF Endpoints
To use HF Endpoint navigate to [Shades](https://ui.endpoints.huggingface.co/LanguageShades/endpoints) if you have access. If not copy the .env file in your root directory.
### Example Script
Run `example_logprob_evaluate.py` to iterate through the dataset for a given model and compute log probability of biased sentences. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.
Run `generation_evaluate.py` to iterate through the dataset, with each instance formatted with a specified prompt from `prompts/`. It is possible to specify a prompt language that is different from the original language. Prompt language will be set to Enlish unless further specified. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.
#### Add more prompts
Follow the examples in `prompts/` to create a `.txt` file for new prompt. Input field should be indicated with `{input}` in the text file.
### Base Models
Current [Proposed Model List](https://docs.google.com/spreadsheets/d/1VIOlRclodnwu0nfIWX211LsQ01cWXjQ3/edit#gid=1485273927)
### 'Aligned' models
Todo