https://github.com/bigscience-workshop/shadesofbias

Evaluation for Shades of Bias in Text
https://github.com/bigscience-workshop/shadesofbias

Last synced: 4 months ago
JSON representation

Evaluation for Shades of Bias in Text

Host: GitHub
URL: https://github.com/bigscience-workshop/shadesofbias
Owner: bigscience-workshop
Created: 2024-06-10T15:42:36.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-10-21T23:31:16.000Z (over 1 year ago)
Last Synced: 2024-11-11T03:13:00.352Z (over 1 year ago)
Language: HTML
Size: 10.2 MB
Stars: 0
Watchers: 6
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ShadesofBias
This repository provides scripts and code use in the [Shades of Bias in Text Dataset](https://huggingface.co/datasets/LanguageShades/BiasShades).
It includes code for processing the data, and for evaluation to measure bias in Language Models across languages.

## Data Processing

**process_dataset/map_dataset.py** takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and normalizes/formats to produce https://huggingface.co/datasets/LanguageShades/BiasShadesRaw

**process_dataset/extract_vocabulary.py** takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and aligns each statement to its corresponding template slots, printing out results -- and how well the alignment worked -- in https://huggingface.co/datasets/LanguageShades/LanguageCorrections

## Evaluation

### HF Endpoints
To use HF Endpoint navigate to [Shades](https://ui.endpoints.huggingface.co/LanguageShades/endpoints) if you have access. If not copy the .env file in your root directory.

### Example Script
Run `example_logprob_evaluate.py` to iterate through the dataset for a given model and compute log probability of biased sentences. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.

Run `generation_evaluate.py` to iterate through the dataset, with each instance formatted with a specified prompt from `prompts/`. It is possible to specify a prompt language that is different from the original language. Prompt language will be set to Enlish unless further specified. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.

#### Add more prompts
Follow the examples in `prompts/` to create a `.txt` file for new prompt. Input field should be indicated with `{input}` in the text file.

### Base Models
Current [Proposed Model List](https://docs.google.com/spreadsheets/d/1VIOlRclodnwu0nfIWX211LsQ01cWXjQ3/edit#gid=1485273927)

### 'Aligned' models
Todo

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bigscience-workshop/shadesofbias

Awesome Lists containing this project

README