https://github.com/openai/safety-rbr-code-and-data

Code and example data for the paper: Rule Based Rewards for Language Model Safety
https://github.com/openai/safety-rbr-code-and-data

Last synced: 3 months ago
JSON representation

Code and example data for the paper: Rule Based Rewards for Language Model Safety

Host: GitHub
URL: https://github.com/openai/safety-rbr-code-and-data
Owner: openai
License: mit
Created: 2024-07-19T23:15:09.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-07-19T23:24:59.000Z (10 months ago)
Last Synced: 2025-01-26T04:07:28.631Z (3 months ago)
Language: Jupyter Notebook
Size: 4.12 MB
Stars: 176
Watchers: 1
Forks: 16
Open Issues: 2
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - openai/safety-rbr-code-and-data

README

# Safety RBR Gold Dataset and Weight Fitting Code

**Warning: Content may include language related to racism, erotic themes, self-harm, or other offensive material.**

This directory contains complementary code and data for the paper: Rule Based Rewards for Language Model Safety

It contains:

- Our Safety RBR gold dataset, the small set of human data we used in the this experiment. This dataset was used for prompt tuning and calculating the accuracy of prompt+LLM grader (ex. Table 13 in the paper.) The data lives in `data/rbr_gold_data/` and the notebook `analyze_RBR_gold_data.ipynb` gives further examples for loading the data.
- Our code for fitting the RBR weights (`rbr_weight_fitter.py`) along with an example `weight_fitting_example.ipynb` of usage and visualization.
- Some example synthetic data and reward model scores to demonstrate the usage of the weight fitting code (`data/weight_fitting_data/`)

A good starting place is the two notebooks we provide:

## Notebooks

1. Weight Fitting Example (`weight_fitting_example.ipynb`): This notebook provides an example of using the RBR weight fitting code given (`rbr_weight_fitter.py`) using the example synthetic data we provide. It demonstrates how to load data, fit weights, and visualize the results.
2. RBR Gold Data (`rbr_gold_data.ipynb`): This notebook covers the RBR Gold dataset, a small set of human-labelled data used for prompt tuning and prompt+LLM grader accuracy calculations. It includes example code for loading the data and some very basic statistical analysis.

## License

We are releasing this code and data under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openai/safety-rbr-code-and-data

Awesome Lists containing this project

README