Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/openai/safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
https://github.com/openai/safety-rbr-code-and-data
Last synced: 3 days ago
JSON representation
Code and example data for the paper: Rule Based Rewards for Language Model Safety
- Host: GitHub
- URL: https://github.com/openai/safety-rbr-code-and-data
- Owner: openai
- License: mit
- Created: 2024-07-19T23:15:09.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-19T23:24:59.000Z (6 months ago)
- Last Synced: 2025-01-04T21:12:00.769Z (10 days ago)
- Language: Jupyter Notebook
- Size: 4.12 MB
- Stars: 172
- Watchers: 1
- Forks: 16
- Open Issues: 2
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - openai/safety-rbr-code-and-data
README
# Safety RBR Gold Dataset and Weight Fitting Code
**Warning: Content may include language related to racism, erotic themes, self-harm, or other offensive material.**
This directory contains complementary code and data for the paper: Rule Based Rewards for Language Model Safety
It contains:
- Our Safety RBR gold dataset, the small set of human data we used in the this experiment. This dataset was used for prompt tuning and calculating the accuracy of prompt+LLM grader (ex. Table 13 in the paper.) The data lives in `data/rbr_gold_data/` and the notebook `analyze_RBR_gold_data.ipynb` gives further examples for loading the data.
- Our code for fitting the RBR weights (`rbr_weight_fitter.py`) along with an example `weight_fitting_example.ipynb` of usage and visualization.
- Some example synthetic data and reward model scores to demonstrate the usage of the weight fitting code (`data/weight_fitting_data/`)A good starting place is the two notebooks we provide:
## Notebooks
1. Weight Fitting Example (`weight_fitting_example.ipynb`): This notebook provides an example of using the RBR weight fitting code given (`rbr_weight_fitter.py`) using the example synthetic data we provide. It demonstrates how to load data, fit weights, and visualize the results.
2. RBR Gold Data (`rbr_gold_data.ipynb`): This notebook covers the RBR Gold dataset, a small set of human-labelled data used for prompt tuning and prompt+LLM grader accuracy calculations. It includes example code for loading the data and some very basic statistical analysis.## License
We are releasing this code and data under the MIT License.