{"id":31173279,"url":"https://github.com/ramyalab/pluralistic-alignment","last_synced_at":"2025-09-19T12:47:48.510Z","repository":{"id":280073998,"uuid":"869843114","full_name":"RamyaLab/pluralistic-alignment","owner":"RamyaLab","description":"The open-source repository for PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment.","archived":false,"fork":false,"pushed_at":"2025-08-28T04:46:05.000Z","size":8418,"stargazers_count":5,"open_issues_count":2,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-08-28T11:30:53.978Z","etag":null,"topics":["ai-alignment","pluralistic-alignment","rlhf"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RamyaLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-09T01:43:02.000Z","updated_at":"2025-08-28T04:46:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"4270fbef-64ad-45b0-b06f-a0cd4982db82","html_url":"https://github.com/RamyaLab/pluralistic-alignment","commit_stats":null,"previous_names":["ramyalab/pluralistic-alignment"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/RamyaLab/pluralistic-alignment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamyaLab%2Fpluralistic-alignment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamyaLab%2Fpluralistic-alignment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamyaLab%2Fpluralistic-alignment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamyaLab%2Fpluralistic-alignment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RamyaLab","download_url":"https://codeload.github.com/RamyaLab/pluralistic-alignment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamyaLab%2Fpluralistic-alignment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275941511,"owners_count":25556975,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-19T02:00:09.700Z","response_time":108,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-alignment","pluralistic-alignment","rlhf"],"created_at":"2025-09-19T12:47:44.762Z","updated_at":"2025-09-19T12:47:48.480Z","avatar_url":"https://github.com/RamyaLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PAL: \u003cspan style=\"font-size: 1.5em; color: #1E90FF;\"\u003eP\u003c/span\u003eluralistic \u003cspan style=\"font-size: 1.5em; color: #1E90FF;\"\u003eAL\u003c/span\u003eignment Framework\n[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)\n\n## 📝 [PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment](https://openreview.net/pdf?id=1kFDrYCuSu)\n\n[Daiwei Chen](https://chendaiwei-99.github.io), [Yi Chen](https://www.deepneural.network/), [Aniket Rege](https://aniketrege.github.io/), [Zhi Wang](https://zwang.org/), [Ramya Korlakai Vinayak](https://ramyakv.github.io/)\n\n[ 🌐 [PAL Project Page](https://pal-alignment.github.io/) ] [ 📜 [arXiv](https://arxiv.org/abs/2406.08469) ]\n\n[ 📊 [Persona Dataset](https://huggingface.co/datasets/kitkatdafu/persona_in_pal); [Pick-a-Pic Dataset (embeddings)](https://huggingface.co/datasets/ramya-ml/pickapic-embeds);  [Pick-a-Filter Dataset (embeddings)](https://huggingface.co/datasets/ramya-ml/pick-a-filter-embeds) ] \n\n# 📑 Citation\n\nIf you find **\u003cu\u003e*PAL*\u003c/u\u003e** useful for your research and applications, please consider citing:\n\n```\n@inproceedings{chen2025pal,\n      title={{PAL}: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment},\n      author={Chen, Daiwei and Chen, Yi and Rege, Aniket and Wang, Zhi and Vinayak, Ramya Korlakai},\n      booktitle={The Thirteenth International Conference on Learning Representations},\n      year={2025},\n      url={https://openreview.net/forum?id=1kFDrYCuSu}\n}\n\n@misc{chen2024palpluralisticalignmentframework,\n      title={PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences}, \n      author={Daiwei Chen and Yi Chen and Aniket Rege and Ramya Korlakai Vinayak},\n      year={2024},\n      eprint={2406.08469},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2406.08469}, \n}\n```\n\n# 📰 News\n\n- NEW 🔥 01/21/2025: **PAL** has been accepted at **ICLR 2025**.\n-  10/09/2024: **PAL** has been accepted at **NeurIPS 2024** workshops: [AFM](https://adaptive-foundation-models.org/), [Behavioral ML](https://sites.google.com/view/behavioralml/), [FITML](https://sites.google.com/view/neurips2024-ftw/home), [Pluralistic-Alignment](https://pluralistic-alignment.github.io/), [SoLaR](https://solar-neurips.github.io/).\n- 06/18/2024: **PAL** has been accepted at **ICML 2024** workshops: [TF2M](https://sites.google.com/view/tf2m) and [MFHAIA](https://sites.google.com/view/mhf-icml2024).\n\n# 📍 Overview\n\nThis repository contains the code for training and evaluating reward models with ***\u003cu\u003ePAL\u003c/u\u003e***, a sample-efficient framework for **P**luralistic **AL**ignment of foundation models. PAL enables efficient reward modeling that caters to \u003cu\u003e***diverse human preferences***\u003c/u\u003e, allowing for **personalized adaptation** in both text and image generation tasks. The model balances *\u003cu\u003e**commonalities across users**\u003c/u\u003e* with individual-specific customizations, achieving *\u003cu\u003e**few-shot localization**\u003c/u\u003e* for new users and reducing the *\u003cu\u003e**sample requirements** to adapt to new users\u003c/u\u003e*.\n\n![Ideal Point Model Explained](img.png)\n\n# 💬 Contents\n\n- [Overview](#overview📍)\n- [Key Features](#🎯-Key-Features)\n- [Installation](#💻-installation)\n- [Usage](#🧰-usage)\n  - [Data Preparation](##data-preparation)\n  - [Configurations](##configurations)\n  - [Training](##training)\n  - [Integration](##integration)\n- [Citation](#📑-citation)\n\n# 🎯 Key Features\n\n💠 \u003cspan style=\"color:lightblue; font-weight:bold;\"\u003eDiverse Preference Alignment\u003c/span\u003e: PAL can handle diverse human preferences rather than assuming a single, universal preference, addressing the variability in individual values.\n\n💠 \u003cspan style=\"color:lightblue; font-weight:bold;\"\u003eAccuracy-Compute Optimality\u003c/span\u003e: e.g. on Reddit TL;DR (Text Summarization), PAL is 1.7% more accurate for seen users and 36% more accurate for unseen users with 20× fewer parameters compared to strong baselines.\n\n💠  \u003cspan style=\"color:lightblue; font-weight:bold;\"\u003eModular Design\u003c/span\u003e: PAL's architecture is modular, allowing levaraging of shared, common preferences while adapting to specific individual preferences.\n\n💠 \u003cspan style=\"color:lightblue; font-weight:bold;\"\u003eFew-shot Generalization\u003c/span\u003e: PAL enables sample-efficient adaptation to new users' preferences with few labeled examples.\n\n# 💻 Installation\n\n\u003e All code has been tested on Linux with `CUDA=12.4`; functionality on other systems  is not guaranteed.\n\n1. Clone this repository and navigate into the directory\n\n   ```shell\n   git clone https://github.com/RamyaLab/pluralistic-alignment.git\n   cd pluralistic-alignment\n   ```\n\n2. Install required packages with [conda](https://docs.anaconda.com/anaconda/install/)\n\n   ``` sh\n   conda env create --file environment.yml\n   ```\n\n\u003e Note: Ensure your environment supports **PyTorch** and **CUDA** (if you are using GPU acceleration). The `environment.yml` contains detailed package versions and setup instructions.\n\n # 🧰 Usage\n\n\n\n ```mermaid\ngraph LR\n    A[Prepare the Preference Dataset with user IDs] --\u003e B[Design Configurations]\n    B --\u003e C[Train the PAL Model]\n    C --\u003e D[Convert PAL Model into Standard Reward Model]\n    D --\u003e E[Ready for Further Applications]\n ```\n\n## Data Preparation\n\nWhen preparing a dataset of preferences to train and evaluate PAL reward models, each sample should also contain a unique `user_id` to learn each user's preference. The format of each sample should be `(user_id, prompt, (response_1), (response_2)),y`, where `user_id` is a unique string identifier for a specific user;  `prompt`, `response_1`, `response_2` are the prompt and corresponding generative model completions,  $y\\in \\\\{-1, +1\\\\}$ represents the user's preference over responses. **\u003cu\u003e*(Notice:  modify `dataset_factory.py` to add your own custom dataset)*\u003c/u\u003e**.\n\n🎯 *\u003cu\u003eFor more details about dataset preparation, please refer to `dataset_factory.py`.\u003c/u\u003e*\n\n\u003e We currently only provide \u003cu\u003e*the variant of Reddit TL;DR Summary Dataset*\u003c/u\u003e used in the PAL paper in this repository.\n\n## Training Configurations\n\nThe [config](config/) folder in this repository contains various configurations that allow for easy training customization. These configuration subfolders are  `ds_config`, `loss_config`, `optim_config`, and `prefLearner_config`.\n\n🎯 \u003cu\u003e*For more details, please review each file individually.*\u003c/u\u003e\n\n```\nconfig\n├── ds_config\n│   ├── summary.yaml\n│   └── ...\n├── loss_config\n│   ├── b-cumulative.yaml\n│   └── ...\n├── optim_config\n│   ├── vanilla-e20.yaml\n│   └── ...\n└── prefLearner_config\n    ├── b-dim512-k2-opt350m-mlp2.yaml\n    ├── b-dim768-k2-distillbert65m-mlp2.yaml\n    ├── b-dim1024-k2-bgem3-mlp2.yaml\n    ├── b-dim1536-k2-qwen1-5b-mlp2.yaml\n    ├── b-dim1536-k2-stella1-5b-mlp2.yaml\n    └── ...\n```\n\n## Training Demos\n\nThe following scripts outline different training demos for PAL-B models, targeting various levels of model adaptation and user generalization.\n\n### 1. Train PAL-B-Large\n\nThis script trains the PAL-B-Large model by finetuning the foundation model, projector, and user weights.\n\n```sh\n# Train PAL-B-Large (Large: finetune the foundation + projectors + user weights)\nCUDA_VISIBLE_DEVICES=0 python -u main_pal_b.py \\\n  --prefLearner_config ./config/prefLearner_config/b-dim512-k2-opt350m-mlp2.yaml \\\n  --run_name summary-pal-b-large-k2-mlp2 \\\n  2\u003e\u00261 \u003e| ./logs/summary-pal-b-large-k2-mlp.log \n```\n\n### 2. Train PAL-B-Tiny\n\nThis script trains the PAL-B-Tiny model by fixing the foundation model and only learning the projector and user weights.\n\n```sh\n# Train PAL-B-Tiny (Tiny: fix the foundation model and only learn the projectors + user weights)\nCUDA_VISIBLE_DEVICES=1 python -u main_pal_b_fix_llm.py \\\n  --prefLearner_config ./config/prefLearner_config/b-dim512-k2-opt350m-mlp2.yaml \\\n  --run_name summary-pal-b-tiny-k2-mlp2 \\\n  2\u003e\u00261 \u003e| ./logs/summary-pal-b-tiny-k2-mlp2.log \n```\n\n### 3. Train PAL-B (Few-Shot) on New Users\n\nThis script performs new user generalization by adapting the model to unseen users. It only learns the weights of new users based on a few samples per user.\n\n```sh\n# New User Generalization with n samples per unseen user (Only learn the weights of new users)\nCUDA_VISIBLE_DEVICES=2 python -u main_pal_b_unseen.py \\\n  --ds_config ./config/ds_config/summary_unseen_{num_of_samples_per_unseen_user}samples.yaml \\\n  --prefLearner_config ./config/prefLearner_config/b-dim512-k2-opt350m-mlp2.yaml \\\n  --optim_config ./config/optim_config/vanilla-e20.yaml \\\n  --loss_config ./config/loss_config/b-cumulative.yaml \\\n  --state_dict_path /path/to/the/well-trained/pal/model.ckpt \\\n  --run_name summary-unseen-pal-b-cumulative-k2-mlp2-e20-{num_of_samples_per_unseen_user}sample \\\n  2\u003e\u00261 \u003e| ./logs/summary-unseen-pal-b-k2-mlp2-{num_of_samples_per_unseen_user}sample.log\n```\n\n\n## Experiment Reproduction\nNote: by default, we use five runs $i$ of the below experiments to calculate error bars.\n### 1: On Reddit TL;DR Summary dataset, increasing # groups (i.e. the plurality) in our PAL model leads to a significant boost in preference prediction accuracy.\n```sh\n  # set # user preference groups to 1\n  for i in {1..5}; do\n      CUDA_VISIBLE_DEVICES=0 python -u main_pal_b.py \\\n          --prefLearner_config ./config/prefLearner_config/b-dim512-k1-opt350m-mlp2.yaml \\\n          --run_name summary-b-cumulative-k1-mlp2-run${i} \\\n          --device 0 \\\n          2\u003e\u00261 \u003e| ./logs/summary-b-cumulative-k1-mlp2-${i}.log\n  done\n\n  # set # user preference groups to 2\n  for i in {1..5}; do\n      CUDA_VISIBLE_DEVICES=1 python -u main_pal_b.py \\\n          --prefLearner_config ./config/prefLearner_config/b-dim512-k2-opt350m-mlp2.yaml \\\n          --run_name summary-b-cumulative-k2-mlp2-run${i} \\\n          --device 0 \\\n          2\u003e\u00261 \u003e| ./logs/summary-b-cumulative-k2-mlp2-${i}.log\n  done\n```\n### 2. With the freshly trained PAL model above, we can generalize to new, unseen users with very few preference pairs $j$\n```sh\nfor j in 2 5 10 20 50 100; do\n  for i in {1..5}; do\n      CUDA_VISIBLE_DEVICES=1 python -u main_pal_b_unseen.py \\\n          --ds_config ./config/ds_config/summary_unseen_${j}samples.yaml \\\n          --prefLearner_config ./config/prefLearner_config/b-dim512-k2-opt350m-mlp2.yaml \\\n          --optim_config ./config/optim_config/vanilla-e20.yaml \\\n          --loss_config ./config/loss_config/b-cumulative.yaml \\\n          --state_dict_path /path/to/the/model(k=2)/trained/in/stage/1.ckpt \\\n          --run_name summary-unseen-b-cumulative-k2-mlp2-e20-sample${j}-run${i}\n  done\ndone\n```\n\n\n\n## Integration\n\nWe provide code in the [integration](integration/) subfolder to **convert trained PAL models into standard reward model** for downstram use, e.g. RLHF to train generative models.\n\n\u003e In the standard reward model setup, the reward model takes a prompt and response as input and generates a scalar reward value. \n\u003e In contrast, our PAL model takes a prompt, two responses, and a user ID as input to predict the user’s preference over the two responses given the prompt.\n\nTo convert the PAL model to the standard scalar reward model, use the following functions:\n\n- `load_pal_rm_a()`\n- `load_pal_rm_b()`\n\n\n## 🤗 Model Downloads\n\n[PAL-B-Large-OPT-350m](https://huggingface.co/daiweichen/pal-b-large-opt-350m) trained on the variant of Reddit TL;DR Summary Dataset \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Framyalab%2Fpluralistic-alignment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Framyalab%2Fpluralistic-alignment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Framyalab%2Fpluralistic-alignment/lists"}