{"id":27948316,"url":"https://github.com/pku-alignment/aligner","last_synced_at":"2025-05-07T14:57:33.773Z","repository":{"id":221116788,"uuid":"753222398","full_name":"PKU-Alignment/aligner","owner":"PKU-Alignment","description":"[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct","archived":false,"fork":false,"pushed_at":"2025-01-16T19:05:01.000Z","size":17061,"stargazers_count":170,"open_issues_count":1,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-07T14:57:27.017Z","etag":null,"topics":["aisafety","aligner","alignment","interpretability","llm","mecinterp","rlhf","weak-to-strong"],"latest_commit_sha":null,"homepage":"https://pku-aligner.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PKU-Alignment.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-05T17:48:43.000Z","updated_at":"2025-05-01T04:30:59.000Z","dependencies_parsed_at":"2024-03-12T12:46:15.983Z","dependency_job_id":"d65d88a6-b883-4c00-b918-4021edc281a9","html_url":"https://github.com/PKU-Alignment/aligner","commit_stats":null,"previous_names":["aligner2024/aligner","pku-aligner/aligner","cby-pku/aligner"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2Faligner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2Faligner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2Faligner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKU-Alignment%2Faligner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PKU-Alignment","download_url":"https://codeload.github.com/PKU-Alignment/aligner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252902623,"owners_count":21822257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aisafety","aligner","alignment","interpretability","llm","mecinterp","rlhf","weak-to-strong"],"created_at":"2025-05-07T14:57:33.262Z","updated_at":"2025-05-07T14:57:33.759Z","avatar_url":"https://github.com/PKU-Alignment.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e(NeurIPS 2024 Oral) Aligner: Efficient Alignment by \u003cbr\u003e Learning to Correct   \u003c/h1\u003e\n\nThis repository contains the source code for our NeurIPS 2024 paper [Aligner: Efficient Alignment by Learning to Correct](https://arxiv.org/abs/2402.02416).\n\n\n[Jiaming Ji*](https://jijiaming.com/), [Boyuan Chen*](https://cby-pku.github.io/), [Hantao Lou](https://htlou.github.io/), [Donghai Hong](https://scholar.google.com/citations?user=JQx-_5gAAAAJ), [Borong Zhang](https://github.com/muchvo), [Xuehai Pan](https://github.com/XuehaiPan), [Juntao Dai](https://scholar.google.com/citations?user=eRmX5AsAAAAJ\u0026hl=zh-CN), [Tianyi Qiu](https://tianyiqiu.net/) and [Yaodong Yang](https://www.yangyaodong.com/)\n\nWork done by [PKU-Alignment Team](https://github.com/PKU-Alignment)\n\n## Abstract\nWith the rapid development of large language models (LLMs) and ever-evolving practical requirements, finding an efficient and effective alignment method has never been more critical. However, the tension between the complexity of current alignment methods and the need for rapid iteration in deployment scenarios necessitates the development of a model-agnostic alignment approach that can operate under these constraints. In this paper, we introduce *Aligner*, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model.\nDesigned as a model-agnostic, plug-and-play module, *Aligner* can be directly applied to various open-source and API-based models with only one-off training, making it suitable for rapid iteration.\nNotably, *Aligner* can be applied to any powerful, large-scale upstream models. \nMoreover, it can even iteratively bootstrap the upstream models using corrected responses as synthetic human preference data, breaking through the model's performance ceiling.\nOur experiments demonstrate performance improvements by deploying the same *Aligner* model across 11 different LLMs, evaluated on the 3H dimensions (helpfulness, harmlessness, and honesty).\nSpecifically, *Aligner*-7B has achieved an average improvement of 68.9\\% in helpfulness and 22.8\\% in harmlessness across the tested LLMs while also effectively reducing hallucination.\nIn the Alpaca-Eval leaderboard, stacking *Aligner*-2B on GPT-4 Turbo improved its LC Win Rate from 55.0\\% to 58.3\\%, surpassing GPT-4 Omni's 57.5\\% Win Rate (community report).\n\nSee our website for more details : https://pku-aligner.github.io/\n\n## Citation\n\nPlease cite our work if you find it useful and meaningful.\n\n```bibtex\n@inproceedings{ji2024aligner,\n  title={Aligner: Efficient Alignment by Learning to Correct},\n  author={Jiaming Ji and Boyuan Chen and Hantao Lou and Donghai Hong and Borong Zhang and Xuehai Pan and Tianyi Qiu and Juntao Dai and Yaodong Yang},\n  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},\n  year={2024},\n  url={https://openreview.net/forum?id=kq166jACVP}\n}\n```\n\n### Table of Contents  \u003c!-- omit in toc --\u003e\n\n- [\u003cem\u003eAligner\u003c/em\u003e: Efficient Alignment by Learning to Correct](#Aligner)\n- [Installation](#installation)\n- [Training](#training)\n- [Dataset \u0026 Models](#dataset-models)\n- [Acknowledgment](#acknowledgment)\n\n\n## \u003cem\u003eAligner\u003c/em\u003e: Efficient Alignment by Learning to Correct \n\n### Architecture of the *Aligner* module.\nAs a plug-and-play module *Aligner* stack upon an upstream LLM. The *Aligner* redistributes initial answers from the upstream model into more helpful and harmless answers, thus aligning the composed LLM responses with human intentions.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"images/main-paradigm.jpg\" width=\"70%\"/\u003e\n\u003c/div\u003e\n\n### Illustration of its behavior in architecture and semantic space.\nLike a residual block that adds modifications via a shortcut without altering the base structure, the *Aligner* employs a *copy and correct* method to improve the original answer. \nThis analogy highlights the *Aligner*'s dual role in preserving the parameter of the upstream model while enhancing it to align with desired outcomes.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"images/semantic_space.png\" width=\"90%\"/\u003e\n\u003c/div\u003e\n\n### Performance of *Aligner* Models\nIt is shown that *Aligner* achieves significant performances in all the settings. All assessments in this table were conducted based on integrating various models with *Aligner*s to compare with the original models to quantify the percentage increase in the *3H* standard.\nWhen integrated and assessed in conjunction with various upstream models, the *Aligner* requires only a single training session (*i.e.*, the *Aligner* can operate in a zero-shot manner and enhance the performance of all upstream models.)\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"images/performance.png\" width=\"90%\"/\u003e\n\u003c/div\u003e\n\n### More Details\nFor more details, please refer to our [website]( https://pku-aligner.github.io/) \n\n## Installation\nClone the source code from GitHub:\n\n```bash\ngit clone https://github.com/cby-pku/aligner.git\ncd aligner\n```\n\n**Native Runner:** Setup a conda environment using [`conda`](https://github.com/conda/conda) / [`mamba`](https://github.com/mamba-org/mamba):\n\n```bash\nconda env create --file conda-recipe.yaml  # or `mamba env create --file conda-recipe.yaml`\n```\n\n## Training\n\n`aligner` supports a complete pipeline for Aligner \u003cem\u003eresidual correction\u003c/em\u003e training.\n\n0. Follow the instructions in section [Installation](#installation) to setup the training environment properly.\n\n```bash\nconda activate aligner\nexport WANDB_API_KEY=\"...\"  # your W\u0026B API key here\n```\n\n1. Supervised Fine-Tuning (SFT)\n\n```bash\nbash scripts/sft-correction.sh \\\n    --train_datasets \u003cyour-correction-dataset\u003e \\\n    --model_name_or_path \u003cyour-model-name-or-checkpoint-path\u003e \\\n    --output_dir output/sft\n```\n\nNOTE: \n1. You may need to update some of the parameters in the script according to your machine setup, such as the number of GPUs for training, the training batch size, etc. \n2. Your dataset format should be consistent with aligner/template-dataset.json\n3. For the reproduction of more alignment training methods such as DPO or RLHF, please refer to the [Align-Anything](https://github.com/PKU-Alignment/align-anything) or [Safe-RLHF](https://github.com/PKU-Alignment/safe-rlhf) repository.\n\n## Register a new dataset\n\nYou can register a new dataset by following the instructions in the `aligner/training/datasets/raw/correction.py` file.\n\nAnd you can also design your own user prompt to develop for more specifc *Aligner*s, such as Instruct-*Aligner*.\n\nNotice that the whole system prompt is start with `BEGINNING OF CONVERSATION: `, you can refer to `aligner/training/configs/constants.py` for details.\n\n\n## Dataset \u0026 Models\n- [2025/01] We have open-sourced an extended dataset [*AlignerTails*](https://huggingface.co/datasets/aligner/alignertails) from our NeurIPS 2024 paper. Incorporating prompts, answers and corrections generated by state-of-the-art models like GPT-4o and refined by human annotators, the dataset encompasses tasks spanning a wide range of topics, including mathematics, empathy, safety, summarization, planning, and more. Further models will come soon.\n- [2024/01] We have open-sourced a 20K [training dataset](https://huggingface.co/datasets/aligner/aligner-20K) and a [7B Aligner model](https://huggingface.co/aligner/aligner-7b-v1.0).\n\n\n## Acknowledgment\n\nThis repository benefits from [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai), [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), [DeepSpeed](https://github.com/microsoft/DeepSpeed), [DeepSpeed-Chat](https://github.com/microsoft/DeepSpeedExamples/tree/HEAD/applications/DeepSpeed-Chat) and [Safe-RLHF](https://github.com/PKU-Alignment/safe-rlhf).\n\nThanks for their wonderful works and their efforts to further promote LLM research.\nAligner and its related assets are built and open-sourced with love and respect ❤️.\n\nThis work is supported and funded by the Peking University.\n\n\u003ctable width=\"50%\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n  \u003ctr align=\"center\" valign=\"middle\"\u003e\n    \u003ctd width=\"40%\"\u003e\n      \u003ca href=\"https://www.ai.pku.edu.cn/\"\u003e\n        \u003cimg src=\"logo/pku-ai.png\" width=\"100%\"/\u003e\n      \u003c/a\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## License\n\nAligner is released under Apache License 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpku-alignment%2Faligner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpku-alignment%2Faligner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpku-alignment%2Faligner/lists"}