{"id":13568579,"url":"https://github.com/agi-templar/Stable-Alignment","last_synced_at":"2025-04-04T04:31:38.037Z","repository":{"id":170187876,"uuid":"628284849","full_name":"agi-templar/Stable-Alignment","owner":"agi-templar","description":"Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper \"Training Socially Aligned Language Models in Simulated Human Society\".","archived":false,"fork":false,"pushed_at":"2023-06-18T16:43:12.000Z","size":16159,"stargazers_count":330,"open_issues_count":4,"forks_count":17,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-05-21T20:10:27.411Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2305.16960.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agi-templar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-04-15T13:26:19.000Z","updated_at":"2024-05-05T13:13:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"558736dc-dc22-4864-8b49-87d150fa25a2","html_url":"https://github.com/agi-templar/Stable-Alignment","commit_stats":null,"previous_names":["agi-templar/stable-alignment"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agi-templar%2FStable-Alignment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agi-templar%2FStable-Alignment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agi-templar%2FStable-Alignment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agi-templar%2FStable-Alignment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agi-templar","download_url":"https://codeload.github.com/agi-templar/Stable-Alignment/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247123073,"owners_count":20887259,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T14:00:28.542Z","updated_at":"2025-04-04T04:31:33.028Z","avatar_url":"https://github.com/agi-templar.png","language":"Python","funding_links":[],"categories":["CodeBase","RLHFdataset","A01_文本生成_文本对话","Dataset"],"sub_categories":["Efficiency","大语言对话模型及数据","2020 and before"],"readme":"\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"assets/images/logo.gif\" alt=\"Stable Alignment\" style=\"width: 100%; min-width: 400px; display: block; margin: auto;\"\u003e\n\u003c/p\u003e\n\n# Stable Alignment - Alignment Learning in Social Games\n\n[![lint](https://github.com/DapangLiu/SandBox/actions/workflows/code_quality.yml/badge.svg)](https://github.com/DapangLiu/SandBox/blob/main/.github/workflows/code_quality.yml)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\nThis is the official repo for the Stable Alignment project. We aim to provide a RLHF alternative which is superior in alignment performance, highly-efficient in data learning, and easy to deploy in scaled-up settings. Instead of training an extra reward model that can be gamed during optimization, we directly train on the recorded interaction data in simulated social games. We find high-quality data + reliable algorithm is the secret recipe for stable alignment learning.\n\nThe repo contains:\n\n- The code for [running social simulation in Sandbox](#sandbox-simulation).\n- The [169K interaction data](#data-release) used for alignment training.\n- The code for [training with stable alignment](#training-with-stable-alignment).\n- The download for [So(cially)-Good Language Model](#downloading-model).\n\n**Life is a game. Play by your rules!**\n\n\u003cp\u003e\n\u003cimg src=\"assets/images/overview.png\" alt=\"Overview of Stable Alignment\" style=\"width: 100%; min-width: 200px; display: block; margin: auto;\"\u003e\n\u003c/p\u003e\n\n## Sandbox Simulation\n\n### Installation\n```bash\n# install development environment\npip install -r requirements.txt\n# install dependencies for package re-distribution\npip install -e .\n```\n### Simulation Setup\n- Initial data is already stored at `assets/hh-rlhf/labeled_prior.jsonl` (with Git LFS). \n- After a round of simulation, the simulated interaction data and metrics will be saved at `data/cache/world_\u003cworld_id\u003e/`.\n- Place your OpenAI API key in `.env` inside the project root folder.\n\n### Run Simulation\nNavigate to the project root folder and run simulation with customized settings:\n\n```bash\npython stable_alignment/simulation.py \\\n    -model_type 'text-davinci-002' \\\n    -obs_model_type 'gpt-3.5-turbo' \\\n    -world_id 1 \\\n    -init_setting 'all_bad' \\\n    -n_round '2' \\\n    -size '4' \\\n    -dataset_name 'hh-rlhf'\n```\n\nWe present an example simulation result in `assets/sample_world`. It is simulated with 100 text-davinci-003 based social agents and ChatGPT based observer agents. The simulation is run for 50 rounds of interactions.\n\n## Alignment Data Release\n\n\u003cp\u003e\n\u003cimg src=\"assets/images/back_scatter.png\" alt=\"Back Scatter in SandBox\" style=\"width: 100%; min-width: 200px; display: block; margin: auto;\"\u003e\n\u003c/p\u003e\n\nThe alignment data used for training has been already included in the path `assets/sandbox_v1.json` and `assets/sandbox_v2.json`. Note that they are sampled from the full set of interaction data by a ratio of 5:1:1 for Alignment Imitation, Self-Critic, and Realignment data respectively. The full set of interaction data is available upon request.\n\n\u003cdetails\u003e\n\u003csummary\u003e \u003cstrong\u003e The Statistics of Alignment Data (Full Set) \u003c/strong\u003e \u003c/summary\u003e\n\n- `sandbox_v1.json`\n\n| Data / Social Agent Type | text-davinci-002 | text-davinci-003 | ChatGPT | Total |\n|--------------------------|------------------|------------------|---------|-------|\n| Alignment Imitation      | 9.8k             | 10k              | 10k     | 29.8k |\n| Self-Critic              | 17k              | 20k              | 20k     | 57k   |\n| Realignment              | 3.3k             | 3k               | 0.7k    | 7k    |\n| Total                    | 30.1k            | 33k              | 30.7k   | 93.8k |\n\n- `sandbox_v2.json`\n\n| Data / Social Agent Type | text-davinci-002 | text-davinci-003 | GPT4  | Total |\n|--------------------------|------------------|------------------|-------|-------|\n| Alignment Imitation      | 18.2k            | 10.4k            | 20.2k | 48.8k |\n| Self-Critic              | 36.3k            | 18.3k            | 40k   | 94.6k |\n| Realignment              | 18.2k            | 3.4k             | 4.0k  | 25.6k |\n| Total                    | 72.7k            | 32.1k            | 64.2k | 169k  |\n\n\u003c/details\u003e\n\n## Training with Stable Alignment\n\n```bash\ntorchrun --nproc_per_node=4 --master_port=36646 train_alignment.py \\\n      --model_name_or_path \"/workspace/hhh_sft\" \\  # path to your SFT model\n      --data_path \"./assets/sandbox_v1.json\" \\ # path to the alignment data\n      --bf16 True \\\n      --output_dir \"/workspace/\u003cyour_output_lm_name\u003e\" \\\n      --num_train_epochs 7 \\\n      --per_device_train_batch_size 1 \\  # batch size has to be 1 for alignment training\n      --per_device_eval_batch_size 1 \\\n      --gradient_accumulation_steps 8 \\\n      --evaluation_strategy \"no\" \\\n      --save_strategy \"steps\" \\\n      --save_steps 200 \\\n      --save_total_limit 1 \\\n      --learning_rate 2e-5 \\\n      --weight_decay 0. \\\n      --warmup_ratio 0.03 \\\n      --lr_scheduler_type \"cosine\" \\\n      --logging_steps 1 \\\n      --fsdp \"shard_grad_op auto_wrap\" \\  # change to \"full_shard auto_wrap\" if OOM\n      --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \\\n      --tf32 True \\\n      --model_max_length 360 \\  # change to shorter length if OOM\n      --rating_scale 7 \\  # the scale of the ratings. 7 for 1-7, 10 for 1-10, etc.\n      --margin 10 \\  # constant, see the paper\n      --max_flow False \\  # mean or max for the penalty\n      --ratio 0.2 \\  # control the ratio of the penalty\n      --num_comp 3\n```\n\n## So(cially)-Good Language Model\n\n![Model Release](assets/images/model_select_light.png#gh-light-mode-only)\n![Model Release](assets/images/model_select_dark.png#gh-dark-mode-only)\n\nWe have released our models on huggingface! 🤗\n\nReleased models include:\n\n1. [`better-base`](https://huggingface.co/agi-css/better-base), base model trained on LLaMA with [AlpacaDataCleaned](https://github.com/gururise/AlpacaDataCleaned) which is the fixed Alpaca instruction tuning dataset, and [codealpaca](https://github.com/sahil280114/codealpaca) which is the code pretraining dataset.\n\n2. [`hh-rlhf-sft`](https://huggingface.co/agi-css/hh-rlhf-sft), supervised fine-tuned model on `better-base` with the socially aligned demonstrations in [Anthropic HH-RLHF dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) (the `accepted` samples in the dataset).\n3. [`socially-good-lm`](https://huggingface.co/agi-css/socially-good-lm), socially aligned language model trained on `hh-rlhf-sft` with the stable alignment method.\n\nAfter you download the model, you can run inference with the following command:\n\n```bash\npython stable_alignment/run_inference.py \\\n    --model_path './models/socially-good-lm' \\\n    --device 'cuda:0'\n```\n\n# Citation\n\nPlease cite our paper if you use the data or code in this repo:\n\n```bibtex\n@misc{liu2023sociallyaligned,\n      title={Training Socially Aligned Language Models in Simulated Human Society},\n      author={Ruibo Liu and Ruixin Yang and Chenyan Jia and Ge Zhang and Denny Zhou and Andrew M. Dai and Diyi Yang and Soroush Vosoughi},\n      year={2023},\n      eprint={2305.16960},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagi-templar%2FStable-Alignment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagi-templar%2FStable-Alignment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagi-templar%2FStable-Alignment/lists"}