{"id":13754038,"url":"https://github.com/thu-coai/BPO","last_synced_at":"2025-05-09T22:30:52.093Z","repository":{"id":206066512,"uuid":"715133519","full_name":"thu-coai/BPO","owner":"thu-coai","description":null,"archived":false,"fork":false,"pushed_at":"2024-06-24T09:41:35.000Z","size":30027,"stargazers_count":274,"open_issues_count":3,"forks_count":14,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-08-04T09:06:20.637Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thu-coai.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-06T14:44:07.000Z","updated_at":"2024-07-28T15:31:17.000Z","dependencies_parsed_at":"2024-06-21T21:43:25.264Z","dependency_job_id":null,"html_url":"https://github.com/thu-coai/BPO","commit_stats":null,"previous_names":["thu-coai/bpo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FBPO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FBPO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FBPO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-coai%2FBPO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thu-coai","download_url":"https://codeload.github.com/thu-coai/BPO/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224884615,"owners_count":17386121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:37.770Z","updated_at":"2025-05-09T22:30:52.050Z","avatar_url":"https://github.com/thu-coai.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"\n\u003c!-- \u003cdiv align=\"center\"\u003e --\u003e\n\u003c!-- \u003cimg src=\"assets/cover.png\" alt=\"BPO\" width=\"90%\" /\u003e --\u003e\n\u003c!-- \u003c/div\u003e --\u003e\n# Black-Box Prompt Optimization (BPO)\n### Aligning Large Language Models without Model Training (ACL 2024)\n\n\u003cp align=\"center\"\u003e\n   🤗 \u003ca href=\"#model\" target=\"_blank\"\u003eModel\u003c/a\u003e • 📚 \u003ca href=\"#data\" target=\"_blank\"\u003eData\u003c/a\u003e • 📃 \u003ca href=\"https://arxiv.org/abs/2311.04155\" target=\"_blank\"\u003ePaper\u003c/a\u003e • 🌐 \u003ca href=\"https://huggingface.co/spaces/CCCCCC/BPO_demo\" target=\"_blank\"\u003eDemo\u003c/a\u003e\n\u003c/p\u003e\n\n(Upper) Black-box Prompt Optimization (BPO) offers a conceptually new perspective to bridge the gap between humans and LLMs. (Lower) On Vicuna Eval’s pairwise evaluation, we show that BPO further aligns gpt-3.5-turbo and claude-2 without training. It also outperforms both PPO \u0026 DPO and presents orthogonal improvements.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"assets/intro.png\" alt=\"BPO\" width=\"50%\" /\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## Update\nWe have released our [model](https://huggingface.co/THUDM/BPO) and [data](https://huggingface.co/datasets/THUDM/BPO) on Hugging Face.\n\nWe build a [demo](https://huggingface.co/spaces/CCCCCC/BPO_demo) for BPO on Hugging Face.\n\u003cbr\u003e\n\n## Table of Contents\n- [Model](#model)\n- [Data](#data)\n- [Quick Start](#quick-start)\n    - [Data Construction](#data-construction)\n    - [Model Training](#model-training)\n    - [Inference](#inference)\n    - [Evaluation](#evaluation)\n- [Citation](#citation)\n\n\n## Model\nThe prompt preference optimization model can be download from [Hugging Face](https://huggingface.co/THUDM/BPO)\n\nInference code (Please refer to [src/infer_example.py](src/infer_example.py) for more instructions on how to optimize your prompts):\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_path = 'THUDM/BPO'\n\nprompt_template = \"[INST] You are an expert prompt engineer. Please help me improve this prompt to get a more helpful and harmless response:\\n{} [/INST]\"\n\ndevice = 'cuda:0'\nmodel = AutoModelForCausalLM.from_pretrained(model_path).half().eval().to(device)\n# for 8bit\n# model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, load_in_8bit=True)\ntokenizer = AutoTokenizer.from_pretrained(model_path)\n\ntext = 'Tell me about Harry Potter'\n\nprompt = prompt_template.format(text)\nmodel_inputs = tokenizer(prompt, return_tensors=\"pt\").to(device)\noutput = model.generate(**model_inputs, max_new_tokens=1024, do_sample=True, top_p=0.9, temperature=0.6, num_beams=1)\nresp = tokenizer.decode(output[0], skip_special_tokens=True).split('[/INST]')[1].strip()\n\nprint(resp)\n```\n\n## Data\n\n### BPO dataset\nBPO Dataset can be found on [Hugging Face](https://huggingface.co/datasets/THUDM/BPO).\n\n### BPO for SFT Data Construction \nThe alpaca_reproduce directory contains the BPO-reproduced Alpaca dataset. The data format is:\n```json\n{\n    \"instruction\": {instruction},\n    \"input\": {input},\n    \"output\": {output},\n    \"optimized_prompt\": {optimized_prompt},\n    \"res\": {res}\n}\n```\n- {instruction}, {input}, and {output} are elements from the original dataset.\n- {optimized_prompt} is BPO-optimized instruction.\n- {res} is the response from text-davinci-003 using the {optimized_prompt}.\n\n\n### Testset\nThe testset directory contains all the test datasets we used, including: \n- 200 prompts sampled from the BPO dataset\n- 200 examples from Dolly dataset\n- 252 human evaluation instructions from Self-Instruct\n- 80 user-oriented prompts from the Vicuna Eval dataset.\n\n\n## Quick Start\nFor all codes, we have added `#TODO` comments to indicate places in the code that need modification before running. Please update the relevant parts as noted before executing each file.\n\n### Setup\n```bash\npip install -r requirements.txt\n```\n\n### Data Construction\nTo construct data yourself, run the following command\n```bash\ncd src/data_construction\n\n# using pairwise feedback data to generate optimized prompts\npython chatgpt_infer.py\n\n# process generated optimized prompts\npython process_optimized_prompts.py\n```\n\n### Model Training\nIf you want to train your own prompt preference optimizer, \nplease run the following command:\n```bash\ncd src/training\n\n# pre-process fine-tuning data\npython ../data_construction/process_en.py\npython data_utils.py\n\n# fine-tuning\npython train.py\n\n# inference\npython infer_finetuning.py\n```\n\n### Inference\nWe show an [example code](src/inference/llama2_infer.py) for generation with llama2-chat on BPO-optimized prompts.\n\n### Evaluation\nIf you wish to compare the BPO-aligned model with the original model, please refer to the following code:\n```bash\ncd src/evaluation\n\n# take gpt4 evaluation on dolly_eval as an example\npython gpt4_score.py --input_file_a \"Path to generation results of BPO-aligned model\" \\\n--input_file_b \"Path to generation results of original model\" \\\n--task_name \"dolly_eval\" \\ # change it to \"self_instruct\", \"test_set\", or \"vicuna_eval\" for other testsets\n--output_file \"Output path\"\n\n# calculate win rates\npython cal_gpt4_score.py --input_file \"Output path\"\n```\n\n\n## Acknowledgement\n- Fine-tuning code: [llm_finetuning](https://github.com/ssbuild/llm_finetuning)\n- PPO code: [DeepSpeed-Chat](https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/README.md)\n- DPO code: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)\n- Evaluation Prompts: [llm_judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) and [alpaca_eval](https://github.com/tatsu-lab/alpaca_eval)\n\n## Citation\n```\n@article{cheng2023black,\n  title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training},\n  author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie},\n  journal={arXiv preprint arXiv:2311.04155},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-coai%2FBPO","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthu-coai%2FBPO","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-coai%2FBPO/lists"}