{"id":21164215,"url":"https://github.com/THUDM/LongReward","last_synced_at":"2025-07-09T16:33:10.841Z","repository":{"id":260008596,"uuid":"873516074","full_name":"THUDM/LongReward","owner":"THUDM","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-29T02:29:04.000Z","size":2380,"stargazers_count":39,"open_issues_count":0,"forks_count":2,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-11-14T06:30:33.113Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-16T09:50:17.000Z","updated_at":"2024-11-12T09:48:28.000Z","dependencies_parsed_at":"2024-10-29T03:23:47.774Z","dependency_job_id":"e8d40cc8-f948-4686-be77-978e8c2713f9","html_url":"https://github.com/THUDM/LongReward","commit_stats":null,"previous_names":["thudm/longreward"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FLongReward","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FLongReward/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FLongReward/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FLongReward/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/LongReward/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225570420,"owners_count":17489885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-20T14:01:17.791Z","updated_at":"2024-11-20T14:01:24.452Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# LongReward: Improving Long-context Large Language Models with AI Feedback\n\n\u003cp align=\"center\"\u003e\n    🤗 \u003ca href=\"https://huggingface.co/datasets/THUDM/LongReward-10k\" target=\"_blank\"\u003eHuggingFace\u003c/a\u003e • 📃 \u003ca href=\"https://arxiv.org/abs/2410.21252\" target=\"_blank\"\u003ePaper\u003c/a\u003e\n\u003c/p\u003e\n\n## 🔍 Table of Contents\n\n- [🤖️ LongReward](#longreward)\n- [⚙️ Datasets \u0026 Models](#released-datasets--models)\n- [🌟 Quick Start](#quickstart)\n- [📊 Evaluation](#evaluation)\n- [📝 Citation](#citation)\n- [📍 License](#license)\n\n## LongReward\n\n![cof](https://github.com/user-attachments/assets/a9b06ba1-23ca-44b4-be98-dc2b59b5b84c)\n\nWe open-source **LongReward** under `long_reward/auto_scorer.py`, a novel method that utilize an off-the-shelf LLM to\nautomatically provide rewards for model responses in long-context scenarios, considering four human-valued dimensions:\nhelpfulness, logicality, faithfulness, and completeness. Given a long-context-based model response, LongReward assigns a\nscore ranging from 0 to 10 for each dimension, and takes their average as the final reward.\n\nPlease configure your API key in the `utils/llm_api.py` before running the code. You can also run the following scripts\nunder `batch_inference/` for large-scale inference: `1_get_chunk_info.py`, `2_get_score.py`. `3_get_pairs.py`. The\nresults will be stored in `./data`.\n\n\u003ca name=\"model\"\u003e\u003c/a\u003e\n\n## Released Datasets \u0026 Models\n\n### SFT Datasets \u0026 SFT Models\n\nOur [SFT dataset](https://huggingface.co/datasets/THUDM/LongReward-10k) contains 10k long-context QA instances, whose\ncontext lengths range from 8k to 64k tokens. The QA pairs are generated\nby [GLM-4-0520](https://bigmodel.cn/dev/api/normal-model/glm-4), following the self-instruct method\nin [LongAlign](https://github.com/THUDM/LongAlign).\nUsing this dataset, we supervised fine-tune two\nmodels: [LongReward-glm4-9b-SFT](https://huggingface.co/NeoZ123/LongReward-glm4-9b-SFT)\nand [LongReward-llama3.1-8b-SFT](https://huggingface.co/NeoZ123/LongReward-llama3.1-8b-SFT), which are based\non [GLM-4-9B](https://huggingface.co/THUDM/glm-4-9b)\nand [Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), respectively, and support up to 64k\ncontext.\n\n### Preference Datasets \u0026 DPO Models\n\nWe utilize LongReward and prompts in the SFT dataset to construct\nthe [preference datasets](https://huggingface.co/datasets/THUDM/LongReward-10k) for each SFT model, and train their DPO\nversion: [LongReward-glm4-9b-DPO](https://huggingface.co/THUDM/LongReward-glm4-9b-DPO)\nand [LongReward-llama3.1-8b-DPO](https://huggingface.co/THUDM/LongReward-llama3.1-8b-DPO). More Details can be found in\nour paper.\n\n### All Available Datasets and Models\n\nHere is the full list of datasets and models we released:\n\n| Name                                | Download Path                                                                                                                                                                                                                               |\n|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| LongReward-10k (SFT \u0026 DPO Datasets) | [🤗 HuggingFace](https://huggingface.co/datasets/THUDM/LongReward-10k)                                                                                                                                                                      |\n| LongReward-glm4-9b-SFT              | [🤗 HuggingFace](https://huggingface.co/NeoZ123/LongReward-glm4-9b-SFT)                                                                                                                                                                     |\n| LongReward-glm4-9b-DPO              | [🤗 HuggingFace](https://huggingface.co/THUDM/LongReward-glm4-9b-DPO), [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/LongReward-glm4-9b-DPO), [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/LongReward-glm4-9b-dpo)             |\n| LongReward-llama3.1-8b-SFT          | [🤗 HuggingFace](https://huggingface.co/NeoZ123/LongReward-llama3.1-8b-SFT)                                                                                                                                                                 |\n| LongReward-llama3.1-8b-DPO          | [🤗 HuggingFace](https://huggingface.co/THUDM/LongReward-llama3.1-8b-DPO), [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/LongReward-llama3.1-8b-dpo), [🟣 WiseModel](https://wisemodel.cn/models/ZhipuAI/LongReward-llama3.1-8b-dpo) |\n\n## QuickStart\n\nTry our model with following step:\n\n1. install requirements\n\n```shell\npip install -r requirement.txt\n```\n\n2. run with model\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nMODEL_PATH = 'THUDM/LongReward-glm4-9b-DPO'\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)\nmodel = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map=\"auto\")\n\nmessage = [\n    {\n        \"role\": \"user\",\n        \"content\": \"W. Russell Todd, 94, United States Army general (b. 1928). February 13. Tim Aymar, 59, heavy metal singer (Pharaoh) (b. 1963). Marshall \\\"Eddie\\\" Conway, 76, Black Panther Party leader (b. 1946). Roger Bonk, 78, football player (North Dakota Fighting Sioux, Winnipeg Blue Bombers) (b. 1944). Conrad Dobler, 72, football player (St. Louis Cardinals, New Orleans Saints, Buffalo Bills) (b. 1950). Brian DuBois, 55, baseball player (Detroit Tigers) (b. 1967). Robert Geddes, 99, architect, dean of the Princeton University School of Architecture (1965–1982) (b. 1923). Tom Luddy, 79, film producer (Barfly, The Secret Garden), co-founder of the Telluride Film Festival (b. 1943). David Singmaster, 84, mathematician (b. 1938). \\n\\n What was Robert Geddes' profession?\"\n    }\n]\n\ninputs = tokenizer.apply_chat_template(\n    message,\n    return_tensors='pt',\n    add_generation_prompt=True,\n    return_dict=True,\n).to(model.device)\n\ninput_len = inputs['input_ids'].shape[1]\ngenerate_kwargs = {\n    \"input_ids\": inputs['input_ids'],\n    \"attention_mask\": inputs['attention_mask'],\n    \"max_new_tokens\": 128,\n    \"do_sample\": False,\n}\nout = model.generate(**generate_kwargs)\nprint(tokenizer.decode(out[0][input_len:], skip_special_tokens=True))\n```\n\n## Evaluation\n\nWe provide our evaluation code for [LongBench](https://github.com/THUDM/LongBench)\nand [LongBench-Chat](https://github.com/THUDM/LongAlign) under `evaluation/`. Details can be found in\n`evaluation/README.md` and `evaluation/LongBench_Chat/README.md`. Remember to configure your OpenAI API key in\n`utils/llm_api.py` since we adopt GPT-4o as the judge.\n\nTo reproduce our results on other benchmarks, we refer to the code in [FastChat](https://github.com/lm-sys/FastChat)\nand [alpaca_eval](https://github.com/tatsu-lab/alpaca_eval) for evaluating on MT-Bench and AlpacaEval2.\n\nHere are our evaluation results:\n\n![eval](https://github.com/user-attachments/assets/c8fc4503-42a1-4081-95b7-7d560f2ec366)\n\n\n## Citation\n\nIf you find our work helpful, please consider citing the following paper:\n\n```\n@article{zhang2024longreward,\n  title = {LongReward: Improving Long-context Large Language Models\nwith AI Feedback} \n  author={Jiajie Zhang and Zhongni Hou and Xin Lv and Shulin Cao and Zhenyu Hou and Yilin Niu and Lei Hou and Yuxiao Dong and Ling Feng and Juanzi Li},\n  journal={arXiv preprint arXiv:2410.21252},\n  year={2024}\n}\n```\n\n## License\n\n+ The use of GLM-4 model weights must follow\n  the [Model License](https://huggingface.co/THUDM/glm-4-9b/blob/main/LICENSE).\n\n+ The code in this open source repository follows the [Apache 2.0](LICENSE) license.\n\nPlease strictly follow the open source license.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTHUDM%2FLongReward","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTHUDM%2FLongReward","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTHUDM%2FLongReward/lists"}