{"id":19054714,"url":"https://github.com/ssbuild/llm_reward","last_synced_at":"2026-05-11T10:30:22.215Z","repository":{"id":199316378,"uuid":"702404438","full_name":"ssbuild/llm_reward","owner":"ssbuild","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-23T15:33:09.000Z","size":76,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"dev","last_synced_at":"2025-01-02T11:11:43.317Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssbuild.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-09T08:57:00.000Z","updated_at":"2024-04-23T15:33:13.000Z","dependencies_parsed_at":"2025-01-02T11:10:44.020Z","dependency_job_id":"a921fab8-2797-44d9-82ff-7883eff4cf8e","html_url":"https://github.com/ssbuild/llm_reward","commit_stats":null,"previous_names":["ssbuild/reward_finetuning"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Fllm_reward","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Fllm_reward/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Fllm_reward/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssbuild%2Fllm_reward/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssbuild","download_url":"https://codeload.github.com/ssbuild/llm_reward/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240110309,"owners_count":19749276,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T23:39:29.451Z","updated_at":"2026-05-11T10:30:20.152Z","avatar_url":"https://github.com/ssbuild.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\r\n```text\r\n    2024-04-22 简化\r\n    2023-10-09 inital reward\r\n```\r\n\r\n## update information\r\n   - [deep_training](https://github.com/ssbuild/deep_training)\r\n\r\n## 支持训练方式\r\n\r\n| 模型        | 32精度 | 混合精度 | 16精度 | lora int8 | lora int4 | ptv2   |\r\n|-----------|-------|---------|-------|-----------|-----------|--------|\r\n| llama     | √    | √    | √    | √         | √         | ×      |\r\n| llama2    | √    | √    | √    | √         | √         | ×      |\r\n| chatglm   | √    | √    | √    | √         | √         | ×      |\r\n| chatglm2  | √    | √    | √    | √         | √         | ×      |\r\n| baichuan  | √    | √    | √    | √         | √         | ×      |\r\n| baichuan2 | √    | √    | √    | √         | √         | ×      |\r\n| opt       | √    | √    | √    | √         | √         | ×      |\r\n| moss      | √    | √    | √    | √         | √         | ×      |\r\n| rwkv      | √    | √    | √    | √         | √         | ×      |\r\n| tiger     | √    | √    | √    | √         | √         | ×      |\r\n| qwen      | √    | √    | √    | √         | √         | ×      |\r\n| internlm  | √    | √    | √    | √         | √         | ×      |\r\n| gpt2      | √    | √    | √    | √         | √         | ×      |\r\n| xverse    | √    | √    | √    | √         | √         | ×      |\r\n| openbuddy | √    | √    | √    | √         | √         | ×      |\r\n\r\n\r\n## 数据示例\r\n```json\r\n{\r\n\t\"prompt\": \"\\n\\nHuman:你能描述如何清除笔记本计算机屏幕上的指纹和污垢的步骤吗?\\n\\nAssistant:是的,也许。为了清洁你的屏幕,你首先需要使用微纤维布或软溼布,轻轻地擦拭屏幕表面。接下来,你会想抓住一个软,无毛巾,微纤维清洁布,轻轻地擦回屏幕上去除指纹和污垢。\\n\\nHuman:我可以将伊索普罗素酒精喷射到布上,并以这种方式清洗它吗?\\n\\nAssistant:\",\r\n\t\"response\": \" 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。\",\r\n\t\"chosen\": \" 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。\",\r\n\t\"rejected\": \" 是的,你可以直接将它喷射到布上。\"\r\n}\r\n```\r\n    \r\n\r\n\r\n\r\n## 生成训练record\r\n   \r\n- cd data \u0026\u0026 make_data_example.py \r\n- python data_utils.py\r\n    \r\n    注:\r\n    num_process_worker 为多进程制作数据 ， 如果数据量较大 ， 适当调大至cpu数量\r\n    dataHelper.make_dataset_with_args(data_args.train_file,mixed_data=False, shuffle=True,mode='train',num_process_worker=0)\r\n\r\n\r\n\r\n\r\n\r\n## training\r\n```text\r\n    # 制作数据\r\n    cd scripts\r\n    bash train_full.sh -m dataset \r\n    or\r\n    bash train_lora.sh -m dataset \r\n    or\r\n    bash train_ptv2.sh -m dataset \r\n    \r\n    注: num_process_worker 为多进程制作数据 ， 如果数据量较大 ， 适当调大至cpu数量\r\n    dataHelper.make_dataset_with_args(data_args.train_file,mixed_data=False, shuffle=True,mode='train',num_process_worker=0)\r\n    \r\n    # 全参数训练 \r\n        bash train_full.sh -m train \r\n        \r\n    # lora adalora ia3 \r\n        bash train_lora.sh -m train \r\n        \r\n    # ptv2\r\n        bash train_ptv2.sh -m train \r\n```\r\n\r\n\r\n\r\n\r\n\r\n\r\n## 训练参数\r\n[训练参数](args.MD)\r\n\r\n## 友情链接\r\n\r\n- [pytorch-task-example](https://github.com/ssbuild/pytorch-task-example)\r\n- [chatmoss_finetuning](https://github.com/ssbuild/chatmoss_finetuning)\r\n- [chatglm_finetuning](https://github.com/ssbuild/chatglm_finetuning)\r\n- [chatglm2_finetuning](https://github.com/ssbuild/chatglm2_finetuning)\r\n- [t5_finetuning](https://github.com/ssbuild/t5_finetuning)\r\n- [llm_finetuning](https://github.com/ssbuild/llm_finetuning)\r\n- [llm_rlhf](https://github.com/ssbuild/llm_rlhf)\r\n- [chatglm_rlhf](https://github.com/ssbuild/chatglm_rlhf)\r\n- [t5_rlhf](https://github.com/ssbuild/t5_rlhf)\r\n- [rwkv_finetuning](https://github.com/ssbuild/rwkv_finetuning)\r\n- [baichuan_finetuning](https://github.com/ssbuild/baichuan_finetuning)\r\n- [internlm_finetuning](https://github.com/ssbuild/internlm_finetuning)\r\n- [qwen_finetuning](https://github.com/ssbuild/qwen_finetuning)\r\n- [xverse_finetuning](https://github.com/ssbuild/xverse_finetuning)\r\n- [auto_finetuning](https://github.com/ssbuild/auto_finetuning)\r\n- [aigc_serving](https://github.com/ssbuild/aigc_serving)\r\n\r\n## \r\n    纯粹而干净的代码\r\n\r\n\r\n## 参考\r\n- https://github.com/CarperAI/trlx","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssbuild%2Fllm_reward","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssbuild%2Fllm_reward","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssbuild%2Fllm_reward/lists"}