{"id":19589711,"url":"https://github.com/jackaduma/chatglm-lora-rlhf-pytorch","last_synced_at":"2025-04-27T12:32:54.619Z","repository":{"id":157104025,"uuid":"629344357","full_name":"jackaduma/ChatGLM-LoRA-RLHF-PyTorch","owner":"jackaduma","description":"A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM","archived":false,"fork":false,"pushed_at":"2023-04-28T22:00:45.000Z","size":26550,"stargazers_count":134,"open_issues_count":2,"forks_count":10,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-19T11:17:34.649Z","etag":null,"topics":["chatglm","chatglm-6b","chatgpt","deepspeed","finetune","gpt","llama","llm","lora","peft","ppo","pytorch","reward-models","rlhf"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jackaduma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-18T06:03:53.000Z","updated_at":"2025-02-09T09:14:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"2bedd552-5b19-4a7b-b45f-365c1a4a2ffc","html_url":"https://github.com/jackaduma/ChatGLM-LoRA-RLHF-PyTorch","commit_stats":{"total_commits":17,"total_committers":1,"mean_commits":17.0,"dds":0.0,"last_synced_commit":"3c23a27ef524956665e2946f4284a6ac9bd9e6ff"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FChatGLM-LoRA-RLHF-PyTorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FChatGLM-LoRA-RLHF-PyTorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FChatGLM-LoRA-RLHF-PyTorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jackaduma%2FChatGLM-LoRA-RLHF-PyTorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jackaduma","download_url":"https://codeload.github.com/jackaduma/ChatGLM-LoRA-RLHF-PyTorch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251138933,"owners_count":21541976,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatglm","chatglm-6b","chatgpt","deepspeed","finetune","gpt","llama","llm","lora","peft","ppo","pytorch","reward-models","rlhf"],"created_at":"2024-11-11T08:20:21.411Z","updated_at":"2025-04-27T12:32:49.602Z","avatar_url":"https://github.com/jackaduma.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **ChatGLM-LoRA-RLHF-PyTorch**\na full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware\n\n---\n## **Table of Contents**\n- [**ChatGLM-LoRA-RLHF-PyTorch**](#chatglm-lora-rlhf-pytorch)\n  - [**Table of Contents**](#table-of-contents)\n  - [**Environment Setup**](#environment-setup)\n  - [**Todo List**](#todo-list)\n  - [**Run**](#run)\n    - [**Data Process**](#data-process)\n    - [**Supervised Finetune**](#supervised-finetune)\n      - [**Merge PEFT adapter into Model**](#merge-peft-adapter-into-model)\n    - [**Reward Modeling**](#reward-modeling)\n      - [**merge reward model into Model**](#merge-reward-model-into-model)\n  - [**Notes**](#notes)\n  - [**Reference**](#reference)\n  - [**Star-History**](#star-history)\n  - [Donation](#donation)\n  - [**License**](#license)\n---\n\n## **Environment Setup**\n```\n穷人卡：2080Ti 12G\ntorch==2.0.0\ncuda==11.8\n```\n\n---\n## **Todo List**\n\n- [x] SFT: Supervised Finetune\n- [x] Merge Adapter into Model\n- [ ] RLHF\n  - [x] train reward model\n  - [ ] tuning with RL\n\n## **Run**\n---\n\n### **Data Process**\n\n转化alpaca数据集为jsonl\n\n```bash\npython cover_alpaca2jsonl.py --data_path data/alpaca_data.json --save_path data/alpaca_data.jsonl\n```\n\ntokenization\n\n```bash\npython tokenize_dataset_rows.py --jsonl_path data/alpaca_data.jsonl --save_path data/alpaca --max_seq_length 200 --skip_overlength True\n```\n\n### **Supervised Finetune**\n\nmust use latest peft version\n```\npip uninstall peft -y\npip install git+https://github.com/huggingface/peft.git  # 最新版本 \u003e=0.3.0.dev0\n```\n\n```bash\npython supervised_finetune.py --dataset_path data/alpaca --lora_rank 8 --per_device_train_batch_size 1 --gradient_accumulation_steps 32 --save_steps 200 --save_total_limit 3  --learning_rate 1e-4 --fp16 --remove_unused_columns false --logging_steps 10 --output_dir output\n```\n\n#### **Merge PEFT adapter into Model**\n\n```bash\npip uninstall peft -y\npip install peft==0.2.0  # 0.3.0.dev0 raise many errors\npython merge_peft_adapter.py --model_name ./output \n```\n\n### **Reward Modeling**\n\n```bash\npython train_reward_model.py --model_name 'THUDM/chatglm-6b' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 100 --eval_subset 10 --local_rank 0 --bf16 False\n```\n\n#### **merge reward model into Model**\n\n```bash\npython merge_peft_adapter.py --model_name ./reward_model_chatglm-6b\n```\n\n---\n\n## **Notes**\n1. PEFT的版本，目前从git上安装的是 0.3.0.dev0 版本，在merge_peft_adapter的时候有问题，需要切换到peft==0.2.0 (0.3.0.dev0 没有 _get_submodules()这个函数)\n2. 因为huggingface的transformer暂时不支持ChatGLM的封装接口，需要自己从ChatGLM的hub上下载代码放到本地目录 models 下面，供后续使用\n3. 同样，ChatGLM的model代码是自己的，和huggingface没合并，所以在调用加载的时候，都主要加上参数 trust_remote_code=True\n4. 训练 Reward Model 需要执行 SeqCLS 这个Task： huggingface 的 transformer 提供 \"AutoModelForSequenceClassification\" 这个类。但是 ChatGLM 只有 \"ChatGLMForConditionalGeneration\" 这个类。\n5. 自己实现 Reward model, [reward_model.py](reward_model.py)，完成奖励模型的训练过程\n\n## **Reference**\ndata preprocess: [cover_alpaca2jsonl.py](./cover_alpaca2jsonl.py) 和 [tokenize_dataset_rows.py](./tokenize_dataset_rows.py) 来自项目 [ChatGLM-Tuning](https://github.com/mymusise/ChatGLM-Tuning)\n\nrequirements 主要是按照 [alpaca-lora](https://github.com/tloen/alpaca-lora) 来配环境。\n\n* [https://github.com/tloen/alpaca-lora](https://github.com/tloen/alpaca-lora)\n* [https://github.com/mymusise/ChatGLM-Tuning](https://github.com/mymusise/ChatGLM-Tuning)\n* [https://github.com/lvwerra/trl](https://github.com/lvwerra/trl)\n* [https://github.com/jasonvanf/llama-trl](https://github.com/jasonvanf/llama-trl)\n\n\n\n------\n## **Star-History**\n\n![star-history](https://api.star-history.com/svg?repos=jackaduma/ChatGLM-LoRA-RLHF-PyTorch\u0026type=Date \"star-history\")\n\n------\n\n## Donation\nIf this project help you reduce time to develop, you can give me a cup of coffee :) \n\nAliPay(支付宝)\n\u003cdiv align=\"center\"\u003e\n\t\u003cimg src=\"./misc/ali_pay.png\" alt=\"ali_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\nWechatPay(微信)\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"./misc/wechat_pay.png\" alt=\"wechat_pay\" width=\"400\" /\u003e\n\u003c/div\u003e\n\n------\n\n## **License**\n\n[MIT](LICENSE) © Kun","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Fchatglm-lora-rlhf-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjackaduma%2Fchatglm-lora-rlhf-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjackaduma%2Fchatglm-lora-rlhf-pytorch/lists"}