{"id":13754125,"url":"https://github.com/OpenMOSS/CoLLiE","last_synced_at":"2025-05-09T22:30:57.624Z","repository":{"id":151561000,"uuid":"622519512","full_name":"OpenMOSS/CoLLiE","owner":"OpenMOSS","description":"Collaborative Training of Large Language Models in an Efficient Way","archived":false,"fork":false,"pushed_at":"2024-06-11T10:54:35.000Z","size":29432,"stargazers_count":390,"open_issues_count":23,"forks_count":56,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-06-12T08:23:25.407Z","etag":null,"topics":["deep-learning","deepspeed","nlp","pytorch"],"latest_commit_sha":null,"homepage":"https://openlmlab-collie.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMOSS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-02T11:10:19.000Z","updated_at":"2024-06-10T09:24:27.000Z","dependencies_parsed_at":"2023-10-27T08:30:57.870Z","dependency_job_id":"ae3819e8-1828-41f1-91ab-b83c46e1e0f0","html_url":"https://github.com/OpenMOSS/CoLLiE","commit_stats":null,"previous_names":["openmoss/collie","openlmlab/collie"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FCoLLiE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FCoLLiE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FCoLLiE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FCoLLiE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMOSS","download_url":"https://codeload.github.com/OpenMOSS/CoLLiE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335367,"owners_count":21892659,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deepspeed","nlp","pytorch"],"created_at":"2024-08-03T09:01:41.170Z","updated_at":"2025-05-09T22:30:55.202Z","avatar_url":"https://github.com/OpenMOSS.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n \u003cimg src=\"docs/assets/images/banner.png\"\u003e\n\u003c/div\u003e\n\n# CoLLiE\n\nCoLLiE (Collaborative Tuning of Large Language Models in an Efficient Way)，一个帮助您从零开始训练大模型的完整工具箱。\n\n\n[![Github Repo Stars](https://img.shields.io/github/stars/openlmlab/collie?style=social)](https://github.com/openlmlab/collie/stargazers)\n[![GitHub](https://img.shields.io/github/license/OpenLMLab/collie)]()\n[![Doc](https://img.shields.io/badge/Website-Doc-blue)](https://openlmlab-collie.readthedocs.io/zh_CN/latest/)\n[![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/openlmlab)\n[![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/OpenLMLab/collie/python-publish.yml)](https://pypi.org/project/collie-lm/)\n[![GitHub commit activity (branch)](https://img.shields.io/github/commit-activity/w/OpenLMLab/collie)](https://github.com/OpenLMLab/collie/commits/main)\n[![GitHub issues](https://img.shields.io/github/issues/OpenLMLab/collie)](https://github.com/OpenLMLab/collie/issues)\n\n\u003ch4 align=\"center\"\u003e\n  \u003cp\u003e\n     [ \u003ca href=\"https://github.com/OpenLMLab/collie/blob/dev/README.md\"\u003e简体中文\u003c/a\u003e ] |\n     [ \u003ca href=\"https://github.com/OpenLMLab/collie/blob/dev/README_EN.md\"\u003eEnglish\u003c/a\u003e ]\n  \u003c/p\u003e\n\u003c/h4\u003e\n\n\n## 新闻\n- [2023/12] 🎉 CoLLiE被EMNLP System Demonstrations接收：[CoLLiE: Collaborative Training of Large Language Models in an Efficient Way](https://arxiv.org/abs/2312.00407)\n- [2023/08] 评测结果新增[显存占用与模型大小的关系](#显存占用)和[吞吐量](#吞吐量)。\n- [2023/07] 发布Python包`collie-lm`。您可以在[PyPI](https://pypi.org/project/collie-lm/#history)中查看更多细节！\n\n## 目录\n\u003cul\u003e\n    \u003cli\u003e\u003ca href=\"#为什么选择CoLLiE\"\u003e为什么选择CoLLiE\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#特性\"\u003e特性\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#CoLLiE支持的模型\"\u003eCoLLiE支持的模型\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#评测\"\u003e评测\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#安装\"\u003e安装\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#Docker安装\"\u003eDocker安装\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#使用\"\u003e使用\u003c/a\u003e\n        \u003cul\u003e\n            \u003cli\u003e\u003ca href=\"#快速开始\"\u003e快速开始\u003c/a\u003e\u003c/li\u003e\n            \u003cli\u003e\u003ca href=\"#有趣的插件\"\u003e有趣的插件\u003c/a\u003e\u003c/li\u003e\n            \u003cli\u003e\u003ca href=\"#更多成功样例和完整教程\"\u003e更多成功样例和完整教程\u003c/a\u003e\u003c/li\u003e\n        \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#社区\"\u003e社区\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#贡献者\"\u003e贡献者\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#引用我们\"\u003e引用我们\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n## 为什么选择CoLLiE\nCoLLiE是一个可以帮助您从零开始训练大模型的完整工具箱，它提供了数据预处理、模型微调、模型保存以及训练过程各项指标监测等功能。CoLLiE集成了现有的并行策略、高效参数微调方法和高效优化器，以加快训练的速度，提高训练的质量，降低训练的开销。CoLLiE支持主流的多种模型（如MOSS, InternLM, LLaMA, ChatGLM等），您可以轻松在不同的模型之间切换。此外，CoLLiE提供了丰富的文档，使初学者可以快速入门。同时，CoLLiE还提供了高度可定制化的功能和灵活的配置选项，使有经验的用户能够根据自己的需求进行个性化定制。无论您是初学者还是有经验的专业人士，CoLLiE都可以为您提供满足需求的解决方案。\n\n## 特点\n\nCoLLiE 基于 *DeepSpeed* 和 *PyTorch*，为大型语言模型提供协作式和高效的调优方法。\n它主要包括以下四个特点：\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"docs/assets/images/feature_list.png\" width=\"800px\"\u003e\n\u003c/div\u003e\n\n- 并行策略\n  - 数据并行 (DP)\n  - [流水线并行 (PP)](https://arxiv.org/pdf/1811.06965.pdf)\n  - [张量并行 (TP)](https://arxiv.org/pdf/2104.04473.pdf)\n  - [零冗余优化器 (ZeRO)](https://arxiv.org/pdf/1910.02054.pdf)\n- 高效微调\n  - [LOMO](https://arxiv.org/pdf/2306.09782.pdf)\n  - [LoRA](https://arxiv.org/pdf/2106.09685.pdf)\n  - [Flash Attention](https://arxiv.org/pdf/2205.14135.pdf)\n- 设计优雅\n- 用户友好\n\n\u003cdetails\u003e\n  \u003csummary\u003e完整特性\u003c/summary\u003e\n  \u003cdiv align=\"center\"\u003e\n      \u003cimg src=\"docs/assets/images/features.svg\" width=\"800px\"\u003e\n  \u003c/div\u003e\n\u003c/details\u003e\n\n## CoLLiE支持的模型\n- MOSS系列：[MOSS-MOON](https://github.com/OpenMOSS/MOSS)\n- InternLM系列：[InternLM2](https://github.com/InternLM/InternLM)\n- LLaMA系列：[LLaMA](https://github.com/meta-llama/llama)、[LLaMA-2](https://github.com/meta-llama/llama)\n- ChatGLM系列：[ChatGLM](https://github.com/THUDM/ChatGLM2-6B)、[ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)\n\n\n## 评测\n\n### 显存占用\n使用张量并行测试了批量大小为 1，序列长度为 2048，梯度累计步数为 2 下显存占用情况，结果如下：\n\n\u003cimg src=\"docs/assets/images/mem_req.png\" width=\"400px\"\u003e\n\n### 吞吐量\n在 A100 和 RTX-3090 上测试了不同批量大小下使用 Adam 优化器的吞吐量，结果如下：\n\n\u003cimg src=\"docs/assets/images/throughput.png\" width=\"800px\"\u003e\n\n## 安装\n在安装前，你需要确保：\n* PyTorch \u003e= 1.13\n* CUDA \u003e= 11.6 \n* Linux OS\n### PyPI安装\n你可以简单地通过PyPI安装，命令如下：\n```bash\npip install collie-lm\n```\n### 源码安装\n```bash\ngit clone https://github.com/OpenLMLab/collie\npython setup.py install\n```\n\n## Docker安装\n\n## 使用\n\n### 快速开始\n\n下面将提供一个使用CoLLiE训练Moss的样例，同时使用LOMO优化器，并且开启ZeRO3来降低显存消耗。\n\n那么，请按照下面的步骤开启你的大模型训练之旅吧~ \n\u003cimg src=\"docs/assets/images/mario-running.gif\" height=\"50px\"/\u003e\n\n#### 第一步：导入必要的包\n```python\nfrom transformers import AutoTokenizer\nfrom collie.config import CollieConfig\nfrom collie.data import CollieDatasetForTraining\nfrom collie.data import CollieDataLoader\nfrom collie.optim.lomo import Lomo\nfrom collie.controller.trainer import Trainer\nfrom collie.controller.evaluator import EvaluatorForPerplexity, EvaluatorForGeneration\nfrom collie.models.moss_moon import Moss003MoonForCausalLM\nfrom collie.utils.monitor import StepTimeMonitor, TGSMonitor, MemoryMonitor, LossMonitor, EvalMonitor\nfrom collie.metrics import DecodeMetric, PPLMetric\nfrom collie.module import GPTLMLoss\nfrom collie.utils.data_provider import GradioProvider\n```\n\n#### 第二步：设置路径\n选择预训练模型为MOSS\n```\npretrained_model = \"fnlp/moss-moon-003-sft\"\n```\n\n#### 第三步：设置CoLLiE配置\n```python\nconfig = CollieConfig.from_pretrained(pretrained_model, trust_remote_code=True)\n# 张量并行\nconfig.tp_size = 2\n# 数据并行\nconfig.dp_size = 1\n# 流水线并行\nconfig.pp_size = 1\n# 训练的epoch数量\nconfig.train_epochs = 1\n# 每{100}个step进行一次eval\nconfig.eval_per_n_steps = 100\n# 每{1}个epoch进行一次eval\nconfig.eval_per_n_epochs = 1 \n# 每个GPU的batch_size设置为{16}\nconfig.train_micro_batch_size = 16\n# 每次eval的batch_size为{1}\nconfig.eval_batch_size = 1\n# 设置DeepSpeed配置\nconfig.ds_config = {\n        # 开启FP16\n        \"fp16\": {\n            \"enabled\": True\n        },\n        \"zero_allow_untested_optimizer\": True,\n        \"zero_force_ds_cpu_optimizer\": False,\n        # 开启ZeRO-3\n        \"zero_optimization\": {\n            \"stage\": 3,\n            \"offload_optimizer\": {\n                \"device\": \"cpu\",\n                \"pin_memory\": False\n            }\n        },\n        \"monitor_config\": {\n            \"enabled\": True,\n            \"tag\": \"adan\",\n            \"csv_monitor\": {\n                \"enabled\": True,\n                \"output_path\": \"./ds_logs/\"\n            }\n        }\n}\n```\n\n#### 第四步：设置Tokenizer\n```python\ntokenizer = AutoTokenizer.from_pretrained(\"fnlp/moss-moon-003-sft\", trust_remote_code=True)\n```\n\n#### 第五步：加载数据集\n这里自定义一个数据集，数据格式可以提供两种形式，具体请参考文档。\n```python\ntrain_dataset = [\n    {\n        'input': 'Collie is a python package for ',\n        'output': 'finetuning large language models.'\n    } for _ in range(10000)\n]\ntrain_dataset = CollieDatasetForTraining(train_dataset, tokenizer)\neval_dataset = train_dataset[:32]\n```\n\n#### 第六步：加载预训练模型\n```python\nmodel = Moss003MoonForCausalLM.from_pretrained(pretrained_model, config=config)\n```\n\n#### 第七步：设置优化器\n```python\noptimizer = Lomo(\n    model,\n    lr = 0.001,\n    clip_grad_norm = 5.0\n)\n```\n\n#### 第八步：添加监视器\n```python\nmonitors = [\n    # 每个step用时监测\n    StepTimeMonitor(config),\n    # TGS（每秒生成token数量监测）\n    TGSMonitor(config),\n    # 显存使用情况监测\n    MemoryMonitor(config),\n    # Loss值监测\n    LossMonitor(config),\n    # Eval结果监测\n    EvalMonitor(config)\n]\n```\n\n#### 第九步：添加Evaluator\n这里添加两个Evaluator，分别用于计算PPL(困惑度：Perplexity)和保存Decode结果。\n```python\nevaluator_ppl = EvaluatorForPerplexity(\n    model = model,\n    config = config,\n    dataset = eval_dataset,\n    monitors = [\n        EvalMonitor(config)\n    ],\n    metrics = {\n        'ppl': PPLMetric()\n    }\n)\nevaluator_decode = EvaluatorForGeneration(\n    model = model,\n    config = config,\n    tokenizer = tokenizer,\n    dataset = eval_dataset,\n    monitors = [\n        EvalMonitor(config)\n    ],\n    metrics = {\n        'decode': DecodeMetric()\n    }\n\n)\n```\n\n#### 第十步：实例化Trainer\n```python\ntrainer = Trainer(\n    model = model,\n    config = config,\n    loss_fn = GPTLMLoss(-100),\n    optimizer = optimizer,\n    train_dataset = train_dataset,\n    monitors = monitors,\n    evaluators = [evaluator_ppl, evaluator_decode],\n)\n# 开始训练/验证\ntrainer.train()\n```\n\n#### 最后一步：启动命令行，开始训练！👍\n```bash\nCommand CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --rdzv_backend=c10d --rdzv_endpoint=localhost:29402 --nnodes=1 --nproc_per_node=4 finetune_moss_for_training.py\n```\n如果你的命令行出现如下的进度条，那么恭喜你，你已经成功开始训练你的大模型！\n\u003cdiv align=\"center\"\u003e\n \u003cimg src=\"docs/assets/images/progress.png\"\u003e\n\u003c/div\u003e\n\n完整代码请参考\u003ca href=\"https://github.com/OpenLMLab/collie/blob/dev/examples/finetune_moss_for_training.py\"\u003eexamples/finetune_moss_for_training.py\u003c/a\u003e。\n\n### 有趣的插件\n\nCoLLiE提供了许多即插即用的插件，下面将介绍Monitor检测器和异步DataProvider，更多插件等待探索和开发...\n\n#### Monitor监测器\n在CollieConfig.ds_config中添加monitor配置，并在Trainer中启用即可在训练过程中打开监测器。\n```python\n    \"monitor_config\": {\n        # 开启检测器\n        \"enabled\": True,\n        # 保存的文件名前缀\n        \"tag\": \"adan\",\n        # 保存文件格式:csv\n        \"csv_monitor\": {\n            \"enabled\": True,\n            # 保存文件夹\n            \"output_path\": \"./ds_logs/\"\n        }\n    }\n```\n启用检测器后，你将在`ds_logs`文件夹中获取相关的文件，如：\n\u003cdiv align=\"center\"\u003e\n \u003cimg src=\"docs/assets/images/monitor.png\"\u003e\n\u003c/div\u003e\n\n#### 异步DataProvider\n你只需要在Trainer中添加：data_provider，即可在训练过程中打开一个异步DataProvider，方便及时Human Eval！\n```python\ntrainer = Trainer(\n    model = model,\n    config = config,\n    loss_fn = GPTLMLoss(-100),\n    optimizer = optimizer,\n    train_dataset = train_dataset,\n    monitors = monitors,\n    evaluators = [evaluator_ppl, evaluator_decode],\n    # 添加\n    data_provider = GradioProvider(tokenizer)\n)\n```\n\u003cdiv align=\"center\"\u003e\n \u003cimg src=\"docs/assets/images/data_provider.png\"\u003e\n\u003c/div\u003e\n\n\n\n### 更多成功样例和完整教程\nCoLLiE提供了完整的 [教程](https://openlmlab-collie.readthedocs.io/zh_CN/latest/)。 更多的示例也可在 [示例](examples) 中查看。\n\n## 社区\n\n## 贡献者\n\u003ca href=\"https://github.com/Openlmlab/collie/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=Openlmlab/collie\" /\u003e\n\u003c/a\u003e\n\n## 引用我们\n","funding_links":[],"categories":["Python","A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenMOSS%2FCoLLiE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenMOSS%2FCoLLiE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenMOSS%2FCoLLiE/lists"}