{"id":13793590,"url":"https://github.com/THUDM/Chinese-Transformer-XL","last_synced_at":"2025-05-12T20:31:10.373Z","repository":{"id":37407010,"uuid":"345935787","full_name":"THUDM/Chinese-Transformer-XL","owner":"THUDM","description":null,"archived":false,"fork":false,"pushed_at":"2022-12-08T13:34:21.000Z","size":1172,"stargazers_count":218,"open_issues_count":8,"forks_count":37,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-05-08T23:34:09.147Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"README.md","changelog":"change_mp.py","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-09T08:31:36.000Z","updated_at":"2024-09-05T11:05:06.000Z","dependencies_parsed_at":"2023-01-25T13:16:07.111Z","dependency_job_id":null,"html_url":"https://github.com/THUDM/Chinese-Transformer-XL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FChinese-Transformer-XL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FChinese-Transformer-XL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FChinese-Transformer-XL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FChinese-Transformer-XL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/Chinese-Transformer-XL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253816710,"owners_count":21968870,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T23:00:25.837Z","updated_at":"2025-05-12T20:31:08.503Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":["Large Scale Pre-training for Language Generation"],"sub_categories":[],"readme":"# Chinese-Transformer-XL\n\nUnder construction\n\n本项目提供了智源研究院\"文汇\"\n预训练模型Chinese-Transformer-XL的预训练和文本生成代码。[[应用主页]](https://gpt-3.aminer.cn/) [[模型下载]](http://dorc-model-team.ks3-cn-beijing.ksyun.com/ren-zhi/my-model/mp_rank_00_model_states.pt)\n\n## 数据\n\n本模型使用了智源研究院发布的中文预训练语料[WuDaoCorpus](https://data.baai.ac.cn/data-set-details/0c8dc71dd06ae75a10ca422fb49b0751)\n。具体地，我们使用了WuDaoCorpus中来自百度百科+搜狗百科（133G）、知乎（131G）、百度知道（38G）的语料，一共303GB数据。\n\n## 模型\n\n本模型使用了[GPT-3](https://arxiv.org/abs/2005.14165)\n的训练目标，同时使用能够更好地处理长序列建模的[Transformer-XL](https://arxiv.org/abs/1901.02860) 替代了GPT中的Transformer。模型的结构与GPT-3\n2.7B（32层，隐表示维度2560，每层32个注意力头）基本相同，因为Transformer-XL的结构改动，模型参数增加到了29亿。\n\n## 结果\n\n为了验证模型的生成能力，我们在中文的开放域长文问答上进行了评测。我们从[知乎](https://www.zhihu.com)\n上随机选择了100个不同领域的、不在训练语料中的问题。对每个问题，由人类测试员对一个高赞同数回答、3个模型生成的回答和3个[CPM](https://github.com/TsinghuaAI/CPM-Generate)\n生成的回答在流畅度、信息量、相关度、总体四个维度进行打分。测评结果如下：\n\n|模型|流畅度(1-5)|信息量(1-5)|相关度(1-5)|总体(1-10)|\n|---|---|---|---|---|\n|CPM|2.66|2.47|2.36|4.32|\n|文汇|3.44|3.25|3.21|5.97|\n|人类答案|3.80|3.61|3.67|6.85|\n\n可以看到相比起CPM，\"文汇\"更接近人类所写的高赞答案。\n\n## 安装\n\n根据`requirements.txt`安装pytorch等基础依赖\n\n```shell\npip install -r requirements.txt\n```\n\n如果要finetune模型参数，还需要安装[DeepSpeed](https://github.com/microsoft/DeepSpeed)\n\n```shell\nDS_BUILD_OPS=1 pip install deepspeed\n```\n也可以使用我们提供的[Docker镜像](https://github.com/THUDM/GLM#docker-image)\n\n## 推理\n\n首先下载模型的[checkpoint](http://dorc-model-team.ks3-cn-beijing.ksyun.com/ren-zhi/my-model/mp_rank_00_model_states.pt) ，目录结构如下\n\n```\n.\n└─ txl-2.9B\n       └─ mp_rank_00_model_states.pt\n```\n\n然后运行交互式生成脚本\n\n```shell\nbash scripts/generate_text.sh ./txl-2.9B\n```\n\n## Finetune\n\n模型的finetune基于使用DeepSpeed。首先在`scripts/ds_finetune_gpt_2.9B.sh`中修改`NUM_WORKERS`和`NUM_GPUS_PER_WORKER`\n为使用的节点数目和每个节点的GPU数量。如果使用多机训练的话，还要修改`HOST_FILE_PATH`\n为hostfile的路径（DeepSpeed使用[OpenMPI风格的hostfile](https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node)\n）。\n\n然后运行finetune脚本\n\n```shell\nbash scripts/ds_finetune_gpt_2.9B.sh ./txl-2.9B ./data.json\n```\n\n其中`./txl-2.9B`为checkpoint目录。`./data.json`为finetune数据，格式为[jsonl文件](https://jsonlines.org/)\n，每条数据的格式为`{\"prompt\": ..,  \"text\": ...}`。其中prompt为生成的context，text为生成的内容。\n\n如果你在finetune的遇到了OOM错误（一般是因为GPU数量或者显存不足导致的），可以尝试在[scripts/ds_config_2.9B_finetune.json](scripts/ds_config_2.9B_finetune.json)的`zero_optimization`部分添加`\"cpu_offload\": true`，来开启[ZeRO-Offload](https://www.deepspeed.ai/tutorials/zero-offload/) 以减少显存消耗。\n\n## 模型并行\n如果你的显存大小比较有限，可以尝试使用模型并行来减少显存消耗。我们提供的模型checkpoint是在单卡上运行的。首先使用[change_mp.py](change_mp.py)来对hceckpoint进行切分\n```shell\npython change_mp.py ./txl-2.9B 2\n```\n其中2表示2路模型并行。在推理和finetune的时候，将脚本中的MP_SIZE改为2，然后使用./txl-2.9B_MP2作为运行脚本时的checkpoint路径。\n## 引用","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTHUDM%2FChinese-Transformer-XL","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTHUDM%2FChinese-Transformer-XL","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTHUDM%2FChinese-Transformer-XL/lists"}