{"id":13595272,"url":"https://github.com/bojone/bert4keras","last_synced_at":"2025-05-08T22:19:14.945Z","repository":{"id":37334230,"uuid":"204388414","full_name":"bojone/bert4keras","owner":"bojone","description":"keras implement of transformers for humans","archived":false,"fork":false,"pushed_at":"2024-11-11T15:41:47.000Z","size":10121,"stargazers_count":5405,"open_issues_count":166,"forks_count":928,"subscribers_count":70,"default_branch":"master","last_synced_at":"2025-05-08T22:19:08.846Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://kexue.fm/archives/6915","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bojone.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-26T03:27:19.000Z","updated_at":"2025-05-08T06:35:51.000Z","dependencies_parsed_at":"2024-12-17T11:00:22.666Z","dependency_job_id":"717ca0fd-767b-4502-a0a7-fe729de039ae","html_url":"https://github.com/bojone/bert4keras","commit_stats":{"total_commits":1908,"total_committers":11,"mean_commits":"173.45454545454547","dds":0.009433962264150941,"last_synced_commit":"c1ae0dc5eff66d329ab3c5c7ee369124ddbb6f87"},"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bojone%2Fbert4keras","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bojone%2Fbert4keras/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bojone%2Fbert4keras/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bojone%2Fbert4keras/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bojone","download_url":"https://codeload.github.com/bojone/bert4keras/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253155005,"owners_count":21862625,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:46.797Z","updated_at":"2025-05-08T22:19:14.917Z","avatar_url":"https://github.com/bojone.png","language":"Python","readme":"# bert4keras\n- Our light reimplement of bert for keras\n- 更清晰、更轻量级的keras版bert\n- 个人博客：https://kexue.fm/\n- 在线文档：http://bert4keras.spaces.ac.cn/ （还在构建中）\n\n## 说明\n这是笔者重新实现的keras版的transformer模型库，致力于用尽可能清爽的代码来实现结合transformer和keras。\n\n本项目的初衷是为了修改、定制上的方便，所以可能会频繁更新。\n\n因此欢迎star，但不建议fork，因为你fork下来的版本可能很快就过期了。\n\n## 功能\n目前已经实现：\n- 加载bert/roberta/albert的预训练权重进行finetune；\n- 实现语言模型、seq2seq所需要的attention mask；\n- 丰富的\u003ca href=\"https://github.com/bojone/bert4keras/tree/master/examples\"\u003eexamples\u003c/a\u003e；\n- 从零预训练代码（支持TPU、多GPU，请看\u003ca href=\"https://github.com/bojone/bert4keras/tree/master/pretraining\"\u003epretraining\u003c/a\u003e）；\n- 兼容keras、tf.keras\n\n## 使用\n安装稳定版：\n```shell\npip install bert4keras\n```\n安装最新版：\n```shell\npip install git+https://www.github.com/bojone/bert4keras.git\n```\n\n使用例子请参考\u003ca href=\"https://github.com/bojone/bert4keras/blob/master/examples\"\u003eexamples\u003c/a\u003e目录。\n\n之前基于keras-bert给出的\u003ca href=\"https://github.com/bojone/bert_in_keras\"\u003e例子\u003c/a\u003e，仍适用于本项目，只需要将`bert_model`的加载方式换成本项目的。\n\n理论上兼容Python2和Python3，兼容tensorflow 1.14+和tensorflow 2.x，实验环境是Python 2.7、Tesorflow 1.14+以及Keras 2.3.1（已经在2.2.4、2.3.0、2.3.1、tf.keras下测试通过）。\n\n**为了获得最好的体验，建议你使用Tensorflow 1.14 + Keras 2.3.1组合。**\n\n\u003cblockquote\u003e\u003cstrong\u003e关于环境组合\u003c/strong\u003e\n  \n- 支持tf+keras和tf+tf.keras，后者需要提前传入环境变量TF_KERAS=1。\n\n- 当使用tf+keras时，建议2.2.4 \u003c= keras \u003c= 2.3.1，以及 1.14 \u003c= tf \u003c= 2.2，不能使用tf 2.3+。\n\n- keras 2.4+可以用，但事实上keras 2.4.x基本上已经完全等价于tf.keras了，因此如果你要用keras 2.4+，倒不如直接用tf.keras。\n\u003c/blockquote\u003e\n\n当然，乐于贡献的朋友如果发现了某些bug的话，也欢迎指出修正甚至Pull Requests～\n\n## 权重\n\n目前支持加载的权重：\n- \u003cstrong\u003eGoogle原版bert\u003c/strong\u003e: https://github.com/google-research/bert\n- \u003cstrong\u003ebrightmart版roberta\u003c/strong\u003e: https://github.com/brightmart/roberta_zh\n- \u003cstrong\u003e哈工大版roberta\u003c/strong\u003e: https://github.com/ymcui/Chinese-BERT-wwm\n- \u003cstrong\u003eGoogle原版albert\u003c/strong\u003e\u003csup\u003e\u003ca href=\"https://github.com/bojone/bert4keras/issues/29#issuecomment-552188981\"\u003e[例子]\u003c/a\u003e\u003c/sup\u003e: https://github.com/google-research/ALBERT\n- \u003cstrong\u003ebrightmart版albert\u003c/strong\u003e: https://github.com/brightmart/albert_zh\n- \u003cstrong\u003e转换后的albert\u003c/strong\u003e: https://github.com/bojone/albert_zh\n- \u003cstrong\u003e华为的NEZHA\u003c/strong\u003e: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/NEZHA-TensorFlow\n- \u003cstrong\u003e华为的NEZHA-GEN\u003c/strong\u003e: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/NEZHA-Gen-TensorFlow\n- \u003cstrong\u003e自研语言模型\u003c/strong\u003e: https://github.com/ZhuiyiTechnology/pretrained-models\n- \u003cstrong\u003eT5模型\u003c/strong\u003e: https://github.com/google-research/text-to-text-transfer-transformer\n- \u003cstrong\u003eGPT_OpenAI\u003c/strong\u003e: https://github.com/bojone/CDial-GPT-tf\n- \u003cstrong\u003eGPT2_ML\u003c/strong\u003e: https://github.com/imcaspar/gpt2-ml\n- \u003cstrong\u003eGoogle原版ELECTRA\u003c/strong\u003e: https://github.com/google-research/electra\n- \u003cstrong\u003e哈工大版ELECTRA\u003c/strong\u003e: https://github.com/ymcui/Chinese-ELECTRA\n- \u003cstrong\u003eCLUE版ELECTRA\u003c/strong\u003e: https://github.com/CLUEbenchmark/ELECTRA\n- \u003cstrong\u003eLaBSE（多国语言BERT）\u003c/strong\u003e: https://github.com/bojone/labse\n- \u003cstrong\u003eChinese-GEN项目下的模型\u003c/strong\u003e: https://github.com/bojone/chinese-gen\n- \u003cstrong\u003eT5.1.1\u003c/strong\u003e: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md#t511\n- \u003cstrong\u003eMultilingual T5\u003c/strong\u003e: https://github.com/google-research/multilingual-t5/\n\n\u003cstrong\u003e注意事项\u003c/strong\u003e\n- 注1：brightmart版albert的开源时间早于Google版albert，这导致早期brightmart版albert的权重与Google版的不完全一致，换言之两者不能直接相互替换。为了减少代码冗余，bert4keras的0.2.4及后续版本均只支持加载\u003cu\u003eGoogle版\u003c/u\u003e以brightmart版中\u003cu\u003e带Google字眼\u003c/u\u003e的权重。如果要加载早期版本的权重，请用\u003ca href=\"https://github.com/bojone/bert4keras/releases/tag/v0.2.3\"\u003e0.2.3版本\u003c/a\u003e，或者考虑作者转换过的\u003ca href=\"https://github.com/bojone/albert_zh\"\u003ealbert_zh\u003c/a\u003e。\n- 注2：下载下来的ELECTRA权重，如果没有json配置文件的话，参考\u003ca href=\"https://github.com/ymcui/Chinese-ELECTRA/issues/3\"\u003e这里\u003c/a\u003e自己改一个（需要加上`type_vocab_size`字段）。\n\n## 更新\n- \u003cstrong\u003e2023.03.06\u003c/strong\u003e: [无穷大改np.inf；优化显存占用](https://github.com/bojone/bert4keras/commit/20a46946156b4bc15ceaa00671fcd00c8b702640)。将无穷大改为np.inf，运算更加准确，而且在低精度运算时不容易出错；同时合并了若干mask算子，减少了显存占用。实测在A100上训练base和large级别模型时，速度有明显加快，显存占用也有降低。\n- \u003cstrong\u003e2022.03.20\u003c/strong\u003e: 增加[RoFormerV2](https://kexue.fm/archives/8998)。\n- \u003cstrong\u003e2022.02.28\u003c/strong\u003e: 增加[GatedAttentionUnit](https://kexue.fm/archives/8934)。\n- \u003cstrong\u003e2021.04.23\u003c/strong\u003e: 增加[GlobalPointer](https://kexue.fm/archives/8373)。\n- \u003cstrong\u003e2021.03.23\u003c/strong\u003e: 增加[RoFormer](https://kexue.fm/archives/8265)。\n- \u003cstrong\u003e2021.01.30\u003c/strong\u003e: 发布0.9.9版，完善多GPU支持，增加多GPU例子：[task_seq2seq_autotitle_multigpu.py](https://github.com/bojone/bert4keras/blob/master/examples/task_seq2seq_autotitle_multigpu.py)。\n- \u003cstrong\u003e2020.12.29\u003c/strong\u003e: 增加`residual_attention_scores`参数来实现RealFormer，只需要在`build_transformer_model`中传入参数`residual_attention_scores=True`启用。\n- \u003cstrong\u003e2020.12.04\u003c/strong\u003e: `PositionEmbedding`引入层次分解，可以让BERT直接处理超长文本，在`build_transformer_model`中传入参数`hierarchical_position=True`启用。\n- \u003cstrong\u003e2020.11.19\u003c/strong\u003e: 支持GPT2模型，参考[CPM_LM_bert4keras](https://github.com/bojone/CPM_LM_bert4keras)项目。\n- \u003cstrong\u003e2020.11.14\u003c/strong\u003e: 新增分参数学习率`extend_with_parameter_wise_lr`，可用于给每层设置不同的学习率。\n- \u003cstrong\u003e2020.10.27\u003c/strong\u003e: 支持\u003ca href=\"https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md#t511\"\u003eT5.1.1\u003c/a\u003e和\u003ca href=\"https://github.com/google-research/multilingual-t5/\"\u003eMultilingual T5\u003c/a\u003e。\n- \u003cstrong\u003e2020.08.28\u003c/strong\u003e: 支持\u003ca href=\"https://github.com/bojone/CDial-GPT-tf\"\u003eGPT_OpenAI\u003c/a\u003e。\n- \u003cstrong\u003e2020.08.22\u003c/strong\u003e: 新增`WebServing`类，允许简单地将模型转换为Web接口，详情请参考该类的\u003ca href=\"https://github.com/bojone/bert4keras/blob/8d55512a12e4677262363ac189ebf504fc451716/bert4keras/snippets.py#L580\"\u003e说明\u003c/a\u003e。\n- \u003cstrong\u003e2020.07.14\u003c/strong\u003e: `Transformer`类加入`prefix`参数；`snippets.py`引入`to_array`函数；`AutoRegressiveDecoder`修改`rtype='logits'`时的一个隐藏bug。\n- \u003cstrong\u003e2020.06.06\u003c/strong\u003e: 强迫症作祟：将`Tokenizer`原来的`max_length`参数重命名为`maxlen`，同时保留向后兼容性，建议大家用新参数名。\n- \u003cstrong\u003e2020.04.29\u003c/strong\u003e: 增加重计算（参考\u003ca href=\"https://github.com/bojone/keras_recompute\"\u003ekeras_recompute\u003c/a\u003e），可以通过时间换空间，通过设置环境变量`RECOMPUTE=1`启用。\n- \u003cstrong\u003e2020.04.25\u003c/strong\u003e: 优化tf2下的表现。\n- \u003cstrong\u003e2020.04.16\u003c/strong\u003e: 所有example均适配tensorflow 2.0。\n- \u003cstrong\u003e2020.04.06\u003c/strong\u003e: 增加UniLM预训练模式（测试中）。\n- \u003cstrong\u003e2020.04.06\u003c/strong\u003e: 完善`rematch`方法。\n- \u003cstrong\u003e2020.04.01\u003c/strong\u003e: `Tokenizer`增加`rematch`方法，给出分词结果与原序列的映射关系。\n- \u003cstrong\u003e2020.03.30\u003c/strong\u003e: 尽量统一py文件的写法。\n- \u003cstrong\u003e2020.03.25\u003c/strong\u003e: 支持ELECTRA。\n- \u003cstrong\u003e2020.03.24\u003c/strong\u003e: 继续加强`DataGenerator`，允许传入迭代器时进行局部shuffle。\n- \u003cstrong\u003e2020.03.23\u003c/strong\u003e: 增加调整Attention的`key_size`的选项。\n- \u003cstrong\u003e2020.03.17\u003c/strong\u003e: 增强`DataGenerator`；优化模型写法。\n- \u003cstrong\u003e2020.03.15\u003c/strong\u003e: 支持\u003ca href=\"https://github.com/imcaspar/gpt2-ml\"\u003eGPT2_ML\u003c/a\u003e。\n- \u003cstrong\u003e2020.03.10\u003c/strong\u003e: 支持Google的\u003ca href=\"https://github.com/google-research/text-to-text-transfer-transformer\"\u003eT5\u003c/a\u003e模型。\n- \u003cstrong\u003e2020.03.05\u003c/strong\u003e: 将`tokenizer.py`更名为`tokenizers.py`。\n- \u003cstrong\u003e2020.03.05\u003c/strong\u003e: `application='seq2seq'`改名为`application='unilm'`。\n- \u003cstrong\u003e2020.03.05\u003c/strong\u003e: `build_bert_model`更名为`build_transformer_model`。\n- \u003cstrong\u003e2020.03.05\u003c/strong\u003e: 重写`models.py`结构。\n- \u003cstrong\u003e2020.03.04\u003c/strong\u003e: 将`bert.py`更名为`models.py`。\n- \u003cstrong\u003e2020.03.02\u003c/strong\u003e: 重构mask机制（用回Keras自带的mask机制），以便更好地编写更复杂的应用。\n- \u003cstrong\u003e2020.02.22\u003c/strong\u003e: 新增`AutoRegressiveDecoder`类，统一处理Seq2Seq的解码问题。\n- \u003cstrong\u003e2020.02.19\u003c/strong\u003e: transformer block的前缀改为Transformer（本来是Encoder），使得其含义局限性更少。\n- \u003cstrong\u003e2020.02.13\u003c/strong\u003e: 优化`load_vocab`函数；将`build_bert_model`中的`keep_words`参数更名为`keep_tokens`，此处改动可能会对部分脚本产生影响。\n- \u003cstrong\u003e2020.01.18\u003c/strong\u003e: 调整文本处理方式，去掉codecs的使用。\n- \u003cstrong\u003e2020.01.17\u003c/strong\u003e: 各api日趋稳定，为了方便大家使用，打包到\u003ca href=\"https://pypi.org/project/bert4keras/\"\u003epypi\u003c/a\u003e，首个打包版本号为0.4.6。\n- \u003cstrong\u003e2020.01.10\u003c/strong\u003e: 重写模型mask方案，某种程度上让代码更为简练清晰；后端优化。\n- \u003cstrong\u003e2019.12.27\u003c/strong\u003e: 重构预训练代码，减少冗余；目前支持RoBERTa和GPT两种预训练方式，详见\u003ca href=\"https://github.com/bojone/bert4keras/tree/master/pretraining/\"\u003epretraining\u003c/a\u003e。\n- \u003cstrong\u003e2019.12.17\u003c/strong\u003e: 适配华为的\u003ca href=\"https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/NEZHA\"\u003enezha\u003c/a\u003e权重，只需要在`build_bert_model`函数里加上`model='nezha'`；此外原来albert的加载方式`albert=True`改为`model='albert'`。\n- \u003cstrong\u003e2019.12.16\u003c/strong\u003e: 通过跟keras 2.3+版本类似的思路给低版本引入层中层功能，从而恢复对低于2.3.0版本的keras的支持。\n- \u003cstrong\u003e2019.12.14\u003c/strong\u003e: 新增Conditional Layer Normalization及相关demo。\n- \u003cstrong\u003e2019.12.09\u003c/strong\u003e: 各example的data_generator规范化；修复application='lm'时的一个错误。\n- \u003cstrong\u003e2019.12.05\u003c/strong\u003e: 优化tokenizer的do_lower_case，同时微调各个example。\n- \u003cstrong\u003e2019.11.23\u003c/strong\u003e: 将train.py重命名为optimizers.py，更新大量优化器实现，全面兼容keras和tf.keras。\n- \u003cstrong\u003e2019.11.19\u003c/strong\u003e: 将utils.py重命名为tokenizer.py。\n- \u003cstrong\u003e2019.11.19\u003c/strong\u003e: 想来想去，最后还是决定把snippets放到\u003ca href=\"https://github.com/bojone/bert4keras/blob/master/bert4keras/snippets.py\"\u003ebert4keras.snippets\u003c/a\u003e下面去好了。\n- \u003cstrong\u003e2019.11.18\u003c/strong\u003e: 优化预训练权重加载逻辑，增加保存模型权重至Bert的checkpoint格式方法。\n- \u003cstrong\u003e2019.11.17\u003c/strong\u003e: \u003cdel\u003e分离一些与Bert本身不直接相关的常用代码片段到\u003ca href=\"https://github.com/bojone/python-snippets\"\u003epython_snippets\u003c/a\u003e，供其它项目共用。\u003c/del\u003e\n- \u003cstrong\u003e2019.11.11\u003c/strong\u003e: 添加NSP部分。\n- \u003cstrong\u003e2019.11.05\u003c/strong\u003e: 适配\u003ca href=\"https://github.com/google-research/google-research/tree/master/albert\"\u003egoogle版albert\u003c/a\u003e，不再支持\u003ca href=\"https://github.com/brightmart/albert_zh\"\u003e非Google版albert_zh\u003c/a\u003e。\n- \u003cstrong\u003e2019.11.05\u003c/strong\u003e: 以RoBERTa为例子的预训练代码开发完毕，同时支持TPU/多GPU训练，详见\u003ca href=\"https://github.com/bojone/bert4keras/tree/master/pretraining/roberta/\"\u003eroberta\u003c/a\u003e。欢迎在此基础上构建更多的预训练代码。\n- \u003cstrong\u003e2019.11.01\u003c/strong\u003e: 逐步增加预训练相关代码，详见\u003ca href=\"https://github.com/bojone/bert4keras/tree/master/pretraining\"\u003epretraining\u003c/a\u003e。\n- \u003cstrong\u003e2019.10.28\u003c/strong\u003e: 支持使用基于\u003ca href=\"https://github.com/google/sentencepiece\"\u003esentencepiece\u003c/a\u003e的tokenizer。\n- \u003cstrong\u003e2019.10.25\u003c/strong\u003e: 引入原生tokenizer。\n- \u003cstrong\u003e2019.10.22\u003c/strong\u003e: 引入梯度累积优化器。\n- \u003cstrong\u003e2019.10.21\u003c/strong\u003e: 为了简化代码结构，决定放弃keras 2.3.0之前的版本的支持，目前只支持keras 2.3.0+以及tf.keras。\n- \u003cstrong\u003e2019.10.20\u003c/strong\u003e: 应网友要求，现支持直接用`model.save`保存模型结构，用`load_model`加载整个模型（只需要在`load_model`之前执行`from bert4keras.layers import *`，不需要额外写`custom_objects`）。\n- \u003cstrong\u003e2019.10.09\u003c/strong\u003e: 已兼容tf.keras，同时在tf 1.13和tf 2.0下的tf.keras测试通过，通过设置环境变量`TF_KERAS=1`来切换tf.keras。\n- \u003cstrong\u003e2019.10.09\u003c/strong\u003e: 已兼容Keras 2.3.x，但只是临时方案，后续可能直接移除掉2.3之前版本的支持。\n- \u003cstrong\u003e2019.10.02\u003c/strong\u003e: 适配albert，能成功加载\u003ca href=\"https://github.com/brightmart/albert_zh\"\u003ealbert_zh\u003c/a\u003e的权重，只需要在`load_pretrained_model`函数里加上`albert=True`。\n\n## 背景\n之前一直用CyberZHG大佬的\u003ca href=\"https://github.com/CyberZHG/keras-bert\"\u003ekeras-bert\u003c/a\u003e，如果纯粹只是为了在keras下对bert进行调用和fine tune来说，keras-bert已经足够能让人满意了。\n\n然而，如果想要在加载官方预训练权重的基础上，对bert的内部结构进行修改，那么keras-bert就比较难满足我们的需求了，因为keras-bert为了代码的复用性，几乎将每个小模块都封装为了一个单独的库，比如keras-bert依赖于keras-transformer，而keras-transformer依赖于keras-multi-head，keras-multi-head依赖于keras-self-attention，这样一重重依赖下去，改起来就相当头疼了。\n\n所以，我决定重新写一个keras版的bert，争取在几个文件内把它完整地实现出来，减少这些依赖性，并且保留可以加载官方预训练权重的特性。\n\n## 鸣谢\n感谢CyberZHG大佬实现的\u003ca href=\"https://github.com/CyberZHG/keras-bert\"\u003ekeras-bert\u003c/a\u003e，本实现有不少地方参考了keras-bert的源码，在此衷心感谢大佬的无私奉献。\n\n## 相关\n\n[bert4torch](https://github.com/Tongjilibo/bert4torch)：一个跟bert4keras风格很相似的pytorch-based的transofrmer库，使用pytorch的读者可以尝试。\n\n## 引用\n\n```\n@misc{bert4keras,\n  title={bert4keras},\n  author={Jianlin Su},\n  year={2020},\n  howpublished={\\url{https://bert4keras.spaces.ac.cn}},\n}\n```\n","funding_links":[],"categories":["Pretrained Language Model","Python","Transformer Implementations By Communities","BERT优化"],"sub_categories":["Repository","Keras","大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbojone%2Fbert4keras","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbojone%2Fbert4keras","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbojone%2Fbert4keras/lists"}