{"id":13435596,"url":"https://github.com/kpe/bert-for-tf2","last_synced_at":"2025-05-15T18:06:11.018Z","repository":{"id":34898293,"uuid":"187989988","full_name":"kpe/bert-for-tf2","owner":"kpe","description":"A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.","archived":false,"fork":false,"pushed_at":"2023-01-13T10:12:34.000Z","size":290,"stargazers_count":808,"open_issues_count":33,"forks_count":196,"subscribers_count":33,"default_branch":"master","last_synced_at":"2025-05-09T02:47:36.639Z","etag":null,"topics":["bert","keras","tensorflow","transformer"],"latest_commit_sha":null,"homepage":"https://github.com/kpe/bert-for-tf2","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kpe.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-22T07:51:33.000Z","updated_at":"2025-05-08T06:55:01.000Z","dependencies_parsed_at":"2023-01-15T10:04:41.754Z","dependency_job_id":null,"html_url":"https://github.com/kpe/bert-for-tf2","commit_stats":null,"previous_names":[],"tags_count":68,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpe%2Fbert-for-tf2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpe%2Fbert-for-tf2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpe%2Fbert-for-tf2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpe%2Fbert-for-tf2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kpe","download_url":"https://codeload.github.com/kpe/bert-for-tf2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254394719,"owners_count":22063984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","keras","tensorflow","transformer"],"created_at":"2024-07-31T03:00:37.250Z","updated_at":"2025-05-15T18:06:10.994Z","avatar_url":"https://github.com/kpe.png","language":"Python","readme":"BERT for TensorFlow v2\n======================\n\n|Build Status| |Coverage Status| |Version Status| |Python Versions| |Downloads|\n\nThis repo contains a `TensorFlow 2.0`_ `Keras`_ implementation of `google-research/bert`_\nwith support for loading of the original `pre-trained weights`_,\nand producing activations **numerically identical** to the one calculated by the original model.\n\n`ALBERT`_ and `adapter-BERT`_ are also supported by setting the corresponding\nconfiguration parameters (``shared_layer=True``, ``embedding_size`` for `ALBERT`_\nand ``adapter_size`` for `adapter-BERT`_). Setting both will result in an adapter-ALBERT\nby sharing the BERT parameters across all layers while adapting every layer with layer specific adapter.\n\nThe implementation is build from scratch using only basic tensorflow operations,\nfollowing the code in `google-research/bert/modeling.py`_\n(but skipping dead code and applying some simplifications). It also utilizes `kpe/params-flow`_ to reduce\ncommon Keras boilerplate code (related to passing model and layer configuration arguments).\n\n`bert-for-tf2`_ should work with both `TensorFlow 2.0`_ and `TensorFlow 1.14`_ or newer.\n\nNEWS\n----\n - **30.Jul.2020** - `VERBOSE=0` env variable for suppressing stdout output.\n - **06.Apr.2020** - using latest ``py-params`` introducing ``WithParams`` base for ``Layer``\n   and ``Model``. See news in `kpe/py-params`_ for how to update (``_construct()`` signature has change and\n   requires calling ``super().__construct()``).\n - **06.Jan.2020** - support for loading the tar format weights from `google-research/ALBERT`.\n - **18.Nov.2019** - ALBERT tokenization added (make sure to import as ``from bert import albert_tokenization`` or ``from bert import bert_tokenization``).\n\n - **08.Nov.2019** - using v2 per default when loading the `TFHub/albert`_ weights of `google-research/ALBERT`_.\n\n - **05.Nov.2019** - minor ALBERT word embeddings refactoring (``word_embeddings_2`` -\u003e ``word_embeddings_projector``) and related parameter freezing fixes.\n\n - **04.Nov.2019** - support for extra (task specific) token embeddings using negative token ids.\n\n - **29.Oct.2019** - support for loading of the pre-trained ALBERT weights released by `google-research/ALBERT`_  at `TFHub/albert`_.\n\n - **11.Oct.2019** - support for loading of the pre-trained ALBERT weights released by `brightmart/albert_zh ALBERT for Chinese`_.\n\n - **10.Oct.2019** - support for `ALBERT`_ through the ``shared_layer=True``\n   and ``embedding_size=128`` params.\n\n - **03.Sep.2019** - walkthrough on fine tuning with adapter-BERT and storing the\n   fine tuned fraction of the weights in a separate checkpoint (see ``tests/test_adapter_finetune.py``).\n\n - **02.Sep.2019** - support for extending the token type embeddings of a pre-trained model\n   by returning the mismatched weights in ``load_stock_weights()`` (see ``tests/test_extend_segments.py``).\n\n - **25.Jul.2019** - there are now two colab notebooks under ``examples/`` showing how to\n   fine-tune an IMDB Movie Reviews sentiment classifier from pre-trained BERT weights\n   using an `adapter-BERT`_ model architecture on a GPU or TPU in Google Colab.\n\n - **28.Jun.2019** - v.0.3.0 supports `adapter-BERT`_ (`google-research/adapter-bert`_)\n   for \"Parameter-Efficient Transfer Learning for NLP\", i.e. fine-tuning small overlay adapter\n   layers over BERT's transformer encoders without changing the frozen BERT weights.\n\n\n\nLICENSE\n-------\n\nMIT. See `License File \u003chttps://github.com/kpe/bert-for-tf2/blob/master/LICENSE.txt\u003e`_.\n\nInstall\n-------\n\n``bert-for-tf2`` is on the Python Package Index (PyPI):\n\n::\n\n    pip install bert-for-tf2\n\n\nUsage\n-----\n\nBERT in `bert-for-tf2` is implemented as a Keras layer. You could instantiate it like this:\n\n.. code:: python\n\n  from bert import BertModelLayer\n\n  l_bert = BertModelLayer(**BertModelLayer.Params(\n    vocab_size               = 16000,        # embedding params\n    use_token_type           = True,\n    use_position_embeddings  = True,\n    token_type_vocab_size    = 2,\n\n    num_layers               = 12,           # transformer encoder params\n    hidden_size              = 768,\n    hidden_dropout           = 0.1,\n    intermediate_size        = 4*768,\n    intermediate_activation  = \"gelu\",\n\n    adapter_size             = None,         # see arXiv:1902.00751 (adapter-BERT)\n\n    shared_layer             = False,        # True for ALBERT (arXiv:1909.11942)\n    embedding_size           = None,         # None for BERT, wordpiece embedding size for ALBERT\n\n    name                     = \"bert\"        # any other Keras layer params\n  ))\n\nor by using the ``bert_config.json`` from a `pre-trained google model`_:\n\n.. code:: python\n\n  import bert\n\n  model_dir = \".models/uncased_L-12_H-768_A-12\"\n\n  bert_params = bert.params_from_pretrained_ckpt(model_dir)\n  l_bert = bert.BertModelLayer.from_params(bert_params, name=\"bert\")\n\n\nnow you can use the BERT layer in your Keras model like this:\n\n.. code:: python\n\n  from tensorflow import keras\n\n  max_seq_len = 128\n  l_input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32')\n  l_token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32')\n\n  # using the default token_type/segment id 0\n  output = l_bert(l_input_ids)                              # output: [batch_size, max_seq_len, hidden_size]\n  model = keras.Model(inputs=l_input_ids, outputs=output)\n  model.build(input_shape=(None, max_seq_len))\n\n  # provide a custom token_type/segment id as a layer input\n  output = l_bert([l_input_ids, l_token_type_ids])          # [batch_size, max_seq_len, hidden_size]\n  model = keras.Model(inputs=[l_input_ids, l_token_type_ids], outputs=output)\n  model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])\n\nif you choose to use `adapter-BERT`_ by setting the `adapter_size` parameter,\nyou would also like to freeze all the original BERT layers by calling:\n\n.. code:: python\n\n  l_bert.apply_adapter_freeze()\n\nand once the model has been build or compiled, the original pre-trained weights\ncan be loaded in the BERT layer:\n\n.. code:: python\n\n  import bert\n\n  bert_ckpt_file   = os.path.join(model_dir, \"bert_model.ckpt\")\n  bert.load_stock_weights(l_bert, bert_ckpt_file)\n\n**N.B.** see `tests/test_bert_activations.py`_ for a complete example.\n\nFAQ\n---\n0. In all the examlpes bellow, **please note** the line:\n\n.. code:: python\n\n  # use in a Keras Model here, and call model.build()\n\nfor a quick test, you can replace it with something like:\n\n.. code:: python\n\n  model = keras.models.Sequential([\n    keras.layers.InputLayer(input_shape=(128,)),\n    l_bert,\n    keras.layers.Lambda(lambda x: x[:, 0, :]),\n    keras.layers.Dense(2)\n  ])\n  model.build(input_shape=(None, 128))\n\n\n1. How to use BERT with the `google-research/bert`_ pre-trained weights?\n\n.. code:: python\n\n  model_name = \"uncased_L-12_H-768_A-12\"\n  model_dir = bert.fetch_google_bert_model(model_name, \".models\")\n  model_ckpt = os.path.join(model_dir, \"bert_model.ckpt\")\n\n  bert_params = bert.params_from_pretrained_ckpt(model_dir)\n  l_bert = bert.BertModelLayer.from_params(bert_params, name=\"bert\")\n\n  # use in a Keras Model here, and call model.build()\n\n  bert.load_bert_weights(l_bert, model_ckpt)      # should be called after model.build()\n\n2. How to use ALBERT with the `google-research/ALBERT`_ pre-trained weights (fetching from TFHub)?\n\nsee `tests/nonci/test_load_pretrained_weights.py \u003chttps://github.com/kpe/bert-for-tf2/blob/master/tests/nonci/test_load_pretrained_weights.py\u003e`_:\n\n.. code:: python\n\n  model_name = \"albert_base\"\n  model_dir    = bert.fetch_tfhub_albert_model(model_name, \".models\")\n  model_params = bert.albert_params(model_name)\n  l_bert = bert.BertModelLayer.from_params(model_params, name=\"albert\")\n\n  # use in a Keras Model here, and call model.build()\n\n  bert.load_albert_weights(l_bert, albert_dir)      # should be called after model.build()\n\n3. How to use ALBERT with the `google-research/ALBERT`_ pre-trained weights (non TFHub)?\n\nsee `tests/nonci/test_load_pretrained_weights.py \u003chttps://github.com/kpe/bert-for-tf2/blob/master/tests/nonci/test_load_pretrained_weights.py\u003e`_:\n\n.. code:: python\n\n  model_name = \"albert_base_v2\"\n  model_dir    = bert.fetch_google_albert_model(model_name, \".models\")\n  model_ckpt   = os.path.join(albert_dir, \"model.ckpt-best\")\n\n  model_params = bert.albert_params(model_dir)\n  l_bert = bert.BertModelLayer.from_params(model_params, name=\"albert\")\n\n  # use in a Keras Model here, and call model.build()\n\n  bert.load_albert_weights(l_bert, model_ckpt)      # should be called after model.build()\n\n4. How to use ALBERT with the `brightmart/albert_zh`_ pre-trained weights?\n\nsee `tests/nonci/test_albert.py \u003chttps://github.com/kpe/bert-for-tf2/blob/master/tests/nonci/test_albert.py\u003e`_:\n\n.. code:: python\n\n  model_name = \"albert_base\"\n  model_dir = bert.fetch_brightmart_albert_model(model_name, \".models\")\n  model_ckpt = os.path.join(model_dir, \"albert_model.ckpt\")\n\n  bert_params = bert.params_from_pretrained_ckpt(model_dir)\n  l_bert = bert.BertModelLayer.from_params(bert_params, name=\"bert\")\n\n  # use in a Keras Model here, and call model.build()\n\n  bert.load_albert_weights(l_bert, model_ckpt)      # should be called after model.build()\n\n5. How to tokenize the input for the `google-research/bert`_ models?\n\n.. code:: python\n\n  do_lower_case = not (model_name.find(\"cased\") == 0 or model_name.find(\"multi_cased\") == 0)\n  bert.bert_tokenization.validate_case_matches_checkpoint(do_lower_case, model_ckpt)\n  vocab_file = os.path.join(model_dir, \"vocab.txt\")\n  tokenizer = bert.bert_tokenization.FullTokenizer(vocab_file, do_lower_case)\n  tokens = tokenizer.tokenize(\"Hello, BERT-World!\")\n  token_ids = tokenizer.convert_tokens_to_ids(tokens)\n\n6. How to tokenize the input for `brightmart/albert_zh`?\n\n.. code:: python\n\n  import params_flow pf\n\n  # fetch the vocab file\n  albert_zh_vocab_url = \"https://raw.githubusercontent.com/brightmart/albert_zh/master/albert_config/vocab.txt\"\n  vocab_file = pf.utils.fetch_url(albert_zh_vocab_url, model_dir)\n\n  tokenizer = bert.albert_tokenization.FullTokenizer(vocab_file)\n  tokens = tokenizer.tokenize(\"你好世界\")\n  token_ids = tokenizer.convert_tokens_to_ids(tokens)\n\n7. How to tokenize the input for the `google-research/ALBERT`_ models?\n\n.. code:: python\n\n  import sentencepiece as spm\n\n  spm_model = os.path.join(model_dir, \"assets\", \"30k-clean.model\")\n  sp = spm.SentencePieceProcessor()\n  sp.load(spm_model)\n  do_lower_case = True\n\n  processed_text = bert.albert_tokenization.preprocess_text(\"Hello, World!\", lower=do_lower_case)\n  token_ids = bert.albert_tokenization.encode_ids(sp, processed_text)\n\n8. How to tokenize the input for the Chinese `google-research/ALBERT`_ models?\n\n.. code:: python\n\n  import bert\n\n  vocab_file = os.path.join(model_dir, \"vocab.txt\")\n  tokenizer = bert.albert_tokenization.FullTokenizer(vocab_file=vocab_file)\n  tokens = tokenizer.tokenize(u\"你好世界\")\n  token_ids = tokenizer.convert_tokens_to_ids(tokens)\n\nResources\n---------\n\n- `BERT`_ - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding\n- `adapter-BERT`_ - adapter-BERT: Parameter-Efficient Transfer Learning for NLP\n- `ALBERT`_ - ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations\n- `google-research/bert`_ - the original `BERT`_ implementation\n- `google-research/ALBERT`_ - the original `ALBERT`_ implementation by Google\n- `google-research/albert(old)`_ - the old location of the original `ALBERT`_ implementation by Google\n- `brightmart/albert_zh`_ - pre-trained `ALBERT`_ weights for Chinese\n- `kpe/params-flow`_ - A Keras coding style for reducing `Keras`_ boilerplate code in custom layers by utilizing `kpe/py-params`_\n\n.. _`kpe/params-flow`: https://github.com/kpe/params-flow\n.. _`kpe/py-params`: https://github.com/kpe/py-params\n.. _`bert-for-tf2`: https://github.com/kpe/bert-for-tf2\n\n.. _`Keras`: https://keras.io\n.. _`pre-trained weights`: https://github.com/google-research/bert#pre-trained-models\n.. _`google-research/bert`: https://github.com/google-research/bert\n.. _`google-research/bert/modeling.py`: https://github.com/google-research/bert/blob/master/modeling.py\n.. _`BERT`: https://arxiv.org/abs/1810.04805\n.. _`pre-trained google model`: https://github.com/google-research/bert\n.. _`tests/test_bert_activations.py`: https://github.com/kpe/bert-for-tf2/blob/master/tests/test_compare_activations.py\n.. _`TensorFlow 2.0`: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf\n.. _`TensorFlow 1.14`: https://www.tensorflow.org/versions/r1.14/api_docs/python/tf\n\n.. _`google-research/adapter-bert`: https://github.com/google-research/adapter-bert/\n.. _`adapter-BERT`: https://arxiv.org/abs/1902.00751\n.. _`ALBERT`: https://arxiv.org/abs/1909.11942\n.. _`brightmart/albert_zh ALBERT for Chinese`: https://github.com/brightmart/albert_zh\n.. _`brightmart/albert_zh`: https://github.com/brightmart/albert_zh\n.. _`google ALBERT weights`: https://github.com/google-research/google-research/tree/master/albert\n.. _`google-research/albert(old)`: https://github.com/google-research/google-research/tree/master/albert\n.. _`google-research/ALBERT`: https://github.com/google-research/ALBERT\n.. _`TFHub/albert`: https://tfhub.dev/google/albert_base/2\n\n.. |Build Status| image:: https://travis-ci.com/kpe/bert-for-tf2.svg?branch=master\n   :target: https://travis-ci.com/kpe/bert-for-tf2\n.. |Coverage Status| image:: https://coveralls.io/repos/kpe/bert-for-tf2/badge.svg?branch=master\n   :target: https://coveralls.io/r/kpe/bert-for-tf2?branch=master\n.. |Version Status| image:: https://badge.fury.io/py/bert-for-tf2.svg\n   :target: https://badge.fury.io/py/bert-for-tf2\n.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/bert-for-tf2.svg\n.. |Downloads| image:: https://img.shields.io/pypi/dm/bert-for-tf2.svg\n.. |Twitter| image:: https://img.shields.io/twitter/follow/siddhadev?logo=twitter\u0026label=\u0026style=\n   :target: https://twitter.com/intent/user?screen_name=siddhadev\n","funding_links":[],"categories":["improvement over BERT:","Model Implementations","Sample Codes / Projects \u003ca name=\"sample\" /\u003e ⛏️📐📁","Model 💛💛💛💛💛\u003ca name=\"Model\" /\u003e"],"sub_categories":["Reinforcement Learning \u003ca name=\"RL\" /\u003e🔮","NLP Model 自然语言处理模型"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkpe%2Fbert-for-tf2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkpe%2Fbert-for-tf2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkpe%2Fbert-for-tf2/lists"}