{"id":13534928,"url":"https://github.com/CyberZHG/keras-bert","last_synced_at":"2025-04-02T00:31:21.673Z","repository":{"id":44470626,"uuid":"153859755","full_name":"CyberZHG/keras-bert","owner":"CyberZHG","description":"Implementation of BERT that could load official pre-trained models for feature extraction and prediction","archived":true,"fork":false,"pushed_at":"2022-01-22T10:33:11.000Z","size":14467,"stargazers_count":2428,"open_issues_count":2,"forks_count":513,"subscribers_count":59,"default_branch":"master","last_synced_at":"2024-10-28T16:59:20.752Z","etag":null,"topics":["bert","keras","language-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CyberZHG.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-20T01:47:20.000Z","updated_at":"2024-10-25T03:01:14.000Z","dependencies_parsed_at":"2022-07-15T20:30:50.975Z","dependency_job_id":null,"html_url":"https://github.com/CyberZHG/keras-bert","commit_stats":null,"previous_names":["lwislvislie/keras-bert","cyberzhg/keras-bert"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberZHG%2Fkeras-bert","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberZHG%2Fkeras-bert/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberZHG%2Fkeras-bert/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberZHG%2Fkeras-bert/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CyberZHG","download_url":"https://codeload.github.com/CyberZHG/keras-bert/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222784453,"owners_count":17037192,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","keras","language-model"],"created_at":"2024-08-01T08:00:46.905Z","updated_at":"2025-04-02T00:31:21.622Z","avatar_url":"https://github.com/CyberZHG.png","language":"Python","funding_links":[],"categories":["Pretrained Language Model","implement of BERT besides tensorflow:","Transformer Implementations By Communities","Implementations","Python"],"sub_categories":["Repository","Keras"],"readme":"# Keras BERT\n\n[![Version](https://img.shields.io/pypi/v/keras-bert.svg)](https://pypi.org/project/keras-bert/)\n![License](https://img.shields.io/pypi/l/keras-bert.svg)\n\n\\[[中文](https://github.com/CyberZHG/keras-bert/blob/master/README.zh-CN.md)|[English](https://github.com/CyberZHG/keras-bert/blob/master/README.md)\\]\n\nImplementation of the [BERT](https://arxiv.org/pdf/1810.04805.pdf). Official pre-trained models could be loaded for feature extraction and prediction.\n\n## Install\n\n```bash\npip install keras-bert\n```\n\n## Usage\n\n* [Load Official Pre-trained Models](#Load-Official-Pre-trained-Models)\n* [Tokenizer](#Tokenizer)\n* [Train \u0026 Use](#Train-\u0026-Use)\n* [Use Warmup](#Use-Warmup)\n* [Download Pretrained Checkpoints](#Download-Pretrained-Checkpoints)\n* [Extract Features](#Extract-Features)\n\n### External Links\n\n* [Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification](https://github.com/BrikerMan/Kashgari)\n* [Keras ALBERT](https://github.com/TinkerMob/keras_albert_model)\n\n### Load Official Pre-trained Models\n\nIn [feature extraction demo](./demo/load_model/load_and_extract.py), you should be able to get the same extraction results as the official model `chinese_L-12_H-768_A-12`. And in [prediction demo](./demo/load_model/load_and_predict.py), the missing word in the sentence could be predicted.\n\n\n### Run on TPU\n\nThe [extraction demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/load_model/keras_bert_load_and_extract_tpu.ipynb) shows how to convert to a model that runs on TPU.\n\nThe [classification demo](https://colab.research.google.com/github/CyberZHG/keras-bert/blob/master/demo/tune/keras_bert_classification_tpu.ipynb) shows how to apply the model to simple classification tasks.\n\n### Tokenizer\n\nThe `Tokenizer` class is used for splitting texts and generating indices:\n\n```python\nfrom keras_bert import Tokenizer\n\ntoken_dict = {\n    '[CLS]': 0,\n    '[SEP]': 1,\n    'un': 2,\n    '##aff': 3,\n    '##able': 4,\n    '[UNK]': 5,\n}\ntokenizer = Tokenizer(token_dict)\nprint(tokenizer.tokenize('unaffable'))  # The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]']`\nindices, segments = tokenizer.encode('unaffable')\nprint(indices)  # Should be `[0, 2, 3, 4, 1]`\nprint(segments)  # Should be `[0, 0, 0, 0, 0]`\n\nprint(tokenizer.tokenize(first='unaffable', second='钢'))\n# The result should be `['[CLS]', 'un', '##aff', '##able', '[SEP]', '钢', '[SEP]']`\nindices, segments = tokenizer.encode(first='unaffable', second='钢', max_len=10)\nprint(indices)  # Should be `[0, 2, 3, 4, 1, 5, 1, 0, 0, 0]`\nprint(segments)  # Should be `[0, 0, 0, 0, 0, 1, 1, 0, 0, 0]`\n```\n\n### Train \u0026 Use\n\n```python\nfrom tensorflow import keras\nfrom keras_bert import get_base_dict, get_model, compile_model, gen_batch_inputs\n\n\n# A toy input example\nsentence_pairs = [\n    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],\n    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],\n    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],\n]\n\n\n# Build token dictionary\ntoken_dict = get_base_dict()  # A dict that contains some special tokens\nfor pairs in sentence_pairs:\n    for token in pairs[0] + pairs[1]:\n        if token not in token_dict:\n            token_dict[token] = len(token_dict)\ntoken_list = list(token_dict.keys())  # Used for selecting a random word\n\n\n# Build \u0026 train the model\nmodel = get_model(\n    token_num=len(token_dict),\n    head_num=5,\n    transformer_num=12,\n    embed_dim=25,\n    feed_forward_dim=100,\n    seq_len=20,\n    pos_num=20,\n    dropout_rate=0.05,\n)\ncompile_model(model)\nmodel.summary()\n\ndef _generator():\n    while True:\n        yield gen_batch_inputs(\n            sentence_pairs,\n            token_dict,\n            token_list,\n            seq_len=20,\n            mask_rate=0.3,\n            swap_sentence_rate=1.0,\n        )\n\nmodel.fit_generator(\n    generator=_generator(),\n    steps_per_epoch=1000,\n    epochs=100,\n    validation_data=_generator(),\n    validation_steps=100,\n    callbacks=[\n        keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)\n    ],\n)\n\n\n# Use the trained model\ninputs, output_layer = get_model(\n    token_num=len(token_dict),\n    head_num=5,\n    transformer_num=12,\n    embed_dim=25,\n    feed_forward_dim=100,\n    seq_len=20,\n    pos_num=20,\n    dropout_rate=0.05,\n    training=False,      # The input layers and output layer will be returned if `training` is `False`\n    trainable=False,     # Whether the model is trainable. The default value is the same with `training`\n    output_layer_num=4,  # The number of layers whose outputs will be concatenated as a single output.\n                         # Only available when `training` is `False`.\n)\n```\n\n### Use Warmup\n\n`AdamWarmup` optimizer is provided for warmup and decay. The learning rate will reach `lr` in `warmpup_steps` steps, and decay to `min_lr` in `decay_steps` steps. There is a helper function `calc_train_steps` for calculating the two steps:\n\n```python\nimport numpy as np\nfrom keras_bert import AdamWarmup, calc_train_steps\n\ntrain_x = np.random.standard_normal((1024, 100))\n\ntotal_steps, warmup_steps = calc_train_steps(\n    num_example=train_x.shape[0],\n    batch_size=32,\n    epochs=10,\n    warmup_proportion=0.1,\n)\n\noptimizer = AdamWarmup(total_steps, warmup_steps, lr=1e-3, min_lr=1e-5)\n```\n\n### Download Pretrained Checkpoints\n\nSeveral download urls has been added. You can get the downloaded and uncompressed path of a checkpoint by:\n\n```python\nfrom keras_bert import get_pretrained, PretrainedList, get_checkpoint_paths\n\nmodel_path = get_pretrained(PretrainedList.multi_cased_base)\npaths = get_checkpoint_paths(model_path)\nprint(paths.config, paths.checkpoint, paths.vocab)\n```\n\n### Extract Features\n\nYou can use helper function `extract_embeddings` if the features of tokens or sentences (without further tuning) are what you need. To extract the features of all tokens:\n\n```python\nfrom keras_bert import extract_embeddings\n\nmodel_path = 'xxx/yyy/uncased_L-12_H-768_A-12'\ntexts = ['all work and no play', 'makes jack a dull boy~']\n\nembeddings = extract_embeddings(model_path, texts)\n```\n\nThe returned result is a list with the same length as texts. Each item in the list is a numpy array truncated by the length of the input. The shapes of outputs in this example are `(7, 768)` and `(8, 768)`.\n\nWhen the inputs are paired-sentences, and you need the outputs of `NSP` and max-pooling of the last 4 layers:\n\n```python\nfrom keras_bert import extract_embeddings, POOL_NSP, POOL_MAX\n\nmodel_path = 'xxx/yyy/uncased_L-12_H-768_A-12'\ntexts = [\n    ('all work and no play', 'makes jack a dull boy'),\n    ('makes jack a dull boy', 'all work and no play'),\n]\n\nembeddings = extract_embeddings(model_path, texts, output_layer_num=4, poolings=[POOL_NSP, POOL_MAX])\n```\n\nThere are no token features in the results. The outputs of `NSP` and max-pooling will be concatenated with the final shape `(768 x 4 x 2,)`.\n\nThe second argument in the helper function is a generator. To extract features from file:\n\n```python\nimport codecs\nfrom keras_bert import extract_embeddings\n\nmodel_path = 'xxx/yyy/uncased_L-12_H-768_A-12'\n\nwith codecs.open('xxx.txt', 'r', 'utf8') as reader:\n    texts = map(lambda x: x.strip(), reader)\n    embeddings = extract_embeddings(model_path, texts)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCyberZHG%2Fkeras-bert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCyberZHG%2Fkeras-bert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCyberZHG%2Fkeras-bert/lists"}