{"id":27070301,"url":"https://github.com/robrua/easy-bert","last_synced_at":"2025-04-05T22:20:51.143Z","repository":{"id":34768713,"uuid":"183556043","full_name":"robrua/easy-bert","owner":"robrua","description":"A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)","archived":false,"fork":false,"pushed_at":"2022-11-21T21:54:24.000Z","size":46,"stargazers_count":171,"open_issues_count":14,"forks_count":44,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-06T14:53:26.533Z","etag":null,"topics":["bert","language-model","machine-learning","natural-language-processing","natural-language-understanding","nlp","sentence-embeddings","tensorflow","word-embeddings"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robrua.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-26T04:20:25.000Z","updated_at":"2025-01-21T20:35:42.000Z","dependencies_parsed_at":"2023-01-15T09:04:00.417Z","dependency_job_id":null,"html_url":"https://github.com/robrua/easy-bert","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robrua%2Feasy-bert","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robrua%2Feasy-bert/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robrua%2Feasy-bert/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robrua%2Feasy-bert/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robrua","download_url":"https://codeload.github.com/robrua/easy-bert/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247407386,"owners_count":20934027,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","language-model","machine-learning","natural-language-processing","natural-language-understanding","nlp","sentence-embeddings","tensorflow","word-embeddings"],"created_at":"2025-04-05T22:20:50.597Z","updated_at":"2025-04-05T22:20:51.137Z","avatar_url":"https://github.com/robrua.png","language":"Java","funding_links":[],"categories":["🔹 **WordPiece Tokenizer Implementations**","人工智能"],"sub_categories":["Spring Cloud框架"],"readme":"[![MIT Licensed](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/robrua/easy-bert/blob/master/LICENSE.txt)\n[![PyPI](https://img.shields.io/pypi/v/easybert.svg)](https://pypi.org/project/easybert/)\n[![Maven Central](https://img.shields.io/maven-central/v/com.robrua.nlp/easy-bert.svg)](https://search.maven.org/search?q=g:com.robrua.nlp%20a:easy-bert)\n[![JavaDocs](https://javadoc.io/badge/com.robrua.nlp/easy-bert.svg)](https://javadoc.io/doc/com.robrua.nlp/easy-bert)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.2651822.svg)](https://doi.org/10.5281/zenodo.2651822)\n\n# easy-bert\neasy-bert is a dead simple API for using Google's high quality [BERT](https://github.com/google-research/bert) language model in Python and Java.\n\nCurrently, easy-bert is focused on getting embeddings from pre-trained BERT models in both Python and Java. Support for fine-tuning and pre-training in Python will be added in the future, as well as support for using easy-bert for other tasks besides getting embeddings.\n\n## Python\n\n### How To Get It\neasy-bert is available on [PyPI](https://pypi.org/project/easybert/). You can install with `pip install easybert` or `pip install git+https://github.com/robrua/easy-bert.git` if you want the very latest.\n\n### Usage\nYou can use easy-bert with pre-trained BERT models from TensorFlow Hub or from local models in the TensorFlow saved model format.\n\nTo create a BERT embedder from a TensowFlow Hub model, simply instantiate a Bert object with the target tf-hub URL:\n\n```python\nfrom easybert import Bert\nbert = Bert(\"https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1\")\n```\n\nYou can also load a local model in TensorFlow's saved model format using `Bert.load`:\n\n```python\nfrom easybert import Bert\nbert = Bert.load(\"/path/to/your/model/\")\n```\n\nOnce you have a BERT model loaded, you can get sequence embeddings using `bert.embed`:\n\n```python\nx = bert.embed(\"A sequence\")\ny = bert.embed([\"Multiple\", \"Sequences\"])\n```\n\nIf you want per-token embeddings, you can set `per_token=True`:\n\n```python\nx = bert.embed(\"A sequence\", per_token=True)\ny = bert.embed([\"Multiple\", \"Sequences\"], per_token=True)\n```\n\neasy-bert returns BERT embeddings as numpy arrays\n\n\nEvery time you call `bert.embed`, a new TensorFlow session is created and used for the computation. If you're calling `bert.embed` a lot sequentially, you can speed up your code by sharing a TensorFlow session among those calls using a `with` statement:\n\n```python\nwith bert:\n    x = bert.embed(\"A sequence\", per_token=True)\n    y = bert.embed([\"Multiple\", \"Sequences\"], per_token=True)\n```\n\nYou can save a BERT model using `bert.save`, then reload it later using `Bert.load`:\n\n```python\nbert.save(\"/path/to/your/model/\")\nbert = Bert.load(\"/path/to/your/model/\")\n```\n\n### CLI\neasy-bert also provides a CLI tool to conveniently do one-off embeddings of sequences with BERT. It can also convert a TensorFlow Hub model to a saved model.\n\nRun `bert --help`, `bert embed --help` or `bert download --help` to get details about the CLI tool.\n\n### Docker\neasy-bert comes with a [docker build](https://hub.docker.com/r/robrua/easy-bert) that can be used as a base image for applications that rely on bert embeddings or to just run the CLI tool without needing to install an environment.\n\n## Java\n\n### How To Get It\neasy-bert is available on [Maven Central](https://search.maven.org/search?q=g:com.robrua.nlp%20a:easy-bert). It is also distributed through the [releases page](https://github.com/robrua/easy-bert/releases).\n\nTo add the latest easy-bert release version to your maven project, add the dependency to your `pom.xml` dependencies section:\n```xml\n\u003cdependencies\u003e\n  \u003cdependency\u003e\n    \u003cgroupId\u003ecom.robrua.nlp\u003c/groupId\u003e\n    \u003cartifactId\u003eeasy-bert\u003c/artifactId\u003e\n    \u003cversion\u003e1.0.3\u003c/version\u003e\n  \u003c/dependency\u003e\n\u003c/dependencies\u003e\n```\nOr, if you want to get the latest development version, add the [Sonaype Snapshot Repository](https://oss.sonatype.org/content/repositories/snapshots/) to your `pom.xml` as well:\n```xml\n\u003cdependencies\u003e\n  \u003cdependency\u003e\n    \u003cgroupId\u003ecom.robrua.nlp\u003c/groupId\u003e\n    \u003cartifactId\u003eeasy-bert\u003c/artifactId\u003e\n    \u003cversion\u003e1.0.4-SNAPSHOT\u003c/version\u003e\n  \u003c/dependency\u003e\n\u003c/dependencies\u003e\n\n\u003crepositories\u003e\n  \u003crepository\u003e\n    \u003cid\u003esnapshots-repo\u003c/id\u003e\n    \u003curl\u003ehttps://oss.sonatype.org/content/repositories/snapshots\u003c/url\u003e\n    \u003creleases\u003e\n      \u003cenabled\u003efalse\u003c/enabled\u003e\n    \u003c/releases\u003e\n    \u003csnapshots\u003e\n      \u003cenabled\u003etrue\u003c/enabled\u003e\n    \u003c/snapshots\u003e\n  \u003c/repository\u003e\n\u003c/repositories\u003e\n```\n\n### Usage\nYou can use easy-bert with pre-trained BERT models generated with easy-bert's Python tools. You can also used pre-generated models on Maven Central.\n\nTo load a model from your local filesystem, you can use:\n\n```java\ntry(Bert bert = Bert.load(new File(\"/path/to/your/model/\"))) {\n    // Embed some sequences\n}\n```\n\nIf the model is in your classpath (e.g. if you're pulling it in via Maven), you can use:\n\n```java\ntry(Bert bert = Bert.load(\"/resource/path/to/your/model\")) {\n    // Embed some sequences\n}\n```\n\nOnce you have a BERT model loaded, you can get sequence embeddings using `bert.embedSequence` or `bert.embedSequences`:\n\n```java\nfloat[] embedding = bert.embedSequence(\"A sequence\");\nfloat[][] embeddings = bert.embedSequences(\"Multiple\", \"Sequences\");\n```\n\nIf you want per-token embeddings, you can use `bert.embedTokens`:\n\n```java\nfloat[][] embedding = bert.embedTokens(\"A sequence\");\nfloat[][][] embeddings = bert.embedTokens(\"Multiple\", \"Sequences\");\n```\n\n### Pre-Generated Maven Central Models\nVarious TensorFlow Hub BERT models are available in easy-bert format on [Maven Central](https://search.maven.org/search?q=g:com.robrua.nlp.models). To use one in your project, add the following to your `pom.xml`, substituting one of the Artifact IDs listed below in place of `ARTIFACT-ID` in the `artifactId`:\n\n```xml\n\u003cdependencies\u003e\n  \u003cdependency\u003e\n    \u003cgroupId\u003ecom.robrua.nlp.models\u003c/groupId\u003e\n    \u003cartifactId\u003eARTIFACT-ID\u003c/artifactId\u003e\n    \u003cversion\u003e1.0.0\u003c/version\u003e\n  \u003c/dependency\u003e\n\u003c/dependencies\u003e\n```\n\nOnce you've pulled in the dependency, you can load the model using this code. Substitute the appropriate Resource Path from the list below in place of `RESOURCE-PATH` based on the model you added as a dependency:\n\n```java\ntry(Bert bert = Bert.load(\"RESOURCE-PATH\")) {\n    // Embed some sequences\n}\n```\n\n#### Available Models\n| Model | Languages | Layers | Embedding Size | Heads | Parameters | Artifact ID | Resource Path |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| [BERT-Base, Uncased](https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1) | English | 12 | 768 | 12 | 110M | easy-bert-uncased-L-12-H-768-A-12 [![Maven Central](https://img.shields.io/maven-central/v/com.robrua.nlp.models/easy-bert-uncased-L-12-H-768-A-12.svg)](https://search.maven.org/search?q=g:com.robrua.nlp.models%20a:easy-bert-uncased-L-12-H-768-A-12) | com/robrua/nlp/easy-bert/bert-uncased-L-12-H-768-A-12 |\n| [BERT-Base, Cased](https://tfhub.dev/google/bert_cased_L-12_H-768_A-12/1) | English | 12 | 768 | 12 | 110M | easy-bert-cased-L-12-H-768-A-12 [![Maven Central](https://img.shields.io/maven-central/v/com.robrua.nlp.models/easy-bert-cased-L-12-H-768-A-12.svg)](https://search.maven.org/search?q=g:com.robrua.nlp.models%20a:easy-bert-cased-L-12-H-768-A-12) | com/robrua/nlp/easy-bert/bert-cased-L-12-H-768-A-12 |\n| [BERT-Base, Multilingual Cased](https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1) | 104 Languages | 12 | 768 | 12 | 110M | easy-bert-multi-cased-L-12-H-768-A-12 [![Maven Central](https://img.shields.io/maven-central/v/com.robrua.nlp.models/easy-bert-multi-cased-L-12-H-768-A-12.svg)](https://search.maven.org/search?q=g:com.robrua.nlp.models%20a:easy-bert-multi-cased-L-12-H-768-A-12) | com/robrua/nlp/easy-bert/bert-multi-cased-L-12-H-768-A-12 |\n| [BERT-Base, Chinese](https://tfhub.dev/google/bert_chinese_L-12_H-768_A-12/1) | Chinese Simplified and Traditional | 12 | 768 | 12 | 110M | easy-bert-chinese-L-12-H-768-A-12 [![Maven Central](https://img.shields.io/maven-central/v/com.robrua.nlp.models/easy-bert-chinese-L-12-H-768-A-12.svg)](https://search.maven.org/search?q=g:com.robrua.nlp.models%20a:easy-bert-chinese-L-12-H-768-A-12) | com/robrua/nlp/easy-bert/bert-chinese-L-12-H-768-A-12 |\n\n### Creating Your Own Models\nFor now, easy-bert can only use pre-trained TensorFlow Hub BERT models that have been converted using the Python tools. We will be adding support for fine-tuning and pre-training new models easily, but there are no plans to support these on the Java side. You'll need to train in Python, save the model, then load it in Java.\n\n## Bugs\nIf you find bugs please let us know via a pull request or issue.\n\n## Citing easy-bert\nIf you used easy-bert for your research, please [cite the project](https://doi.org/10.5281/zenodo.2651822).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobrua%2Feasy-bert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobrua%2Feasy-bert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobrua%2Feasy-bert/lists"}