Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jonathanbratt/RBERT
Implementation of BERT in R
https://github.com/jonathanbratt/RBERT
bert natural-language-processing nlp reticulate rstats rstudio tensorflow
Last synced: 3 months ago
JSON representation
Implementation of BERT in R
- Host: GitHub
- URL: https://github.com/jonathanbratt/RBERT
- Owner: jonathanbratt
- License: apache-2.0
- Created: 2019-08-26T20:58:56.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-25T17:05:50.000Z (almost 2 years ago)
- Last Synced: 2024-05-21T02:11:45.561Z (6 months ago)
- Topics: bert, natural-language-processing, nlp, reticulate, rstats, rstudio, tensorflow
- Language: R
- Homepage:
- Size: 2.79 MB
- Stars: 156
- Watchers: 13
- Forks: 19
- Open Issues: 21
-
Metadata Files:
- Readme: README.Rmd
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - jonathanbratt/RBERT - Implementation of BERT in R (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# RBERT[![Lifecycle: superseded](https://img.shields.io/badge/lifecycle-superseded-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#superseded)
[![Travis build status](https://travis-ci.org/jonathanbratt/RBERT.svg?branch=master)](https://travis-ci.org/jonathanbratt/RBERT)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/jonathanbratt/RBERT?branch=master&svg=true)](https://ci.appveyor.com/project/jonathanbratt/RBERT)
[![Codecov test coverage](https://codecov.io/gh/jonathanbratt/RBERT/branch/master/graph/badge.svg)](https://codecov.io/gh/jonathanbratt/RBERT?branch=master)We are re-implementing BERT for R in [{torchtransformers}](https://github.com/macmillancontentscience/torchtransformers). We find {torch} much easier to work with in R than {tensorflow}, and strongly recommend starting there!
---
RBERT is an R implementation of the Python package [BERT](https://github.com/google-research/bert) developed at Google for Natural Language Processing.
## Installation
You can install RBERT from [GitHub](https://github.com/) with:
```{r installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github(
"jonathanbratt/RBERT",
build_vignettes = TRUE
)
```### TensorFlow Installation
RBERT requires TensorFlow. Currently the version must be <= 1.13.1. You can install it using the tensorflow package (installed as a dependency of this package; see note below about Windows).
```{r tensorflow, eval = FALSE}
tensorflow::install_tensorflow(version = "1.13.1")
```### Windows
The current CRAN version of reticulate (1.13) causes some issues with the tensorflow installation. Rebooting your machine after installing Anaconda seems to fix this issue, or upgrade to the development version of reticulate.
```{r install dev reticulate, eval = FALSE}
devtools::install_github("rstudio/reticulate")
```## Basic usage
RBERT is a work in progress. While fine-tuning a BERT model using RBERT may be possible, it is not currently recommended.
RBERT is best suited for exploring pre-trained BERT models, and obtaining contextual representations of input text for use as features in downstream tasks.
* See the "Introduction to RBERT" vignette included with the package for more specific examples.
* For a quick explanation of what BERT is, see the "BERT Basics" vignette.
* The package [RBERTviz](https://github.com/jonathanbratt/RBERTviz) provides tools for making fun and easy visualizations of BERT data.## Running Tests
The first time you run the test suite, the 388.8MB bert_base_uncased.zip file will download in your `tests/testthat/test_checkpoints` directory. Subsequent test runs will use that download. This was our best compromise to allow for relatively rapid testing without bloating the repository.
## Disclaimer
This is not an officially supported Macmillan Learning product.
## Contact information
Questions or comments should be directed to Jonathan Bratt ([email protected]) and Jon Harmon ([email protected]).