Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/farach/huggingfaceR

Hugging Face state-of-the-art models in R
https://github.com/farach/huggingfaceR

huggingface nlp r rstats

Last synced: 2 months ago
JSON representation

Hugging Face state-of-the-art models in R

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# huggingfaceR

The goal of `huggingfaceR` is to to bring state-of-the-art NLP models to R. `huggingfaceR` is built on top of Hugging Face's [transformers](https://huggingface.co/docs/transformers/index) library; and has support for navigating the Hugging Face Hub [The Hub](https://huggingface.co/models).

## Installation

Prior to installing `huggingfaceR` please be sure to have your python environment set up correctly.

```{r eval = FALSE}
install.packages("reticulate")
library(reticulate)

install_miniconda()
```

If you are having issues, more detailed instructions on how to install and configure python can be found [here](https://support.rstudio.com/hc/en-us/articles/360023654474-Installing-and-Configuring-Python-with-RStudio).

After that you can install the development version of huggingfaceR from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("farach/huggingfaceR")
```

## Example

`huggingfaceR` makes use of the `transformers` `pipline()` abstraction to quickly make pre-trained language models available for use in R. In this example we will load the `distilbert-base-uncased-finetuned-sst-2-english` model and its tokenizer into a pipeline object to obtain sentiment scores.

```{r example}
library(huggingfaceR)

distilBERT <- hf_load_pipeline(
model_id = "distilbert-base-uncased-finetuned-sst-2-english",
task = "text-classification"
)

distilBERT
```

With the pipeline now loaded, we can begin using the model.

```{r}
distilBERT("I like you. I love you")
```

We can use this pipeline in a typical tidyverse processing chunk. First we load the `tidyverse`.

```{r}
library(tidyverse)
```

We can use the `huggingfaceR` `hf_load_dataset()` function to pull in the [emotion](https://huggingface.co/datasets/emotion) Hugging Face dataset. This dataset contains English Twitter messages with six basic emotions: anger, fear, love, sadness, and surprise. We are interested in how well the Distilbert model classifies these emotions as either a positive or a negative sentiment.

```{r}
emo <- hf_load_dataset(
dataset = "emo",
split = "train",
as_tibble = TRUE,
label_name = "int2str"
)

emo_model <- emo %>%
sample_n(100) %>%
transmute(
text,
emotion_id = label,
emotion_name = label_name,
distilBERT_sent = distilBERT(text)
) %>%
unnest_wider(distilBERT_sent)

glimpse(emo_model)
```

We can use `ggplot2` to visualize the results.

```{r}
emo_model |>
mutate(
label = paste0("Distilbert class:\n", label),
emotion_name = str_to_title(emotion_name)
) |>
ggplot(aes(x = emotion_name, y = score, color = label)) +
geom_boxplot(show.legend = FALSE, outlier.alpha = 0.4, ) +
scale_color_manual(values = c("#D55E00", "#6699CC")) +
facet_wrap(~ label) +
labs(
title = "Reviewing Distilbert classification predictions",
x = "Original label",
y = "Model score",
caption = "source:\nhttps://huggingface.co/datasets/emo"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0))
)
```