https://github.com/farach/huggingfaceR

Hugging Face state-of-the-art models in R
https://github.com/farach/huggingfaceR

huggingface nlp r rstats

Last synced: 3 months ago
JSON representation

Hugging Face state-of-the-art models in R

Host: GitHub
URL: https://github.com/farach/huggingfaceR
Owner: farach
License: other
Created: 2022-05-24T23:41:21.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-01-28T12:23:18.000Z (over 2 years ago)
Last Synced: 2024-11-11T00:37:08.575Z (8 months ago)
Topics: huggingface, nlp, r, rstats
Language: R
Homepage:
Size: 2.34 MB
Stars: 141
Watchers: 9
Forks: 17
Open Issues: 8
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

jimsghstars - farach/huggingfaceR - Hugging Face state-of-the-art models in R (R)

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# huggingfaceR

The goal of `huggingfaceR` is to to bring state-of-the-art NLP models to R. `huggingfaceR` is built on top of Hugging Face's [transformers](https://huggingface.co/docs/transformers/index) library; and has support for navigating the Hugging Face Hub [The Hub](https://huggingface.co/models).

## Installation

Prior to installing `huggingfaceR` please be sure to have your python environment set up correctly.

```{r eval = FALSE}

install.packages("reticulate")

library(reticulate)

install_miniconda()

```

If you are having issues, more detailed instructions on how to install and configure python can be found [here](https://support.rstudio.com/hc/en-us/articles/360023654474-Installing-and-Configuring-Python-with-RStudio).

After that you can install the development version of huggingfaceR from [GitHub](https://github.com/) with:

``` r

# install.packages("devtools")

devtools::install_github("farach/huggingfaceR")

```

## Example

`huggingfaceR` makes use of the `transformers` `pipline()` abstraction to quickly make pre-trained language models available for use in R. In this example we will load the `distilbert-base-uncased-finetuned-sst-2-english` model and its tokenizer into a pipeline object to obtain sentiment scores.

```{r example}

library(huggingfaceR)

distilBERT <- hf_load_pipeline(

  model_id = "distilbert-base-uncased-finetuned-sst-2-english", 

  task = "text-classification"

  )

distilBERT

```

With the pipeline now loaded, we can begin using the model.

```{r}

distilBERT("I like you. I love you")

```

We can use this pipeline in a typical tidyverse processing chunk. First we load the `tidyverse`.

```{r}

library(tidyverse)

```

We can use the `huggingfaceR` `hf_load_dataset()` function to pull in the [emotion](https://huggingface.co/datasets/emotion) Hugging Face dataset. This dataset contains English Twitter messages with six basic emotions: anger, fear, love, sadness, and surprise. We are interested in how well the Distilbert model classifies these emotions as either a positive or a negative sentiment.

```{r}

emo <- hf_load_dataset(

  dataset = "emo", 

  split = "train", 

  as_tibble = TRUE, 

  label_name = "int2str"

  )

emo_model <- emo %>%

  sample_n(100) %>% 

  transmute(

    text,

    emotion_id = label,

    emotion_name = label_name,

    distilBERT_sent = distilBERT(text)

  ) %>%

  unnest_wider(distilBERT_sent)

glimpse(emo_model)

```

We can use `ggplot2` to visualize the results.

```{r}

emo_model |>

  mutate(

    label = paste0("Distilbert class:\n", label),

    emotion_name = str_to_title(emotion_name)

  ) |>

  ggplot(aes(x = emotion_name, y = score, color = label)) +

  geom_boxplot(show.legend = FALSE, outlier.alpha = 0.4, ) +

  scale_color_manual(values = c("#D55E00", "#6699CC")) +

  facet_wrap(~ label) +

  labs(

    title = "Reviewing Distilbert classification predictions",

    x = "Original label",

    y = "Model score",

    caption = "source:\nhttps://huggingface.co/datasets/emo"

  ) +

  theme_minimal() +

  theme(

    plot.title = element_text(hjust = 0.5),

    axis.text.x = element_text(angle = 45),

    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),

    axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0))

  )

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/farach/huggingfaceR

Awesome Lists containing this project

README