Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/farach/huggingfaceR
Hugging Face state-of-the-art models in R
https://github.com/farach/huggingfaceR
huggingface nlp r rstats
Last synced: 3 months ago
JSON representation
Hugging Face state-of-the-art models in R
- Host: GitHub
- URL: https://github.com/farach/huggingfaceR
- Owner: farach
- License: other
- Created: 2022-05-24T23:41:21.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-28T12:23:18.000Z (almost 2 years ago)
- Last Synced: 2024-07-28T09:33:22.781Z (4 months ago)
- Topics: huggingface, nlp, r, rstats
- Language: R
- Homepage:
- Size: 2.34 MB
- Stars: 130
- Watchers: 8
- Forks: 16
- Open Issues: 8
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - farach/huggingfaceR - Hugging Face state-of-the-art models in R (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# huggingfaceR
The goal of `huggingfaceR` is to to bring state-of-the-art NLP models to R. `huggingfaceR` is built on top of Hugging Face's [transformers](https://huggingface.co/docs/transformers/index) library; and has support for navigating the Hugging Face Hub [The Hub](https://huggingface.co/models).
## Installation
Prior to installing `huggingfaceR` please be sure to have your python environment set up correctly.
```{r eval = FALSE}
install.packages("reticulate")
library(reticulate)install_miniconda()
```If you are having issues, more detailed instructions on how to install and configure python can be found [here](https://support.rstudio.com/hc/en-us/articles/360023654474-Installing-and-Configuring-Python-with-RStudio).
After that you can install the development version of huggingfaceR from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("farach/huggingfaceR")
```## Example
`huggingfaceR` makes use of the `transformers` `pipline()` abstraction to quickly make pre-trained language models available for use in R. In this example we will load the `distilbert-base-uncased-finetuned-sst-2-english` model and its tokenizer into a pipeline object to obtain sentiment scores.
```{r example}
library(huggingfaceR)distilBERT <- hf_load_pipeline(
model_id = "distilbert-base-uncased-finetuned-sst-2-english",
task = "text-classification"
)distilBERT
```With the pipeline now loaded, we can begin using the model.
```{r}
distilBERT("I like you. I love you")
```We can use this pipeline in a typical tidyverse processing chunk. First we load the `tidyverse`.
```{r}
library(tidyverse)
```We can use the `huggingfaceR` `hf_load_dataset()` function to pull in the [emotion](https://huggingface.co/datasets/emotion) Hugging Face dataset. This dataset contains English Twitter messages with six basic emotions: anger, fear, love, sadness, and surprise. We are interested in how well the Distilbert model classifies these emotions as either a positive or a negative sentiment.
```{r}
emo <- hf_load_dataset(
dataset = "emo",
split = "train",
as_tibble = TRUE,
label_name = "int2str"
)emo_model <- emo %>%
sample_n(100) %>%
transmute(
text,
emotion_id = label,
emotion_name = label_name,
distilBERT_sent = distilBERT(text)
) %>%
unnest_wider(distilBERT_sent)glimpse(emo_model)
```We can use `ggplot2` to visualize the results.
```{r}
emo_model |>
mutate(
label = paste0("Distilbert class:\n", label),
emotion_name = str_to_title(emotion_name)
) |>
ggplot(aes(x = emotion_name, y = score, color = label)) +
geom_boxplot(show.legend = FALSE, outlier.alpha = 0.4, ) +
scale_color_manual(values = c("#D55E00", "#6699CC")) +
facet_wrap(~ label) +
labs(
title = "Reviewing Distilbert classification predictions",
x = "Original label",
y = "Model score",
caption = "source:\nhttps://huggingface.co/datasets/emo"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0))
)
```