An open API service indexing awesome lists of open source software.

https://github.com/paithiov909/vibrrt

An R wrapper for 'vibrato'
https://github.com/paithiov909/vibrrt

pos-tagging r r-package rust

Last synced: about 2 months ago
JSON representation

An R wrapper for 'vibrato'

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
pkgload::load_all(export_all = FALSE)
```

# vibrrt

[![vibrrt status badge](https://paithiov909.r-universe.dev/badges/vibrrt)](https://paithiov909.r-universe.dev)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![R-CMD-check](https://github.com/paithiov909/vibrrt/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/paithiov909/vibrrt/actions/workflows/R-CMD-check.yaml)

An R wrapper for [vibrato](https://github.com/daac-tools/vibrato): Viterbi-based accelerated tokenizer.

## Installation

To install from source package, the Rust toolchain is required.

```r
install.packages("vibrrt", repos = c("https://paithiov909.r-universe.dev", "https://cloud.r-project.org"))
```

## Usage

You can download the model files from [ryan-minato/vibrato-models](https://huggingface.co/ryan-minato/vibrato-models)
using [hfhub](https://github.com/mlverse/hfhub) package.

```{r}
sample_text <- jsonlite::read_json(
"https://paithiov909.r-universe.dev/gibasa/data/ginga/json",
simplifyVector = TRUE
)

# withr::with_envvar(c(HUGGINGFACE_HUB_CACHE = tempdir()), {
ipadic <- hfhub::hub_download("ryan-minato/vibrato-models", "ipadic-mecab-2_7_0/system.dic")
# })

vibrrt::tokenize(
sample_text[5:8],
tagger = vibrrt::create_tagger(ipadic)
)
```

## Versioning

This package is versioned by copying the version number of [vibrato](https://github.com/daac-tools/vibrato),
where the first three digits represent that version number
and the fourth digit (if any) represents the patch release for this package.