https://github.com/paithiov909/sudachir2
(Unofficial) R wrapper for 'sudachi.rs'๐ฆ
https://github.com/paithiov909/sudachir2
pos-tagging r r-package rust
Last synced: 4 months ago
JSON representation
(Unofficial) R wrapper for 'sudachi.rs'๐ฆ
- Host: GitHub
- URL: https://github.com/paithiov909/sudachir2
- Owner: paithiov909
- License: apache-2.0
- Created: 2025-02-24T02:44:15.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-04-18T23:46:44.000Z (6 months ago)
- Last Synced: 2025-04-19T09:52:51.845Z (6 months ago)
- Topics: pos-tagging, r, r-package, rust
- Language: R
- Homepage: https://paithiov909.r-universe.dev/sudachir2
- Size: 471 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
pkgload::load_all(export_all = FALSE)
```# sudachir2
[](https://paithiov909.r-universe.dev/sudachir2)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://github.com/paithiov909/sudachir2/actions/workflows/R-CMD-check.yaml)An R wrapper for 'Sudachi'; a modern reimagining of
[uribo/sudachir](https://github.com/uribo/sudachir) and [yutannihilation/fledgingr](https://github.com/yutannihilation/fledgingr)
that directly wraps [sudachi.rs](https://github.com/WorksApplications/sudachi.rs) with [savvy](https://github.com/yutannihilation/savvy).## Installation
To install from source package, the Rust toolchain is required.
```r
install.packages("sudachir2", repos = c("https://paithiov909.r-universe.dev", "https://cloud.r-project.org"))
```## Usage
To use the package, you need to download a dictionary first.
You can use `sudachir2::fetch_dict()` to download the [SudachiDict](https://github.com/WorksApplications/SudachiDict).```{r}
library(sudachir2)small_dict <-
file.path(tempdir(),
"sudachi-dictionary-20250129",
"system_small.dic"
)if (!file.exists(small_dict)) {
fetch_dict(tempdir(), dict_version = "20250129", dict_type = "small")
}
```After downloading the dictionary, you can create a tagger function.
`sudachir2::create_tagger()` returns a function that can be used to tokenize texts.```{r}
my_tagger <- create_tagger(small_dict, mode = "C")
my_tagger("ๆฐใใๆใๆฅใ")
```For convenience, you can use `sudachir2::tokenize()` to tokenize a data.frame as well as a character vector
with a tagger function, and `sudachir2::prettify()` to parse comma-delimited features.```{r}
tokenize("ๆฐใใๆใๆฅใ", tagger = my_tagger)dat <-
dplyr::tibble(
text = c("ๆฐใใๆใๆฅใ", "ๅธๆใฎๆใ ", "ๅใณใซ่ธใ้ใ", "ๅคง็ฉบใใใ"),
doc_id = seq_along(text)
)toks <-
tokenize(
dat, text, doc_id,
tagger = my_tagger
)str(toks)
prettify(toks) |> str()
```## Versioning
This package is versioned by copying the version number of [sudachi.rs](https://github.com/WorksApplications/sudachi.rs),
where the first three digits represent that version number
and the fourth digit (if any) represents the patch release for this package.