An open API service indexing awesome lists of open source software.

https://github.com/paithiov909/rjavacmecab

rJava Interface to CMeCab
https://github.com/paithiov909/rjavacmecab

mecab r r-package rjava

Last synced: 6 months ago
JSON representation

rJava Interface to CMeCab

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
pkgload::load_all()
```

# rjavacmecab

[![GitHub last commit](https://img.shields.io/github/last-commit/paithiov909/rjavacmecab)](#)
[![Lifecycle: superseded](https://img.shields.io/badge/lifecycle-superseded-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#superseded)
[![R-CMD-check](https://github.com/paithiov909/rjavacmecab/actions/workflows/check.yml/badge.svg)](https://github.com/paithiov909/rjavacmecab/actions/workflows/check.yml)
[![Codecov test coverage](https://codecov.io/gh/paithiov909/rjavacmecab/branch/main/graph/badge.svg)](https://app.codecov.io/gh/paithiov909/rjavacmecab?branch=main)

> rJava Interface to CMeCab

rjavacmecab is an rJava interface to [takscape/cmecab-java](https://github.com/takscape/cmecab-java) that is a Java binding for MeCab.

The goal of this package is to provide the simplest way to help use 'MeCab' from R than alternatives ([RMeCab](https://github.com/IshidaMotohiro/RMeCab) and [RcppMeCab](https://github.com/junhewk/RcppMeCab)).

rjavacmecab is yet slower, but it should be easier to use because...

1. There is no need to build from C/C++ source.
2. It returns all features of each nodes accessible via cmecab-java.

## System Requirements

rjavacmecab requires 'MeCab' (mecab, libmecab-dev and mecab-ipadic-utf8) and JDK. Please note that they are installed and available before you use rjavacmecab.

In case using base R and JDK for 32/64bit under Windows, you need 32/64bit build of libmecab.

## Usage

### Installation

``` r
remotes::install_github("paithiov909/rjavacmecab")
```

### Call Tagger

To make cmecab tagger available, `rebuild_tagger` at first.

```{r cmecab_1}
rjavacmecab::rebuild_tagger()

res <- rjavacmecab::cmecab(c("長期的自己実現で福楽は得られない", "幸せは刹那の中にあり"))
str(res)
```

### Prettify Output

```{r cmecab_2}
res <- rjavacmecab::prettify(res)
str(res)
```

If you use IPA-styled dictionary, the output has these columns.

- doc_id: 文番号
- token: 表層形(surface form)
- POS1~POS4: 品詞, 品詞細分類1, 品詞細分類2, 品詞細分類3
- X5StageUse1: 活用型(ex. 五段, 下二段...)
- X5StageUse2: 活用形(ex. 連用形, 基本形...)
- Original: 原形(lemmatised form)
- Yomi1: 読み(readings)
- Yomi2: 発音(pronunciation)

### Pack Output

```{r cmecab_3}
res <- rjavacmecab::pack(res)
print(res)
```

### Use Igo

[Igo](http://igo.osdn.jp/) is a pure Java port of MeCab. rjavacmecab also provides a wrapper function of that.

```{r igo}
res <- rjavacmecab::igo("お前がそう思うんならそうなんだろう、お前ん中ではな")
str(res)
```

## License

BSD 3-clause License.

This software includes works that are distributed in Public Domain and New BSD License.
See https://github.com/takscape/cmecab-java/blob/master/README.txt for more details.

Icons made by [Vectors Market](https://www.flaticon.com/authors/vectors-market) from [Flaticon](https://www.flaticon.com/).