An open API service indexing awesome lists of open source software.

https://github.com/mrchypark/elbird

R binding package Kiwi(Korean Intelligent Word Identifier)
https://github.com/mrchypark/elbird

analyzer hacktoberfest hacktoberfest2021 morphological r r-package rstats

Last synced: about 1 year ago
JSON representation

R binding package Kiwi(Korean Intelligent Word Identifier)

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(crayon.enabled = NULL)
```

# elbird [](https://mrchypark.github.io/elbird/index.html)

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![R-CMD-check](https://github.com/mrchypark/elbird/workflows/R-CMD-check/badge.svg)](https://github.com/mrchypark/elbird/actions)
[![CRAN status](https://www.r-pkg.org/badges/version/elbird)](https://cran.r-project.org/package=elbird)
[![runiverse-name](https://mrchypark.r-universe.dev/badges/:name)](https://mrchypark.r-universe.dev/)
[![runiverse-package](https://mrchypark.r-universe.dev/badges/elbird)](https://mrchypark.r-universe.dev/ui#packages)
[![metacran downloads](https://cranlogs.r-pkg.org/badges/elbird)](https://cran.r-project.org/package=elbird)
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/elbird)](https://cran.r-project.org/package=elbird)
[![Codecov test coverage](https://codecov.io/gh/mrchypark/elbird/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mrchypark/elbird?branch=main)

* [Korean version README](https://mrchypark.github.io/elbird/articles/README_kr.html)

The `elbird` package is a morpheme analyzer packed with [Kiwi](https://github.com/bab2min/Kiwi).
It is based on cpp package `Kiwi` and that has convenient functions such as faster performance compared to other tokenizers, easy user dictionary addition, unregistered noun extraction, etc.

### logo

Wings icons created by Good Ware - Flaticon
Africa icons created by Eucalyp - Flaticon

## Installation

You can install the elbird with:

```r
# CRAN
install.packages("elbird")

# Dev version
install.packages('elbird', repos = c('https://mrchypark.r-universe.dev', 'https://cloud.r-project.org'))
```

## Example

The examples below introduce the behavior of `elbird`'s functions.

### tokenize

Basically, the `tokenize` function return list form and the `tokenize_tbl` organized in tibble data type, and grammar compatibility with tidytext are supported provides an `tokenize_tidy` function.

```{r}
library(elbird)
tokenize("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
tokenize_tidy("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
```

Multiple sentences are input as `vector` or `list` and output as `list`.

```{r}
tokenize(c("새롭게 작성된 패키지 입니다.", "tidytext와의 호환을 염두하고 작성하였습니다."))
tokenize_tidy(c("새롭게 작성된 패키지 입니다.", "tidytext와의 호환을 염두하고 작성하였습니다."))
```

### With tidytext

The `tokenize_tidy` function can also be used as `tokenize_tt` and `tokenize_tidytext`.
Below is an example of using it with the `tidytext` package.
The `tar` below is the target text for morpheme analysis.

```{r}
suppressMessages(library(dplyr))
# install.packages("komment", repos = "https://forkonlp.r-universe.dev/")
library(stringr)
library(tidytext)
library(komment)

speech_list %>%
filter(president == "이명박") %>%
filter(str_detect(title, "취임사")) %>%
pull(link) %>%
get_speech(paragraph = T) %>%
select(paragraph, content) -> tar
tar
```

This is an example of using `tokenize_tidy` of `elbird` as a tokenizer with `tar` as `unnest_tokens` which is a function of `tidytext` package.

```{r}
tar %>%
unnest_tokens(
input = content,
output = word,
token = tokenize_tidy
)
```

```{r}
library(ggplot2)
tar %>%
unnest_tokens(
input = content,
output = word,
token = tokenize_tidy
) %>%
count(word) %>%
top_n(10) %>%
ggplot(aes(n, word)) +
geom_col(show.legend = FALSE)
```

### analyze

In addition, an `analyze` function is provided that uses the output of multi-result with there score.

```{r}
library(elbird)
analyze("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
analyze(c("안녕하세요. kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다."), top_n = 1)
```

## tag set

[Tag list](https://github.com/bab2min/kiwipiepy#%ED%92%88%EC%82%AC-%ED%83%9C%EA%B7%B8) that used in [kiwipiepy](https://github.com/bab2min/kiwipiepy) package.

```{r echo=FALSE, results='asis'}
cat(paste0("* The table below is fetched at ", Sys.time()," ",Sys.timezone(),"."))
```

```{r echo=FALSE}
httr::GET("https://github.com/bab2min/kiwipiepy/blob/master/README.md") %>%
httr::content() %>%
rvest::html_table() %>%
knitr::kable(format = "markdown")
```

## Special Thanks to

### kiwi package
[bab2min](https://github.com/bab2min) with [kiwi package](https://github.com/bab2min/Kiwi) author.

### logo
[jhk0530](https://github.com/jhk0530) with [suggestion](https://github.com/mrchypark/elbird/issues/6).

### cpp backend
[kkweon](https://github.com/kkweon) with [kiwigo package](https://github.com/codingpot/kiwigo)