https://github.com/mrchypark/elbird
R binding package Kiwi(Korean Intelligent Word Identifier)
https://github.com/mrchypark/elbird
analyzer hacktoberfest hacktoberfest2021 morphological r r-package rstats
Last synced: about 1 year ago
JSON representation
R binding package Kiwi(Korean Intelligent Word Identifier)
- Host: GitHub
- URL: https://github.com/mrchypark/elbird
- Owner: mrchypark
- License: other
- Created: 2020-06-27T15:22:41.000Z (almost 6 years ago)
- Default Branch: main
- Last Pushed: 2023-06-20T03:38:30.000Z (about 3 years ago)
- Last Synced: 2024-04-14T07:11:03.551Z (about 2 years ago)
- Topics: analyzer, hacktoberfest, hacktoberfest2021, morphological, r, r-package, rstats
- Language: R
- Homepage: https://mrchypark.github.io/elbird/index.html
- Size: 3.28 MB
- Stars: 32
- Watchers: 4
- Forks: 3
- Open Issues: 20
-
Metadata Files:
- Readme: README.Rmd
- Funding: .github/FUNDING.yml
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(crayon.enabled = NULL)
```
# elbird [
](https://mrchypark.github.io/elbird/index.html)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://github.com/mrchypark/elbird/actions)
[](https://cran.r-project.org/package=elbird)
[](https://mrchypark.r-universe.dev/)
[](https://mrchypark.r-universe.dev/ui#packages)
[](https://cran.r-project.org/package=elbird)
[](https://cran.r-project.org/package=elbird)
[](https://app.codecov.io/gh/mrchypark/elbird?branch=main)
* [Korean version README](https://mrchypark.github.io/elbird/articles/README_kr.html)
The `elbird` package is a morpheme analyzer packed with [Kiwi](https://github.com/bab2min/Kiwi).
It is based on cpp package `Kiwi` and that has convenient functions such as faster performance compared to other tokenizers, easy user dictionary addition, unregistered noun extraction, etc.
### logo
Wings icons created by Good Ware - Flaticon
Africa icons created by Eucalyp - Flaticon
## Installation
You can install the elbird with:
```r
# CRAN
install.packages("elbird")
# Dev version
install.packages('elbird', repos = c('https://mrchypark.r-universe.dev', 'https://cloud.r-project.org'))
```
## Example
The examples below introduce the behavior of `elbird`'s functions.
### tokenize
Basically, the `tokenize` function return list form and the `tokenize_tbl` organized in tibble data type, and grammar compatibility with tidytext are supported provides an `tokenize_tidy` function.
```{r}
library(elbird)
tokenize("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
tokenize_tidy("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
```
Multiple sentences are input as `vector` or `list` and output as `list`.
```{r}
tokenize(c("새롭게 작성된 패키지 입니다.", "tidytext와의 호환을 염두하고 작성하였습니다."))
tokenize_tidy(c("새롭게 작성된 패키지 입니다.", "tidytext와의 호환을 염두하고 작성하였습니다."))
```
### With tidytext
The `tokenize_tidy` function can also be used as `tokenize_tt` and `tokenize_tidytext`.
Below is an example of using it with the `tidytext` package.
The `tar` below is the target text for morpheme analysis.
```{r}
suppressMessages(library(dplyr))
# install.packages("komment", repos = "https://forkonlp.r-universe.dev/")
library(stringr)
library(tidytext)
library(komment)
speech_list %>%
filter(president == "이명박") %>%
filter(str_detect(title, "취임사")) %>%
pull(link) %>%
get_speech(paragraph = T) %>%
select(paragraph, content) -> tar
tar
```
This is an example of using `tokenize_tidy` of `elbird` as a tokenizer with `tar` as `unnest_tokens` which is a function of `tidytext` package.
```{r}
tar %>%
unnest_tokens(
input = content,
output = word,
token = tokenize_tidy
)
```
```{r}
library(ggplot2)
tar %>%
unnest_tokens(
input = content,
output = word,
token = tokenize_tidy
) %>%
count(word) %>%
top_n(10) %>%
ggplot(aes(n, word)) +
geom_col(show.legend = FALSE)
```
### analyze
In addition, an `analyze` function is provided that uses the output of multi-result with there score.
```{r}
library(elbird)
analyze("안녕하세요 kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다.")
analyze(c("안녕하세요. kiwi 형태소 분석기의 R wrapper인 elbird를 소개합니다."), top_n = 1)
```
## tag set
[Tag list](https://github.com/bab2min/kiwipiepy#%ED%92%88%EC%82%AC-%ED%83%9C%EA%B7%B8) that used in [kiwipiepy](https://github.com/bab2min/kiwipiepy) package.
```{r echo=FALSE, results='asis'}
cat(paste0("* The table below is fetched at ", Sys.time()," ",Sys.timezone(),"."))
```
```{r echo=FALSE}
httr::GET("https://github.com/bab2min/kiwipiepy/blob/master/README.md") %>%
httr::content() %>%
rvest::html_table() %>%
knitr::kable(format = "markdown")
```
## Special Thanks to
### kiwi package
[bab2min](https://github.com/bab2min) with [kiwi package](https://github.com/bab2min/Kiwi) author.
### logo
[jhk0530](https://github.com/jhk0530) with [suggestion](https://github.com/mrchypark/elbird/issues/6).
### cpp backend
[kkweon](https://github.com/kkweon) with [kiwigo package](https://github.com/codingpot/kiwigo)