An open API service indexing awesome lists of open source software.

https://github.com/paithiov909/commatatest


https://github.com/paithiov909/commatatest

Last synced: 6 months ago
JSON representation

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# commatatest

This R package is proof of concept for rewriting `prettify` function using [commata](https://github.com/furfurylic/commata).

Currently, `prettify` uses `readr::read_delim` function of which parsing in-memory data is limited (https://github.com/tidyverse/vroom/issues/460).

The following simple benchmark shows that `prettify2` (wrapping commata) is faster than `prettify`, even though it is not multithreaded.

Since `prettify` provides no way to guess the column type, using the readr package may be overkill.

```{r example}
library(commatatest)

vec <- sample(gibasa::ginga, size = 5 * 1e3, replace = TRUE)

df <- gibasa::tokenize(
data.frame(
doc_id = seq_along(vec),
text = vec
)
)
nrow(df)

microbenchmark::microbenchmark(
current = commatatest::prettify(df, col_select = c("POS1", "Yomi1")),
current_lim_threads = withr::with_options(list(readr.num_threads = 1), commatatest::prettify(df, col_select = c("POS1", "Yomi1"))),
commata = commatatest::prettify2(df, col_select = c("POS1", "Yomi1")),
check = "equal",
times = 10
)
```