https://github.com/paithiov909/commatatest
https://github.com/paithiov909/commatatest
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/paithiov909/commatatest
- Owner: paithiov909
- License: apache-2.0
- Created: 2024-03-28T11:39:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-25T12:36:58.000Z (over 1 year ago)
- Last Synced: 2024-04-25T13:46:37.797Z (over 1 year ago)
- Language: C++
- Homepage:
- Size: 78.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# commatatest
This R package is proof of concept for rewriting `prettify` function using [commata](https://github.com/furfurylic/commata).
Currently, `prettify` uses `readr::read_delim` function of which parsing in-memory data is limited (https://github.com/tidyverse/vroom/issues/460).
The following simple benchmark shows that `prettify2` (wrapping commata) is faster than `prettify`, even though it is not multithreaded.
Since `prettify` provides no way to guess the column type, using the readr package may be overkill.
```{r example}
library(commatatest)vec <- sample(gibasa::ginga, size = 5 * 1e3, replace = TRUE)
df <- gibasa::tokenize(
data.frame(
doc_id = seq_along(vec),
text = vec
)
)
nrow(df)microbenchmark::microbenchmark(
current = commatatest::prettify(df, col_select = c("POS1", "Yomi1")),
current_lim_threads = withr::with_options(list(readr.num_threads = 1), commatatest::prettify(df, col_select = c("POS1", "Yomi1"))),
commata = commatatest::prettify2(df, col_select = c("POS1", "Yomi1")),
check = "equal",
times = 10
)
```