https://github.com/paithiov909/commatatest

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/paithiov909/commatatest
Owner: paithiov909
License: apache-2.0
Created: 2024-03-28T11:39:35.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-25T12:36:58.000Z (over 1 year ago)
Last Synced: 2024-04-25T13:46:37.797Z (over 1 year ago)
Language: C++
Homepage:
Size: 78.1 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# commatatest

This R package is proof of concept for rewriting `prettify` function using [commata](https://github.com/furfurylic/commata).

Currently, `prettify` uses `readr::read_delim` function of which parsing in-memory data is limited (https://github.com/tidyverse/vroom/issues/460).

The following simple benchmark shows that `prettify2` (wrapping commata) is faster than `prettify`, even though it is not multithreaded.

Since `prettify` provides no way to guess the column type, using the readr package may be overkill.

```{r example}

library(commatatest)

vec <- sample(gibasa::ginga, size = 5 * 1e3, replace = TRUE)

df <- gibasa::tokenize(

  data.frame(

    doc_id = seq_along(vec),

    text = vec

  )

)

nrow(df)

microbenchmark::microbenchmark(

  current = commatatest::prettify(df, col_select = c("POS1", "Yomi1")),

  current_lim_threads = withr::with_options(list(readr.num_threads = 1), commatatest::prettify(df, col_select = c("POS1", "Yomi1"))),

  commata = commatatest::prettify2(df, col_select = c("POS1", "Yomi1")),

  check = "equal",

  times = 10

)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/paithiov909/commatatest

Awesome Lists containing this project

README