An open API service indexing awesome lists of open source software.

https://github.com/vgherard/gsample

Efficient weighted sampling without replacement in R
https://github.com/vgherard/gsample

random-sampling

Last synced: 11 months ago
JSON representation

Efficient weighted sampling without replacement in R

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# gsample

`gsample` offers a drop-in replacement for the R `base::sample()` functions for
random sampling, with considerably better performance for the case of weighted
sampling without replacement (both from the speed and memory point of view).

## Installation

You can install `gsample` from [GitHub](https://github.com/vgherard/gsample)
with:

``` r
# install.packages("devtools")
devtools::install_github("vgherard/gsample")
```
## Example

The `gsample` API is identical to the one of `base::sample()`:

```{r}
library(gsample)
n <- 1e3
size <- 1e2
prob <- exp(rnorm(n, sd = 3))
gsample.int(n, size, prob = prob)
```

```{r}
x <- letters
size <- 10
prob <- exp(rnorm(length(letters), sd = 3))
gsample(x, size, prob = prob)
```

Here are some simple benchmark comparisons with `base::sample()`:

```{r, message=FALSE, warning=FALSE}
library(dplyr)
library(ggplot2)
set.seed(840)
n <- 1e6
prob <- rexp(n)

bm <- lapply(10 ^ (1:5), function(size) {
bench::mark(
gsample.int(n, size, prob = prob),
sample.int(n, size, prob = prob),
check = FALSE
) %>%
select(expression, median) %>%
mutate(expression = as.character(expression)) %>%
mutate(size = size)
})

bind_rows(bm) %>%
ggplot(aes(x = size, y = 1e3 * median, colour = expression)) +
geom_line() +
scale_x_continuous(trans = "log10") +
scale_y_continuous("median (ms)", trans = "log10")

```