https://github.com/vgherard/gsample
Efficient weighted sampling without replacement in R
https://github.com/vgherard/gsample
random-sampling
Last synced: 11 months ago
JSON representation
Efficient weighted sampling without replacement in R
- Host: GitHub
- URL: https://github.com/vgherard/gsample
- Owner: vgherard
- Created: 2021-01-15T21:07:42.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-01-20T20:40:30.000Z (about 5 years ago)
- Last Synced: 2025-02-06T11:53:24.136Z (about 1 year ago)
- Topics: random-sampling
- Language: R
- Homepage:
- Size: 99.6 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# gsample
`gsample` offers a drop-in replacement for the R `base::sample()` functions for
random sampling, with considerably better performance for the case of weighted
sampling without replacement (both from the speed and memory point of view).
## Installation
You can install `gsample` from [GitHub](https://github.com/vgherard/gsample)
with:
``` r
# install.packages("devtools")
devtools::install_github("vgherard/gsample")
```
## Example
The `gsample` API is identical to the one of `base::sample()`:
```{r}
library(gsample)
n <- 1e3
size <- 1e2
prob <- exp(rnorm(n, sd = 3))
gsample.int(n, size, prob = prob)
```
```{r}
x <- letters
size <- 10
prob <- exp(rnorm(length(letters), sd = 3))
gsample(x, size, prob = prob)
```
Here are some simple benchmark comparisons with `base::sample()`:
```{r, message=FALSE, warning=FALSE}
library(dplyr)
library(ggplot2)
set.seed(840)
n <- 1e6
prob <- rexp(n)
bm <- lapply(10 ^ (1:5), function(size) {
bench::mark(
gsample.int(n, size, prob = prob),
sample.int(n, size, prob = prob),
check = FALSE
) %>%
select(expression, median) %>%
mutate(expression = as.character(expression)) %>%
mutate(size = size)
})
bind_rows(bm) %>%
ggplot(aes(x = size, y = 1e3 * median, colour = expression)) +
geom_line() +
scale_x_continuous(trans = "log10") +
scale_y_continuous("median (ms)", trans = "log10")
```