An open API service indexing awesome lists of open source software.

https://github.com/mightymetrika/crtb

Complementary Resampling of Tags in Blocks
https://github.com/mightymetrika/crtb

data-science machine-learning sampling statistics

Last synced: about 1 year ago
JSON representation

Complementary Resampling of Tags in Blocks

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# Complementary Resampling of Tags in Blocks (CRTB)

The crtb resampling method is inspired by complementary pairs subsampling (Shah & Samworth, 2013). The method creates pairs of resampled datasets with complementary properties.

## Installation

You can install the development version of crtb from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("mightymetrika/crtb")
```

## Implementation Details

When working with multiple groups and pooled resampling, CRTB follows these steps:

1. **Tag Assignment**
- Each observation receives a unique integer tag
- For multiple groups, tagging can be done row-wise or column-wise

2. **Initial Resampling**
- Tags are resampled using one of three methods
- With replacement (default)
- Without replacement
- Custom resampling function
- This creates the "original resample"
- Process halts if the proportion of resamples falls below the tie threshold

3. **Block Creation**
- Block length is set to half the length of initial tags
- First block:
- Form initial block stem from unique tags in original sample
- If block stem is undersized:
1. Take the set difference between all tags and block stem
2. Sample without replacement to fill block to target size
- Subsequent blocks:
- Form new block stem from unique remaining tags
- If block stem is undersized:
1. Take the set difference between all tags and block stem
2. Sample without replacement to fill block to target size
- Continue until all tags from original sample are assigned to blocks

4. **Complementary Sampling**
- For each block:
- Find complement (all tags not in block)
- Sample from complement to match block stem size
- Combined complementary samples form the "complementary resample"

5. **Output Generation**
- Map tags back to original observations
- Return two datasets:
- Original resample
- Complementary resample

```{r example}
library(crtb)

# Create sample data
data <- data.frame(
group1 = stats::rnorm(10),
group2 = stats::rnorm(10)
)

# Basic usage with default settings
result <- crtb(data)

# Access results
result$ordat
result$crdat
```

Shah, R. D., & Samworth, R. J. (2013). Variable Selection with Error Control: Another Look at Stability Selection. Journal of the Royal Statistical Society: Series B, 75(1), 55-80.