https://github.com/mightymetrika/crtb
Complementary Resampling of Tags in Blocks
https://github.com/mightymetrika/crtb
data-science machine-learning sampling statistics
Last synced: about 1 year ago
JSON representation
Complementary Resampling of Tags in Blocks
- Host: GitHub
- URL: https://github.com/mightymetrika/crtb
- Owner: mightymetrika
- License: other
- Created: 2024-09-29T15:02:36.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-12-05T21:05:41.000Z (over 1 year ago)
- Last Synced: 2025-02-02T00:39:01.552Z (over 1 year ago)
- Topics: data-science, machine-learning, sampling, statistics
- Language: R
- Homepage:
- Size: 178 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# Complementary Resampling of Tags in Blocks (CRTB)
The crtb resampling method is inspired by complementary pairs subsampling (Shah & Samworth, 2013). The method creates pairs of resampled datasets with complementary properties.
## Installation
You can install the development version of crtb from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("mightymetrika/crtb")
```
## Implementation Details
When working with multiple groups and pooled resampling, CRTB follows these steps:
1. **Tag Assignment**
- Each observation receives a unique integer tag
- For multiple groups, tagging can be done row-wise or column-wise
2. **Initial Resampling**
- Tags are resampled using one of three methods
- With replacement (default)
- Without replacement
- Custom resampling function
- This creates the "original resample"
- Process halts if the proportion of resamples falls below the tie threshold
3. **Block Creation**
- Block length is set to half the length of initial tags
- First block:
- Form initial block stem from unique tags in original sample
- If block stem is undersized:
1. Take the set difference between all tags and block stem
2. Sample without replacement to fill block to target size
- Subsequent blocks:
- Form new block stem from unique remaining tags
- If block stem is undersized:
1. Take the set difference between all tags and block stem
2. Sample without replacement to fill block to target size
- Continue until all tags from original sample are assigned to blocks
4. **Complementary Sampling**
- For each block:
- Find complement (all tags not in block)
- Sample from complement to match block stem size
- Combined complementary samples form the "complementary resample"
5. **Output Generation**
- Map tags back to original observations
- Return two datasets:
- Original resample
- Complementary resample
```{r example}
library(crtb)
# Create sample data
data <- data.frame(
group1 = stats::rnorm(10),
group2 = stats::rnorm(10)
)
# Basic usage with default settings
result <- crtb(data)
# Access results
result$ordat
result$crdat
```
Shah, R. D., & Samworth, R. J. (2013). Variable Selection with Error Control: Another Look at Stability Selection. Journal of the Royal Statistical Society: Series B, 75(1), 55-80.