https://github.com/mightymetrika/crtb

Complementary Resampling of Tags in Blocks
https://github.com/mightymetrika/crtb

data-science machine-learning sampling statistics

Last synced: over 1 year ago
JSON representation

Complementary Resampling of Tags in Blocks

Host: GitHub
URL: https://github.com/mightymetrika/crtb
Owner: mightymetrika
License: other
Created: 2024-09-29T15:02:36.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2024-12-05T21:05:41.000Z (over 1 year ago)
Last Synced: 2025-02-02T00:39:01.552Z (over 1 year ago)
Topics: data-science, machine-learning, sampling, statistics
Language: R
Homepage:
Size: 178 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

README

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# Complementary Resampling of Tags in Blocks (CRTB)

The crtb resampling method is inspired by complementary pairs subsampling (Shah & Samworth, 2013). The method creates pairs of resampled datasets with complementary properties.

## Installation

You can install the development version of crtb from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("mightymetrika/crtb")
```

## Implementation Details

When working with multiple groups and pooled resampling, CRTB follows these steps:

1. **Tag Assignment**
- Each observation receives a unique integer tag
- For multiple groups, tagging can be done row-wise or column-wise

2. **Initial Resampling**
- Tags are resampled using one of three methods
- With replacement (default)
- Without replacement
- Custom resampling function
- This creates the "original resample"
- Process halts if the proportion of resamples falls below the tie threshold

3. **Block Creation**
- Block length is set to half the length of initial tags
- First block:
- Form initial block stem from unique tags in original sample
- If block stem is undersized:
1. Take the set difference between all tags and block stem
2. Sample without replacement to fill block to target size
- Subsequent blocks:
- Form new block stem from unique remaining tags
- If block stem is undersized:
1. Take the set difference between all tags and block stem
2. Sample without replacement to fill block to target size
- Continue until all tags from original sample are assigned to blocks

4. **Complementary Sampling**
- For each block:
- Find complement (all tags not in block)
- Sample from complement to match block stem size
- Combined complementary samples form the "complementary resample"

5. **Output Generation**
- Map tags back to original observations
- Return two datasets:
- Original resample
- Complementary resample

```{r example}
library(crtb)

# Create sample data
data <- data.frame(
group1 = stats::rnorm(10),
group2 = stats::rnorm(10)
)

# Basic usage with default settings
result <- crtb(data)

# Access results
result$ordat
result$crdat
```

Shah, R. D., & Samworth, R. J. (2013). Variable Selection with Error Control: Another Look at Stability Selection. Journal of the Royal Statistical Society: Series B, 75(1), 55-80.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mightymetrika/crtb

Awesome Lists containing this project

README