{"id":24906199,"url":"https://github.com/mightymetrika/crtb","last_synced_at":"2025-03-27T22:22:24.365Z","repository":{"id":257887802,"uuid":"864934622","full_name":"mightymetrika/crtb","owner":"mightymetrika","description":"Complementary Resampling of Tags in Blocks","archived":false,"fork":false,"pushed_at":"2024-12-05T21:05:41.000Z","size":182,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-02T00:39:01.552Z","etag":null,"topics":["data-science","machine-learning","sampling","statistics"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mightymetrika.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-29T15:02:36.000Z","updated_at":"2024-12-05T21:05:45.000Z","dependencies_parsed_at":"2024-11-30T21:22:55.471Z","dependency_job_id":"e958a664-b87f-44a9-9252-88d8eb722425","html_url":"https://github.com/mightymetrika/crtb","commit_stats":null,"previous_names":["mightymetrika/crtb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mightymetrika%2Fcrtb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mightymetrika%2Fcrtb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mightymetrika%2Fcrtb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mightymetrika%2Fcrtb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mightymetrika","download_url":"https://codeload.github.com/mightymetrika/crtb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245932325,"owners_count":20696037,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","machine-learning","sampling","statistics"],"created_at":"2025-02-02T00:39:08.153Z","updated_at":"2025-03-27T22:22:24.310Z","avatar_url":"https://github.com/mightymetrika.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# Complementary Resampling of Tags in Blocks (CRTB)\n\n\u003c!-- badges: start --\u003e\n\u003c!-- badges: end --\u003e\n\nThe crtb resampling method is inspired by complementary pairs subsampling (Shah \u0026 Samworth, 2013). The method creates pairs of resampled datasets with complementary properties.\n\n## Installation\n\nYou can install the development version of crtb from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"pak\")\npak::pak(\"mightymetrika/crtb\")\n```\n\n## Implementation Details\n\nWhen working with multiple groups and pooled resampling, CRTB follows these steps:\n\n1. **Tag Assignment**\n  - Each observation receives a unique integer tag\n  - For multiple groups, tagging can be done row-wise or column-wise\n \n2. **Initial Resampling**\n  - Tags are resampled using one of three methods\n    - With replacement (default)\n    - Without replacement\n    - Custom resampling function\n  - This creates the \"original resample\"\n  - Process halts if the proportion of resamples falls below the tie threshold\n\n3. **Block Creation**\n  - Block length is set to half the length of initial tags\n  - First block:\n    - Form initial block stem from unique tags in original sample\n    - If block stem is undersized:\n      1. Take the set difference between all tags and block stem\n      2. Sample without replacement to fill block to target size\n  - Subsequent blocks:\n    - Form new block stem from unique remaining tags\n    - If block stem is undersized:\n      1. Take the set difference between all tags and block stem\n      2. Sample without replacement to fill block to target size\n  - Continue until all tags from original sample are assigned to blocks\n \n4. **Complementary Sampling**\n  - For each block:\n    - Find complement (all tags not in block)\n    - Sample from complement to match block stem size\n  - Combined complementary samples form the \"complementary resample\"\n \n5. **Output Generation**\n  - Map tags back to original observations\n  - Return two datasets:\n    - Original resample\n    - Complementary resample\n\n```{r example}\nlibrary(crtb)\n\n# Create sample data\ndata \u003c- data.frame(\n  group1 = stats::rnorm(10),\n  group2 = stats::rnorm(10)\n)\n\n# Basic usage with default settings\nresult \u003c- crtb(data)\n\n# Access results\nresult$ordat\nresult$crdat\n```\n\n\nShah, R. D., \u0026 Samworth, R. J. (2013). Variable Selection with Error Control: Another Look at Stability Selection. Journal of the Royal Statistical Society: Series B, 75(1), 55-80.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmightymetrika%2Fcrtb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmightymetrika%2Fcrtb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmightymetrika%2Fcrtb/lists"}