{"id":32200820,"url":"https://github.com/alec-stashevsky/blocklength","last_synced_at":"2025-10-22T03:56:04.620Z","repository":{"id":143716301,"uuid":"320965119","full_name":"Alec-Stashevsky/blocklength","owner":"Alec-Stashevsky","description":"R package with a set of functions to select the optimal block-length for a dependent bootstrap (block-bootstrap). Includes the Hall, Horowitz, and Jing (1995) cross-validation method and the Politis and White (2004) Spectral Density Plug-in method.","archived":false,"fork":false,"pushed_at":"2025-03-08T22:41:48.000Z","size":2940,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-22T03:55:57.745Z","etag":null,"topics":["block-bootstrap","block-resampling","blocklength","boot","bootstrap","cran","depedent-bootstrap","dependent","horowitz","inference","meboot","politis","r","resample","stats","time","time-series","time-series-analysis","tseries"],"latest_commit_sha":null,"homepage":"https://alecstashevsky.com/r/blocklength/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Alec-Stashevsky.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-12-13T02:07:13.000Z","updated_at":"2025-06-12T21:26:59.000Z","dependencies_parsed_at":"2025-04-12T16:30:45.363Z","dependency_job_id":"092c6cf1-536a-4157-814a-bc517e7804d9","html_url":"https://github.com/Alec-Stashevsky/blocklength","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/Alec-Stashevsky/blocklength","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alec-Stashevsky%2Fblocklength","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alec-Stashevsky%2Fblocklength/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alec-Stashevsky%2Fblocklength/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alec-Stashevsky%2Fblocklength/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Alec-Stashevsky","download_url":"https://codeload.github.com/Alec-Stashevsky/blocklength/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alec-Stashevsky%2Fblocklength/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280376550,"owners_count":26320276,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["block-bootstrap","block-resampling","blocklength","boot","bootstrap","cran","depedent-bootstrap","dependent","horowitz","inference","meboot","politis","r","resample","stats","time","time-series","time-series-analysis","tseries"],"created_at":"2025-10-22T03:56:03.077Z","updated_at":"2025-10-22T03:56:04.613Z","avatar_url":"https://github.com/Alec-Stashevsky.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\neditor_options: \n  markdown: \n    wrap: 72\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\",\n  fig.align = \"center\",\n  fig.ext = \"svg\",\n  dev = \"svg\"\n)\n\nset.seed(32)\n```\n\n\n# blocklength \u003cimg src=\"man/figures/logo.svg\" style=\"padding-left: 20px\" align=\"right\"/\u003e\n\n\u003c!-- badges: start --\u003e\n\n[![R-CMD-check](https://github.com/Alec-Stashevsky/blocklength/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Alec-Stashevsky/blocklength/actions/workflows/R-CMD-check.yaml)\n[![CRAN status](https://www.r-pkg.org/badges/version/blocklength)](https://CRAN.R-project.org/package=blocklength)\n[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/blocklength?color=brightgreen)](https://CRAN.R-project.org/package=blocklength)\n[![CRAN/METACRAN](https://img.shields.io/cran/l/blocklength)](https://CRAN.R-project.org/package=blocklength)\n[![Codecov test coverage](https://codecov.io/gh/Alec-Stashevsky/blocklength/graph/badge.svg)](https://app.codecov.io/gh/Alec-Stashevsky/blocklength)\n\n\u003c!-- badges: end --\u003e\n\n`blocklength` is an R package used to automatically select the\nblock-length parameter for a block-bootstrap. It is meant for use with\ndependent data such as stationary time series.\n\n## The Story\n\nRegular bootstrap methods rely on assumptions that observations are\nindependent and identically distributed (*i.i.d.*), but this assumption\nfails for many types of time series because we would expect the\nobservation in the previous period to have some explanatory power over\nthe current observation. This could occur in any time series from\nunemployment rates, stock prices, biological data, etc. A time series\nthat is *i.i.d.* would look like white noise, since the following\nobservation would be totally independent of the previous one (random).\n\nTo get around this problem, we can retain some of this *time-dependence*\nby breaking-up a time series into a number of blocks with length *l*.\nInstead of sampling each observation randomly (with replacement) like a\nregular bootstrap, we can resample these *blocks* at random. This way\nwithin each block the time-dependence is preserved.\n\nThe problem with the block bootstrap is the high sensitivity to the\nchoice of block-length, or the number of blocks to break the time series\ninto.\n\nThe goal of `blocklength` is to simplify and automate the process of\nselecting a block-length to perform a bootstrap on dependent data.\n`blocklength` has several functions that take their name from the\nauthors who have proposed them. Currently, there are three methods\navailable:\n\n1.  `hhj()` takes its name from the [Hall, Horowitz, and Jing\n    (1995)](https://doi.org/10.1093/biomet/82.3.561)\n    \"HHJ\" method to select the optimal block-length using a\n    cross-validation algorithm which minimizes the mean squared error\n    *(MSE)* incurred by the bootstrap at various block-lengths.\n\n2.  `pwsd()` takes its name from the [Politis and White\n    (2004)](https://doi.org/10.1081/ETC-120028836) Spectral Density\n    \"PWSD\" Plug-in method to automatically select the optimal\n    block-length using spectral density estimation via \"flat-top\" lag\n    windows of [Politis and Romano\n    (1995).](https://doi.org/10.1111/j.1467-9892.1995.tb00223.x)\n\n3.  `nppi()` takes its name from the [Lahiri, Furukawa, and Lee\n    (2007)](https://doi.org/10.1016/j.stamet.2006.08.002) Nonparametric\n    Plug-In \"NPPI\" method to select the optimal block-length for block\n    bootstrap procedures. The NPPI method estimates the leading term in\n    the first-order expansion of the theoretically optimal block length\n    by using resampling methods to construct consistent bias and\n    variance estimators for the block-bootstrap. Specifically, this\n    package implements the Moving Block Bootstrap (MBB) method of\n    [Künsch (1989)](https://doi.org/10.1214/aos/1176347265) and\n    the Moving Blocks Jackknife (MBJ) of [Liu and Singh\n    (1992)](https://doi.org/10.1214/aos/1176348653) as the bias and\n    variance estimators, respectively.\n\nUnder the hood, `hhj()` uses the moving block bootstrap (MBB) procedure\naccording to [Künsch\n(1989)](https://projecteuclid.org/euclid.aos/1176347265) which resamples\nblocks from a set of overlapping sub-samples with a fixed block-length.\nHowever, the results of `hhj()` may be generalized to other block\nbootstrap procedures such as the *stationary bootstrap* of [Politis and\nRomano\n(1994).](https://doi.org/10.1080/01621459.1994.10476870)\n\nCompared to `pwsd()`, `hhj()` is more computationally intensive as it\nrelies on iterative sub-sampling processes that optimize the MSE\nfunction over each possible block-length (or a select grid of\nblock-lengths), while `pwsd()` is a simpler \"plug-in\" rule that uses\nauto-correlations, auto-covariance, and the spectral density of the\nseries to optimize the choice of block-length. Similarly, `nppi()` is\nanother \"plug-in\" rule, however, due to its heavy reliance on\nresampling, it can also be computationally intensive compared to `pwsd()`.\n\nFor a detailed comparison, see the table below:\n\n```{r table, echo=FALSE}\n# Comparison table\ntable \u003c- data.frame(\n  rows = c(\n    \"**Method Type**\",\n    \"**Computational Cost**\",\n    \"**Primary Goal**\", \n    \"**Variance Estimation**\",\n    \"**Bias Estimation**\",\n    \"**Best for**\", \n    \"**Estimation Capacity**\",\n    \"**Dependency\\\\***\"\n    ),\n  NPPI = c(\n    \"Nonparametric resampling\",\n    \"Medium (bootstrap resampling \u0026 jackknife)\",\n    \"Minimize MSE of bootstrap estimator\",\n    \"Moving Blocks Jackknife-After-Bootstrap (JAB)\", \n    \"Directly estimates bias from bootstrap\",\n    \"General-purpose estimators, small sample sizes, and quantile estimation\", \n    \"Bootstrap bias, variance, distribution function, and quantile estimation\",\n    \"User-defined parameters for initial block-length `l` and number of deletion blocks `m`\"\n    ),\n  PWSD = c(\n    \"Spectral density estimation\",\n    \"Low (direct ACF computation)\", \n    \"Estimate block length using spectral density\",\n    \"Implicitly estimated via spectral density\", \n    \"Indirectly accounts for bias via ACF decay\",\n    \"Block-length selection for circular and stationary bootstrap, time series with strong autocorrelation\", \n    \"Bootstrap sample mean only\",\n    \"User-defined parameters for autocorrelation lag and implied hypothesis tests (4 total)\"\n    ),\n  HHJ = c(\n    \"Subsampling-based cross-validation\",\n    \"High (subsampling \u0026 cross-validation)\", \n    \"Minimize MSE via cross-validation\",\n    \"Uses subsample-based variance estimation\", \n    \"Uses subsample-based bias estimation\",\n    \"Estimating functionals with strong dependencies\", \n    \"Bootstrap variance and distribution function estimation\", \n    \"Requires user-defined parameters for `pilot_block_length` (*l\\\\**) and `sub_sample` size (*m*)\"\n    )\n)\n\n# Render table\nknitr::kable(\n  table,\n  format = \"markdown\",\n  col.names = c(\n    \"\",\n    \"NPPI (Lahiri et al., 2007)\",\n    \"PWSD (Politis \u0026 White, 2004)\",\n    \"HHJ (Hall, Horowitz \u0026 Jing, 1995)\"\n    ),\n  caption = \"* All algorithms have default user-defined parameters recomended by the respective authors.\"\n  )\n```\n\n\n## Installation\n\nYou can install the released version from\n[CRAN](https://cran.r-project.org/package=blocklength) with:\n\n``` r\ninstall.packages(\"blocklength\")\n```\n\nYou can install the development version from\n[GitHub](https://github.com/Alec-Stashevsky/blocklength) with:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"Alec-Stashevsky/blocklength\")\n```\n\n## Use Case\n\nWe want to select the optimal block-length to perform a block bootstrap\non a simulated autoregressive *AR(1)* time series.\n\nFirst we will generate the time series:\n\n```{r series}\nlibrary(blocklength)\n\n# Simulate AR(1) time series\nseries \u003c- stats::arima.sim(model = list(order = c(1, 0, 0), ar = 0.5),\n                           n = 500, rand.gen = rnorm)\n\n# Coerce time series to data.frame (not necessary)\ndata \u003c- data.frame(\"AR1\" = series)\n```\n\nNow, we can find the optimal block-length to perform a block-bootstrap.\nWe do this using the three available methods.\n\n\n### 1. The Hall, Horowitz, and Jing (1995) \"HHJ\" Method\n\n```{r hhj}\n## Using the HHJ Algorithm with overlapping subsamples of width 10\nhhj(series, sub_sample = 10, k = \"bias/variance\")\n```\n\n\n### 2. The Politis and White (2004) Spectral Density Estimation \"PWSD\" Method\n\n```{r pwsd}\n# Using Politis and White (2004) Spectral Density Estimation\npwsd(data)\n```\n\nWe can see that both methods produce similar results for a block-length\nof 9 or 11 depending on the type of bootstrap method used.\n\n\n### 3. The Lahiri, Furukawa, and Lee (2007) Nonparametric Plug-In \"NPPI\" Method\n\n```{r nppi}\n# Using Lahiri, Furukawa, and Lee (2007) Nonparametric Plug-In \nnppi(data, m = 8) \n```\n\n\n## Acknowledgements\n\nA big shoutout to Malina Cheeneebash for designing the `blocklength` hex\nsticker! Also to Sergio Armella and [Simon P.\nCouch](https://www.simonpcouch.com) for their help and feedback!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falec-stashevsky%2Fblocklength","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falec-stashevsky%2Fblocklength","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falec-stashevsky%2Fblocklength/lists"}