{"id":23825987,"url":"https://github.com/coolbutuseless/oomph","last_synced_at":"2025-09-07T08:31:25.350Z","repository":{"id":268990985,"uuid":"905577647","full_name":"coolbutuseless/oomph","owner":"coolbutuseless","description":"Faster subsetting of vectors and lists by name","archived":false,"fork":false,"pushed_at":"2025-07-10T06:42:44.000Z","size":265,"stargazers_count":22,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-10T10:10:09.318Z","etag":null,"topics":["package","r","rpackage"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coolbutuseless.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-19T05:40:26.000Z","updated_at":"2025-07-10T06:42:47.000Z","dependencies_parsed_at":"2024-12-20T07:27:41.111Z","dependency_job_id":"f19be205-af44-47ed-a934-9af6d1e6dee4","html_url":"https://github.com/coolbutuseless/oomph","commit_stats":null,"previous_names":["coolbutuseless/oomph"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/coolbutuseless/oomph","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Foomph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Foomph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Foomph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Foomph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coolbutuseless","download_url":"https://codeload.github.com/coolbutuseless/oomph/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Foomph/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274011457,"owners_count":25207072,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["package","r","rpackage"],"created_at":"2025-01-02T12:14:00.895Z","updated_at":"2025-09-07T08:31:24.975Z","avatar_url":"https://github.com/coolbutuseless.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = FALSE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n\nlibrary(oomph)\n```\n\n# oomph\n\n\u003c!-- badges: start --\u003e\n![](https://img.shields.io/badge/cool-useless-green.svg)\n[![R-CMD-check](https://github.com/coolbutuseless/oomph/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/coolbutuseless/oomph/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\n`oomph` is a package for fast string matching within a static set of strings.\nThis is useful for fast named look-up in fixed lists and vectors.\n\nInternally this uses a hash-map in C to map strings to integers.  In R, \nthis appears as a minimal perfect hash where each string maps to its index, \nand unknown strings return `NA`\n\nThe hashed look-up can be more than **1000x** faster than R's standard look-up method (depending on \nnumber of elements in original object and the number of elements to extract).\n\n\n## What's in the box\n\n* `mph \u003c- mph_init(s, size_factor)` initialise a hash with the given set of strings\n    * Using a larger `size_factor` (than the default of `1`) decreases the number \n      of hash collisions, and can make other operations faster at the cost or\n      more memory being allocated.\n* `mph_match(s, mph)` find the indices of the strings `s` (equivalent to R's `match()`)\n\n\n\n## Installation\n\nYou can install from [GitHub](https://github.com/coolbutuseless/oomph) with:\n\n``` r\n# install.package('remotes')\nremotes::install_github('coolbutuseless/oomph')\n```\n\n## Setup test data\n\n\n```{r}\nlibrary(oomph)\nN \u003c- 500000\nset.seed(1)\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# 500k random names\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nnms \u003c- vapply(seq(N), \\(i) paste(sample(c(letters, LETTERS, 0:9), 10, T), collapse = \"\"), character(1))\nhead(nms)\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# A big named vector and named list (each with 500k elements)\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nbig_vector \u003c- seq(N)\nbig_list   \u003c- as.list(seq(N))\n\nnames(big_vector) \u003c- nms\nnames(big_list  ) \u003c- nms\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# Probe sets to use for testing\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nt0  \u003c- sample(nms,    1, replace = TRUE)\nt1  \u003c- sample(nms,   10, replace = TRUE)\nt2  \u003c- sample(nms,  100, replace = TRUE)\nt3  \u003c- sample(nms, 1000, replace = TRUE)\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# By default, the number of hash buckets is the same as the number of \n# strings.  To reduce the possibility of hash collisions (and possibly make look-ups\n# faster), the number of hash buckets can be changed using the 'size_factor'\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nmph \u003c- mph_init(nms) # Allocate exactly length(nms) buckets\n```\n\n\n## Compare `match()` with `mph_match()`\n\n\n```{r match}\nbench::mark(\n  match(t0, nms),\n  mph_match(t0, mph)\n)[, 1:5] |\u003e knitr::kable()\n\nbench::mark(\n  match(t1, nms),\n  mph_match(t1, mph)\n)[, 1:5] |\u003e knitr::kable()\n\nbench::mark(\n  match(t2, nms),\n  mph_match(t2, mph)\n)[, 1:5] |\u003e knitr::kable()\n\nbench::mark(\n  match(t3, nms),\n  mph_match(t3, mph)\n)[, 1:5] |\u003e knitr::kable()\n```\n\n\n\n\n## Vector subsetting - Extract 100 elements of a `vector` by name\n\n```{r vector-subset}\nbench::mark(\n  big_vector[t2],\n  big_vector[mph_match(t2, mph)]\n)[, 1:5] |\u003e knitr::kable()\n```\n\n\n## List subsetting - Extract 100 elements of a `list` by name\n\nAlso compare to using hashed named lookup in a standard R environment\n\n```{r list-subset}\nee \u003c- as.environment(big_list)\n\nbench::mark(\n  `Standard R`           = big_list[t2],\n  `R hashed environment` = mget(t2, ee),\n  `[] and mph indexing`  = big_list[mph_match(t2, mph)]\n)[, 1:5] |\u003e knitr::kable()\n```\n\n\n\n## Time taken to build the hash\n\n```{r}\nset.seed(1)\nchrs \u003c- c(letters, LETTERS, 0:9)\nN \u003c- 1000\nnms1k \u003c- vapply(seq(N), \\(i) paste(sample(chrs, 10, T), collapse = \"\"), character(1))\n\nN \u003c- 10000\nnms10k \u003c- vapply(seq(N), \\(i) paste(sample(chrs, 10, T), collapse = \"\"), character(1))\n\nN \u003c- 100000\nnms100k \u003c- vapply(seq(N), \\(i) paste(sample(chrs, 10, T), collapse = \"\"), character(1))\n\nbench::mark(\n  mph_init(nms1k),\n  mph_init(nms10k),\n  mph_init(nms100k),\n  check = FALSE\n)[, 1:5] |\u003e knitr::kable()\n\n```\n\n\n## Billion Row Challenge indexing\n\nThe following example is a part of the [billion row challenge](https://github.com/jrosell/1br).\n\nIn this example, we are attempting to keep a streaming tally of the 3-letter codes\nwhich are seen.\n\n```{r warning=FALSE}\nlibrary(oomph)\nlibrary(insitu)\n\nnms \u003c- expand.grid(LETTERS, LETTERS, LETTERS) |\u003e \n  apply(1, paste0, collapse = \"\")\n\ncounts \u003c- numeric(length(nms))\nnames(counts) \u003c- nms\nmph \u003c- mph_init(nms)\n\nset.seed(1)\nrandom_nms \u003c- sample(nms, 1000)\n\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# updating in bulk\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nbench::mark(\n  baseR            = {i \u003c- match(random_nms, nms); counts[i] \u003c- counts[i] + 1},\n  oomph            = {i \u003c- mph_match(random_nms, mph); counts[i] \u003c- counts[i] + 1},\n  `oomph + insitu` = {br_add(counts, 1, idx =  mph_match(random_nms, mph))},\n  check = FALSE\n)[, 1:5] |\u003e knitr::kable()\n\n\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n# Updating within a for loop\n#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nbench::mark(\n  baseR = {\n    for (nm in random_nms) {\n      i \u003c- match(nm, nms)\n      counts[i] \u003c- counts[i] + 1\n    }\n  },\n  oomph = {\n    for (nm in random_nms) {\n      i \u003c- mph_match(nm, mph)\n      counts[i] \u003c- counts[i] + 1\n    }\n  },\n  `oomph + insitu` = {\n    for (nm in random_nms) {\n      br_add(counts, 1, idx = mph_match(nm, mph))\n    }\n  },\n  check = FALSE\n)[, 1:5] |\u003e knitr::kable()\n```\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoolbutuseless%2Foomph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoolbutuseless%2Foomph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoolbutuseless%2Foomph/lists"}