{"id":13706691,"url":"https://github.com/nathan-russell/hashmap","last_synced_at":"2025-06-16T14:02:51.454Z","repository":{"id":60721649,"uuid":"55267445","full_name":"nathan-russell/hashmap","owner":"nathan-russell","description":"Faster hash maps in R","archived":false,"fork":false,"pushed_at":"2023-07-24T19:02:12.000Z","size":511,"stargazers_count":81,"open_issues_count":13,"forks_count":9,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-30T22:24:37.670Z","etag":null,"topics":["cplusplus","hashmap","r","rcpp"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nathan-russell.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-04-01T23:06:57.000Z","updated_at":"2025-01-02T13:07:11.000Z","dependencies_parsed_at":"2024-01-14T20:18:56.121Z","dependency_job_id":"40743803-53a7-4797-93d1-37c31955047f","html_url":"https://github.com/nathan-russell/hashmap","commit_stats":{"total_commits":99,"total_committers":1,"mean_commits":99.0,"dds":0.0,"last_synced_commit":"39d547d5d58028314dead0cf85423d90e77f2f60"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathan-russell%2Fhashmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathan-russell%2Fhashmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathan-russell%2Fhashmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nathan-russell%2Fhashmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nathan-russell","download_url":"https://codeload.github.com/nathan-russell/hashmap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252533513,"owners_count":21763607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cplusplus","hashmap","r","rcpp"],"created_at":"2024-08-02T22:01:05.479Z","updated_at":"2025-05-05T16:29:27.109Z","avatar_url":"https://github.com/nathan-russell.png","language":"C++","funding_links":[],"categories":["Table of Contents","C++"],"sub_categories":["Data manipulation"],"readme":"---\noutput:\n  md_document:\n    variant: markdown_github\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"README-\"\n)\n```\n\nhashmap \n=======\n\n[![Travis-CI Build Status](https://travis-ci.org/nathan-russell/hashmap.svg?branch=master)](https://travis-ci.org/nathan-russell/hashmap) \n[![MIT licensed](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE) \n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/hashmap)](https://cran.r-project.org/package=hashmap) \n\n\n### Motivation \n\nUnlike many programming languages, R does not implement a native hash table \nclass. The typical workaround is to use `environment`s, taking advantage of \nthe fact that these objects are, by default, internally hashed: \n\n```R\nEE \u003c- new.env(hash = TRUE)  # equivalent to new.env()\n\nset.seed(123)\nlist2env(\n    setNames(\n        as.list(rnorm(26)), \n        LETTERS\n    ),\n    envir = EE\n)\n\nEE[[\"A\"]]\n# [1] -0.5604756\n\nEE[[\"D\"]]\n# [1] 0.07050839\n\nEE[[\"Z\"]]\n# [1] -1.686693\n```\n\nIn many situations, this is a fine solution - lookups are reasonably \nfast, and `environment`s are highly flexible, allowing one to store \nvirtually any type of R object (functions, lists, other environments, etc.).\nHowever, one of the major downsides to using `envinronment`s as hash tables \nis the inability to work with vector arguments: \n\n```R\nEE[[c(\"A\", \"B\")]]\n# Error in EE[[c(\"A\", \"B\")]] : \n#   wrong arguments for subsetting an environment\n\nEE[c(\"A\", \"B\")]\n# Error in EE[c(\"A\", \"B\")] : \n#   object of type 'environment' is not subsettable\n```\n\nThis is unfortunate, and somewhat surprising, considering most \noperations in R have vectorized semantics. \n\n------------\n\n### Solution \n\n```R\nlibrary(hashmap)\n\nset.seed(123)\n(HH \u003c- hashmap(LETTERS, rnorm(26)))\n## (character) =\u003e (numeric)  \n##         [Z] =\u003e [-1.686693]\n##         [Y] =\u003e [-0.625039]\n##         [R] =\u003e [-1.966617]\n##         [X] =\u003e [-0.728891]\n##         [Q] =\u003e [+0.497850]\n##         [P] =\u003e [+1.786913]\n##       [...] =\u003e [...] \n\nHH[[c(\"A\", \"B\")]]\n# [1] -0.5604756 -0.2301775\n```\n\nIt is important to note that unlike the `environment`-based solution, \n`hashmap` does *NOT* offer the flexibilty to store arbitrary types of \nobjects. Any combination of the following *atomic* vector types is \ncurrently permitted: \n\n+ keys\n    + `integer`\n    + `numeric`\n    + `character`\n    + `Date` \n    + `POSIXct`\n+ values\n    + `logical`\n    + `integer`\n    + `numeric`\n    + `character`\n    + `complex`\n    + `Date`\n    + `POSIXct`\n\n------------\n\n### Features \n\nWhat `hashmap` may lack in terms of flexibility it makes up for in \ntwo important areas: performance and ease-of-use. Let's begin with the \nlatter by looking at some basic examples. \n\n#### Usage\n\n+ A `Hashmap` is created by passing a vector of keys and a vector of \nvalues to `hashmap`: \n\n    ```R\n    set.seed(123)\n    H \u003c- hashmap(letters[1:10], rnorm(10))\n    H\n    ## (character) =\u003e (numeric)  \n    ##         [j] =\u003e [-0.445662]\n    ##         [i] =\u003e [-0.686853]\n    ##         [h] =\u003e [-1.265061]\n    ##         [g] =\u003e [+0.460916]\n    ##         [e] =\u003e [+0.129288]\n    ##         [d] =\u003e [+0.070508]\n    ##       [...] =\u003e [...] \n    ```\n\n+ If the lengths of the two vectors are not equal, the longer object is \ntruncated to the length of its counterpart, and a warning is issued: \n\n    ```R\n    hashmap(letters[1:5], 1:3)\n    ## (character) =\u003e (integer)\n    ##         [c] =\u003e [3]      \n    ##         [b] =\u003e [2]      \n    ##         [a] =\u003e [1]      \n    # Warning message:\n    # In new_CppObject_xp(fields$.module, fields$.pointer, ...) :\n    #   length(keys) != length(values)!\n      \n    hashmap(letters[1:3], 1:5)\n    ## (character) =\u003e (integer)\n    ##         [c] =\u003e [3]      \n    ##         [b] =\u003e [2]      \n    ##         [a] =\u003e [1]      \n    # Warning message:\n    # In new_CppObject_xp(fields$.module, fields$.pointer, ...) :\n    #   length(keys) != length(values)!\n    ```\n    \n+ Value lookup can be performed by passing a vector of lookup keys to \neither of `[[` or `$find`: \n\n    ```R\n    H[[\"a\"]]\n    # [1] -0.5604756\n    \n    H$find(\"b\")\n    # [1] -0.2301775\n    \n    H[[c(\"a\", \"c\")]]\n    # [1] -0.5604756  1.5587083\n    \n    H$find(c(\"b\", \"d\"))\n    # [1] -0.23017749  0.07050839\n    ```\n\n+ For non-existant lookup keys, `NA` is returned: \n\n    ```R\n    H[[c(\"a\", \"A\", \"b\")]]\n    # [1] -0.5604756         NA -0.2301775\n    ```\n    \n+ Use `$has_key` to check for the existance of individual keys, or `$has_keys` \nfor a vector of keys: \n\n    ```R\n    H$has_key(\"a\")\n    # [1] TRUE\n    \n    H$has_key(\"A\")\n    # [1] FALSE\n    \n    H$has_keys(c(\"a\", \"A\", \"b\", \"B\"))\n    # [1]  TRUE FALSE  TRUE FALSE\n    ```\n\n+ Modification of key-value pairs is done using either of `[[\u003c-` or \n`$insert`. For non-existing keys, a new key-value pair will be \ninserted. For existing keys, the previous value will be overwritten: \n\n    ```R\n    H[[c(\"a\", \"x\")]]\n    # [1] -0.5604756         NA\n    \n    H[[c(\"a\", \"x\")]] \u003c- c(1.5, 26.5)\n    H[[c(\"a\", \"x\")]]\n    # [1]  1.5 26.5\n    \n    H$insert(c(\"a\", \"y\", \"z\"), c(100, 200, 300))\n    H[[c(\"a\", \"y\", \"z\")]]\n    # [1] 100 200 300\n    ``` \n\n+ To remove elements from the hash table, pass a vector of keys to `$erase`, \nwhich will delete entries for matched elements, and do nothing otherwise: \n\n    ```R\n    H$has_keys(c(\"y\", \"Y\", \"z\", \"Z\"))\n    # [1]  TRUE FALSE  TRUE FALSE\n    \n    H$erase(c(\"y\", \"Y\", \"z\", \"Z\"))\n    \n    H$has_keys(c(\"y\", \"Y\", \"z\", \"Z\"))\n    # [1] FALSE FALSE FALSE FALSE\n    ```\n\n+ Use `$size` to check the number of key-value pairs, `$empty` to check \nif the hash table is empty, and `$clear` to delete all existing entries: \n\n    ```R\n    H$size()\n    # [1] 11\n    \n    H$empty()\n    # [1] FALSE\n    \n    H$clear()\n    \n    H$empty()\n    # [1] TRUE\n    \n    H$size()\n    # [1] 0\n    \n    H\n    ## [empty Hashmap]\n    ``` \n\n+ `$keys` and `$values` return every key and value, respectively, and \n`$data` returns a named vector of values, using the keys as names: \n\n    ```R\n    H[[c(\"A\", \"B\", \"C\")]] \u003c- 1:3\n    \n    H$keys()\n    # [1] \"C\" \"B\" \"A\"\n    \n    H$values()\n    # [1] 3 2 1\n    \n    H$data()\n    # C B A \n    # 3 2 1 \n    ```\n    \n+ By default, only the first 6 key-value pairs of a `Hashmap` are printed, \nwhere `[...] =\u003e [...]` indicates that additional entries exist but are not \ndisplayed. This can be adjusted via `options()`: \n\n    ```R\n    getOption(\"hashmap.max.print\")\n    # [1] 6\n    \n    H\n    ## (character) =\u003e (numeric)  \n    ##         [C] =\u003e [+3.000000]\n    ##         [B] =\u003e [+2.000000]\n    ##         [A] =\u003e [+1.000000]\n    \n    H[[letters[1:10]]] \u003c- rnorm(10)\n    H\n    ## (character) =\u003e (numeric)  \n    ##         [j] =\u003e [-0.472791]\n    ##         [i] =\u003e [+0.701356]\n    ##         [h] =\u003e [-1.966617]\n    ##         [g] =\u003e [+0.497850]\n    ##         [e] =\u003e [-0.555841]\n    ##         [d] =\u003e [+0.110683]\n    ##       [...] =\u003e [...]\n    \n    options(hashmap.max.print = 15)\n    H\n    ## (character) =\u003e (numeric)  \n    ##         [j] =\u003e [-0.472791]\n    ##         [i] =\u003e [+0.701356]\n    ##         [h] =\u003e [-1.966617]\n    ##         [g] =\u003e [+0.497850]\n    ##         [e] =\u003e [-0.555841]\n    ##         [d] =\u003e [+0.110683]\n    ##         [c] =\u003e [+0.400772]\n    ##         [f] =\u003e [+1.786913]\n    ##         [b] =\u003e [+0.359814]\n    ##         [a] =\u003e [+1.224082]\n    ##         [C] =\u003e [+3.000000]\n    ##         [B] =\u003e [+2.000000]\n    ##         [A] =\u003e [+1.000000]\n    ``` \n\n----------\n\n#### Benchmark\n\nThe following is a simple test comparing the performance of an \n`environment` object against `hashmap` for \n\n1. Construction of the hash table \n2. Vectorized key lookup \n\nAn overview of results in presented here, but the \nfull code to reproduce the test is in \n[assets/benchmark.R](https://github.com/nathan-russell/hashmap/blob/master/assets/benchmark.R). \nAll of the examples use a one million element character vector for \nkeys, and a one million element numeric vector for values.\n\nHash table construction was rather slow for the environment, \ndespite my ~~best~~ moderate efforts to devise a fast solution, so\nexpressions were only evaluated 25 times: \n\n```r\nmicrobenchmark::microbenchmark(\n    \"Hash\" = hashmap(Keys, Values),\n    \"Env\" = env_hash(Keys, Values),\n    times = 25L\n)\n# Unit: milliseconds\n#  expr        min        lq      mean    median       uq       max neval cld\n#  Hash   946.3524  1287.771  1784.404  1639.788  2243.93  3315.194    25   a \n#   Env 11724.2705 13218.521 14071.874 13685.929 15178.27 16516.216    25   b\n```\nNext, a lookup of all 1000 keys: \n\n```r\nE \u003c- env_hash(Keys, Values)\nH \u003c- hashmap(Keys, Values)\n\nall.equal(env_find(Lookup, E), H[[Lookup]])\n# [1] TRUE\n\nmicrobenchmark::microbenchmark(\n    \"Hash\" = H[[Lookup]],\n    \"Env\" = env_find(Lookup, E), \n    times = 500L\n)\n# Unit: microseconds\n#  expr       min       lq       mean     median         uq       max neval cld\n#  Hash   314.182   738.98   804.5154   799.7065   858.3895  3013.285   500   a \n#   Env 12291.671 12651.12 13020.3816 12740.1735 12919.7355 67220.784   500   b\n```\n\nAnd finally, a comparison of key-lookups for vectors of various sizes, \nplotted below on the linear and logarithmic scales, where data points \nrepresent median evaluation time of 200 runs for the given expression: \n\n\n![](tools/linear-plot.png)\n\n\n\n![](tools/log-plot.png)\n\n-----------\n\nThe benchmark was conducted on a laptop running Ubuntu \n14.04, with the following specs, \n\n```shell\n$ lscpu \u0026\u0026 printf \"\\n\\n\" \u0026\u0026 free -h\nArchitecture:          x86_64\nCPU op-mode(s):        32-bit, 64-bit\nByte Order:            Little Endian\nCPU(s):                4\nOn-line CPU(s) list:   0-3\nThread(s) per core:    2\nCore(s) per socket:    2\nSocket(s):             1\nNUMA node(s):          1\nVendor ID:             GenuineIntel\nCPU family:            6\nModel:                 69\nStepping:              1\nCPU MHz:               759.000\nBogoMIPS:              4589.34\nVirtualization:        VT-x\nL1d cache:             32K\nL1i cache:             32K\nL2 cache:              256K\nL3 cache:              3072K\nNUMA node0 CPU(s):     0-3\n\n\n             total       used       free     shared    buffers     cached\nMem:          7.7G       5.6G       2.1G       333M       499M       2.5G\n-/+ buffers/cache:       2.6G       5.1G\nSwap:           0B         0B         0B\n```\n\nin the following R session: \n\n```r\nR version 3.2.4 Revised (2016-03-16 r70336)\nPlatform: x86_64-pc-linux-gnu (64-bit)\nRunning under: Ubuntu 14.04.4 LTS\n\nlocale:\n [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       \n [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   \n [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              \n[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       \n\nattached base packages:\n[1] stats     graphics  grDevices utils     datasets  methods   base     \n\nother attached packages:\n[1] data.table_1.9.6   hashmap_0.0.0.9000 ggvis_0.4.2       \n\nloaded via a namespace (and not attached):\n [1] Rcpp_0.12.4.1          rstudioapi_0.3.1       knitr_1.11             magrittr_1.5          \n [5] munsell_0.4.2          colorspace_1.2-6       xtable_1.8-2           R6_2.1.1              \n [9] plyr_1.8.3             dplyr_0.4.3            tools_3.2.4            parallel_3.2.4        \n[13] grid_3.2.4             gtable_0.1.2           DBI_0.3.1              htmltools_0.3.5       \n[17] yaml_2.1.13            lazyeval_0.1.10        assertthat_0.1         digest_0.6.8          \n[21] shiny_0.13.2           ggplot2_2.0.0          microbenchmark_1.4-2.1 codetools_0.2-14      \n[25] mime_0.4               rmarkdown_0.8.1        scales_0.3.0           jsonlite_0.9.17       \n[29] httpuv_1.3.3           chron_2.3-47 \n```\n\n----------\n\n### Installation\n\nThe stable release of `hashmap` can be installed from CRAN: \n\n```r\ninstall.packages(\"hashmap\")\n```\n\nThe current development version can be installed from GitHub with `devtools`: \n\n```r\nif (!\"devtools\" %in% installed.packages()[,1]) {\n    install.packages(\"devtools\")\n}\ndevtools::install_github(\"nathan-russell/hashmap\")\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnathan-russell%2Fhashmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnathan-russell%2Fhashmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnathan-russell%2Fhashmap/lists"}