{"id":13858009,"url":"https://github.com/mkearney/funique","last_synced_at":"2026-03-04T23:31:57.439Z","repository":{"id":56937162,"uuid":"133566034","full_name":"mkearney/funique","owner":"mkearney","description":"⌚️ A faster unique() function","archived":false,"fork":false,"pushed_at":"2018-12-28T18:37:34.000Z","size":7494,"stargazers_count":19,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-12-09T21:53:35.275Z","etag":null,"topics":["data-frame","data-wrangling","date-time","duplicates","mkearney-r-package","posix","posixct","r","r-package","rstats","unique"],"latest_commit_sha":null,"homepage":"https://funique.mikewk.com","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkearney.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-15T19:52:13.000Z","updated_at":"2025-03-22T11:06:59.000Z","dependencies_parsed_at":"2022-08-21T06:50:09.278Z","dependency_job_id":null,"html_url":"https://github.com/mkearney/funique","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/mkearney/funique","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkearney%2Ffunique","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkearney%2Ffunique/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkearney%2Ffunique/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkearney%2Ffunique/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkearney","download_url":"https://codeload.github.com/mkearney/funique/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkearney%2Ffunique/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30099380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T23:31:22.529Z","status":"ssl_error","status_checked_at":"2026-03-04T23:31:22.112Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-frame","data-wrangling","date-time","duplicates","mkearney-r-package","posix","posixct","r","r-package","rstats","unique"],"created_at":"2024-08-05T03:01:53.825Z","updated_at":"2026-03-04T23:31:57.398Z","avatar_url":"https://github.com/mkearney.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\"\n)\ndrop_hl \u003c- function(x, n = 1) {\n  x \u003c- tibble::as_tibble(x, validate = FALSE)\n  x \u003c- dplyr::arrange(x, expr, time)\n  tot \u003c- sum(x$expr == x$expr[1])\n  g \u003c- length(unique(x$expr))\n  s \u003c- (1 + n):(tot - n)\n  s \u003c- unlist(Map(\"+\", list(s), c(0, cumsum(rep(tot, g - 1)))))\n  structure(x[s, ], class = c(\"hlmb\", \"tbl_df\", \"tbl\", \"data.frame\"))\n}\n\nplot.hlmb \u003c- function(x) {\n  x$time \u003c- x$time / 1000\n  min \u003c- 0\n  max \u003c- max(x$time, na.rm = TRUE) * 1.05\n  x$expr \u003c- as.character(x$expr)\n  ggplot2::ggplot(x, ggplot2::aes(x = expr, y = time, fill = expr)) +\n    ggplot2::geom_boxplot(outlier.shape = NA, alpha = .6) +\n    ggplot2::geom_jitter(shape = 21, size = ggplot2::rel(3), alpha = .6) +\n    ggplot2::theme_minimal(base_size = 11, base_family = \"Roboto Condensed\") +\n    ggplot2::theme(legend.position = \"none\",\n      text = ggplot2::element_text(colour = \"#444444\"),\n      axis.title = ggplot2::element_text(size = ggplot2::rel(1.0),\n        hjust = 0.95, face = \"italic\", colour = \"black\"),\n      axis.text.x = ggplot2::element_text(size = ggplot2::rel(0.9),\n        colour = \"black\"),\n      axis.text.y = ggplot2::element_text(size = ggplot2::rel(1.1),\n        colour = \"black\", angle = 90, hjust = .5),\n      plot.title = ggplot2::element_text(size = ggplot2::rel(1.4),\n        colour = \"black\", face = \"bold\"),\n      plot.subtitle = ggplot2::element_text(size = ggplot2::rel(1.1),\n        colour = \"black\"),\n      plot.caption = ggplot2::element_text(hjust = 0, size = ggplot2::rel(.95)),\n      panel.grid.minor.x = ggplot2::element_blank(),\n      panel.grid.major.x = ggplot2::element_line(linetype = \"dashed\"),\n      panel.grid.major.y = ggplot2::element_line(linetype = \"dashed\"),\n      axis.line.x = ggplot2::element_line(colour = \"#44444422\")) +\n    ggplot2::labs(y = \"Time (microseconds)\", x = \"Expression\",\n      title = \"Benchmarking expression evaluation times\",\n      subtitle = \"Boxplots overlayed with jittered replication times\",\n      caption = \"Estimates from the {microbenchmark} pkg\") +\n    ggplot2::scale_y_continuous(limits = c(min, max)) +\n    ggplot2::coord_flip() + \n    ggplot2::scale_fill_manual(values = c(\"greenyellow\", \"gray\"))\n}\nlibrary(funique)\n```\n# funique \u003cimg src=\"man/figures/logo.png\" width=\"160px\" align=\"right\" /\u003e\n\n[![Travis build status](https://travis-ci.org/mkearney/funique.svg?branch=master)](https://travis-ci.org/mkearney/funique)\n[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)\n\n\u003e ⌚️ A faster `unique()` function\n\n## Installation\n\nYou can install the released version of funique from Github with:\n\n```{r, eval = FALSE}\n## install remotes pkg if not already\nif (!requireNamespace(\"remotes\", quietly = TRUE)) {\n  install.packages(\"remotes\")\n}\n\n## install funique from github\nremotes::install_github(\"mkearney/funique\")\n```\n\n## Usage\n\nThere's one function `funique()`, which is the same as `base::unique()` only optimized to be faster when data contain date-time variables.\n\n## Speed test: `funique()` vs. `base::unique()`\n\nThe code below creates a data frame with several duplicate rows and then compares performance (in time) of `funique()` versus `base::unique()`.\n\n\n```{r ex1, fig.keep = \"none\", eval = FALSE}\n## set seed\nset.seed(20180812)\n\n## generate data\nd \u003c- data.frame(\n  x = rnorm(1000),\n  y = seq.POSIXt(as.POSIXct(\"2018-01-01\"),\n    as.POSIXct(\"2018-12-31\"), length.out = 10))\n\n## create data frame with duplicate rows\nd \u003c- d[c(1:1000, sample(1:1000, 500, replace = TRUE)), ]\nrow.names(d) \u003c- NULL\n\n## check the output against base::unique\nidentical(unique(d), funique(d))\n\n## bench mark\n(m \u003c- microbenchmark::microbenchmark(unique(d), funique(d), \n  times = 200, unit = \"relative\"))\n\n## plot\nplot(drop_hl(m, n = 4)) + \n  ggplot2::ggsave(\"man/figures/r1.png\", width = 8, height = 4.5, units = \"in\")\n```\n\n\u003cp align=\"center\"\u003e \u003cimg src=\"man/figures/r1.png\"\u003e\n\nHere's another test this time using duplicate-infested Twitter data.\n\n```{r ex2, fig.keep = \"none\", eval = FALSE}\n## search for data on 100 tweets\nrt \u003c- rtweet::search_tweets(\"lang:en\", verbose = FALSE)\n\n## create duplicates\nrt2 \u003c- rt[sample(1:nrow(rt), 1000, replace = TRUE), ]\n\n## benchmarks\n(mb \u003c- microbenchmark::microbenchmark(\n  unique(rt2), funique(rt2), unit = \"relative\"))\n\n## make sure the output is the same\nidentical(unique(rt2), funique(rt2))\n\n## plot\nplot(drop_hl(mb, n = 4)) + \n  ggplot2::ggsave(\"man/figures/r2.png\", width = 8, height = 4.5, units = \"in\")\n```\n\n\u003cp align=\"center\"\u003e \u003cimg src=\"man/figures/r2.png\"\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkearney%2Ffunique","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkearney%2Ffunique","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkearney%2Ffunique/lists"}