{"id":14067471,"url":"https://github.com/TimTeaFan/dplyover","last_synced_at":"2025-07-30T01:31:13.802Z","repository":{"id":38785998,"uuid":"276982297","full_name":"TimTeaFan/dplyover","owner":"TimTeaFan","description":"Create columns by applying functions to vectors and/or columns in 'dplyr'.","archived":false,"fork":false,"pushed_at":"2021-10-03T13:26:19.000Z","size":2007,"stargazers_count":59,"open_issues_count":16,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-08-13T07:13:30.661Z","etag":null,"topics":["dplyr","r"],"latest_commit_sha":null,"homepage":"https://timteafan.github.io/dplyover/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TimTeaFan.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-03T20:29:43.000Z","updated_at":"2024-05-30T00:33:40.000Z","dependencies_parsed_at":"2022-09-18T03:32:20.979Z","dependency_job_id":null,"html_url":"https://github.com/TimTeaFan/dplyover","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimTeaFan%2Fdplyover","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimTeaFan%2Fdplyover/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimTeaFan%2Fdplyover/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimTeaFan%2Fdplyover/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TimTeaFan","download_url":"https://codeload.github.com/TimTeaFan/dplyover/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228065333,"owners_count":17863979,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dplyr","r"],"created_at":"2024-08-13T07:05:36.885Z","updated_at":"2024-12-04T07:31:10.866Z","avatar_url":"https://github.com/TimTeaFan.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, child = \"man/rmd/setup.Rmd\"}\n```\n\n# dplyover\n\n\u003c!-- badges: start --\u003e\n![Release status](https://img.shields.io/badge/status-first%20release-yellow)\n[![Lifecycle](man/figures/lifecycle-experimental.svg)](man/figures/lifecycle-experimental.svg)\n[![R-CMD-check](https://github.com/TimTeaFan/dplyover/workflows/R-CMD-check/badge.svg)](https://github.com/TimTeaFan/dplyover/actions)\n[![Codecov test coverage](https://codecov.io/gh/TimTeaFan/dplyover/branch/main/graph/badge.svg)](https://codecov.io/gh/TimTeaFan/dplyover?branch=main)\n[![CodeFactor](https://www.codefactor.io/repository/github/timteafan/dplyover/badge)](https://www.codefactor.io/repository/github/timteafan/dplyover)\n[![CRAN status](https://www.r-pkg.org/badges/version/dplyover)](https://cran.r-project.org/package=dplyover)\n\u003c!-- badges: end --\u003e\n\n## Overview\n\n\u003ca href=\"https://raw.githubusercontent.com/TimTeaFan/dplyover/main/man/figures/logo_big.png\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/TimTeaFan/dplyover/main/man/figures/logo.png\" alt=\"dplyover logo\" align=\"right\"\u003e\u003c/a\u003e\n\n{dplyover} extends {dplyr}'s functionality by building a function family\naround `dplyr::across()`.\n\nThe goal of this *over-across function family* is to provide a concise and\nuniform syntax which can be used to create columns by applying functions to\nvectors and/or sets of columns in {dplyr}. Ideally, this will:\n\n- **reduce the amount of code** to create variables derived from existing colums, \nwhich is especially helpful when doing exploratory data analysis (e.g. lagging, \ncollapsing, recoding etc. many variables in a similar way). \n- **provide a clean {dplyr} approach** to create many variables which are\ncalculated based on two or more variables. \n- **improve our mental model** so that it is easier to tackle problems where the\nsolution is based on creating new columns.\n\nThe functions in the *over-apply function family* create columns by applying\none or several functions to:\n\n - `dplyr::across()` a set of columns (not part of dplyover)\n - `over()` a vector (list or atomic vector)\n - `over2()` two vectors of the same length (sequentially^#^)\n - `over2x()` two vectors (nested^+^)\n - `across2()` two sets of columns (sequentially^#^)\n - `across2x()` two sets of columns (nested^+^)\n - `crossover()` a set of columns and a vector (nested^+^)\n\n\u003csmall\u003e# \"sequentially\" means that the function is sequentially applied to the\nfirst two elements of `x[[1]]` and `y[[1]]`, then to the second pair of elements\nand so on.\u003c/small\u003e\u003cbr\u003e\n\u003csmall\u003e+ \"nested\" means that the function is applied to all combinations\nbetween elements in `x` and `y` similar to a nested loop.\u003c/small\u003e\n\n\n## Installation\n\n{dplyover} is not on CRAN. You can install the latest version from \n[GitHub](https://github.com/) with:\n\n```{r, eval = FALSE}\n# install.packages(\"remotes\")\nremotes::install_github(\"TimTeaFan/dplyover\")\n```\n\n## Getting started\n\nBelow are a few examples of the {dplyover}'s *over-across function family*. More\nfunctions and workarounds of how to tackle the problems below without {dplyover}\ncan be found in the vignette \u003ca href=\"https://timteafan.github.io/dplyover/articles/why_dplyover.html\"\u003e\"Why dplyover?\"\u003c/a\u003e.\n\n```{r, setup, warning = FALSE, message = FALSE}\n# dplyover is an extention of dplyr on won't work without it\nlibrary(dplyr)\nlibrary(dplyover)\n\n# For better printing:\niris \u003c- as_tibble(iris)\n```\n\n#### Apply functions to a vector\n\n`over()` applies one or several functions to a vector. We can use it inside\n`dplyr::mutate()` to create several similar variables that we derive from an\nexisting column. This is helpful in cases where we want to create a batch of\nsimilar variables with only slightly changes in the argument values of the\ncalling function. A good example are `lag` and `lead` variables. Below we use\ncolumn 'a' to create lag and lead variables by `1`, `2` and `3` positions.\n`over()`'s `.names` argument lets us put nice names on the output columns.\n\n```{r} \ntibble(a = 1:25) %\u003e%\n  mutate(over(c(1:3),\n              list(lag  = ~ lag(a, .x),\n                   lead = ~ lead(a, .x)),\n              .names = \"a_{fn}{x}\"))\n```\n\n#### Apply functions to a set of columns and a vector simultaniously\n\n`crossover()` applies the functions in `.fns` to every combination of colums in\n`.xcols` with elements in `.y`. This is similar to the example above, but this time,\nwe use a set of columns. Below we create five lagged variables for each\n'Sepal.Length' and 'Sepal.Width'. Again, we use a named list as argument in `.fns`\nto create nice names by specifying the glue syntax in `.names.`\n\n```{r}\niris %\u003e%\n   transmute(\n     crossover(starts_with(\"sepal\"),\n                1:5,\n                list(lag = ~ lag(.x, .y)),\n                .names = \"{xcol}_{fn}{y}\")) %\u003e%\n   glimpse\n```\n\n\n#### Apply functions to a set of variable pairs\n\n`across2()` can be used to transform pairs of variables in one or more functions.\nIn the example below we want to calculate the product and the sum of all pairs\nof 'Length' and 'Width' variables in the `iris` data set. We can use `{pre}` in\nthe glue specification in `.names` to extract the common prefix of each pair of\nvariables. We can further transform the names, in the example setting them\n`tolower`, by specifying the `.names_fn` argument:\n\n```{r}\niris %\u003e%\n  transmute(across2(ends_with(\"Length\"),\n                    ends_with(\"Width\"),\n                    .fns = list(product = ~ .x * .y,\n                                sum = ~ .x + .y),\n                   .names = \"{pre}_{fn}\",\n                   .names_fn = tolower))\n```\n\n\n## Performance and Compability\n\nThis is an experimental package which I started developing with my own use cases\nin mind. I tried to keep the effort low, which is why this package *does not* \ninternalize (read: copy) internal {dplyr} functions (especially the 'context\ninternals'). This made it relatively easy to develop the package without:\n\n1. copying tons of {dplyr} code,\n1. having to figure out which dplyr-functions use the copied internals and\n1. finally overwritting these functions (like `mutate` and other one-table verbs),\n  which would eventually lead to conflicts with other add-on packages, like for\n  example {tidylog}.\n\nHowever, the downside is that not relying on {dplyr} internals has some negative\neffects in terms of performance and compability.\n\nIn a nutshell this means:\n\n- The *over-across function family* in {dplyover} is slower than the\noriginal `dplyr::across`. Up until {dplyr} 1.0.3 the overhead was not too big,\nbut `dplyr::across` got much faster with {dplyr} 1.0.4 which is why the gap has\nwidend a lot.\n- Although {dplyover} is designed to work in {dplyr}, some features and\nedge cases will not work correctly.\n  \nThe good news is that even without relying on {dplyr} internals most of the\noriginal functionality can be replicated and although being less performant,\nthe current setup is optimized and falls not too far behind in terms of speed -\nat least when compared to the pre v1.0.4 `dplyr::across`.\n\nRegarding compability, I have spent quite some time testing the package and\n I was able to replicate most of the tests for `dplyr::across` successfully. \n\nFor more information on the performance and compability of {dplyover} see the\nvignette \u003ca href=\"https://timteafan.github.io/dplyover/articles/performance.html\"\u003e\"Performance and Compability\"\u003c/a\u003e.\n\n\n## History\n\nI originally opened a\n[feature request on GitHub](https://github.com/tidyverse/dplyr/issues/4834) to\ninclude a very special case version of `over` (or to that time `mutate_over`)\ninto {dplyr}. The adivse then was to make this kind of functionality available\nin a separate package. While I was working on this very special case version of\n`over`, I realized that the more general use case resembles a `purrr::map`\nfunction for inside {dplyr} verbs with different variants, which led me to the\n*over-across function family*.\n\n\n## Acknowledgements and Disclaimer\n\nThis package is not only an extention of {dplyr}. The main functions in\n{dplyover} are directly derived and based on `dplyr::across()` (dplyr's license\nand copyrights apply!). So if this package is working correctly, all the credit\nshould go to the dplyr team. \n\nMy own \"contribution\" (if you want to call it like that) merely consists of: \n\n 1. removing the dependencies on {dplyr}'s internal functions, and\n 2. slightly changing `across`' logic to make it work for vectors and a\ncombination of two vectors and/or sets of columns.\n\nBy this I most definitely introduced some bugs and edge cases which won't work, \nand in which case I am the only one to blame.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTimTeaFan%2Fdplyover","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTimTeaFan%2Fdplyover","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTimTeaFan%2Fdplyover/lists"}