{"id":19405954,"url":"https://github.com/randrescastaneda/joyn","last_synced_at":"2025-04-24T09:31:03.094Z","repository":{"id":45535522,"uuid":"350871345","full_name":"randrescastaneda/joyn","owner":"randrescastaneda","description":"joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn","archived":false,"fork":false,"pushed_at":"2025-04-01T21:21:22.000Z","size":12654,"stargazers_count":9,"open_issues_count":3,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-20T06:47:47.422Z","etag":null,"topics":["join","merge"],"latest_commit_sha":null,"homepage":"https://randrescastaneda.github.io/joyn/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/randrescastaneda.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-23T22:10:33.000Z","updated_at":"2024-12-18T21:02:17.000Z","dependencies_parsed_at":"2023-12-11T17:28:18.633Z","dependency_job_id":"aac9b85c-7ec3-473d-9e9b-41a895da4cb9","html_url":"https://github.com/randrescastaneda/joyn","commit_stats":{"total_commits":179,"total_committers":1,"mean_commits":179.0,"dds":0.0,"last_synced_commit":"674ebacb955cd48198ae428a52788c9ece304d79"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randrescastaneda%2Fjoyn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randrescastaneda%2Fjoyn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randrescastaneda%2Fjoyn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/randrescastaneda%2Fjoyn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/randrescastaneda","download_url":"https://codeload.github.com/randrescastaneda/joyn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250600617,"owners_count":21456996,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["join","merge"],"created_at":"2024-11-10T11:40:28.950Z","updated_at":"2025-04-24T09:31:02.689Z","avatar_url":"https://github.com/randrescastaneda.png","language":"R","readme":"---\noutput: github_document\neditor_options: \n  markdown: \n    wrap: 72\n---\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# joyn\n\n\u003c!-- badges: start --\u003e\n\n`r badger::badge_cran_checks(\"joyn\")` `r badger::badge_cran_release(\"joyn\", \"orange\")` `r badger::badge_devel(\"randrescastaneda/joyn\", \"blue\")` `r badger::badge_codecov(\"randrescastaneda/joyn\")` `r badger::badge_lifecycle(\"maturing\", \"green\")`\n\n\n\u003c!-- badges: end --\u003e\n\n`joyn` empowers you to assess the results of joining data frames, making it easier and more efficient to combine your tables. Similar in philosophy to the `merge` command in `Stata`, `joyn` offers matching key variables and detailed join reports to ensure accurate and insightful results.\n\n## Motivation\n\nMerging tables in R can be tricky. Ensuring accuracy and understanding the joined data fully can be tedious tasks. That's where `joyn` comes in. Inspired by Stata's informative approach to merging, `joyn` makes the process smoother and more insightful.\n\nWhile standard R merge functions are powerful, they often lack features like assessing join accuracy, detecting potential issues, and providing detailed reports. `joyn` fills this gap by offering:\n\n* **Intuitive join handling:** Whether you're dealing with one-to-one, one-to-many, or many-to-many relationships, `joyn` helps you navigate them confidently.\n* **Informative reports:** Get clear insights into the join process with helpful reports that identify duplicate observations, missing values, and potential inconsistencies.\n\n## What makes `joyn` special?\n\nWhile standard R merge functions offer basic functionality, `joyn` goes above and beyond by providing comprehensive tools and features tailored to your data joining needs:\n\n**1. Flexibility in join types:** Choose your ideal join type (\"left\", \"right\", or \"inner\") with the `keep` argument. Unlike R's default, `joyn` performs a full join by default, ensuring all observations are included, but you have full control to tailor the results.\n\n**2. Seamless variable handling:** No more wrestling with duplicate variable names! `joyn` offers multiple options:\n\n* **Update values:** Use `update_values` or `update_NA` to automatically update conflicting variables in the left table with values from the right table.\n\n* **Keep both (with different names):** Enable `keep_common_vars = TRUE` to retain both variables, each with a unique suffix.\n\n* **Selective inclusion:** Choose specific variables from the right table with `y_vars_to_keep`, ensuring you get only the data you need.\n\n**3. Relationship awareness:** `joyn` recognizes one-to-one, one-to-many, many-to-one, and many-to-many relationships between tables. While it defaults to many-to-many for compatibility, **remember this is often not ideal**. **Always specify the correct relationship using `by` arguments** for accurate and meaningful results.\n\n**4. Join success at a glance:** Get instant feedback on your join with the automatically generated reporting variable. Identify potential issues like unmatched observations or missing values to ensure data integrity and informed decision-making.\n\nBy addressing these common pain points and offering enhanced flexibility, `joyn` empowers you to confidently and effectively join your data frames, paving the way for deeper insights and data-driven success.\n\n\n## Performance and flexibility\n\n### The cost of Reliability\n\nWhile raw speed is essential, understanding your joins every step of the way is equally crucial. `joyn` prioritizes providing **insightful information** and preventing errors over solely focusing on speed. Unlike other functions, it adds:\n\n* **Meticulous checks:** `joyn` performs comprehensive checks to ensure your join is accurate and avoids potential missteps, like unmatched observations or missing values.\n* **Detailed reporting:** Get a clear picture of your join with a dedicated report, highlighting any issues you should be aware of.\n* **User-friendly summary:** Quickly grasp the join's outcome with a concise overview presented in a clear table.\n\nThese valuable features contribute to a slightly slower performance compared to functions like `data.table::merge.data.table()` or `collapse::join()`. However, the benefits of **preventing errors and gaining invaluable insights** far outweigh the minor speed difference.\n\n### Know your needs, choose your tool\n\n* **Speed is your top priority for massive datasets?** Consider using `data.table` or `collapse` directly.\n* **Seek clear understanding and error prevention for your joins?** `joyn` is your trusted guide.\n\n### Protective by design\n\n`joyn` intentionally restricts certain actions and provides clear messages when encountering unexpected data configurations. This might seem **opinionated**, but it's designed to **protect you from accidentally creating inaccurate or misleading joins**. This \"safety net\" empowers you to confidently merge your data, knowing `joyn` has your back.\n\n### Flexibility\n\nCurrently, `joyn` focuses on the most common and valuable join types. Future development might explore expanding its flexibility based on user needs and feedback.\n\n## `joyn` as wrapper: Familiar Syntax, Familiar Power\n\nWhile `joyn::join()` offers the core functionality and Stata-inspired arguments, you might prefer a syntax more aligned with your existing workflow. `joyn` has you covered!\n\n**Embrace base R and `data.table`:**\n\n* `joyn::merge()`: Leverage familiar base R and `data.table` syntax for seamless integration with your existing code.\n\n**Join with flair using `dplyr`:**\n\n* `joyn::{dplyr verbs}()`: Enjoy the intuitive [verb-based](https://dplyr.tidyverse.org/reference/mutate-joins.html) syntax of `dplyr` for a powerful and expressive way to perform joins.\n\n**Dive deeper:** Explore the corresponding vignettes to unlock the full potential of these alternative interfaces and find the perfect fit for your data manipulation style.\n\n\n## Installation\n\nYou can install the stable version of `joyn` from\n[CRAN](https://CRAN.R-project.org) with:\n\n``` r\ninstall.packages(\"joyn\")\n```\n\nThe development version from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"randrescastaneda/joyn\")\n```\n\n## Examples\n\n```{r example}\n\nlibrary(joyn)\nlibrary(data.table)\n\nx1 = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_),\n                t  = c(1L, 2L, 1L, 2L, NA_integer_),\n                x  = 11:15)\n\ny1 = data.table(id = c(1,2, 4),\n                y  = c(11L, 15L, 16))\n\n\nx2 = data.table(id = c(1, 4, 2, 3, NA),\n                t  = c(1L, 2L, 1L, 2L, NA_integer_),\n                x  = c(16, 12, NA, NA, 15))\n\n\ny2 = data.table(id = c(1, 2, 5, 6, 3),\n                yd = c(1, 2, 5, 6, 3),\n                y  = c(11L, 15L, 20L, 13L, 10L),\n                x  = c(16:20))\n\n# using common variable `id` as key.\njoyn(x = x1, \n     y = y1,\n     match_type = \"m:1\")\n\n# keep just those observations that match\njoyn(x = x1, \n     y = y1, \n     match_type = \"m:1\",\n     keep = \"inner\")\n\n# Bad merge for not specifying by argument\njoyn(x = x2, \n     y = y2,\n     match_type = \"1:1\")\n\n# good merge, ignoring variable x from y\njoyn(x = x2, \n     y = y2,\n     by = \"id\",\n     match_type = \"1:1\")\n\n# update NAs in var x in table x from var x in y\njoyn(x = x2, \n     y = y2, \n     by = \"id\", \n     update_NAs = TRUE)\n\n# update values in var x in table x from var x in y\njoyn(x = x2, \n     y = y2, \n     by = \"id\", \n     update_values = TRUE)\n\n\n# do not bring any variable from y into x, just the report\njoyn(x = x2, \n     y = y2, \n     by = \"id\", \n     y_vars_to_keep = NULL)\n\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frandrescastaneda%2Fjoyn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frandrescastaneda%2Fjoyn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frandrescastaneda%2Fjoyn/lists"}