{"id":16274196,"url":"https://github.com/trinker/dplyr_in_a_nutshell","last_synced_at":"2025-04-08T16:27:58.049Z","repository":{"id":146663398,"uuid":"16193608","full_name":"trinker/dplyr_in_a_nutshell","owner":"trinker","description":"This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions I'm that familiar with.","archived":false,"fork":false,"pushed_at":"2017-09-01T02:00:15.000Z","size":36,"stargazers_count":35,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-14T13:15:38.201Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trinker.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-01-24T03:12:38.000Z","updated_at":"2019-08-13T15:34:12.000Z","dependencies_parsed_at":"2023-04-14T16:18:31.515Z","dependency_job_id":null,"html_url":"https://github.com/trinker/dplyr_in_a_nutshell","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Fdplyr_in_a_nutshell","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Fdplyr_in_a_nutshell/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Fdplyr_in_a_nutshell/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Fdplyr_in_a_nutshell/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trinker","download_url":"https://codeload.github.com/trinker/dplyr_in_a_nutshell/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247880881,"owners_count":21011751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-10T18:27:38.728Z","updated_at":"2025-04-08T16:27:58.029Z","avatar_url":"https://github.com/trinker.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"dplyr In a Nutshell\n===\n\nThis is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions that I'm familiar with. Also check out [tidyr In a Nutshell](https://github.com/trinker/tidyr_in_a_nutshell).\n\n```{r setup, include=FALSE, echo=FALSE}\nopts_chunk$set(comment=NA, tidy=FALSE)\n```\n\n# 8 dplyr Functions to Rule the World\n\n### Speedy Table\n\n`tbl_df`\n\n\n### The 5 Guys + 1\n\n1. `filter`\n2. `select`\n3. `mutate`\n4. `group_by`\n5. `summarise`\n6. `arrange`\n\n### Chaining (pronounced \"then\")\n\n`%\u003e%`\n\n# Relating the Functions\n\n### Speedy Table \n\n`tbl_df` works similar to `data.table` in that it prints sensibly.\n\n### Relating the 5 Guys + 1 to base R\n\nList of dplyr functions and the base functions they're related to:\n\nBase Function    | dplyr Function(s) | Special Powers\n-----------------|-------------------|-----------------------------\nsubset           |  filter \u0026 select  | filter rows \u0026 select columns\ntransform        |  mutate           | operate with columns not yet created\nsplit            |  group_by         | splits without cutting\nlapply + do.call |  summarise        | apply and bind in a single bound\norder + with     |  arrange          | \"I only have to specify dataframe once?\"\n\n### Chaining\n\n`%\u003e%`... Do you know ggplot2's `+`?  Same idea.  \n\n![](chain.png)\n\n*Basically previous input in chain supplied as argument 1 to function on right side.*\n\n# Demos\n### Speedy Table\n```{r, message=FALSE}\nlibrary(dplyr)\nmtcars2 \u003c- tbl_df(mtcars)\n```\n\n### The 5 Guys\n```{r, message=FALSE}\nfilter(mtcars2[1:10, ], cyl == 8)\nselect(mtcars2[1:10, ], mpg, cyl, hp:vs)\narrange(mtcars2[1:10, ], cyl, disp)\nmutate(mtcars2[1:10, ], displ_l = disp / 61.0237, displ_l_add1 = displ_l + 1)\nsummarise(mtcars, mean(disp))\n```\n\n### Chaining\n\n```{r}\nmtcars2 %\u003e%\n    group_by(cyl) %\u003e%\n    summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp))\nmtcars2 %\u003e%\n    group_by(cyl, gear) %\u003e%\n    summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %\u003e%\n    arrange(-cyl, -gear)\n## Use `%\u003e%` with base functions too!!!\nmtcars2 %\u003e%\n    group_by(cyl, gear) %\u003e%\n    summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %\u003e%\n    arrange(-cyl, -gear) %\u003e%\n\thead()\nmtcars2 %\u003e%\n    group_by(cyl) %\u003e%\n    summarise(max(disp), hp[1])\nmtcars2 %\u003e%\n    group_by(cyl) %\u003e%\n    summarise(n = n()) \ntable(mtcars$cyl) \n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinker%2Fdplyr_in_a_nutshell","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrinker%2Fdplyr_in_a_nutshell","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinker%2Fdplyr_in_a_nutshell/lists"}