{"id":13857541,"url":"https://github.com/coolbutuseless/dplyr-cli","last_synced_at":"2025-04-13T10:44:50.065Z","repository":{"id":44534958,"uuid":"257263990","full_name":"coolbutuseless/dplyr-cli","owner":"coolbutuseless","description":"Manipulate CSV files on the command line using dplyr","archived":false,"fork":false,"pushed_at":"2022-02-09T13:15:11.000Z","size":37,"stargazers_count":269,"open_issues_count":1,"forks_count":20,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-27T02:10:03.965Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coolbutuseless.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-MIT.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-20T11:47:04.000Z","updated_at":"2025-03-22T10:51:35.000Z","dependencies_parsed_at":"2022-09-01T16:11:43.868Z","dependency_job_id":null,"html_url":"https://github.com/coolbutuseless/dplyr-cli","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fdplyr-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fdplyr-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fdplyr-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coolbutuseless%2Fdplyr-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coolbutuseless","download_url":"https://codeload.github.com/coolbutuseless/dplyr-cli/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248702218,"owners_count":21148114,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:40.078Z","updated_at":"2025-04-13T10:44:50.046Z","avatar_url":"https://github.com/coolbutuseless.png","language":"R","funding_links":[],"categories":["R","Browse CSV"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = FALSE,\n  comment = \"# \"\n)\n```\n\n\n# dplyr-cli\n\n\u003c!-- badges: start --\u003e\n![](https://img.shields.io/badge/cool-useless-green.svg)\n\u003c!-- badges: end --\u003e\n\n`dplyr-cli` uses the `Rscript` executable to \nrun dplyr commands on CSV files in the terminal.\n\n`dplyr-cli` makes use of the terminal pipe `|` instead of the magrittr pipe (`%\u003e%`)\nto run sequences of commands.\n\n```\ncat mtcars.csv | group_by cyl | summarise \"mpg = mean(mpg)\" | kable\n#\u003e | cyl|      mpg|\n#\u003e |---:|--------:|\n#\u003e |   4| 26.66364|\n#\u003e |   6| 19.74286|\n#\u003e |   8| 15.10000|\n```\n\n## Motivation\n\nI wanted to be able to do quick hacks on CSV files on the command line using\ndplyr syntax, but without actually starting a proper R session.\n\n\n## What dplyr commands are supported?\n\nAny command of the form:\n\n* `dplyr::verb(.data, code)`\n* `dplyr::*_join(.data, .rhs)`\n\nCurrently two extra commands are supported which are not part of `dplyr`.\n\n* `csv` performs no dplyr command, but only outputs the input data as CSV to stdout\n* `kable` performs no dplyr command, but only outputs the input data as a\n  `knitr::kable()` formatted string to stdout\n\n\n## Limitations\n\n* Only tested under 'bash' on OSX. YMMV.\n* Every command runs in a separate R session.\n* When using special shell characters such as `()`, you'll have to quote \n  your code arguments.  Some shells will require more quoting than others.\n* \"joins\" (such as `left_join`) do not currently let you specify the `by` argument, \n  so there must be columns in common to both dataset\n\n## Usage\n\n```{sh}\ndplyr --help\n```\n\n## History\n\n\n#### v0.1.0 2020-04-20\n\n* Initial release\n\n#### v0.1.1 2020-04-21\n\n* Switch to 'Rscript' for easier install for users\n* rename 'dplyr.sh' to just 'dplyr'\n\n#### v0.1.2 2020-04-21\n\n* Support for joins e.g. `left_join`\n\n#### v0.1.3 2020-04-22\n\n* More robust tmpdir handling\n\n#### v0.1.4 2022-01-23\n\n* Fix handling for latest `read_csv()`.  Fixes #9\n\n\n## Contributors\n\n* [aborusso](https://github.com/aborruso) - documentation\n\n\n## Installation\n\nBecause this script straddles a great divide between R and the shell, you need \nto ensure both are set up correctly for this to work.\n\n1. Install R packages\n2. Clone this repo and put `dplyr` in your path\n\n\n#### Install R packages - within R\n`dplyr-cli` is run from the shell but at every invocation is starting a new \nrsession where the following packages are expected to be installed:\n\n\n```{r eval=FALSE}\ninstall.packages('readr')    # read in CSV data\ninstall.packages('dplyr')    # data manipulation\ninstall.packages('docopt')   # CLI description language\n```\n\n\u003cdetails\u003e\n\u003csummary\u003e Click to reveal instructions for installing packages on the command line\u003c/summary\u003e\n\nTo do it from the cli on a linux-ish system, install `r-base` (`sudo apt -y install r-base`) and then run\n\n```bash\nsudo su - -c \"R -e \\\"install.packages('readr', repos='http://cran.rstudio.com/')\\\"\"\nsudo su - -c \"R -e \\\"install.packages('dplyr', repos='http://cran.rstudio.com/')\\\"\"\nsudo su - -c \"R -e \\\"install.packages('docopt', repos='http://cran.rstudio.com/')\\\"\"\n```\n\n\u003c/details\u003e\n\n\n#### Clone this repo and put `dplyr` in your path\n\n\nYou'll then need to download the shell script from this repository and put `dplyr`\nsomewhere in your path.\n\n```\ngit clone https://github.com/coolbutuseless/dplyr-cli\ncp dplyr-cli/dplyr ./somewhere/in/your/search/path\n```\n\n\n# Example data\n\nPut an example CSV file on the filesystem. Note: This CSV file is now included as \n`mtcars.csv` as part of this git repository, as is a second CSV file for \ndemonstrating joins - `cyl.csv`\n\n```{r}\nwrite.csv(mtcars, \"mtcars.csv\", row.names = FALSE)\n```\n\n# Example 1 - Basic Usage\n\n\n```{sh}\n# cat contents of input CSV into dplyr-cli.  \n# Use '-c' to output CSV if this is the final step\ncat mtcars.csv | dplyr filter -c \"mpg == 21\"\n```\n\n\n```{sh}\n# Put quotes around any commands which contain special characters like \u003c\u003e()\ncat mtcars.csv | dplyr filter -c \"mpg \u003c 11\"\n```\n\n\n```{sh}\n# Combine dplyr commands with shell 'head' command\ndplyr select --file mtcars.csv -c cyl | head -n 6\n```\n\n\n# Example 2 - Simple piping of commands (with shell pipe, not magrittr pipe)\n\n```{sh}\ncat mtcars.csv | \\\n   dplyr mutate \"cyl2 = 2 * cyl\"  | \\\n   dplyr filter \"cyl == 8\" | \\\n   dplyr kable\n```\n\n\n# Example 3 - set up some aliases for convenience\n\n\n```{sh}\nalias mutate=\"dplyr mutate\"\nalias filter=\"dplyr filter\"\nalias select=\"dplyr select\"\nalias summarise=\"dplyr summarise\"\nalias group_by=\"dplyr group_by\"\nalias ungroup=\"dplyr ungroup\"\nalias count=\"dplyr count\"\nalias arrange=\"dplyr arrange\"\nalias kable=\"dplyr kable\"\n\n\ncat mtcars.csv | group_by cyl | summarise \"mpg = mean(mpg)\" | kable\n```\n\n\n# Example 4 - joins\n\nLimitations:\n\n* first argument after a join command must be an existing file (either CSV or RDS)\n* You can't yet specify a `by` argument for a join, so there must be a column in \n  common to join by\n  \n  \n```{sh}\ncat cyl.csv\n```\n\n\n```{sh}\ncat mtcars.csv | dplyr inner_join cyl.csv | dplyr kable\n```\n\n\n\n## Security warning\n\n`dplyr-cli` uses `eval(parse(text = ...))` on user input.  Do not expose this \nprogram to the internet or random users under any circumstances.\n\n\n## Inspirations\n\n* [xsv](https://github.com/BurntSushi/xsv) - a fast CSV command line toolkit \n  written in Rust\n* [jq](https://stedolan.github.io/jq/) - a command line JSON processor.\n* [miller](http://johnkerl.org/miller/doc/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoolbutuseless%2Fdplyr-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoolbutuseless%2Fdplyr-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoolbutuseless%2Fdplyr-cli/lists"}