https://github.com/coolbutuseless/dplyr-cli
Manipulate CSV files on the command line using dplyr
https://github.com/coolbutuseless/dplyr-cli
Last synced: 7 months ago
JSON representation
Manipulate CSV files on the command line using dplyr
- Host: GitHub
- URL: https://github.com/coolbutuseless/dplyr-cli
- Owner: coolbutuseless
- License: mit
- Created: 2020-04-20T11:47:04.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-02-09T13:15:11.000Z (over 3 years ago)
- Last Synced: 2025-03-27T02:10:03.965Z (7 months ago)
- Language: R
- Size: 36.1 KB
- Stars: 269
- Watchers: 7
- Forks: 20
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE-MIT.txt
Awesome Lists containing this project
- jimsghstars - coolbutuseless/dplyr-cli - Manipulate CSV files on the command line using dplyr (R)
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "# "
)
```
# dplyr-cli

`dplyr-cli` uses the `Rscript` executable to
run dplyr commands on CSV files in the terminal.
`dplyr-cli` makes use of the terminal pipe `|` instead of the magrittr pipe (`%>%`)
to run sequences of commands.
```
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
#> | cyl| mpg|
#> |---:|--------:|
#> | 4| 26.66364|
#> | 6| 19.74286|
#> | 8| 15.10000|
```
## Motivation
I wanted to be able to do quick hacks on CSV files on the command line using
dplyr syntax, but without actually starting a proper R session.
## What dplyr commands are supported?
Any command of the form:
* `dplyr::verb(.data, code)`
* `dplyr::*_join(.data, .rhs)`
Currently two extra commands are supported which are not part of `dplyr`.
* `csv` performs no dplyr command, but only outputs the input data as CSV to stdout
* `kable` performs no dplyr command, but only outputs the input data as a
`knitr::kable()` formatted string to stdout
## Limitations
* Only tested under 'bash' on OSX. YMMV.
* Every command runs in a separate R session.
* When using special shell characters such as `()`, you'll have to quote
your code arguments. Some shells will require more quoting than others.
* "joins" (such as `left_join`) do not currently let you specify the `by` argument,
so there must be columns in common to both dataset
## Usage
```{sh}
dplyr --help
```
## History
#### v0.1.0 2020-04-20
* Initial release
#### v0.1.1 2020-04-21
* Switch to 'Rscript' for easier install for users
* rename 'dplyr.sh' to just 'dplyr'
#### v0.1.2 2020-04-21
* Support for joins e.g. `left_join`
#### v0.1.3 2020-04-22
* More robust tmpdir handling
#### v0.1.4 2022-01-23
* Fix handling for latest `read_csv()`. Fixes #9
## Contributors
* [aborusso](https://github.com/aborruso) - documentation
## Installation
Because this script straddles a great divide between R and the shell, you need
to ensure both are set up correctly for this to work.
1. Install R packages
2. Clone this repo and put `dplyr` in your path
#### Install R packages - within R
`dplyr-cli` is run from the shell but at every invocation is starting a new
rsession where the following packages are expected to be installed:
```{r eval=FALSE}
install.packages('readr') # read in CSV data
install.packages('dplyr') # data manipulation
install.packages('docopt') # CLI description language
```
Click to reveal instructions for installing packages on the command line
To do it from the cli on a linux-ish system, install `r-base` (`sudo apt -y install r-base`) and then run
```bash
sudo su - -c "R -e \"install.packages('readr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('dplyr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('docopt', repos='http://cran.rstudio.com/')\""
```
#### Clone this repo and put `dplyr` in your path
You'll then need to download the shell script from this repository and put `dplyr`
somewhere in your path.
```
git clone https://github.com/coolbutuseless/dplyr-cli
cp dplyr-cli/dplyr ./somewhere/in/your/search/path
```
# Example data
Put an example CSV file on the filesystem. Note: This CSV file is now included as
`mtcars.csv` as part of this git repository, as is a second CSV file for
demonstrating joins - `cyl.csv`
```{r}
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
```
# Example 1 - Basic Usage
```{sh}
# cat contents of input CSV into dplyr-cli.
# Use '-c' to output CSV if this is the final step
cat mtcars.csv | dplyr filter -c "mpg == 21"
```
```{sh}
# Put quotes around any commands which contain special characters like <>()
cat mtcars.csv | dplyr filter -c "mpg < 11"
```
```{sh}
# Combine dplyr commands with shell 'head' command
dplyr select --file mtcars.csv -c cyl | head -n 6
```
# Example 2 - Simple piping of commands (with shell pipe, not magrittr pipe)
```{sh}
cat mtcars.csv | \
dplyr mutate "cyl2 = 2 * cyl" | \
dplyr filter "cyl == 8" | \
dplyr kable
```
# Example 3 - set up some aliases for convenience
```{sh}
alias mutate="dplyr mutate"
alias filter="dplyr filter"
alias select="dplyr select"
alias summarise="dplyr summarise"
alias group_by="dplyr group_by"
alias ungroup="dplyr ungroup"
alias count="dplyr count"
alias arrange="dplyr arrange"
alias kable="dplyr kable"
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
```
# Example 4 - joins
Limitations:
* first argument after a join command must be an existing file (either CSV or RDS)
* You can't yet specify a `by` argument for a join, so there must be a column in
common to join by
```{sh}
cat cyl.csv
```
```{sh}
cat mtcars.csv | dplyr inner_join cyl.csv | dplyr kable
```
## Security warning
`dplyr-cli` uses `eval(parse(text = ...))` on user input. Do not expose this
program to the internet or random users under any circumstances.
## Inspirations
* [xsv](https://github.com/BurntSushi/xsv) - a fast CSV command line toolkit
written in Rust
* [jq](https://stedolan.github.io/jq/) - a command line JSON processor.
* [miller](http://johnkerl.org/miller/doc/)