Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/coolbutuseless/dplyr-cli

Manipulate CSV files on the command line using dplyr
https://github.com/coolbutuseless/dplyr-cli

Last synced: 4 days ago
JSON representation

Manipulate CSV files on the command line using dplyr

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "# "
)
```

# dplyr-cli

![](https://img.shields.io/badge/cool-useless-green.svg)

`dplyr-cli` uses the `Rscript` executable to
run dplyr commands on CSV files in the terminal.

`dplyr-cli` makes use of the terminal pipe `|` instead of the magrittr pipe (`%>%`)
to run sequences of commands.

```
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
#> | cyl| mpg|
#> |---:|--------:|
#> | 4| 26.66364|
#> | 6| 19.74286|
#> | 8| 15.10000|
```

## Motivation

I wanted to be able to do quick hacks on CSV files on the command line using
dplyr syntax, but without actually starting a proper R session.

## What dplyr commands are supported?

Any command of the form:

* `dplyr::verb(.data, code)`
* `dplyr::*_join(.data, .rhs)`

Currently two extra commands are supported which are not part of `dplyr`.

* `csv` performs no dplyr command, but only outputs the input data as CSV to stdout
* `kable` performs no dplyr command, but only outputs the input data as a
`knitr::kable()` formatted string to stdout

## Limitations

* Only tested under 'bash' on OSX. YMMV.
* Every command runs in a separate R session.
* When using special shell characters such as `()`, you'll have to quote
your code arguments. Some shells will require more quoting than others.
* "joins" (such as `left_join`) do not currently let you specify the `by` argument,
so there must be columns in common to both dataset

## Usage

```{sh}
dplyr --help
```

## History

#### v0.1.0 2020-04-20

* Initial release

#### v0.1.1 2020-04-21

* Switch to 'Rscript' for easier install for users
* rename 'dplyr.sh' to just 'dplyr'

#### v0.1.2 2020-04-21

* Support for joins e.g. `left_join`

#### v0.1.3 2020-04-22

* More robust tmpdir handling

#### v0.1.4 2022-01-23

* Fix handling for latest `read_csv()`. Fixes #9

## Contributors

* [aborusso](https://github.com/aborruso) - documentation

## Installation

Because this script straddles a great divide between R and the shell, you need
to ensure both are set up correctly for this to work.

1. Install R packages
2. Clone this repo and put `dplyr` in your path

#### Install R packages - within R
`dplyr-cli` is run from the shell but at every invocation is starting a new
rsession where the following packages are expected to be installed:

```{r eval=FALSE}
install.packages('readr') # read in CSV data
install.packages('dplyr') # data manipulation
install.packages('docopt') # CLI description language
```

Click to reveal instructions for installing packages on the command line

To do it from the cli on a linux-ish system, install `r-base` (`sudo apt -y install r-base`) and then run

```bash
sudo su - -c "R -e \"install.packages('readr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('dplyr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('docopt', repos='http://cran.rstudio.com/')\""
```

#### Clone this repo and put `dplyr` in your path

You'll then need to download the shell script from this repository and put `dplyr`
somewhere in your path.

```
git clone https://github.com/coolbutuseless/dplyr-cli
cp dplyr-cli/dplyr ./somewhere/in/your/search/path
```

# Example data

Put an example CSV file on the filesystem. Note: This CSV file is now included as
`mtcars.csv` as part of this git repository, as is a second CSV file for
demonstrating joins - `cyl.csv`

```{r}
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
```

# Example 1 - Basic Usage

```{sh}
# cat contents of input CSV into dplyr-cli.
# Use '-c' to output CSV if this is the final step
cat mtcars.csv | dplyr filter -c "mpg == 21"
```

```{sh}
# Put quotes around any commands which contain special characters like <>()
cat mtcars.csv | dplyr filter -c "mpg < 11"
```

```{sh}
# Combine dplyr commands with shell 'head' command
dplyr select --file mtcars.csv -c cyl | head -n 6
```

# Example 2 - Simple piping of commands (with shell pipe, not magrittr pipe)

```{sh}
cat mtcars.csv | \
dplyr mutate "cyl2 = 2 * cyl" | \
dplyr filter "cyl == 8" | \
dplyr kable
```

# Example 3 - set up some aliases for convenience

```{sh}
alias mutate="dplyr mutate"
alias filter="dplyr filter"
alias select="dplyr select"
alias summarise="dplyr summarise"
alias group_by="dplyr group_by"
alias ungroup="dplyr ungroup"
alias count="dplyr count"
alias arrange="dplyr arrange"
alias kable="dplyr kable"

cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
```

# Example 4 - joins

Limitations:

* first argument after a join command must be an existing file (either CSV or RDS)
* You can't yet specify a `by` argument for a join, so there must be a column in
common to join by


```{sh}
cat cyl.csv
```

```{sh}
cat mtcars.csv | dplyr inner_join cyl.csv | dplyr kable
```

## Security warning

`dplyr-cli` uses `eval(parse(text = ...))` on user input. Do not expose this
program to the internet or random users under any circumstances.

## Inspirations

* [xsv](https://github.com/BurntSushi/xsv) - a fast CSV command line toolkit
written in Rust
* [jq](https://stedolan.github.io/jq/) - a command line JSON processor.
* [miller](http://johnkerl.org/miller/doc/)