Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/coolbutuseless/dplyr-cli
Manipulate CSV files on the command line using dplyr
https://github.com/coolbutuseless/dplyr-cli
Last synced: 4 days ago
JSON representation
Manipulate CSV files on the command line using dplyr
- Host: GitHub
- URL: https://github.com/coolbutuseless/dplyr-cli
- Owner: coolbutuseless
- License: mit
- Created: 2020-04-20T11:47:04.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-02-09T13:15:11.000Z (almost 3 years ago)
- Last Synced: 2024-10-26T20:35:42.201Z (18 days ago)
- Language: R
- Size: 36.1 KB
- Stars: 269
- Watchers: 8
- Forks: 20
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE-MIT.txt
Awesome Lists containing this project
- jimsghstars - coolbutuseless/dplyr-cli - Manipulate CSV files on the command line using dplyr (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "# "
)
```# dplyr-cli
![](https://img.shields.io/badge/cool-useless-green.svg)
`dplyr-cli` uses the `Rscript` executable to
run dplyr commands on CSV files in the terminal.`dplyr-cli` makes use of the terminal pipe `|` instead of the magrittr pipe (`%>%`)
to run sequences of commands.```
cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
#> | cyl| mpg|
#> |---:|--------:|
#> | 4| 26.66364|
#> | 6| 19.74286|
#> | 8| 15.10000|
```## Motivation
I wanted to be able to do quick hacks on CSV files on the command line using
dplyr syntax, but without actually starting a proper R session.## What dplyr commands are supported?
Any command of the form:
* `dplyr::verb(.data, code)`
* `dplyr::*_join(.data, .rhs)`Currently two extra commands are supported which are not part of `dplyr`.
* `csv` performs no dplyr command, but only outputs the input data as CSV to stdout
* `kable` performs no dplyr command, but only outputs the input data as a
`knitr::kable()` formatted string to stdout## Limitations
* Only tested under 'bash' on OSX. YMMV.
* Every command runs in a separate R session.
* When using special shell characters such as `()`, you'll have to quote
your code arguments. Some shells will require more quoting than others.
* "joins" (such as `left_join`) do not currently let you specify the `by` argument,
so there must be columns in common to both dataset## Usage
```{sh}
dplyr --help
```## History
#### v0.1.0 2020-04-20
* Initial release
#### v0.1.1 2020-04-21
* Switch to 'Rscript' for easier install for users
* rename 'dplyr.sh' to just 'dplyr'#### v0.1.2 2020-04-21
* Support for joins e.g. `left_join`
#### v0.1.3 2020-04-22
* More robust tmpdir handling
#### v0.1.4 2022-01-23
* Fix handling for latest `read_csv()`. Fixes #9
## Contributors
* [aborusso](https://github.com/aborruso) - documentation
## Installation
Because this script straddles a great divide between R and the shell, you need
to ensure both are set up correctly for this to work.1. Install R packages
2. Clone this repo and put `dplyr` in your path#### Install R packages - within R
`dplyr-cli` is run from the shell but at every invocation is starting a new
rsession where the following packages are expected to be installed:```{r eval=FALSE}
install.packages('readr') # read in CSV data
install.packages('dplyr') # data manipulation
install.packages('docopt') # CLI description language
```Click to reveal instructions for installing packages on the command line
To do it from the cli on a linux-ish system, install `r-base` (`sudo apt -y install r-base`) and then run
```bash
sudo su - -c "R -e \"install.packages('readr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('dplyr', repos='http://cran.rstudio.com/')\""
sudo su - -c "R -e \"install.packages('docopt', repos='http://cran.rstudio.com/')\""
```#### Clone this repo and put `dplyr` in your path
You'll then need to download the shell script from this repository and put `dplyr`
somewhere in your path.```
git clone https://github.com/coolbutuseless/dplyr-cli
cp dplyr-cli/dplyr ./somewhere/in/your/search/path
```# Example data
Put an example CSV file on the filesystem. Note: This CSV file is now included as
`mtcars.csv` as part of this git repository, as is a second CSV file for
demonstrating joins - `cyl.csv````{r}
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
```# Example 1 - Basic Usage
```{sh}
# cat contents of input CSV into dplyr-cli.
# Use '-c' to output CSV if this is the final step
cat mtcars.csv | dplyr filter -c "mpg == 21"
``````{sh}
# Put quotes around any commands which contain special characters like <>()
cat mtcars.csv | dplyr filter -c "mpg < 11"
``````{sh}
# Combine dplyr commands with shell 'head' command
dplyr select --file mtcars.csv -c cyl | head -n 6
```# Example 2 - Simple piping of commands (with shell pipe, not magrittr pipe)
```{sh}
cat mtcars.csv | \
dplyr mutate "cyl2 = 2 * cyl" | \
dplyr filter "cyl == 8" | \
dplyr kable
```# Example 3 - set up some aliases for convenience
```{sh}
alias mutate="dplyr mutate"
alias filter="dplyr filter"
alias select="dplyr select"
alias summarise="dplyr summarise"
alias group_by="dplyr group_by"
alias ungroup="dplyr ungroup"
alias count="dplyr count"
alias arrange="dplyr arrange"
alias kable="dplyr kable"cat mtcars.csv | group_by cyl | summarise "mpg = mean(mpg)" | kable
```# Example 4 - joins
Limitations:
* first argument after a join command must be an existing file (either CSV or RDS)
* You can't yet specify a `by` argument for a join, so there must be a column in
common to join by
```{sh}
cat cyl.csv
``````{sh}
cat mtcars.csv | dplyr inner_join cyl.csv | dplyr kable
```## Security warning
`dplyr-cli` uses `eval(parse(text = ...))` on user input. Do not expose this
program to the internet or random users under any circumstances.## Inspirations
* [xsv](https://github.com/BurntSushi/xsv) - a fast CSV command line toolkit
written in Rust
* [jq](https://stedolan.github.io/jq/) - a command line JSON processor.
* [miller](http://johnkerl.org/miller/doc/)