Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wahani/dat
Tools for Data Manipulation in R
https://github.com/wahani/dat
Last synced: 3 months ago
JSON representation
Tools for Data Manipulation in R
- Host: GitHub
- URL: https://github.com/wahani/dat
- Owner: wahani
- License: other
- Created: 2015-11-24T10:47:35.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2020-12-04T09:35:13.000Z (almost 4 years ago)
- Last Synced: 2024-07-10T14:47:46.435Z (4 months ago)
- Language: R
- Homepage:
- Size: 190 KB
- Stars: 15
- Watchers: 6
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - wahani/dat - Tools for Data Manipulation in R (R)
README
[![Travis-CI Build Status](https://travis-ci.org/wahani/dat.svg?branch=master)](https://travis-ci.org/wahani/dat)
[![codecov.io](https://codecov.io/github/wahani/dat/coverage.svg?branch=master)](https://codecov.io/github/wahani/dat?branch=master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/dat)](http://cran.r-project.org/package=dat)
[![Downloads](http://cranlogs.r-pkg.org/badges/dat?color=brightgreen)](http://www.r-pkg.org/pkg/dat)An implementation of common higher order functions with syntactic
sugar for anonymous function. Provides also a link to 'dplyr' and
'data.table' for common transformations on data frames to work around non
standard evaluation by default.## Installation
### From GitHub
```r
remotes::install_github("wahani/dat")
```### From CRAN
```r
install.packages("dat")
```## Why should you care?
- You probably have to rewrite all your dplyr / data.table code once you put it
inside a package. I.e. working around non standard evaluation or find another
way to satisfy `R CMD check`. And you don't like that.
- `dplyr` is not respecting the class of the object it operates on; the class
attribute changes on-the-fly.
- Neither `dplyr` nor `data.table` are playing nice with S4, but you really,
really want a S4 *data.table* or *tbl_df*.
- You like currying as in `rlist` and `purrr`.## Tools for data manipulation
The examples are from the introductory vignette of `dplyr`. You still work with
data frames: so you can simply mix in dplyr features whenever you need them.```r
library("nycflights13")
library("dat")
```### Select rows
We can use `mutar` to select rows. When you
reference a variable in the data frame, you can indicate this by using a one
sided formula.```r
mutar(flights, ~ month == 1 & day == 1)
mutar(flights, ~ 1:10)
```And for sorting:
```r
mutar(flights, ~ order(year, month, day))
``````
## # A tibble: 336,776 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## 7 2013 1 1 555 600 -5 913 854
## 8 2013 1 1 557 600 -3 709 723
## 9 2013 1 1 557 600 -3 838 846
## 10 2013 1 1 558 600 -2 753 745
## # … with 336,766 more rows, and 11 more variables: arr_delay ,
## # carrier , flight , tailnum , origin , dest ,
## # air_time , distance , hour , minute , time_hour
```### Select cols
You can use characters, logicals, regular expressions and functions to select
columns. Regular expressions are indicated by a leading "^".```r
flights %>%
extract(c("year", "month", "day")) %>%
extract("^day$") %>%
extract(is.numeric)
```### Operations on columns
The main difference between `dplyr::mutate` and `mutar` is that you use a `~`
instead of `=`.
```r
mutar(
flights,
gain ~ arr_delay - dep_delay,
speed ~ distance / air_time * 60
)
```Grouping data is handled within `mutar`:
```r
mutar(flights, n ~ .N, by = "month")
``````r
mutar(flights, delay ~ mean(dep_delay, na.rm = TRUE), by = "month")
```You can also provide additional arguments to a formula. This is especially
helpful when you want to pass arguments from a function to such expressions. The
additional augmentation can be anything which you can use to select columns
(character, regular expression, function) or a named list where each element is
a character.
```r
mutar(
flights,
.n ~ mean(.n, na.rm = TRUE) | "^.*delay$",
.x ~ mean(.x, na.rm = TRUE) | list(.x = "arr_time"),
by = "month"
)
``````
## # A tibble: 336,776 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##
## 1 2013 1 1 517 515 10.0 1523. 819
## 2 2013 1 1 533 529 10.0 1523. 830
## 3 2013 1 1 542 540 10.0 1523. 850
## 4 2013 1 1 544 545 10.0 1523. 1022
## 5 2013 1 1 554 600 10.0 1523. 837
## 6 2013 1 1 554 558 10.0 1523. 728
## 7 2013 1 1 555 600 10.0 1523. 854
## 8 2013 1 1 557 600 10.0 1523. 723
## 9 2013 1 1 557 600 10.0 1523. 846
## 10 2013 1 1 558 600 10.0 1523. 745
## # … with 336,766 more rows, and 11 more variables: arr_delay ,
## # carrier , flight , tailnum , origin , dest ,
## # air_time , distance , hour , minute , time_hour
```## A link to S4
Using this package you can create S4 classes to contain a data frame (or a
data.table) and use the interface to `dplyr`. Both `dplyr` and `data.table` do
not support integration with S4. The main function here is `mutar` which is
generic enough to link to subsetting of rows and cols as well as mutate and
summarise. In the background `dplyr`s ability to work on a `data.table` is being
used.```r
library("data.table")setClass("DataTable", "data.table")
DataTable <- function(...) {
new("DataTable", data.table::data.table(...))
}setMethod("[", "DataTable", mutar)
dtflights <- do.call(DataTable, nycflights13::flights)
dtflights[1:10, c("year", "month", "day")]
dtflights[n ~ .N, by = "month"]
dtflights[n ~ .N, sby = "month"]dtflights %>%
filtar(~month > 6) %>%
mutar(n ~ .N, by = "month") %>%
sumar(n ~ data.table::first(n), by = "month")
```## Working with vectors
Inspired by `rlist` and `purrr` some low level operations on vectors are
supported. The aim here is to integrate syntactic sugar for anonymous functions.
Furthermore the functions should support the use of pipes.- `map` and `flatmap` as replacements for the apply functions
- `extract` for subsetting
- `replace` for replacing elements in a vectorWhat we can do with map:
```r
map(1:3, ~ .^2)
flatmap(1:3, ~ .^2)
map(1:3 ~ 11:13, c) # zip
dat <- data.frame(x = 1, y = "")
map(dat, x ~ x + 1, is.numeric)
```What we can do with extract:
```r
extract(1:10, ~ . %% 2 == 0) %>% sum
extract(1:15, ~ 15 %% . == 0)
l <- list(aList = list(x = 1), aAtomic = "hi")
extract(l, "^aL")
extract(l, is.atomic)
```What we can do with replace:
```r
replace(c(1, 2, NA), is.na, 0)
replace(c(1, 2, NA), rep(TRUE, 3), 0)
replace(c(1, 2, NA), 3, 0)
replace(list(x = 1, y = 2), "x", 0)
replace(list(x = 1, y = 2), "^x$", 0)
replace(list(x = 1, y = "a"), is.character, NULL)
```