https://github.com/hope-data-science/tidyft

Tidy Verbs for Fast Data Operations by Reference
https://github.com/hope-data-science/tidyft

Last synced: 6 months ago
JSON representation

Tidy Verbs for Fast Data Operations by Reference

Host: GitHub
URL: https://github.com/hope-data-science/tidyft
Owner: hope-data-science
License: other
Created: 2020-04-10T10:47:27.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2023-01-11T06:37:12.000Z (over 2 years ago)
Last Synced: 2024-04-26T06:03:10.338Z (about 1 year ago)
Language: R
Homepage: https://hope-data-science.github.io/tidyft/
Size: 358 KB
Stars: 33
Watchers: 3
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - hope-data-science/tidyft - Tidy Verbs for Fast Data Operations by Reference (R)

README

        # tidyft: Fast and Memory Efficient Data Operations in Tidy Syntax

 [![](https://www.r-pkg.org/badges/version/tidyft?color=black)](https://cran.r-project.org/package=tidyft) [![downloads](http://cranlogs.r-pkg.org/badges/grand-total/tidyft?color=D3D3D3)](https://r-pkg.org/pkg/tidyft) 

## Overview

*tidyft* is an extension of [data.table](https://github.com/Rdatatable/data.table). Using modification by reference whenever possible, this toolkit is designed for big data analysis in high-performance desktop or laptop computers. The syntax of the package is similar or identical to [tidyverse](https://github.com/tidyverse/tidyverse). It is user friendly, memory efficient and time saving. For more information, check its ancestor package [tidyfst](https://github.com/hope-data-science/tidyfst).

This design is best for big data manipulation on out of memory data using facilities provided by [fst](https://hope-data-science.github.io/tidyft/reference/fst.html). In such ways, you can handle the most quantity of data in the least time and space on your computer.

## Installation

You can install the released version of tidyft via:

``` r

install.packages("tidyft") 

```

## Example

This is a basic example which shows you how to solve a common problem:

``` r

library(tidyft)

# get first 5 rows of iris

as.data.table(iris)[1:5] -> a

#show

a

#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species

#> 1:          5.1         3.5          1.4         0.2  setosa

#> 2:          4.9         3.0          1.4         0.2  setosa

#> 3:          4.7         3.2          1.3         0.2  setosa

#> 4:          4.6         3.1          1.5         0.2  setosa

#> 5:          5.0         3.6          1.4         0.2  setosa

# if you select

a %>% select(1:3)

#>    Sepal.Length Sepal.Width Petal.Length

#> 1:          5.1         3.5          1.4

#> 2:          4.9         3.0          1.4

#> 3:          4.7         3.2          1.3

#> 4:          4.6         3.1          1.5

#> 5:          5.0         3.6          1.4

# you lose the unselected columns forever

a

#>    Sepal.Length Sepal.Width Petal.Length

#> 1:          5.1         3.5          1.4

#> 2:          4.9         3.0          1.4

#> 3:          4.7         3.2          1.3

#> 4:          4.6         3.1          1.5

#> 5:          5.0         3.6          1.4

```

If you still want to keep the original data, use `copy()` to make a copy beforehand.

## Tutorial

See [vignettes](https://hope-data-science.github.io/tidyft/).

## Performance

```

rm(list = ls())

library(profvis)

library(dplyr)

library(tidyft)

as.data.frame(starwars) -> starwars

starwars[sample.int(1:nrow(starwars),1e6,replace = T),] -> starwars

copy(starwars) -> dat1

copy(starwars) -> dat2

copy(starwars) -> dat3

profvis({

  dat1 %>%

    dplyr::as_tibble() %>%

    dplyr::select(name, dplyr::ends_with("color")) %>%

    dplyr::arrange(hair_color,skin_color,eye_color) -> a

  setorder(setDT(dat2)[,.SD,.SDcols = patterns("name|color$")],

           hair_color,skin_color,eye_color) -> b

  dat3 %>%

    tidyft::setDT() %>%

    tidyft::select("name|color$") %>%

    tidyft::arrange(hair_color,skin_color,eye_color) -> c

})

all.equal(a,b)

#> [1] TRUE

all.equal(b,c)

#> [1] TRUE

```

![](performance.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hope-data-science/tidyft

Awesome Lists containing this project

README