https://github.com/fmarotta/fplyr

Apply Functions to Blocks of Files
https://github.com/fmarotta/fplyr

bigdata cran r rstats

Last synced: 8 months ago
JSON representation

Apply Functions to Blocks of Files

Host: GitHub
URL: https://github.com/fmarotta/fplyr
Owner: fmarotta
License: other
Created: 2019-11-07T18:27:39.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2023-08-24T12:45:18.000Z (over 2 years ago)
Last Synced: 2024-10-07T08:06:45.880Z (about 1 year ago)
Topics: bigdata, cran, r, rstats
Language: R
Homepage: https://cran.r-project.org/package=fplyr
Size: 2.17 MB
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - fmarotta/fplyr - Apply Functions to Blocks of Files (R)

README

          # fplyr

[![cran version-ago](https://www.r-pkg.org/badges/version-ago/fplyr)](https://www.r-pkg.org/badges/version-ago/fplyr)

[![cran logs](https://cranlogs.r-pkg.org/badges/fplyr)](https://cranlogs.r-pkg.org/badges/fplyr)

[![cran logs last-day](https://cranlogs.r-pkg.org/badges/last-day/fplyr)](https://cranlogs.r-pkg.org/badges/last-day/fplyr)

[![cran logs grand-total](https://cranlogs.r-pkg.org/badges/grand-total/fplyr)](https://cranlogs.r-pkg.org/badges/grand-total/fplyr)

This package combines the power of the `data.table` and `iotools` packages to

efficiently read a large file block by block and, in the meantime, apply a

user-specified function to each block of the file. The outputs can be collected

into a list or printed to an output file.

A 'block' is defined as a set of contiguous lines that have the same value

in the first field. Thus, this package is not intended for all large files,

but rather its usefulness is limited to a particular type of files.

## Examples

A typical file that can be processed with `fplyr` is as follows.

```

V1   V2  V3 V4

ID01 ABC Berlin 0.1

ID01 DEF London 0.5

ID01 GHI Rome   0.3

ID02 ABC Lisbon 0.2

ID02 DEF Berlin 0.6

ID02 LMN Prague 0.8

ID02 OPQ Dublin 0.7

ID03 DEF Lisbon -0.1

ID03 LMN Berlin 0.01

ID03 XYZ Prague 0.2

```

The first block consists of the first three lines, the second block of the next

four lines, and the third and last block is made up of the last three lines.

Suppose you want to compute the mean of the fourth column for each ID in the

first field. If the file is small, you can use `by()`. But if the file is so

big that it does not fit into the available memory, you can use one of the

functions of this package. If the path to the above file is stored in the

variable `f`, the following command returns a list where each element is the

mean of the fourth column for a single block.

```

l <- flply(f, function(d) mean(d$V4))

```

## Installation

1. The package is on CRAN and it can be installed from there. Start R and enter:

```

install.packages("fplyr")

```

2. The development version can be installed from GitHub with `devtools` (but do it at your own risk). Start R and enter:

```

if (!requireNamespace("devtools", quietly = TRUE))

    install.packages("devtools")

devtools::install_github("fmarotta/fplyr", ref = "devel")

```

3. To install an old release, browse to [](https://github.com/fmarotta/fplyr/releases), and download the "tar.gz" of the required version; then from the command line enter:

```

R CMD INSTALL fplyr-x.y.z.tar.gz

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fmarotta/fplyr

Awesome Lists containing this project

README