Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fmarotta/fplyr

Apply Functions to Blocks of Files
https://github.com/fmarotta/fplyr

bigdata cran r rstats

Last synced: 19 days ago
JSON representation

Apply Functions to Blocks of Files

Awesome Lists containing this project

README

        

# fplyr

[![cran version-ago](https://www.r-pkg.org/badges/version-ago/fplyr)](https://www.r-pkg.org/badges/version-ago/fplyr)
[![cran logs](https://cranlogs.r-pkg.org/badges/fplyr)](https://cranlogs.r-pkg.org/badges/fplyr)
[![cran logs last-day](https://cranlogs.r-pkg.org/badges/last-day/fplyr)](https://cranlogs.r-pkg.org/badges/last-day/fplyr)
[![cran logs grand-total](https://cranlogs.r-pkg.org/badges/grand-total/fplyr)](https://cranlogs.r-pkg.org/badges/grand-total/fplyr)

This package combines the power of the `data.table` and `iotools` packages to
efficiently read a large file block by block and, in the meantime, apply a
user-specified function to each block of the file. The outputs can be collected
into a list or printed to an output file.

A 'block' is defined as a set of contiguous lines that have the same value
in the first field. Thus, this package is not intended for all large files,
but rather its usefulness is limited to a particular type of files.

## Examples

A typical file that can be processed with `fplyr` is as follows.

```
V1 V2 V3 V4
ID01 ABC Berlin 0.1
ID01 DEF London 0.5
ID01 GHI Rome 0.3
ID02 ABC Lisbon 0.2
ID02 DEF Berlin 0.6
ID02 LMN Prague 0.8
ID02 OPQ Dublin 0.7
ID03 DEF Lisbon -0.1
ID03 LMN Berlin 0.01
ID03 XYZ Prague 0.2
```

The first block consists of the first three lines, the second block of the next
four lines, and the third and last block is made up of the last three lines.
Suppose you want to compute the mean of the fourth column for each ID in the
first field. If the file is small, you can use `by()`. But if the file is so
big that it does not fit into the available memory, you can use one of the
functions of this package. If the path to the above file is stored in the
variable `f`, the following command returns a list where each element is the
mean of the fourth column for a single block.

```
l <- flply(f, function(d) mean(d$V4))
```

## Installation

1. The package is on CRAN and it can be installed from there. Start R and enter:

```
install.packages("fplyr")
```

2. The development version can be installed from GitHub with `devtools` (but do it at your own risk). Start R and enter:

```
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")

devtools::install_github("fmarotta/fplyr", ref = "devel")
```

3. To install an old release, browse to [](https://github.com/fmarotta/fplyr/releases), and download the "tar.gz" of the required version; then from the command line enter:

```
R CMD INSTALL fplyr-x.y.z.tar.gz
```