Rearrrange data by a set of methods

arrange cluster expand forming generate ggplot2 order plotting-in-r roll rotate shaping swirl transformations

Rearrrange data by a set of methods

# rearrr
**Rearrrange Data**
**Authors:** [Ludvig R. Olsen]( ( [email protected] )

**License:** [MIT](

**Started:** April 2020

## Overview

R package for rearranging data by a set of methods.

We distinguish between **rearrangers** and **mutators**, where the first *reorders* the data points and the second *changes the values* of the data points.

When performing an operation relative to a point in an n-dimensional vector space, we refer to the point as the **origin**. If we, for instance, wish to rotate our data points around the point at `x = 3` and `y = 7`, those are the coordinates of our origin.

### Install

CRAN (when available):

> `install.packages("rearrr")`

Development version:

> `install.packages("devtools")`
> `devtools::install_github("LudvigOlsen/rearrr")`

### Rearrangers

| Function | Description |
|`center_max()` |Center the highest value with values decreasing around it. |
|`center_min()` |Center the lowest value with values increasing around it. |
|`position_max()` |Position the highest value with values decreasing around it. |
|`position_min()` |Position the lowest value with values increasing around it. |
|`pair_extremes()` |Arrange as lowest, highest, 2nd lowest, 2nd highest, etc. |
|`triplet_extremes()` |Arrange as lowest, most middle, highest, 2nd lowest, 2nd most middle, 2nd highest, etc. |
|`closest_to()` |Order values by shortest distance to an origin. |
|`furthest_from()` |Order values by longest distance to an origin. |
|`rev_windows()` |Reverse order window-wise. |
|`roll_elements()` |Rolls/shifts positions of elements. |
|`shuffle_hierarchy()` |Shuffle multi-column hierarchy of groups. |

### Mutators

| Function | Description | Dimensions |
|`rotate_2d()`, `rotate_3d()` |Rotate values around an origin in 2 or 3 dimensions. |2 or 3 |
|`swirl_2d()`, `swirl_3d()` |Swirl values around an origin in 2 or 3 dimensions. |2 or 3 |
|`shear_2d()`, `shear_3d()` |Shear values around an origin in 2 or 3 dimensions. |2 or 3 |
|`expand_distances()` |Expand distances to an origin. |n |
|`expand_distances_each()`|Expand distances to an origin separately for each dimension. |n |
|`cluster_groups()` |Move data points into clusters around group centroids. |n |
|`dim_values()` |Dim values of a dimension by the distance to an n-dimensional origin. |n (alters 1)|
|`flip_values()` |Flip the values around an origin. |n |
|`roll_values()` |Shifts values and wraps to a range. |n |
|`wrap_to_range()` |Wraps values to a range. |n |
|`transfer_centroids()` |Transfer centroids from one `data.frame` to another. |n |
|`apply_transformation_matrix()` |Apply transformation `matrix` to `data.frame` columns. |n |

### Formers

| Function | Description |
|`circularize()` |Create x-coordinates for y-coordinates so they form a circle. |
|`hexagonalize()` |Create x-coordinates for y-coordinates so they form a hexagon. |
|`square()` |Create x-coordinates for y-coordinates so they form a square. |
|`triangularize()` |Create x-coordinates for y-coordinates so they form a triangle. |

### Pipelines

| Class | Description |
|`Pipeline` |Chain multiple transformations. |
|`GeneratedPipeline` |Chain multiple transformations and generate argument values per group. |
|`FixedGroupsPipeline` |Chain multiple transformations with different argument values per group. |

### Generators

| Function | Description |
|`generate_clusters()` |Generate n-dimensional clusters. |

Additionally, some functions have `*_vec()` versions, that take and return a `vector`.

**Note**: The available utility functions (like scalers, converters and measuring functions) are
listed at the bottom of the readme.

## Table of Contents

## Attach packages

Let's see some **examples**. We start by attaching the necessary packages:

```{r warning=FALSE, message=FALSE}


vec <- 1:10
random_sample <- runif(10)
orderings <- data.frame(
"Position" = as.integer(vec),
"center_max" = center_max(vec),
"center_min" = center_min(vec),
"position_max" = position_max(vec, position = 3),
"position_min" = position_min(vec, position = 3),
"pair_extremes" = pair_extremes_vec(vec),
"rev_windows" = rev_windows_vec(vec, window_size = 3),
"closest_to" = closest_to_vec(vec, origin_fn = create_origin_fn(median)),
"furthest_from" = furthest_from_vec(vec, origin = 5),
"random_sample" = random_sample,
"flipped_median" = flip_values_vec(random_sample, origin_fn=create_origin_fn(median)),
stringsAsFactors = FALSE

# Convert to long format for plotting
if (has_tidyr){
orderings <- orderings %>%
tidyr::gather(key = "Method", value = "Value", 2:(ncol(orderings)))

gg_line_alpha <- .4
gg_base_line_size <- .3


While we can use the functions with `data.frames`, we showcase many of them with a `vector` for simplicity.
At times, we use the `*_vec()` version of a function in order to get the output as a `vector` instead of a `data.frame`.

The functions work with grouped `data.frames` and in `magrittr` pipelines (`%>%`).

## Rearranger examples

Rearrangers change the order of the data points.

### Center min/max

center_max(data = 1:10)

center_min(data = 1:10)

### Position min/max

position_max(data = 1:10, position = 3)

position_min(data = 1:10, position = 3)

### Pair extremes

pair_extremes(data = 1:10)

### Closest to / furthest from

We use the `_vec()` versions to get the reordered vectors. For `data.frames`, use `closest_to()`/`furthest_from()` instead.

The origin can be passed as either a specific coordinate (here, a value in `data`) or a function.

closest_to_vec(data = 1:10, origin_fn = create_origin_fn(median))

furthest_from_vec(data = 1:10, origin = 5)

### Reverse windows

We use the `_vec()` version to get the reordered vector. For `data.frames`, use `rev_windows()` instead.

rev_windows_vec(data = 1:10, window_size = 3)

### Shuffle Hierarchy

When having a `data.frame` with multiple grouping columns, we can shuffle them one column (hierarchical level) at a time:

```{r eval=FALSE}
# Shuffle a given data frame 'df'
shuffle_hierarchy(df, group_cols = c("a", "b", "c"))

The columns are shuffled one at a time, as so:

## Mutator examples

Mutators change the values of the data points.

### Rotate values

2-dimensional rotation:

# Set seed for reproducibility

# Draw random numbers
random_sample <- round(runif(10), digits = 4)

data = random_sample,
degrees = 60,
origin_fn = centroid

3-dimensional rotation:

# Set seed

# Create a data frame
df <- data.frame(
"x" = 1:12,
"y" = c(1, 2, 3, 4, 9, 10, 11,
12, 15, 16, 17, 18),
"z" = runif(12)

# Perform rotation
data = df,
x_col = "x",
y_col = "y",
z_col = "z",
x_deg = 45,
y_deg = 90,
z_deg = 135,
origin_fn = centroid

### Swirl values

2-dimensional swirling:

# Rotate values
swirl_2d(data = rep(1, 50), radius = 95, origin = c(0, 0))

3-dimensional swirling:

# Set seed

# Create a data frame
df <- data.frame(
"x" = 1:50,
"y" = 1:50,
"z" = 1:50,
"r1" = runif(50),
"r2" = runif(50) * 35,
"o" = 1,
"g" = rep(1:5, each = 10)

# They see me swiiirling
data = df,
x_radius = 45,
x_col = "x",
y_col = "y",
z_col = "z",
origin = c(0, 0, 0),
keep_original = FALSE

### Expand distances

# 1d expansion
data = random_sample,
multiplier = 3,
origin_fn = centroid,
exponentiate = TRUE

2d expansion:

xpectr::set_test_seed(36) # for next section

Expand differently in each axis:

# Expand x-axis and contract y-axis
data.frame("x" = runif(10),
"y" = runif(10)),
cols = c("x", "y"),
multipliers = c(7, 0.5),
origin_fn = centroid

### Cluster groups

# Set seed for reproducibility

# Create data frame with random data and a grouping variable
df <- data.frame(
"x" = runif(50),
"y" = runif(50),
"g" = rep(c(1, 2, 3, 4, 5), each = 10)

data = df,
cols = c("x", "y"),
group_col = "g"

df_clustered <- df_clustered %>%
dplyr::select(x_clustered, y_clustered, g)

### Dim values

# Add a column with 1s
df_clustered$o <- 1

# Dim the "o" column based on the data point's distance
# to the most central point in the cluster
df_clustered %>%
dplyr::group_by(g) %>%
cols = c("x_clustered", "y_clustered"),
dim_col = "o",
origin_fn = most_centered

### Flip values

# The median value to flip around

# Flip the random numbers around the median
data = random_sample,
origin_fn = create_origin_fn(median)

## Forming examples

### Circularize points


### Hexagonalize points


### Square points


### Triangularize points


## Generators

### Generate clusters

num_rows = 50,
num_cols = 5,
num_clusters = 5,
compactness = 1.6

## Utilities

### Converters

| Function | Description |
|`radians_to_degrees()` |Converts radians to degrees. |
|`degrees_to_radians()` |Converts degrees to radians. |

### Scalers

| Function | Description |
|`min_max_scale()` |Scale values to a range. |
|`to_unit_length()` |Scale vectors to unit length *row-wise* or *column-wise*. |

### Measuring functions

| Function | Description |
|`distance()` |Calculates distance to an origin. |
|`angle()` |Calculates angle between points and an origin. |
|`vector_length()` |Calculates vector length/magnitude *row-wise* or *column-wise*. |

### Helper functions

| Function | Description |
|`create_origin_fn()` |Creates function for finding origin coordinates (like `centroid()`). |
|`centroid()` |Calculates the mean of each supplied vector/column. |
|`most_centered()` |Finds coordinates of data point closest to the centroid. |
|`is_most_centered()` |Indicates whether a data point is the most centered. |
|`midrange()` |Calculates the midrange of each supplied vector/column. |
|`create_n_fn()` |Creates function for finding the number of positions to move. |
|`median_index()` |Calculates median index of each supplied vector. |
|`quantile_index()` |Calculates quantile of indices for each supplied vector. |