https://github.com/tidymodels/corrr

Explore correlations in R
https://github.com/tidymodels/corrr

Last synced: about 1 month ago
JSON representation

Explore correlations in R

Host: GitHub
URL: https://github.com/tidymodels/corrr
Owner: tidymodels
License: other
Created: 2016-06-24T06:09:09.000Z (almost 9 years ago)
Default Branch: main
Last Pushed: 2024-01-05T18:09:50.000Z (over 1 year ago)
Last Synced: 2025-03-31T20:11:20.178Z (3 months ago)
Language: R
Homepage: https://corrr.tidymodels.org
Size: 77.5 MB
Stars: 593
Watchers: 18
Forks: 56
Open Issues: 19
Metadata Files:
- Readme: README.Rmd
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md

Awesome Lists containing this project

jimsghstars - tidymodels/corrr - Explore correlations in R (R)

README

        ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-"

)

```

# corrr 

[![R-CMD-check](https://github.com/tidymodels/corrr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/corrr/actions/workflows/R-CMD-check.yaml)

[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/corrr)](https://cran.r-project.org/package=corrr)

[![Codecov test coverage](https://codecov.io/gh/tidymodels/corrr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/corrr?branch=main)

corrr is a package for exploring **corr**elations in **R**. It focuses on creating and working with **data frames** of correlations (instead of matrices) that can be easily explored via corrr functions or by leveraging tools like those in the [tidyverse](https://www.tidyverse.org/). This, along with the primary corrr functions, is represented below:



You can install:

- the latest released version from CRAN with

```{r install_cran, eval = FALSE}

install.packages("corrr")

```

- the latest development version from GitHub with

```{r install_git, eval = FALSE}

# install.packages("remotes") 

remotes::install_github("tidymodels/corrr")

```

## Using corrr

Using `corrr` typically starts with `correlate()`, which acts like the base correlation function `cor()`. It differs by defaulting to pairwise deletion, and returning a correlation data frame (`cor_df`) of the following structure:

- A `tbl` with an additional class, `cor_df`

- An extra "term" column

- Standardized variances (the matrix diagonal) set to missing values (`NA`) so they can be ignored.

### API

The corrr API is designed with data pipelines in mind (e.g., to use `%>%` from the magrittr package). After `correlate()`, the primary corrr functions take a `cor_df` as their first argument, and return a `cor_df` or `tbl` (or output like a plot). These functions serve one of three purposes:

Internal changes (`cor_df` out):

- `shave()` the upper or lower triangle (set to `r NA`).

- `rearrange()` the columns and rows based on correlation strengths.

Reshape structure (`tbl` or `cor_df` out):

- `focus()` on select columns and rows.

- `stretch()` into a long format.

Output/visualizations (console/plot out):

- `fashion()` the correlations for pretty printing.

- `rplot()` the correlations with shapes in place of the values.

- `network_plot()` the correlations in a network.

## Databases and Spark

The `correlate()` function also works with database tables.  The function will automatically push the calculations of the correlations to the database, collect the results in R, and return the `cor_df` object.  This allows for those results integrate with the rest of the `corrr` API.

## Examples

```{r example, message = FALSE, warning = FALSE}

library(MASS)

library(corrr)

set.seed(1)

# Simulate three columns correlating about .7 with each other

mu <- rep(0, 3)

Sigma <- matrix(.7, nrow = 3, ncol = 3) + diag(3)*.3

seven <- mvrnorm(n = 1000, mu = mu, Sigma = Sigma)

# Simulate three columns correlating about .4 with each other

mu <- rep(0, 3)

Sigma <- matrix(.4, nrow = 3, ncol = 3) + diag(3)*.6

four <- mvrnorm(n = 1000, mu = mu, Sigma = Sigma)

# Bind together

d <- cbind(seven, four)

colnames(d) <- paste0("v", 1:ncol(d))

# Insert some missing values

d[sample(1:nrow(d), 100, replace = TRUE), 1] <- NA

d[sample(1:nrow(d), 200, replace = TRUE), 5] <- NA

# Correlate

x <- correlate(d)

class(x)

x

```

**NOTE: Previous to corrr 0.4.3, the first column of a `cor_df` dataframe was named "rowname". As of corrr 0.4.3, the name of this first column changed to "term".**

As a `tbl`, we can use functions from data frame packages like `dplyr`, `tidyr`, `ggplot2`:

```{r, message = FALSE, warning = FALSE}

library(dplyr)

# Filter rows by correlation size

x %>% filter(v1 > .6)

```

corrr functions work in pipelines (`cor_df` in; `cor_df` or `tbl` out):

```{r combination, warning = FALSE, fig.height = 4, fig.width = 5}

x <- datasets::mtcars %>%

       correlate() %>%    # Create correlation data frame (cor_df)

       focus(-cyl, -vs, mirror = TRUE) %>%  # Focus on cor_df without 'cyl' and 'vs'

       rearrange() %>%  # rearrange by correlations

       shave() # Shave off the upper triangle for a clean result

       

fashion(x)

rplot(x)

datasets::airquality %>% 

  correlate() %>% 

  network_plot(min_cor = .2)

```

## Contributing

This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/corrr/issues).

- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.

- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tidymodels/corrr

Awesome Lists containing this project

README