Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tidymodels/corrr
Explore correlations in R
https://github.com/tidymodels/corrr
Last synced: 2 days ago
JSON representation
Explore correlations in R
- Host: GitHub
- URL: https://github.com/tidymodels/corrr
- Owner: tidymodels
- License: other
- Created: 2016-06-24T06:09:09.000Z (over 8 years ago)
- Default Branch: main
- Last Pushed: 2024-01-05T18:09:50.000Z (10 months ago)
- Last Synced: 2024-10-31T22:03:14.691Z (11 days ago)
- Language: R
- Homepage: https://corrr.tidymodels.org
- Size: 77.5 MB
- Stars: 589
- Watchers: 19
- Forks: 56
- Open Issues: 18
-
Metadata Files:
- Readme: README.Rmd
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
- jimsghstars - tidymodels/corrr - Explore correlations in R (R)
README
---
output: github_document
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```[![R-CMD-check](https://github.com/tidymodels/corrr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/corrr/actions/workflows/R-CMD-check.yaml)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/corrr)](https://cran.r-project.org/package=corrr)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/corrr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/corrr?branch=main)corrr is a package for exploring **corr**elations in **R**. It focuses on creating and working with **data frames** of correlations (instead of matrices) that can be easily explored via corrr functions or by leveraging tools like those in the [tidyverse](https://www.tidyverse.org/). This, along with the primary corrr functions, is represented below:
You can install:
- the latest released version from CRAN with
```{r install_cran, eval = FALSE}
install.packages("corrr")
```- the latest development version from GitHub with
```{r install_git, eval = FALSE}
# install.packages("remotes")
remotes::install_github("tidymodels/corrr")
```## Using corrr
Using `corrr` typically starts with `correlate()`, which acts like the base correlation function `cor()`. It differs by defaulting to pairwise deletion, and returning a correlation data frame (`cor_df`) of the following structure:
- A `tbl` with an additional class, `cor_df`
- An extra "term" column
- Standardized variances (the matrix diagonal) set to missing values (`NA`) so they can be ignored.### API
The corrr API is designed with data pipelines in mind (e.g., to use `%>%` from the magrittr package). After `correlate()`, the primary corrr functions take a `cor_df` as their first argument, and return a `cor_df` or `tbl` (or output like a plot). These functions serve one of three purposes:
Internal changes (`cor_df` out):
- `shave()` the upper or lower triangle (set to `r NA`).
- `rearrange()` the columns and rows based on correlation strengths.Reshape structure (`tbl` or `cor_df` out):
- `focus()` on select columns and rows.
- `stretch()` into a long format.Output/visualizations (console/plot out):
- `fashion()` the correlations for pretty printing.
- `rplot()` the correlations with shapes in place of the values.
- `network_plot()` the correlations in a network.## Databases and Spark
The `correlate()` function also works with database tables. The function will automatically push the calculations of the correlations to the database, collect the results in R, and return the `cor_df` object. This allows for those results integrate with the rest of the `corrr` API.
## Examples
```{r example, message = FALSE, warning = FALSE}
library(MASS)
library(corrr)
set.seed(1)# Simulate three columns correlating about .7 with each other
mu <- rep(0, 3)
Sigma <- matrix(.7, nrow = 3, ncol = 3) + diag(3)*.3
seven <- mvrnorm(n = 1000, mu = mu, Sigma = Sigma)# Simulate three columns correlating about .4 with each other
mu <- rep(0, 3)
Sigma <- matrix(.4, nrow = 3, ncol = 3) + diag(3)*.6
four <- mvrnorm(n = 1000, mu = mu, Sigma = Sigma)# Bind together
d <- cbind(seven, four)
colnames(d) <- paste0("v", 1:ncol(d))# Insert some missing values
d[sample(1:nrow(d), 100, replace = TRUE), 1] <- NA
d[sample(1:nrow(d), 200, replace = TRUE), 5] <- NA# Correlate
x <- correlate(d)
class(x)
x
```**NOTE: Previous to corrr 0.4.3, the first column of a `cor_df` dataframe was named "rowname". As of corrr 0.4.3, the name of this first column changed to "term".**
As a `tbl`, we can use functions from data frame packages like `dplyr`, `tidyr`, `ggplot2`:
```{r, message = FALSE, warning = FALSE}
library(dplyr)# Filter rows by correlation size
x %>% filter(v1 > .6)
```corrr functions work in pipelines (`cor_df` in; `cor_df` or `tbl` out):
```{r combination, warning = FALSE, fig.height = 4, fig.width = 5}
x <- datasets::mtcars %>%
correlate() %>% # Create correlation data frame (cor_df)
focus(-cyl, -vs, mirror = TRUE) %>% # Focus on cor_df without 'cyl' and 'vs'
rearrange() %>% # rearrange by correlations
shave() # Shave off the upper triangle for a clean result
fashion(x)
rplot(x)datasets::airquality %>%
correlate() %>%
network_plot(min_cor = .2)
```## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/corrr/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).