Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tidyverse/forcats

🐈🐈🐈🐈: tools for working with categorical variables (factors)
https://github.com/tidyverse/forcats

factor r tidyverse

Last synced: 3 months ago
JSON representation

🐈🐈🐈🐈: tools for working with categorical variables (factors)

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```

# forcats

[![CRAN status](https://www.r-pkg.org/badges/version/forcats)](https://cran.r-project.org/package=forcats)
[![R-CMD-check](https://github.com/tidyverse/forcats/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/forcats/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/forcats/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/forcats?branch=main)

## Overview

R uses __factors__ to handle categorical variables, variables that have a fixed and known set of possible values. Factors are also helpful for reordering character vectors to improve display. The goal of the __forcats__ package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values. Some examples include:

* `fct_reorder()`: Reordering a factor by another variable.
* `fct_infreq()`: Reordering a factor by the frequency of values.
* `fct_relevel()`: Changing the order of a factor by hand.
* `fct_lump()`: Collapsing the least/most frequent values of a factor into "other".

You can learn more about each of these in `vignette("forcats")`. If you're new to factors, the best place to start is the [chapter on factors](https://r4ds.hadley.nz/factors.html) in R for Data Science.

## Installation

```
# The easiest way to get forcats is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just forcats:
install.packages("forcats")

# Or the the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/forcats")
```

## Cheatsheet

## Getting started

forcats is part of the core tidyverse, so you can load it with `library(tidyverse)` or `library(forcats)`.

```{r setup, message = FALSE}
library(forcats)
library(dplyr)
library(ggplot2)
```

```{r}
starwars %>%
filter(!is.na(species)) %>%
count(species, sort = TRUE)
```

```{r}
starwars %>%
filter(!is.na(species)) %>%
mutate(species = fct_lump(species, n = 3)) %>%
count(species)
```

```{r unordered-plot}
ggplot(starwars, aes(x = eye_color)) +
geom_bar() +
coord_flip()
```

```{r ordered-plot}
starwars %>%
mutate(eye_color = fct_infreq(eye_color)) %>%
ggplot(aes(x = eye_color)) +
geom_bar() +
coord_flip()
```

## More resources

For a history of factors, I recommend [_stringsAsFactors: An unauthorized biography_](https://simplystats.github.io/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng and [_stringsAsFactors = \_](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley. If you want to learn more about other approaches to working with factors and categorical data, I recommend [_Wrangling categorical data in R_](https://peerj.com/preprints/3163/), by Amelia McNamara and Nicholas Horton.

## Getting help

If you encounter a clear bug, please file a minimal reproducible example on [Github](https://github.com/tidyverse/forcats/issues). For questions and other discussion, please use [community.rstudio.com](https://community.rstudio.com/).