Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/elbersb/segregation

R package to calculate entropy-based segregation indices, with a focus on the Mutual Information Index (M) and Theil’s Information Index (H)
https://github.com/elbersb/segregation

entropy r r-package rstats segregation statistics

Last synced: 3 months ago
JSON representation

R package to calculate entropy-based segregation indices, with a focus on the Mutual Information Index (M) and Theil’s Information Index (H)

Awesome Lists containing this project

README

        

---
output:
md_document:
variant: gfm
editor_options:
markdown:
wrap: 72
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
options(scipen = 999)
options(digits = 3)
set.seed(69839)
```

# segregation

[![CRAN
Version](https://www.r-pkg.org/badges/version/segregation)](https://CRAN.R-project.org/package=segregation)
[![R build
status](https://github.com/elbersb/segregation/workflows/R-CMD-check/badge.svg)](https://github.com/elbersb/segregation/actions)
[![Coverage
status](https://codecov.io/gh/elbersb/segregation/branch/master/graph/badge.svg)](https://app.codecov.io/github/elbersb/segregation?branch=master)

An R package to calculate, visualize, and decompose various segregation indices.
The package currently supports

- the Mutual Information Index (M),
- Theil's Information Index (H),
- the index of Dissimilarity (D),
- the isolation and exposure index.

Find more information in `vignette("segregation")`
and the [documentation](https://elbersb.de/segregation).

The package also supports

- [standard error and confidence intervals estimation via bootstrapping](https://elbersb.com/public/posts/2021-11-24-segregation-bias/),
which also corrects for small sample bias
- decomposition of the M and H indices (within/between, local segregation)
- decomposing differences in total segregation over time (Elbers 2020)
- [segregation visualizations](https://elbersb.github.io/segregation/articles/plotting.html) (segregation curves and 'segplots')

Most methods return [tidy](https://vita.had.co.nz/papers/tidy-data.html)
[data.tables](https://rdatatable.gitlab.io/data.table/) for easy
post-processing and plotting. For speed, the package uses the [`data.table`](https://rdatatable.gitlab.io/data.table/)
package internally, and implements some functions in C++.

Most of the procedures implemented in this package are described in more
detail [in this SMR
paper](https://journals.sagepub.com/doi/full/10.1177/0049124121986204)
([Preprint](https://osf.io/preprints/socarxiv/ya7zs/)) and [in this
working paper](https://osf.io/preprints/socarxiv/ruw4g/).

## Usage

The package provides an easy way to calculate segregation measures,
based on the Mutual Information Index (M) and Theil's Entropy Index (H).

```{r}
library(segregation)

# example dataset with fake data provided by the package
mutual_total(schools00, "race", "school", weight = "n")
```

Standard errors in all functions can be estimated via boostrapping. This
will also apply bias-correction to the estimates:

```{r}
mutual_total(schools00, "race", "school",
weight = "n",
se = TRUE, CI = 0.90, n_bootstrap = 500
)
```

Decompose segregation into a between-state and a within-state term (the
sum of these equals total segregation):

```{r}
# between states
mutual_total(schools00, "race", "state", weight = "n")

# within states
mutual_total(schools00, "race", "school", within = "state", weight = "n")
```

Local segregation (`ls`) is a decomposition by units or groups (here
racial groups). This function also support standard error and CI
estimation. The sum of the proportion-weighted local segregation scores
equals M:

```{r}
local <- mutual_local(schools00,
group = "school", unit = "race", weight = "n",
se = TRUE, CI = 0.90, n_bootstrap = 500, wide = TRUE
)
local[, c("race", "ls", "p", "ls_CI")]
sum(local$p * local$ls)
```

Decompose the difference in M between 2000 and 2005, using iterative
proportional fitting (IPF) and the Shapley decomposition (see Elbers
2021 for details):

```{r}
mutual_difference(schools00, schools05,
group = "race", unit = "school",
weight = "n", method = "shapley"
)
```

Show a segplot:

```{r segplot}
segplot(schools00, group = "race", unit = "school", weight = "n")
```

Find more information in the
[documentation](https://elbersb.github.io/segregation/).

## How to install

To install the package from CRAN, use

```{r eval=FALSE}
install.packages("segregation")
```

To install the development version, use

```{r eval=FALSE}
devtools::install_github("elbersb/segregation")
```

## Citation

If you use this package for your research, please cite one of the following papers:

- Elbers, Benjamin (2021). A Method for Studying Differences in Segregation
Across Time and Space. Sociological Methods & Research.

- Elbers, Benjamin and Rob Gruijters (2023). Segplot: A New Method for Visualizing Patterns of Multi-Group Segregation.

## Some additional resources

- The book *Analyzing US Census Data: Methods, Maps, and Models in R*
by Kyle E. Walker contains [a discussion of this
package](https://walker-data.com/census-r/modeling-us-census-data.html#indices-of-segregation-and-diversity),
and is a great resource for anyone working with spatial data,
especially U.S. Census data.
- A paper that makes use of this package: [Did Residential Racial
Segregation in the U.S. Really Increase? An Analysis Accounting for
Changes in Racial
Diversity](https://elbersb.com/public/posts/2021-07-23-segregation-increase/)
([Code and Data](https://osf.io/mg9q4/))
- Some of the analyses [in this
article](https://multimedia.tijd.be/diversiteit/) by the Belgian
newspaper *De Tijd* used the package.
- The analyses of [this article in the Wall Street
Journal](https://www.wsj.com/articles/chicago-vs-dallas-why-the-north-lags-behind-the-south-and-west-in-racial-integration-11657936680)
were produced using this package.

## References on entropy-based segregation indices

Deutsch, J., Flückiger, Y. & Silber, J. (2009). Analyzing Changes in
Occupational Segregation: The Case of Switzerland (1970--2000), in: Yves
Flückiger, Sean F. Reardon, Jacques Silber (eds.) Occupational and
Residential Segregation (Research on Economic Inequality, Volume 17),
171--202.

DiPrete, T. A., Eller, C. C., Bol, T., & van de Werfhorst, H. G. (2017).
School-to-Work Linkages in the United States, Germany, and France.
American Journal of Sociology, 122(6), 1869-1938.

Elbers, B. (2021). A Method for Studying Differences in Segregation
Across Time and Space. Sociological Methods & Research.

Forster, A. G., & Bol, T. (2017). Vocational education and employment
over the life course using a new measure of occupational specificity.
Social Science Research, 70, 176-197.

Theil, H. (1971). Principles of Econometrics. New York: Wiley.

Frankel, D. M., & Volij, O. (2011). Measuring school segregation.
Journal of Economic Theory, 146(1), 1-38.

Mora, R., & Ruiz-Castillo, J. (2003). Additively decomposable
segregation indexes. The case of gender segregation by occupations and
human capital levels in Spain. The Journal of Economic Inequality, 1(2),
147-179.

Mora, R., & Ruiz-Castillo, J. (2009). The Invariance Properties of the
Mutual Information Index of Multigroup Segregation, in: Yves Flückiger,
Sean F. Reardon, Jacques Silber (eds.) Occupational and Residential
Segregation (Research on Economic Inequality, Volume 17), 33-53.

Mora, R., & Ruiz-Castillo, J. (2011). Entropy-based Segregation Indices.
Sociological Methodology, 41(1), 159--194.

Van Puyenbroeck, T., De Bruyne, K., & Sels, L. (2012). More than 'Mutual
Information': Educational and sectoral gender segregation and their
interaction on the Flemish labor market. Labour Economics, 19(1), 1-8.

Watts, M. The Use and Abuse of Entropy Based Segregation Indices.
Working Paper. URL: