An open API service indexing awesome lists of open source software.

https://github.com/friendly/nestedlogit

Nested Dichotomy Logistic Regression Models
https://github.com/friendly/nestedlogit

logistic-regression multinomial-logistic-regression polytomous-variables r-package

Last synced: 3 months ago
JSON representation

Nested Dichotomy Logistic Regression Models

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
fig.height = 5,
fig.width = 5
# out.width = "100%"
)

library(nestedLogit)
# get package versions
cran_version <- available.packages(repos = "https://cloud.r-project.org")["nestedLogit", "Version"]
dev_version <- getNamespaceVersion("nestedLogit")

```

[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Last Commit](https://img.shields.io/github/last-commit/friendly/nestedLogit)](https://github.com/friendly/nestedLogit)
[![CRAN status](https://www.r-pkg.org/badges/version/nestedLogit)](https://cran.r-project.org/package=nestedLogit)
[![Downloads](https://cranlogs.r-pkg.org/badges/nestedLogit?color=brightgreen)](https://www.r-pkg.org:443/pkg/nestedLogit)
[![Docs](https://img.shields.io/badge/pkgdown%20site-blue)](https://friendly.github.io/nestedLogit)

# nestedLogit

**Version `r dev_version`**; documentation built for `pkgdown` `r Sys.Date()`

The `nestedLogit` package provides functions for fitting _nested dichotomy_ logistic regression models
for a **polytomous** response (with $m > 2$ categories), such as:

* support for political party in Canada (PC, Liberal, NDP, Green, BQ),
* preferred mode of transport (foot, bus, bike, train, plane),
* womens' working status (not working, part-time, full-time).

The figure below shows two different ways that a $m=4$-category polytomous response $Y = \{1, 2, 3, 4\}$ can be decomposed as
three ($m-1$) nested dichotomies among the levels.

* In the case shown at the left of the figure, the response categories
are divided first as $\{1, 2\}$ vs. $\{3, 4\}$. Then these compound categories are subdivided
as the dichotomies $\{1\}$ vs. $\{2\}$ and as $\{3\}$ vs. $\{4\}$.
* Alternatively, as shown at the right of the figure, the response categories
are divided progressively:
first as $\{1\}$ vs. $\{2, 3, 4\}$;
next as $\{2\}$ vs. $\{3, 4\}$; and
and finally $\{3\}$ vs. $\{4\}$.

```{r nested}
#| echo=FALSE,
#| out.width="80%",
#| fig.cap = "**Nested dichotomies**: The boxes show two different ways a four-category response can be represented as three nested dichotomies."
knitr::include_graphics("vignettes/fig/nested.jpg")
```

The basic model for this situation is the standard **multinomial logistic model** (fit by: e.g., `nnet::multinom()`)
which compares response categories to a _reference level_.
When you can think of the differences among the response categories as a set nested comparisons
among subsets of the categories, the approach of nested dichotomies is simpler, because:

* Nested dichotomies are statistically independent, and hence:
* the likelihood chi-square statistics for the sub-models are additive;
* they provide an additive decomposition of tests for the overall polytomous response.
* You can think of this as breaking up the overall question of "How do the response categories differ?" into $m-1$
sub-questions that answer the global one.

When the dichotomies make
sense substantively, this method can be a simpler alternative to the standard **multinomial logistic model**
which compares response categories to a reference level.
This choice is similar to using **orthogonal contrasts** among factor categories in an ANOVA,
as opposed to using the default reference-level coding.

### Ordered categories

Note that when the response catgegories are **ordered**, as in
education attained: "HS" < "College" < "BA" < "MA" < "Phd", another attractive model is the
**proportional odds** model (e.g., fit by `MASS::polr()`).
This is a simpler model, but achieves that simplicity by
making the additional assumption that the coefficients for the
predictors are the same for all categories.

## Installation

You can install the current published version (`r cran_version`) from [CRAN](https://cran.r-project.org/package=nestedLogit),
or the development version (`r dev_version`) from
either [R-universe](https://friendly.r-universe.dev/nestedLogit) or [Github](https://github.com/friendly/nestedLogit)

+-------------------+------------------------------------------------------------------------------+
| CRAN version | `install.packages("nestedLogit")` |
+-------------------+------------------------------------------------------------------------------+
| R-universe | `install.packages('nestedLogit', repos = 'https://friendly.r-universe.dev')` |
+-------------------+------------------------------------------------------------------------------+
| | |
| Github | `remotes::install_github("friendly/nestedLogit")` |
| | |
+-------------------+------------------------------------------------------------------------------+

## Package overview

The package provides one main function, `nestedLogit()` for fitting the set of $(m-1)$
binary logistic regression models for a polytomous response with $m$ levels.
These can be specified using helper functions,

* `dichotomy()`: constructs a _single_ dichotomy among the levels of a response factor;
* `logits()`: creates the set of dichotomies, typically using `dichotomy()` for each.
* `continuationLogits()`: provides a convenient way to generate all dichotomies for an ordered response.

For instance, a 4-category response, with levels `r LETTERS[1:4]`, and successive binary splits
for the dichotomies of interest
could be specified as:

```{r}
(ABCD <-
logits(AB.CD = dichotomy(c("A", "B"), c("C", "D")),
A.B = dichotomy("A", "B"),
C.D = dichotomy("C", "D")
)
)
```

These dichotomies are effectively a tree structure of lists, which can be displayed simply using
`lobstr::tree()`.

```{r tree}
lobstr::tree(ABCD)
```

Alternatively, the nested dichotomies can be specified more compactly as a nested (i.e., recursive) list
with optionally named elements. For example, where people might choose a method of transportation
among the categories `plane`, `train`, `bus`, `car`, a sensible set of three dichotomies could
be specified as:

```{r transport}
transport <- list(
air = "plane",
ground = list(
public = list("train", "bus"),
private = "car"
))

lobstr::tree(transport)
```

There are also methods including `as.matrix.dichotomies()`, `as.character.dichotomies()`
to facilitate working with `dichotomies` objects in other representations. The `ABCD` example
above corresponds to the matrix below, whose rows represent the dichotomies and columns
are the response levels:

```{r}
as.matrix(ABCD)

as.character(ABCD)
```

The result of `nestedLogit()` is an object of class `"nestedLogit"`. It contains
the set of $(m-1)$ `glm()` models fit to the dichotomies.

### Methods

```{r child="man/partials/methods.Rmd"}
```

## Examples

This example uses data on women's labor force participation to fit a nested logit model for
the response, `partic`, representing categories
`not.work`, `parttime` and `fulltime` for 263 women from a 1977
survey in Canada. This dataset is explored in more detail in the
package vignette, `vignette("nestedLogits", package = "nestedLogit")`.

A model for the complete polytomy can be specified as two nested
dichotomies, using helper functions `dichotomy()` and `logits()`, as shown in the example that follows:

* `work`: {not.work} vs. {parttime, fulltime}
* `full`: {parttime} vs. {fulltime}, but only for those working

`nestedLogit()` effectively fits each of these dichotomies
as logistic regression models via `glm(..., family = binomial)`

```{r wlf-model}
data(Womenlf, package = "carData")

# Use `logits()` and `dichotomy()` to specify the comparisons of interest
comparisons <- logits(work=dichotomy("not.work",
working=c("parttime", "fulltime")),
full=dichotomy("parttime", "fulltime"))

m <- nestedLogit(partic ~ hincome + children,
dichotomies = comparisons,
data=Womenlf)
coef(m)
```
The `"nestedLogit"` object contains the components of the fitted model. The structure can be shown nicely
using `lobstr::tree()`:

```{r}
m |> lobstr::tree(max_depth=1)
```

The separate models for the `work` and `full` dichotomies can be extracted via `models()`. These
are the binomial `glm()` models.
```{r}
models(m) |> lobstr::tree(max_depth = 1)
```

`Anova()` produces analysis of variance deviance tests for the terms in this model for each of the submodels, as well as for the combined responses of the polytomy. The `LR Chisq` and `df` for terms in the combined model are the sums of those for
the submodels.

```{r wlf-anova}
car::Anova(m)
```

### Plots
A basic plot of predicted probabilities can be produced using
the `plot()` method for `"nestedLogit"` objects.
It can be called several times to give multi-panel plots.
By default, a 95% pointwise confidence envelope is added to the plot.
Here, they are plotted with `conf.level = 0.68` to give $\pm 1$ std. error bounds.

```{r wlf-plot}
#| out.width = "100%",
#| fig.asp = 0.55,
#| echo = 1:3
op <- par(mfcol=c(1, 2), mar=c(4, 4, 3, 1) + 0.1)
plot(m, "hincome", list(children="absent"),
conf.level = 0.68,
xlab="Husband's Income", legend=FALSE)
plot(m, "hincome", list(children="present"),
conf.level = 0.68,
xlab="Husband's Income")
par(op)
```

## Vignettes

* A more general discussion of nested dichotomies logistic regression and detailed examples can be found
in `vignette("nestedLogit")`

* A variety of other plots can be produced using `ggplot()`, as described in the vignette,
`vignette("plotting-ggplot")`.

* A new vignette, "`vignette("standard-errors")`, describes the mathematics behind the calculation of
standard errors using the delta method.

## Authors
* John Fox
* Michael Friendly

## References

S. Fienberg (1980) _The Analysis of Cross-Classified Categorical Data_, 2nd Edition, MIT Press, Section 6.6.

J. Fox (2016) _Applied Regression Analysis and Generalized Linear Models_, 3rd Edition, Sage, Section 14.2.2.

M. Friendly and D. Meyers (2016) _Discrete Data Analysis with R_, CRC Press, Section 8.2.