https://github.com/friendly/nestedlogit

Nested Dichotomy Logistic Regression Models
https://github.com/friendly/nestedlogit
logistic-regression multinomial-logistic-regression polytomous-variables r-package
Last synced: 3 months ago
JSON representation
Nested Dichotomy Logistic Regression Models
Host: GitHub
URL: https://github.com/friendly/nestedlogit
Owner: friendly
Created: 2023-04-17T15:57:19.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2026-01-29T03:29:48.000Z (3 months ago)
Last Synced: 2026-01-29T19:54:30.148Z (3 months ago)
Topics: logistic-regression, multinomial-logistic-regression, polytomous-variables, r-package
Language: HTML
Homepage: https://friendly.github.io/nestedLogit/
Size: 7.22 MB
Stars: 10
Watchers: 2
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
Awesome Lists containing this project

README

          ---

output: github_document

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  warning = FALSE,

  comment = "#>",

  fig.path = "man/figures/README-",

  fig.height = 5,

  fig.width = 5

#  out.width = "100%"

)

library(nestedLogit)

# get package versions

cran_version <- available.packages(repos = "https://cloud.r-project.org")["nestedLogit", "Version"]

dev_version <- getNamespaceVersion("nestedLogit")

```

[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)

[![Last Commit](https://img.shields.io/github/last-commit/friendly/nestedLogit)](https://github.com/friendly/nestedLogit)

[![CRAN status](https://www.r-pkg.org/badges/version/nestedLogit)](https://cran.r-project.org/package=nestedLogit)

[![Downloads](https://cranlogs.r-pkg.org/badges/nestedLogit?color=brightgreen)](https://www.r-pkg.org:443/pkg/nestedLogit)

[![Docs](https://img.shields.io/badge/pkgdown%20site-blue)](https://friendly.github.io/nestedLogit)

# nestedLogit 

**Version `r dev_version`**; documentation built for `pkgdown` `r Sys.Date()`

The `nestedLogit` package provides functions for fitting _nested dichotomy_ logistic regression models

for a **polytomous** response (with $m > 2$ categories), such as: 

* support for political party in Canada (PC, Liberal, NDP, Green, BQ), 

* preferred mode of transport (foot, bus, bike, train, plane), 

* womens' working status (not working, part-time, full-time).

The figure below shows two different ways that a $m=4$-category polytomous response $Y = \{1, 2, 3, 4\}$ can be decomposed as

three ($m-1$) nested dichotomies among the levels.

 

* In the case shown at the left of the figure, the response categories

are divided first as $\{1, 2\}$ vs. $\{3, 4\}$. Then these compound categories are subdivided

 as the dichotomies $\{1\}$ vs. $\{2\}$ and as $\{3\}$ vs. $\{4\}$.

* Alternatively, as shown at the right of the figure, the response categories

are divided progressively:

first as $\{1\}$ vs. $\{2, 3, 4\}$; 

next as $\{2\}$ vs. $\{3, 4\}$; and

and finally $\{3\}$ vs. $\{4\}$.

```{r nested}

#| echo=FALSE,

#| out.width="80%",

#| fig.cap = "**Nested dichotomies**: The boxes show two different ways a four-category response can be represented as three nested dichotomies."

knitr::include_graphics("vignettes/fig/nested.jpg")

```

The basic model for this situation is the standard **multinomial logistic model** (fit by: e.g., `nnet::multinom()`)

which compares response categories to a _reference level_.

When you can think of the differences among the response categories as a set nested comparisons

among subsets of the categories, the approach of nested dichotomies is simpler, because:

* Nested dichotomies are statistically independent, and hence: 

* the likelihood chi-square statistics for the sub-models are additive;

* they provide an additive decomposition of tests for the overall polytomous response.

* You can think of this as breaking up the overall question of "How do the response categories differ?" into $m-1$

  sub-questions that answer the global one.

When the dichotomies make

sense substantively, this method can be a simpler alternative to the standard **multinomial logistic model**

which compares response categories to a reference level.

This choice is similar to using **orthogonal contrasts** among factor categories in an ANOVA,

as opposed to using the default reference-level coding.

### Ordered categories

Note that when the response catgegories are **ordered**, as in 

education attained: "HS" < "College" < "BA" < "MA" < "Phd", another attractive model is the 

**proportional odds** model (e.g., fit by `MASS::polr()`).

This is a simpler model, but achieves that simplicity by

making the additional assumption that the coefficients for the

predictors are the same for all categories.

## Installation

You can install the current published version (`r cran_version`) from [CRAN](https://cran.r-project.org/package=nestedLogit), 

or the development version (`r dev_version`) from

either [R-universe](https://friendly.r-universe.dev/nestedLogit) or [Github](https://github.com/friendly/nestedLogit)

+-------------------+------------------------------------------------------------------------------+

| CRAN version      | `install.packages("nestedLogit")`                                            |

+-------------------+------------------------------------------------------------------------------+

| R-universe        | `install.packages('nestedLogit', repos = 'https://friendly.r-universe.dev')` |

+-------------------+------------------------------------------------------------------------------+

|                   |                                                                              |

| Github            | `remotes::install_github("friendly/nestedLogit")`                            |

|                   |                                                                              |

+-------------------+------------------------------------------------------------------------------+

## Package overview

The package provides one main function, `nestedLogit()` for fitting the set of $(m-1)$

binary logistic regression models for a polytomous response with $m$ levels.

These can be specified using helper functions,

* `dichotomy()`: constructs a _single_ dichotomy among the levels of a response factor;

* `logits()`: creates the set of dichotomies, typically using `dichotomy()` for each.

* `continuationLogits()`: provides a convenient way to generate all dichotomies for an ordered response.

For instance, a 4-category response, with levels `r LETTERS[1:4]`, and successive binary splits

for the dichotomies of interest

could be specified as:

```{r}

(ABCD <-

  logits(AB.CD = dichotomy(c("A", "B"), c("C", "D")),

           A.B = dichotomy("A", "B"),

           C.D = dichotomy("C", "D")

         )

)

```

These dichotomies are effectively a tree structure of lists, which can be displayed simply using

`lobstr::tree()`.

```{r tree}

lobstr::tree(ABCD)

```

Alternatively, the nested dichotomies can be specified more compactly as a nested (i.e., recursive) list 

with optionally named elements. For example, where people might choose a method of transportation

among the categories `plane`, `train`, `bus`, `car`, a sensible set of three dichotomies could

be specified as:

```{r transport}

transport <- list(

  air = "plane",

  ground = list(

    public = list("train", "bus"),

    private = "car"

  ))

lobstr::tree(transport)

```

There are also methods including `as.matrix.dichotomies()`, `as.character.dichotomies()`

to facilitate working with `dichotomies` objects in other representations. The `ABCD` example

above corresponds to the matrix below, whose rows represent the dichotomies and columns

are the response levels:

```{r}

as.matrix(ABCD)

as.character(ABCD)

```

The result of `nestedLogit()` is an object of class `"nestedLogit"`. It contains

the set of $(m-1)$ `glm()` models fit to the dichotomies.

### Methods

```{r child="man/partials/methods.Rmd"}

```

## Examples

This example uses data on women's labor force participation to fit a nested logit model for

the response, `partic`, representing categories

`not.work`, `parttime` and `fulltime` for 263 women from a 1977

survey in Canada. This dataset is explored in more detail in the

package vignette, `vignette("nestedLogits", package = "nestedLogit")`.

A model for the complete polytomy can be specified as two nested

dichotomies, using helper functions `dichotomy()` and `logits()`, as shown in the example that follows:

* `work`: {not.work} vs. {parttime, fulltime}

* `full`: {parttime} vs. {fulltime}, but only for those working

`nestedLogit()` effectively fits each of these dichotomies

as logistic regression models via `glm(..., family = binomial)`

```{r wlf-model}

data(Womenlf, package = "carData")

# Use `logits()` and `dichotomy()` to specify the comparisons of interest

comparisons <- logits(work=dichotomy("not.work", 

                                     working=c("parttime", "fulltime")),

                      full=dichotomy("parttime", "fulltime"))

m <- nestedLogit(partic ~ hincome + children,

                 dichotomies = comparisons,

                 data=Womenlf)

coef(m)

```

The `"nestedLogit"` object contains the components of the fitted model. The structure can be shown nicely

using `lobstr::tree()`:

```{r}

m |> lobstr::tree(max_depth=1)

```

The separate models for the `work` and `full` dichotomies can be extracted via `models()`. These

are the binomial `glm()` models.

```{r}

models(m) |> lobstr::tree(max_depth = 1)

```

`Anova()` produces analysis of variance deviance tests for the terms in this model for each of the submodels, as well as for the combined responses of the polytomy. The `LR Chisq` and `df` for terms in the combined model are the sums of those for

the submodels.

```{r wlf-anova}

car::Anova(m)

```

### Plots

A basic plot of predicted probabilities can be produced using

the `plot()` method for `"nestedLogit"` objects.

It can be called several times to give multi-panel plots.

By default, a 95% pointwise confidence envelope is added to the plot.

Here, they are plotted with `conf.level = 0.68` to give $\pm 1$ std. error bounds.

```{r wlf-plot}

#| out.width = "100%",

#| fig.asp = 0.55,

#| echo = 1:3

op <- par(mfcol=c(1, 2), mar=c(4, 4, 3, 1) + 0.1)

plot(m, "hincome", list(children="absent"),

     conf.level = 0.68,

     xlab="Husband's Income", legend=FALSE)

plot(m, "hincome", list(children="present"),

     conf.level = 0.68,

     xlab="Husband's Income")

par(op)

```

## Vignettes

* A more general discussion of nested dichotomies logistic regression and detailed examples can be found 

in `vignette("nestedLogit")`

* A variety of other plots can be produced using `ggplot()`, as described in the vignette,

`vignette("plotting-ggplot")`.

* A new vignette, "`vignette("standard-errors")`, describes the mathematics behind the calculation of

standard errors using the delta method.

## Authors

* John Fox

* Michael Friendly

## References

S. Fienberg (1980) _The Analysis of Cross-Classified Categorical Data_, 2nd Edition, MIT Press, Section 6.6.

J. Fox (2016) _Applied Regression Analysis and Generalized Linear Models_, 3rd Edition, Sage, Section 14.2.2.

M. Friendly and D. Meyers (2016) _Discrete Data Analysis with R_, CRC Press, Section 8.2.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/friendly/nestedlogit

Awesome Lists containing this project

README