An open API service indexing awesome lists of open source software.

https://github.com/jaytimm/mds-for-linguists

Using R & VoteView mutlidimensional scaling (MDS) methods for the analysis & visualization of complex patterns of crosslinguistic variation.
https://github.com/jaytimm/mds-for-linguists

linguistic-typology multidimensional-scaling nominate semantic-maps voteview

Last synced: 3 months ago
JSON representation

Using R & VoteView mutlidimensional scaling (MDS) methods for the analysis & visualization of complex patterns of crosslinguistic variation.

Awesome Lists containing this project

README

          

---
title: "MDS for Linguists"
output:
md_document:
variant: markdown_github
toc: TRUE
toc_depth: 2
---


## MDS for Linguists

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
source("/home/jtimm/pCloudDrive/GitHub/git-projects/render_toc.r")
```

**An R-based guide for linguistic typologists** interested in applying [NOMINATE](https://voteview.com/about) multidimensional scaling (MDS) techniques to linguistic data as presented in [Croft](http://www.unm.edu/~wcroft/) and [Poole](https://polisci.ucsd.edu/about-our-people/faculty/faculty-directory/emeriti-faculty/poole-profile.html), "Inferring universals from grammatical variation: multidimensional scaling for typological analysis" (*Theoretical Linguistics* 34.1-37, 2008)." [[Abstract]](https://www.degruyter.com/view/j/thli.2008.34.issue-1/thli.2008.001/thli.2008.001.xml)

This guide provides a brief summary of an R-based workflow for model implementation and the visualization of model results within the `ggplot` data visualization framework. A cross-linguistic data set of indefinite pronouns from Haspelmath (1997) is utilized (and made available) here for demonstration purposes. For more thoughtful discussions regarding theory, scaling procedures & model interpretation, see reference section.

```{r echo=FALSE}
render_toc("/home/jtimm/pCloudDrive/GitHub/git-projects/mds_for_linguists/README.Rmd",
toc_header_name = 'MDS for Linguists',
toc_depth = 2)
```

## Getting started

### Install and load required packages

```{r message=FALSE, warning=FALSE}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(# anominate, -- no longer maintained --
wnominate,
pscl,
ggplot2,
knitr,
devtools,
ggrepel,
data.table)
```

```{r eval=FALSE}
devtools::install_github("jaytimm/wnomadds")
library(wnomadds)
```

### Load data

Data set: A 9 x 140 data frame: Nine indefinite pronominal meanings, using data from 140 pronouns in 40 languages. Data are made available [here](https://github.com/jaytimm/mds_for_linguists_using_R/blob/master/resources/Indefprn13.txt).

```{r include=FALSE}
local_data <- '/home/jtimm/pCloudDrive/GitHub/git-projects/mds_for_linguists/resources'
```

```{r eval=FALSE}
## File paths will look differently for Windows/Mac
local_data <- '/home/jtimm/Desktop/data/'
```

Load data set:

```{r message=FALSE, warning=FALSE}
setwd(local_data)
raw_data <- read.csv("Indefprn13.txt",
sep="\t",
stringsAsFactors = FALSE)
```

A portion of the data frame is presented below. Rows contain functions/meanings, and are analagous to legislators in the NOMINATE model. Columns contain language-specific grammatical forms, and are analagous to roll calls (ie, votes) in the NOMINATE model.

A value of 1 in the table below means that a given form expresses a particular meaning; a value of 6 means that a given form does not express that particular meaning. Missing data are specified with the value 9.

```{r message=FALSE, warning=FALSE}
knitr::kable(raw_data[,1:9])
```

---

## Using the wnominate and pscl packages

### Building MDS models

#### Rollcall object

The first step is to transform the original data structure into a `rollcall` object using the `pscl` package.

```{r}
roll_obj <- pscl::rollcall(raw_data [,-1],
yea=1,
nay=6,
missing=9,
notInLegis=8,
vote.names = colnames(raw_data)[2:ncol(raw_data)],
legis.names = raw_data[,1])
```

#### Ideal points estimation

Then we fit three models using the `wnominate` function -- one-, two- & three-dimensional solutions.

```{r message=FALSE, warning=FALSE, results = 'hide'}
ideal_points_1D <- wnominate::wnominate (roll_obj, dims = 1, polarity=c(1))
ideal_points_2D <- wnominate::wnominate (roll_obj, dims = 2, polarity=c(1,2))
ideal_points_3D <- wnominate::wnominate (roll_obj, dims = 3, polarity=c(1,2,3))
```

The resulting data structures are each comprised of seven elements:

```{r}
names(ideal_points_1D)
```

#### Model comparison and fitness statistics

Correct classification and fitness statistics for each model are extracted from the `fits` element, and summarized below:

```{r message=FALSE, warning=FALSE}
list('1D' = ideal_points_1D$fits,
'2D' = ideal_points_2D$fits,
'3D' = ideal_points_3D$fits)
```

### Visualizing model results

#### A one-dimensional solution

Extract legislator coordinates (ie, ideal points) from one-dimensional model results.

```{r}
d1 <- cbind(label=rownames(ideal_points_1D$legislators),
ideal_points_1D$legislators)
d1 <- d1[order(d1$coord1D),]
```

Plot legislators (ie, grammatical functions) in one-dimensional space by rank.

```{r fig.height=6, fig.width=6}
ggplot() +
geom_text(data = d1,
aes(x=reorder(label, coord1D),
y=coord1D,
label=label),
size=4,
color = 'blue') +

# theme_classic() +
theme_minimal() +

labs(title="1D W-NOMINATE Plot") +
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
xlab('') + ylab('First Dimension')+
ylim(-1.1, 1.1)+
coord_flip()
```

#### A two-dimensional solution

We first build a simple "base" plot using legislator coordinates from two-dimensional model results. Per `wnominate` convention, we add a unit circle to specify model constraints. All subsequent plots are built on this simple base plot.

```{r fig.height=6, fig.width=6}
base_2D <- ggplot(data = ideal_points_2D$legislators,
aes(x=coord1D,
y=coord2D)) +

geom_point(size= 1.5,
color = 'blue') +

annotate("path",
x=cos(seq(0,2*pi,length.out=300)),
y=sin(seq(0,2*pi,length.out=300)),
color='gray',
size = .25) +

xlab('First Dimension') +
ylab('Second Dimension')

base_2D + ggtitle('Two-dimensional base plot')
```

**Add** labels, a title, and change the theme.

```{r fig.height=6, fig.width=6}
base_2D +

ggrepel::geom_text_repel(
data = ideal_points_2D$legislators,
aes(label = rownames(ideal_points_2D$legislators)),
direction = "y",
hjust = 0,
size = 4,
color = 'blue') +

theme_classic() +
# theme_minimal() +

ggtitle("W-NOMINATE Coordinates")
```

---

### Cutting lines and roll call polarity via the wnomadds package

I have developed a simple R package, `wnomadds`, that facilitates the plotting of roll call cutting lines and roll call polarities using `ggplot`. While `wnominate` provides functionality for plotting cutting lines, only plotting in base R is supported. The `wnm_get_cutlines` function extracts cutting line coordinates from `wnominate` model results, along with coordinates specifying the direction of majority Yea votes for a given roll call (ie, vote polarity). Addtional details about the package are available [here](https://github.com/jaytimm/wnomadds).

```{r message=FALSE, warning=FALSE}
with_cuts <- wnomadds::wnm_get_cutlines(ideal_points_2D,
rollcall_obj = roll_obj,
arrow_length = 0.05)
```

A sample of the resulting data frame:

```{r}
head(with_cuts)
```

#### Cutting lines & legislator coordinates

```{r fig.height=6, fig.width=6}
base_2D +

ggrepel::geom_text_repel(
data = ideal_points_2D$legislators,
aes(label = rownames(ideal_points_2D$legislators)),
direction = "y",
hjust = 0,
size = 4,
color = 'blue') +

geom_segment(data = with_cuts,
aes(x = x_1, y = y_1,
xend = x_2, yend = y_2),
size = .25) + #cutting start to end

theme_minimal() +
labs(title="Cutting lines & W-NOMINATE Coordinates")
```

#### Cutting lines, roll call polarity & legislator coordinates

```{r fig.height=6, fig.width=6}
base_2D +

geom_segment(data=with_cuts,
aes(x = x_1, y = y_1,
xend = x_2, yend = y_2),
size = .25) + #cutting start to end

##ARROWS --
geom_segment(data=with_cuts,
aes(x = x_2, y = y_2,
xend = x_2a, yend = y_2a),
#cutting end to opposite arrow
color = 'red',
arrow = arrow(length = unit(0.2,"cm"))) +

geom_segment(data=with_cuts,
aes(x = x_1, y = y_1,
xend = x_1a, yend = y_1a),
#cutting start to opposite arrow
color = 'red',
arrow = arrow(length = unit(0.2,"cm")))+
##END ARROWS.

geom_text(data=with_cuts,
aes(x = x_1a, y = y_1a,
label = Bill_Code),
size=2.5,
nudge_y = 0.03,
check_overlap = TRUE) +

theme_minimal() +
labs(title = "W-NOMINATE Coordinates, cutting lines & roll call polarity")
```

#### Selected cutting lines and legislator coordinates

```{r fig.height=6, fig.width=6}
selected <- c('X01e', 'X01j', 'X01jd', 'X01n')

subset_cuts <- subset(with_cuts, Bill_Code %in% selected)

base_2D +

ggrepel::geom_text_repel(
data = ideal_points_2D$legislators,
aes(label = rownames(ideal_points_2D$legislators)),
direction = "y",
hjust = 0,
size = 4,
color = 'blue') +

geom_segment(data=subset_cuts,
aes(x = x_1, y = y_1,
xend = x_2, yend = y_2),
size = .25) + #cutting start to end

##ARROWS --
geom_segment(data=subset_cuts,
aes(x = x_2, y = y_2,
xend = x_2a, yend = y_2a),
#cutting end to opposite arrow
color = 'red',
arrow = arrow(length = unit(0.2,"cm"))) +

geom_segment(data=subset_cuts,
aes(x = x_1, y = y_1,
xend = x_1a, yend = y_1a),
#cutting start to opposite arrow
color = 'red',
arrow = arrow(length = unit(0.2,"cm")))+
##END ARROWS.

geom_text(data=subset_cuts,
aes(x = x_1a, y = y_1a,
label = Bill_Code),
size=2.5,
nudge_y = 0.03,
check_overlap = TRUE) +

theme_minimal() +
labs(title = "W-NOMINATE Coordinates & selected cutting lines")
```

### Facet cutting lines by language

```{r fig.height=6}
#Extract language code from language-specific grammatical forms
with_cuts$lang <- gsub('[A-Za-z]', '', with_cuts$Bill_Code)

#Filter cutting line data set to first six language codes.
facet_cuts <- subset(with_cuts, lang %in% c('01', '02', '03', '04', '05', '06'))

base_2D +

geom_segment(data=facet_cuts,
aes(x = x_1, y = y_1,
xend = x_2, yend = y_2),
size = .25) + #cutting start to end

##ARROWS --
geom_segment(data=facet_cuts,
aes(x = x_2, y = y_2,
xend = x_2a, yend = y_2a),
#cutting end to opposite arrow
color = 'red',
arrow = arrow(length = unit(0.2,"cm"))) +

geom_segment(data=facet_cuts,
aes(x = x_1, y = y_1,
xend = x_1a, yend = y_1a),
#cutting start to opposite arrow
color = 'red',
arrow = arrow(length = unit(0.2,"cm")))+
##END ARROWS.

theme_minimal() +
facet_wrap(~lang) +
coord_fixed()+
labs(title = "W-NOMINATE Coordinates & language-specific cutting lines")
```

---

## References

Royce Carroll, Christopher Hare, Jeffrey B. Lewis, James Lo, Keith T. Poole and Howard Rosenthal (2017). Alpha-NOMINATE: Ideal Point Estimator. R package version 0.6. URL http://k7moa.c
om/alphanominate.htm

Croft, W., & Poole, K. T. (2008). Inferring universals from grammatical variation: Multidimensional scaling for typological analysis. *Theoretical linguistics*, 34(1), 1-37.

Haspelmath, M. (1997). *Indefinite pronouns*. Oxford: Clarendon Press.

Poole, K. T. (2005). *Spatial models of parliamentary voting*. Cambridge University Press.

Keith Poole, Jeffrey Lewis, James Lo, Royce Carroll (2011). Scaling Roll Call Votes with wnominate in R. *Journal of Statistical Software*, 42(14), 1-21. URL http://www.jstatsoft.org/v42/i14/.