https://github.com/jaytimm/mds-for-linguists

Using R & VoteView mutlidimensional scaling (MDS) methods for the analysis & visualization of complex patterns of crosslinguistic variation.
https://github.com/jaytimm/mds-for-linguists
linguistic-typology multidimensional-scaling nominate semantic-maps voteview
Last synced: 3 months ago
JSON representation
Using R & VoteView mutlidimensional scaling (MDS) methods for the analysis & visualization of complex patterns of crosslinguistic variation.
Host: GitHub
URL: https://github.com/jaytimm/mds-for-linguists
Owner: jaytimm
Created: 2019-06-19T19:13:43.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2022-02-07T17:22:08.000Z (over 4 years ago)
Last Synced: 2025-10-29T16:58:11.839Z (7 months ago)
Topics: linguistic-typology, multidimensional-scaling, nominate, semantic-maps, voteview
Homepage:
Size: 1.59 MB
Stars: 4
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project

README

          ---

title: "MDS for Linguists"

output:

  md_document:

    variant: markdown_github

    toc: TRUE

    toc_depth: 2

---

  

## MDS for Linguists

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

source("/home/jtimm/pCloudDrive/GitHub/git-projects/render_toc.r")

```

**An R-based guide for linguistic typologists** interested in applying [NOMINATE](https://voteview.com/about) multidimensional scaling (MDS) techniques to linguistic data as presented in [Croft](http://www.unm.edu/~wcroft/) and [Poole](https://polisci.ucsd.edu/about-our-people/faculty/faculty-directory/emeriti-faculty/poole-profile.html), "Inferring universals from grammatical variation: multidimensional scaling for typological analysis" (*Theoretical Linguistics* 34.1-37, 2008)." [[Abstract]](https://www.degruyter.com/view/j/thli.2008.34.issue-1/thli.2008.001/thli.2008.001.xml)

This guide provides a brief summary of an R-based workflow for model implementation and the visualization of model results within the `ggplot` data visualization framework.  A cross-linguistic data set of indefinite pronouns from Haspelmath (1997) is utilized (and made available) here for demonstration purposes. For more thoughtful discussions regarding theory, scaling procedures & model interpretation, see reference section.

```{r echo=FALSE}

render_toc("/home/jtimm/pCloudDrive/GitHub/git-projects/mds_for_linguists/README.Rmd", 

           toc_header_name = 'MDS for Linguists',

           toc_depth = 2)

```

## Getting started

### Install and load required packages

```{r message=FALSE, warning=FALSE}

if (!require("pacman")) install.packages("pacman")

pacman::p_load(# anominate, -- no longer maintained -- 

               wnominate, 

               pscl, 

               ggplot2, 

               knitr, 

               devtools, 

               ggrepel, 

               data.table)

```

```{r eval=FALSE}

devtools::install_github("jaytimm/wnomadds")

library(wnomadds)

```

### Load data

Data set:  A 9 x 140 data frame: Nine indefinite pronominal meanings, using data from 140 pronouns in 40 languages.  Data are made available [here](https://github.com/jaytimm/mds_for_linguists_using_R/blob/master/resources/Indefprn13.txt).

```{r include=FALSE}

local_data <- '/home/jtimm/pCloudDrive/GitHub/git-projects/mds_for_linguists/resources'

```

```{r eval=FALSE}

## File paths will look differently for Windows/Mac

local_data <- '/home/jtimm/Desktop/data/'

```

Load data set:

```{r message=FALSE, warning=FALSE}

setwd(local_data)

raw_data <- read.csv("Indefprn13.txt",

            sep="\t", 

            stringsAsFactors = FALSE)

```

A portion of the data frame is presented below.  Rows contain functions/meanings, and are analagous to legislators in the NOMINATE model.  Columns contain language-specific grammatical forms, and are analagous to roll calls (ie, votes) in the NOMINATE model.  

A value of 1 in the table below means that a given form expresses a particular meaning; a value of 6 means that a given form does not express that particular meaning.  Missing data are specified with the value 9.

```{r message=FALSE, warning=FALSE}

knitr::kable(raw_data[,1:9]) 

```

---

## Using the wnominate and pscl packages

### Building MDS models

#### Rollcall object

The first step is to transform the original data structure into a `rollcall` object using the `pscl` package.

```{r}

roll_obj <- pscl::rollcall(raw_data [,-1], 

                           yea=1, 

                           nay=6, 

                           missing=9,

                           notInLegis=8,

                           vote.names = colnames(raw_data)[2:ncol(raw_data)], 

                           legis.names = raw_data[,1])

```

#### Ideal points estimation 

Then we fit three models using the `wnominate` function -- one-, two- & three-dimensional solutions.

```{r message=FALSE, warning=FALSE, results = 'hide'}

ideal_points_1D <- wnominate::wnominate (roll_obj, dims = 1, polarity=c(1)) 

ideal_points_2D <- wnominate::wnominate (roll_obj, dims = 2, polarity=c(1,2)) 

ideal_points_3D <- wnominate::wnominate (roll_obj, dims = 3, polarity=c(1,2,3)) 

```

The resulting data structures are each comprised of seven elements:

```{r}

names(ideal_points_1D)

```

#### Model comparison and fitness statistics

Correct classification and fitness statistics for each model are extracted from the `fits` element, and summarized below: 

```{r message=FALSE, warning=FALSE}

list('1D' = ideal_points_1D$fits, 

     '2D' = ideal_points_2D$fits, 

     '3D' = ideal_points_3D$fits)

```

### Visualizing model results

#### A one-dimensional solution 

Extract legislator coordinates (ie, ideal points) from one-dimensional model results.

```{r}

d1 <- cbind(label=rownames(ideal_points_1D$legislators), 

            ideal_points_1D$legislators)

d1 <- d1[order(d1$coord1D),]

```

Plot legislators (ie, grammatical functions) in one-dimensional space by rank.

```{r fig.height=6, fig.width=6}

ggplot()  +

  geom_text(data = d1,

            aes(x=reorder(label, coord1D), 

                y=coord1D, 

                label=label), 

            size=4, 

            color = 'blue') +

  

#  theme_classic() +

   theme_minimal() +

  

  labs(title="1D W-NOMINATE Plot") + 

  theme(axis.text.y=element_blank(),

        axis.ticks.y=element_blank())+ 

  xlab('') + ylab('First Dimension')+

  ylim(-1.1, 1.1)+

  coord_flip()

```

#### A two-dimensional solution 

We first build a simple "base" plot using legislator coordinates from two-dimensional model results.  Per `wnominate` convention, we add a unit circle to specify model constraints.  All subsequent plots are built on this simple base plot. 

```{r fig.height=6, fig.width=6}

base_2D <- ggplot(data = ideal_points_2D$legislators,

       aes(x=coord1D, 

           y=coord2D)) +

  

  geom_point(size= 1.5,

             color = 'blue') +

  

  annotate("path",

           x=cos(seq(0,2*pi,length.out=300)),

           y=sin(seq(0,2*pi,length.out=300)),

           color='gray',

           size = .25) +

  

  xlab('First Dimension') + 

  ylab('Second Dimension') 

base_2D + ggtitle('Two-dimensional base plot')

```

**Add** labels, a title, and change the theme.  

```{r fig.height=6, fig.width=6}

base_2D +

  ggrepel::geom_text_repel(

    data  = ideal_points_2D$legislators,

    aes(label = rownames(ideal_points_2D$legislators)),

    direction = "y",

    hjust = 0, 

    size = 4,

    color = 'blue') +

  

  theme_classic() +

# theme_minimal() +

  

  ggtitle("W-NOMINATE Coordinates") 

```

---

### Cutting lines and roll call polarity via the wnomadds package

I have developed a simple R package, `wnomadds`, that facilitates the plotting of roll call cutting lines and roll call polarities using `ggplot`.  While `wnominate` provides functionality for plotting cutting lines, only plotting in base R is supported.  The `wnm_get_cutlines` function extracts cutting line coordinates from `wnominate` model results, along with coordinates specifying the direction of majority Yea votes for a given roll call (ie, vote polarity). Addtional details about the package are available [here](https://github.com/jaytimm/wnomadds).

```{r message=FALSE, warning=FALSE}

with_cuts <- wnomadds::wnm_get_cutlines(ideal_points_2D, 

                                        rollcall_obj = roll_obj, 

                                        arrow_length = 0.05)

```

A sample of the resulting data frame:

```{r}

head(with_cuts)

```

#### Cutting lines & legislator coordinates

```{r fig.height=6, fig.width=6}

base_2D +

  ggrepel::geom_text_repel(

    data  = ideal_points_2D$legislators,

    aes(label = rownames(ideal_points_2D$legislators)),

    direction = "y",

    hjust = 0, 

    size = 4,

    color = 'blue') +

  

  geom_segment(data = with_cuts, 

               aes(x = x_1, y = y_1, 

                   xend = x_2, yend = y_2),

               size = .25) + #cutting start to end

  

  theme_minimal() +

  labs(title="Cutting lines & W-NOMINATE Coordinates")

```

#### Cutting lines, roll call polarity & legislator coordinates

```{r fig.height=6, fig.width=6}

base_2D +

  

  geom_segment(data=with_cuts, 

               aes(x = x_1, y = y_1, 

                   xend = x_2, yend = y_2),

               size = .25) + #cutting start to end

  

  ##ARROWS -- 

  geom_segment(data=with_cuts, 

               aes(x = x_2, y = y_2, 

                   xend = x_2a, yend = y_2a), 

               #cutting end to opposite arrow

               color = 'red',

               arrow = arrow(length = unit(0.2,"cm"))) +

  

  geom_segment(data=with_cuts, 

               aes(x = x_1, y = y_1, 

                   xend = x_1a, yend = y_1a), 

               #cutting start to opposite arrow

               color = 'red',

               arrow = arrow(length = unit(0.2,"cm")))+ 

  ##END ARROWS.

  

  geom_text(data=with_cuts, 

               aes(x = x_1a, y = y_1a, 

                   label = Bill_Code), 

               size=2.5, 

               nudge_y = 0.03,

               check_overlap = TRUE) +

  theme_minimal() +

  labs(title = "W-NOMINATE Coordinates, cutting lines & roll call polarity")

```

#### Selected cutting lines and legislator coordinates

```{r fig.height=6, fig.width=6}

selected <- c('X01e', 'X01j', 'X01jd', 'X01n')

subset_cuts <- subset(with_cuts, Bill_Code %in% selected)

base_2D +

  

  ggrepel::geom_text_repel(

    data  = ideal_points_2D$legislators,

    aes(label = rownames(ideal_points_2D$legislators)),

    direction = "y",

    hjust = 0, 

    size = 4,

    color = 'blue') +

  

  geom_segment(data=subset_cuts, 

               aes(x = x_1, y = y_1, 

                   xend = x_2, yend = y_2),

               size = .25) + #cutting start to end

  

  ##ARROWS -- 

  geom_segment(data=subset_cuts, 

               aes(x = x_2, y = y_2, 

                   xend = x_2a, yend = y_2a), 

               #cutting end to opposite arrow

               color = 'red',

               arrow = arrow(length = unit(0.2,"cm"))) +

  

  geom_segment(data=subset_cuts, 

               aes(x = x_1, y = y_1, 

                   xend = x_1a, yend = y_1a), 

               #cutting start to opposite arrow

               color = 'red',

               arrow = arrow(length = unit(0.2,"cm")))+ 

  ##END ARROWS.

  

  geom_text(data=subset_cuts, 

               aes(x = x_1a, y = y_1a, 

                   label = Bill_Code), 

               size=2.5, 

               nudge_y = 0.03,

               check_overlap = TRUE) +

  theme_minimal() +

  labs(title = "W-NOMINATE Coordinates & selected cutting lines")

```

### Facet cutting lines by language

```{r fig.height=6}

#Extract language code from language-specific grammatical forms

with_cuts$lang <- gsub('[A-Za-z]', '', with_cuts$Bill_Code)

#Filter cutting line data set to first six language codes.

facet_cuts <- subset(with_cuts, lang %in% c('01', '02', '03', '04', '05', '06'))

base_2D +

  

  geom_segment(data=facet_cuts, 

               aes(x = x_1, y = y_1, 

                   xend = x_2, yend = y_2),

               size = .25) + #cutting start to end

  

  ##ARROWS -- 

  geom_segment(data=facet_cuts, 

               aes(x = x_2, y = y_2, 

                   xend = x_2a, yend = y_2a), 

               #cutting end to opposite arrow

               color = 'red',

               arrow = arrow(length = unit(0.2,"cm"))) +

  

  geom_segment(data=facet_cuts, 

               aes(x = x_1, y = y_1, 

                   xend = x_1a, yend = y_1a), 

               #cutting start to opposite arrow

               color = 'red',

               arrow = arrow(length = unit(0.2,"cm")))+ 

  ##END ARROWS.

  theme_minimal() +

  facet_wrap(~lang) +

  coord_fixed()+

  labs(title = "W-NOMINATE Coordinates & language-specific cutting lines")

```

---

## References

Royce Carroll, Christopher Hare, Jeffrey B. Lewis, James Lo, Keith T. Poole and Howard Rosenthal (2017). Alpha-NOMINATE: Ideal Point Estimator. R package version 0.6. URL http://k7moa.c

om/alphanominate.htm

Croft, W., & Poole, K. T. (2008). Inferring universals from grammatical variation: Multidimensional scaling for typological analysis. *Theoretical linguistics*, 34(1), 1-37.

Haspelmath, M. (1997). *Indefinite pronouns*. Oxford: Clarendon Press.

Poole, K. T. (2005). *Spatial models of parliamentary voting*. Cambridge University Press.

Keith Poole, Jeffrey Lewis, James Lo, Royce Carroll (2011). Scaling Roll Call Votes with wnominate in R. *Journal of Statistical Software*, 42(14), 1-21. URL http://www.jstatsoft.org/v42/i14/.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jaytimm/mds-for-linguists

Awesome Lists containing this project

README