Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ArthurSpirling/EmbeddingsPaper

Paper and related materials for Rodriguez & Spirling (JOP, 2022) word embeddings overview and assessment
https://github.com/ArthurSpirling/EmbeddingsPaper

Last synced: 3 months ago
JSON representation

Paper and related materials for Rodriguez & Spirling (JOP, 2022) word embeddings overview and assessment

Host: GitHub
URL: https://github.com/ArthurSpirling/EmbeddingsPaper
Owner: ArthurSpirling
Created: 2019-03-11T17:51:58.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-02-14T20:46:13.000Z (over 2 years ago)
Last Synced: 2024-05-14T15:36:11.449Z (6 months ago)
Homepage:
Size: 3.29 MB
Stars: 44
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Word Embeddings: What works, what doesn’t, and how to tell the difference for applied research

Paper and related materials for [Rodriguez](http://prodriguezsosa.com/) & [Spirling](http://www.arthurspirling.org) (2022). The abstract for the paper is as follows

> Word embeddings are becoming popular for political science research, yet we know little about their properties and performance. To help scholars seeking to use these techniques, we explore the effects of key parameter choices---including context window length, embedding vector dimensions and pre-trained vs locally fit variants---on the efficiency and quality of inferences possible with these models. Reassuringly we show that results are generally robust to such choices for political corpora of various sizes and in various languages. Beyond reporting extensive technical findings, we provide a novel crowdsourced ``Turing test"-style method for examining the relative performance of any two models that produce substantive, text-based outputs. Our results are encouraging: popular, easily available pre-trained embeddings perform at a level close to---or surpassing---both human coders and more complicated locally-fit models. For completeness, we provide best practice advice for cases where local fitting is required.

You can find the paper [here](https://github.com/ArthurSpirling/EmbeddingsPaper/blob/master/Paper/Embeddings_SpirlingRodriguez.pdf) and an FAQ for the project [here](https://github.com/ArthurSpirling/EmbeddingsPaper/blob/master/Project_FAQ/faq.md). Replication code is [here](https://github.com/prodriguezsosa/EmbeddingsPaperReplication)

The paper is now published at the [*Journal of Politics*](https://www.journals.uchicago.edu/doi/10.1086/715162). Full citation is:

```
@article{rodriguez2022word,
title={Word embeddings: What works, what doesn’t, and how to tell the difference for applied research},
author={Rodriguez, Pedro L and Spirling, Arthur},
journal={The Journal of Politics},
volume={84},
number={1},
pages={101--115},
year={2022},
publisher={The University of Chicago Press Chicago, IL}
}
```

Comments are (still) very welcome!

Note that Spirling was supported in part by NSF grant number 1922658