Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/koheiw/lsx

Semi-supervised algorithm for document scaling
https://github.com/koheiw/lsx

lsa quanteda sentiment-analysis text-analysis

Last synced: 2 days ago
JSON representation

Semi-supervised algorithm for document scaling

Awesome Lists containing this project

README

        

---
output:
rmarkdown::github_document
---

```{r, echo=FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "##",
fig.path = "images/",
dpi = 150,
fig.height = 5,
fig.width = 10
)
```

# LSS: Semi-supervised algorithm for document scaling

[![CRAN
Version](https://www.r-pkg.org/badges/version/LSX)](https://CRAN.R-project.org/package=LSX)
[![Downloads](https://cranlogs.r-pkg.org/badges/LSX)](https://CRAN.R-project.org/package=LSX)
[![Total
Downloads](https://cranlogs.r-pkg.org/badges/grand-total/LSX?color=orange)](https://CRAN.R-project.org/package=LSX)
[![R build
status](https://github.com/koheiw/LSX/workflows/R-CMD-check/badge.svg)](https://github.com/koheiw/LSX/actions)
[![codecov](https://codecov.io/gh/koheiw/LSX/branch/master/graph/badge.svg)](https://app.codecov.io/gh/koheiw/LSX)

In quantitative text analysis, the cost of training supervised machine learning models tend to be very high when the corpus is large. Latent Semantic Scaling (LSS) is a semi-supervised document scaling technique that I developed to perform large scale analysis at low cost. Taking user-provided *seed words* as weak supervision, it estimates polarity of words in the corpus by latent semantic analysis and locates documents on a unidimensional scale (e.g. sentiment).

## Installation

From CRAN:

```{r, eval=FALSE}
install.packages("LSX")
```

From Github:

```{r, eval=FALSE}
devtools::install_github("koheiw/LSX")
```

## Examples

Please visit the package website to understand the usage of the functions:

- [Introduction to LSX](https://koheiw.github.io/LSX/articles/pkgdown/basic.html)
- [Application in research](https://koheiw.github.io/LSX/articles/pkgdown/research.html)
- [Selection of seed words](https://koheiw.github.io/LSX/articles/pkgdown/seedwords.html)

Please read the following papers for the algorithm and methodology, and its application to non-English texts (Japanese and Hebrew):

- Watanabe, Kohei. 2020. ["Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages"](https://www.tandfonline.com/doi/full/10.1080/19312458.2020.1832976), *Communication Methods and Measures*.
- Watanabe, Kohei, Segev, Elad, & Tago, Atsushi. (2022). ["Discursive diversion: Manipulation of nuclear threats by the conservative leaders in Japan and Israel"](https://journals.sagepub.com/doi/full/10.1177/17480485221097967), *International Communication Gazette*.

## Other publications

LSS has been used for research in various fields of social science.

- Nakamura, Kentaro. 2022 [Balancing Opportunities and Incentives: How Rising China’s Mediated Public Diplomacy Changes Under Crisis](https://ijoc.org/index.php/ijoc/article/view/18676/3968), *International Journal of Communication*.
- Zollinger, Delia. 2022 [Cleavage Identities in Voters’ Own Words: Harnessing Open-Ended Survey Responses](https://onlinelibrary.wiley.com/doi/10.1111/ajps.12743), *American Journal of Political Science*.
- Brändle, Verena K., and Olga Eisele. 2022. ["A Thin Line: Governmental Border Communication in Times of European Crises"](https://onlinelibrary.wiley.com/doi/full/10.1111/jcms.13398) *Journal of Common Market Studies*.
- Umansky, Natalia. 2022. ["Who gets a say in this? Speaking security on social media"](https://journals.sagepub.com/doi/10.1177/14614448221111009). *New Media & Society*.
- Rauh, Christian, 2022. ["Supranational emergency politics? What executives’ public crisis communication may tell us"](https://www.tandfonline.com/doi/full/10.1080/13501763.2021.1916058), *Journal of European Public Policy*.
- Trubowitz, Peter and Watanabe, Kohei. 2021. ["The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats"](https://academic.oup.com/isq/advance-article/doi/10.1093/isq/sqab029/6278490), *International Studies Quarterly*.
- Vydra, Simon and Kantorowicz, Jaroslaw. 2020. ["Tracing Policy-relevant Information in Social Media: The Case of Twitter before and during the COVID-19 Crisis"](https://www.degruyter.com/document/doi/10.1515/spp-2020-0013/html). *Statistics, Politics and Policy*.
- Watanabe, Kohei. 2017. ["Measuring News Bias: Russia's Official News Agency ITAR-TASS’s Coverage of the Ukraine Crisis"](http://journals.sagepub.com/eprint/TBc9miIc89njZvY3gyAt/full), *European Journal Communication*.

More publications are available on [Google Scholar](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5312969973901591795).