Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/miserman/lingmatch

An all-in-one R package for the assessment of linguistic similarity
https://github.com/miserman/lingmatch

nlp r rcpp text-analysis

Last synced: 1 day ago
JSON representation

An all-in-one R package for the assessment of linguistic similarity

Host: GitHub
URL: https://github.com/miserman/lingmatch
Owner: miserman
Created: 2017-09-25T03:55:10.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2024-11-08T20:51:07.000Z (12 days ago)
Last Synced: 2024-11-08T21:33:58.568Z (12 days ago)
Topics: nlp, r, rcpp, text-analysis
Language: R
Homepage: https://miserman.github.io/lingmatch
Size: 29.6 MB
Stars: 11
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # lingmatch

An all-in-one R package for the assessment of linguistic matching and/or accommodation.

## features

* Input raw text, a document-term matrix (DTM), or LIWC output.

* Apply various weighting functions to a DTM.

* Measure similarity and/or accommodation with various metrics.

* Calculate standard forms of Language Style Matching (LSM) and Latent Semantic Similarity (LSS).

## resources

* Documentation and guides: [miserman.github.io/lingmatch](https://miserman.github.io/lingmatch/)

  * [Quick Start](https://miserman.github.io/lingmatch/articles/quickstart.html)

  * [Comparison Specification](https://miserman.github.io/lingmatch/articles/groups.html)

  * [Introduction to Text Analysis](https://miserman.github.io/lingmatch/articles/introduction.html)

  * [Word Vectors](https://miserman.github.io/lingmatch/articles/word_vectors.html)

  * [Text Classification](https://miserman.github.io/lingmatch/articles/text_classification.html)

  * [Dictionary Creation](https://miserman.github.io/lingmatch/articles/dictionary_creation.html)

* Dictionary repository: [osf.io/y6g5b](https://osf.io/y6g5b/wiki/home/)

* Latent semantic space repository: [osf.io/489he](https://osf.io/489he/wiki/home/)

* Dictionary builder: [miserman.github.io/dictionary_builder](https://miserman.github.io/dictionary_builder/)

## installation

Download R from [r-project.org](https://www.r-project.org/), then install the package from an R console:

Release ([version 1.0.7](https://CRAN.R-project.org/package=lingmatch))

```R

install.packages("lingmatch")

```

Development (version 1.0.8)

```R

# install.packages("remotes")

remotes::install_github("miserman/lingmatch")

```

And load the package:

```R

library(lingmatch)

```

## examples

Can make a quick comparison between two bits of text; by default this will give the cosine similarity between raw

word-count vectors:

```R

lingmatch("First text to look at.", "Text to compare that text with.")

```

Or, given a vector of texts:

```R

text = c(

  "Why, hello there! How are you this evening?",

  "I am well, thank you for your inquiry!",

  "You are a most good at social interactions person!",

  "Why, thank you! You're not all bad yourself!"

)

```

Process the texts in one step:

```R

# with a dictionary

inquirer_cats = lma_process(text, dict = "inquirer", dir = "~/Dictionaries")

# with a latent semantic space

glove_vectors = lma_process(text, space = "glove", dir = "~/Latent Semantic Spaces")

```

Or process the texts step by step, then measure similarity between each:

```R

dtm = lma_dtm(text)

dtm_weighted = lma_weight(dtm)

dtm_categorized = lma_termcat(dtm_weighted, lma_dict(1:9))

similarity = lma_simets(dtm_categorized, metric = "canberra")

```

Or do that within a single function call:

```R

similarity = lingmatch(

  text, weight = "frequency", dict = lma_dict(1:9), metric = "canberra"

)$sim

```

Or, if you want a standard form (as in this example), specify a default:

```R

similarity = lingmatch(text, type = "lsm")$sim

```