https://github.com/mlampros/glover
Global Vectors for Word Representation
https://github.com/mlampros/glover
global-vectors glove r word-representation
Last synced: 6 months ago
JSON representation
Global Vectors for Word Representation
- Host: GitHub
- URL: https://github.com/mlampros/glover
- Owner: mlampros
- Created: 2017-01-04T16:42:54.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2021-04-17T04:15:41.000Z (over 4 years ago)
- Last Synced: 2025-03-26T02:42:58.496Z (7 months ago)
- Topics: global-vectors, glove, r, word-representation
- Language: R
- Homepage: https://mlampros.github.io/GloveR/
- Size: 1.83 MB
- Stars: 7
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://github.com/mlampros/GloveR/actions)
[](https://codecov.io/github/mlampros/GloveR?branch=master)## GloveR
The GloveR package is an R wrapper for the [*Global Vectors for Word Representation*](http://nlp.stanford.edu/projects/glove/) (GloVe). *GloVe* is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. For more information consult : *Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation*. COPYRIGHTS file and LICENSE can be found in the *inst* folder of the R package.
This R package has some limitations:
* it works only on a unix OS
* the data file should be big enough for the package-function *Glove* to work properlyTo install the package from Github use the *install_github* function of the devtools package,
```R
devtools::install_github('mlampros/GloveR')
```
Use the following link to report bugs/issues (for the R wrapper),
[https://github.com/mlampros/GloveR/issues](https://github.com/mlampros/GloveR/issues)
#### **Example usage**
```R
# example input data ---> 'dat.txt'
library(GloveR)
#-----------------------------
# vocabulary count computation
#-----------------------------res = vocabulary_counts(train_data = '/data_GloveR/dat.txt', MAX_vocab = 0,
MIN_count = 5, output_vocabulary = '/data_GloveR/VOCAB.txt',
trace = TRUE)
#-------------------------
# cooccurrence statistics
#-------------------------co_mat = cooccurrence_statistics(train_data = '/data_GloveR/dat.txt', vocab_input = '/data_GloveR/VOCAB.txt',
output_cooccurences = '/data_GloveR/COOCUR.bin', symmetric_both = TRUE,
context_words = 15, memory_gb = 4.0, MAX_product = 0, overflowLength = 0,
trace = TRUE)#---------------------------
# shuffling of cooccurrences
#---------------------------shfl = shuffle_cooccurrences(input_cooccurences = '/data_GloveR/COOCUR.bin',
output_cooccurences = '/data_GloveR/COOCUR_output.bin',
memory_gb = 4.0, arraySize = 0, trace = TRUE)
#---------------------------------------
# Global Vectors for Word Representation
#---------------------------------------gl = Glove(input_cooccurences = '/data_GloveR/COOCUR_output.bin',
output_vectors = '/data_GloveR/vectors',
vocab_input = '/data_GloveR/VOCAB.txt',
model_output = 2, iter_num = 5, learn_rate = 0.05,
save_squared_grads_file = NULL, alpha_weight = 0.75,
cutoff = 10, binary_output = 0, vectorSize = 50, threads = 6,
trace = TRUE)```
More information about the parameters of each function can be found in the package documentation.