Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/teramonagi/scdv
SCDV (Sparse Composite Document Vectors) implementation in R
https://github.com/teramonagi/scdv
r
Last synced: 3 months ago
JSON representation
SCDV (Sparse Composite Document Vectors) implementation in R
- Host: GitHub
- URL: https://github.com/teramonagi/scdv
- Owner: teramonagi
- License: mit
- Created: 2019-02-11T09:53:35.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-04-01T23:05:36.000Z (over 5 years ago)
- Last Synced: 2024-05-21T02:11:59.468Z (6 months ago)
- Topics: r
- Language: R
- Size: 205 KB
- Stars: 4
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r setting, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
message = FALSE,
comment = "#>",
fig.path = "man/figures/README-"
)
```# scdv
An R package for SCDV (Sparse Composite Document Vectors) algorithm[![Travis-CI Build
Status](https://api.travis-ci.com/teramonagi/scdv.svg?branch=master)](https://travis-ci.com/teramonagi/scdv)## Installation
```r
# Wait for a while...
# install.packages("scdv")# The development version from GitHub:
# install.packages("devtools")
devtools::install_github("teramonagi/scdv")
```## Example
### Get (sample) data and do pre-processing
```{r data, cache=TRUE}
library(scdv)
# Get example document from Project Gutenberg (http://www.gutenberg.org/wiki/Main_Page)
urls <- c(
"http://www.gutenberg.org/files/98/98-0.txt",
"http://www.gutenberg.org/files/1342/1342-0.txt"
)
x <- purrr::map(urls, ~ httr::content(httr::GET(.x)))
# pre-processing for each document
doc <- purrr::map(x, ~ tokenizers::tokenize_words(.x, stopwords = stopwords::stopwords("en"))[[1]])
doc[[1]][1:10]
```### Calculate SCDV(Sparse Composite Document Vector)
```{r scdv, cache=TRUE}
# Set the number of cluster (k), and the word2vec dimension (dimension)
k <- 10
dimension <- 30
# Calculate Sparse Composite Document Vector
dv <- scdv::scdv(doc, k, dimension, word2vec_args = list(show_by=25))
```### Calculate embedding expression by word2vec and visualize these
```{r w2v, cache=FALSE}
# Calculate embedding expression by word2vec
wv <- scdv::word2vec(doc, dimension, args = list(show_by=25))
``````{r w2v_visualize}
# Sample row and visualize
scdv::visualize(wv[sample(nrow(wv), size = 10), ])
# You can also visualize document vecotr like
#scdv::visualize(dv)
```