https://github.com/Mega-DatA-Lab/SpectralLDA

Spectral LDA
https://github.com/Mega-DatA-Lab/SpectralLDA

lda-model

Last synced: 3 months ago
JSON representation

Spectral LDA

Host: GitHub
URL: https://github.com/Mega-DatA-Lab/SpectralLDA
Owner: Mega-DatA-Lab
License: apache-2.0
Created: 2017-01-29T15:42:28.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-06-22T14:17:54.000Z (about 7 years ago)
Last Synced: 2024-05-22T04:21:02.564Z (about 1 year ago)
Topics: lda-model
Language: Python
Homepage:
Size: 486 KB
Stars: 13
Watchers: 4
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

Awesome-MXNet - SpectralLDA

README

# SpectralLDA

**Note: This is the single-host version, for the up-to-date and distributed version please refer to [https://github.com/Mega-DatA-Lab/SpectralLDA-Spark].**

This code implements a Spectral (third order tensor decomposition) learning method for the Latent Dirichlet Allocation model in Python.

The Spectral learning method works with empirical counts of word pair or word triplet from any document in the dataset. We average the counts and put them in tensors. We then perform tensor decomposition to learn the Latent Dirichlet Allocation model. For more details, please refer to `report.pdf` in the repository.

## Usage
Invoke `spectral_lda` with the doc-term count matrix. At output we'd learn `alpha` for the Dirichlet prior parameter, `beta` for the topic-word-distribution, with one topic per column.

```python
# docs is the doc-term count matrix
# alpha0 is the sum of the Dirichlet prior parameter
# k is the rank aka number of topics
from spectral_lda import spectral_lda
alpha, beta = spectral_lda(docs, alpha0=, k=, l1_simplex_proj=False)

# alpha is the learnt Dirichlet prior
# beta is the topic-word-distribution matrix
# with one column per topic
```

By default each column in `beta` may not sum to one, set `l1_simplex_proj=True` to perform post-processing that projects `beta` into the l1-simplex.

## References
Anandkumar, Animashree, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky, Tensor Decompositions for Learning Latent Variable Models.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Mega-DatA-Lab/SpectralLDA

Awesome Lists containing this project

README