Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nzw0301/lightlda
fast sampling algorithm based on CGS
https://github.com/nzw0301/lightlda
lda machine-learning nlp python topic-modeling
Last synced: 24 days ago
JSON representation
fast sampling algorithm based on CGS
- Host: GitHub
- URL: https://github.com/nzw0301/lightlda
- Owner: nzw0301
- Created: 2016-11-05T13:33:28.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2020-03-20T14:09:10.000Z (over 4 years ago)
- Last Synced: 2024-10-04T12:14:25.336Z (about 1 month ago)
- Topics: lda, machine-learning, nlp, python, topic-modeling
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 50
- Watchers: 4
- Forks: 16
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LightLDA.py
This repo is python reimplementation of [lightLDA](https://github.com/Microsoft/LightLDA).
LightLDA is a topic scalable latent dirichlet allocation (LDA) algorithm that is proposed in [WWW paper](http://www.www2015.it/documents/proceedings/proceedings/p1351.pdf).
## Examples
```bash
$ python lightlda# Word distributions per latent class
## φ of latent class of 0
word: probability
安原絵麻: 0.346
SHIROBAKO: 0.266
万策尽きた: 0.161
佳村はるか: 0.108
武蔵野: 0.108
城ヶ崎美嘉: 0.003
デレマス: 0.003
城ヶ崎莉嘉: 0.003
カブトムシ: 0.003## φ of latent class of 1
word: probability
城ヶ崎美嘉: 0.357
デレマス: 0.239
佳村はるか: 0.180
城ヶ崎莉嘉: 0.121
カブトムシ: 0.091
安原絵麻: 0.003
SHIROBAKO: 0.003
万策尽きた: 0.003
武蔵野: 0.003# Topic distributions per document
## Topic information of document 0
Propotion of topics
topic: θ_{document_id, latent_class}
0: 0.001
1: 0.999Assigned latent class per word
word: latent class
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
デレマス: 1
デレマス: 1
佳村はるか: 1
佳村はるか: 1
佳村はるか: 1
--------------## Topic information of document 1
Propotion of topics
topic: θ_{document_id, latent_class}
0: 0.001
1: 0.999Assigned latent class per word
word: latent class
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
佳村はるか: 1
デレマス: 1
デレマス: 1
城ヶ崎莉嘉: 1
城ヶ崎莉嘉: 1
カブトムシ: 1
--------------## Topic information of document 2
Propotion of topics
topic: θ_{document_id, latent_class}
0: 0.001
1: 0.999Assigned latent class per word
word: latent class
城ヶ崎美嘉: 1
城ヶ崎美嘉: 1
佳村はるか: 1
佳村はるか: 1
デレマス: 1
デレマス: 1
デレマス: 1
デレマス: 1
城ヶ崎莉嘉: 1
城ヶ崎莉嘉: 1
カブトムシ: 1
カブトムシ: 1
--------------## Topic information of document 3
Propotion of topics
topic: θ_{document_id, latent_class}
0: 0.999
1: 0.001Assigned latent class per word
word: latent class
安原絵麻: 0
安原絵麻: 0
安原絵麻: 0
佳村はるか: 0
佳村はるか: 0
SHIROBAKO: 0
SHIROBAKO: 0
万策尽きた: 0
万策尽きた: 0
--------------## Topic information of document 4
Propotion of topics
topic: θ_{document_id, latent_class}
0: 0.999
1: 0.001Assigned latent class per word
word: latent class
安原絵麻: 0
安原絵麻: 0
安原絵麻: 0
佳村はるか: 0
SHIROBAKO: 0
SHIROBAKO: 0
武蔵野: 0
武蔵野: 0
万策尽きた: 0
--------------## Topic information of document 5
Propotion of topics
topic: θ_{document_id, latent_class}
0: 0.999
1: 0.001Assigned latent class per word
word: latent class
安原絵麻: 0
安原絵麻: 0
安原絵麻: 0
安原絵麻: 0
安原絵麻: 0
安原絵麻: 0
安原絵麻: 0
佳村はるか: 0
SHIROBAKO: 0
SHIROBAKO: 0
SHIROBAKO: 0
SHIROBAKO: 0
SHIROBAKO: 0
SHIROBAKO: 0
万策尽きた: 0
万策尽きた: 0
万策尽きた: 0
武蔵野: 0
武蔵野: 0
--------------
```## Reference
Yuan, Jinhui and Gao, Fei and Ho, Qirong and Dai, Wei and Wei, Jinliang and Zheng, Xun and Xing, Eric Po and Liu, Tie-Yan and Ma, Wei-Ying. LightLDA: Big Topic Models on Modest Computer Clusters.
In _WWW_, 2015.