https://github.com/nzw0301/lightLDA

fast sampling algorithm based on CGS
https://github.com/nzw0301/lightLDA

lda machine-learning nlp python topic-modeling

Last synced: 6 months ago
JSON representation

fast sampling algorithm based on CGS

Host: GitHub
URL: https://github.com/nzw0301/lightLDA
Owner: nzw0301
Created: 2016-11-05T13:33:28.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2020-03-20T14:09:10.000Z (over 5 years ago)
Last Synced: 2025-03-30T15:51:15.647Z (7 months ago)
Topics: lda, machine-learning, nlp, python, topic-modeling
Language: Python
Homepage:
Size: 7.81 KB
Stars: 50
Watchers: 3
Forks: 16
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # LightLDA.py

This repo is python reimplementation of [lightLDA](https://github.com/Microsoft/LightLDA).

LightLDA is a topic scalable latent dirichlet allocation (LDA) algorithm that is proposed in [WWW paper](http://www.www2015.it/documents/proceedings/proceedings/p1351.pdf).

## Examples

```bash

$ python lightlda

# Word distributions per latent class

## φ of latent class of 0

word: probability

安原絵麻: 0.346

SHIROBAKO: 0.266

万策尽きた: 0.161

佳村はるか: 0.108

武蔵野: 0.108

城ヶ崎美嘉: 0.003

デレマス: 0.003

城ヶ崎莉嘉: 0.003

カブトムシ: 0.003

## φ of latent class of 1

word: probability

城ヶ崎美嘉: 0.357

デレマス: 0.239

佳村はるか: 0.180

城ヶ崎莉嘉: 0.121

カブトムシ: 0.091

安原絵麻: 0.003

SHIROBAKO: 0.003

万策尽きた: 0.003

武蔵野: 0.003

# Topic distributions per document

## Topic information of document 0

Propotion of topics

topic: θ_{document_id, latent_class}

0: 0.001

1: 0.999

Assigned latent class per word

word: latent class

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

デレマス: 1

デレマス: 1

佳村はるか: 1

佳村はるか: 1

佳村はるか: 1

--------------

## Topic information of document 1

Propotion of topics

topic: θ_{document_id, latent_class}

0: 0.001

1: 0.999

Assigned latent class per word

word: latent class

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

佳村はるか: 1

デレマス: 1

デレマス: 1

城ヶ崎莉嘉: 1

城ヶ崎莉嘉: 1

カブトムシ: 1

--------------

## Topic information of document 2

Propotion of topics

topic: θ_{document_id, latent_class}

0: 0.001

1: 0.999

Assigned latent class per word

word: latent class

城ヶ崎美嘉: 1

城ヶ崎美嘉: 1

佳村はるか: 1

佳村はるか: 1

デレマス: 1

デレマス: 1

デレマス: 1

デレマス: 1

城ヶ崎莉嘉: 1

城ヶ崎莉嘉: 1

カブトムシ: 1

カブトムシ: 1

--------------

## Topic information of document 3

Propotion of topics

topic: θ_{document_id, latent_class}

0: 0.999

1: 0.001

Assigned latent class per word

word: latent class

安原絵麻: 0

安原絵麻: 0

安原絵麻: 0

佳村はるか: 0

佳村はるか: 0

SHIROBAKO: 0

SHIROBAKO: 0

万策尽きた: 0

万策尽きた: 0

--------------

## Topic information of document 4

Propotion of topics

topic: θ_{document_id, latent_class}

0: 0.999

1: 0.001

Assigned latent class per word

word: latent class

安原絵麻: 0

安原絵麻: 0

安原絵麻: 0

佳村はるか: 0

SHIROBAKO: 0

SHIROBAKO: 0

武蔵野: 0

武蔵野: 0

万策尽きた: 0

--------------

## Topic information of document 5

Propotion of topics

topic: θ_{document_id, latent_class}

0: 0.999

1: 0.001

Assigned latent class per word

word: latent class

安原絵麻: 0

安原絵麻: 0

安原絵麻: 0

安原絵麻: 0

安原絵麻: 0

安原絵麻: 0

安原絵麻: 0

佳村はるか: 0

SHIROBAKO: 0

SHIROBAKO: 0

SHIROBAKO: 0

SHIROBAKO: 0

SHIROBAKO: 0

SHIROBAKO: 0

万策尽きた: 0

万策尽きた: 0

万策尽きた: 0

武蔵野: 0

武蔵野: 0

--------------

```

## Reference

Yuan, Jinhui and Gao, Fei and Ho, Qirong and Dai, Wei and Wei, Jinliang and Zheng, Xun and Xing, Eric Po and Liu, Tie-Yan and Ma, Wei-Ying. LightLDA: Big Topic Models on Modest Computer Clusters.

 In _WWW_, 2015.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nzw0301/lightLDA

Awesome Lists containing this project

README