https://github.com/da03/lightlda
Distributed LDA, takes raw text as input and outputs topic word table.
https://github.com/da03/lightlda
Last synced: over 1 year ago
JSON representation
Distributed LDA, takes raw text as input and outputs topic word table.
- Host: GitHub
- URL: https://github.com/da03/lightlda
- Owner: da03
- Created: 2016-03-22T01:08:40.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-04-16T18:41:23.000Z (about 10 years ago)
- Last Synced: 2025-03-17T19:11:39.713Z (over 1 year ago)
- Language: C++
- Homepage:
- Size: 19.7 MB
- Stars: 16
- Watchers: 4
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Light LDA
=========
Modified based on MSR's Light LDA, added preprocessing scripts.
Usage (Suppose you are in lightlda/):
```
make
```
```
cd datasets
```
```
tar zxf 20news-train.tgz
```
```
python scripts/pipeline.py etc/params.config
```
Note: parameters are defined in `etc/params.config`. The result is put in `output/model/${timestamp}/snapshot.word_topic_table.${iteration}${client_id}`. By using `python scripts/parse_word_topic_table.py` a visualization can be obtained. The `` is in `output/datablocks/${timestamp}/word_tf.txt`.
Note2: The machine file defined in `etc/params.config` only works on cogito. And the whole pipeline assumes a shared filesystem.