An open API service indexing awesome lists of open source software.

https://github.com/da03/lightlda

Distributed LDA, takes raw text as input and outputs topic word table.
https://github.com/da03/lightlda

Last synced: over 1 year ago
JSON representation

Distributed LDA, takes raw text as input and outputs topic word table.

Awesome Lists containing this project

README

          

Light LDA
=========
Modified based on MSR's Light LDA, added preprocessing scripts.

Usage (Suppose you are in lightlda/):

```
make
```

```
cd datasets
```

```
tar zxf 20news-train.tgz
```

```
python scripts/pipeline.py etc/params.config
```

Note: parameters are defined in `etc/params.config`. The result is put in `output/model/${timestamp}/snapshot.word_topic_table.${iteration}${client_id}`. By using `python scripts/parse_word_topic_table.py` a visualization can be obtained. The `` is in `output/datablocks/${timestamp}/word_tf.txt`.

Note2: The machine file defined in `etc/params.config` only works on cogito. And the whole pipeline assumes a shared filesystem.