Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wellecks/online_lda_python
Online LDA using Hoffman's Python Implementation
https://github.com/wellecks/online_lda_python
Last synced: 2 months ago
JSON representation
Online LDA using Hoffman's Python Implementation
- Host: GitHub
- URL: https://github.com/wellecks/online_lda_python
- Owner: wellecks
- License: gpl-3.0
- Created: 2014-10-27T01:00:00.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2014-11-14T23:25:27.000Z (about 10 years ago)
- Last Synced: 2024-04-28T04:30:25.917Z (8 months ago)
- Language: Python
- Size: 273 KB
- Stars: 16
- Watchers: 2
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
online_lda_python
=================Online LDA using Hoffman's Python Implementation.
Usage
```bash
$ python online_lda.py -h
usage: online_lda.py [-h] [-o OUTDIR] [-b BATCHSIZE] [-d NUM_DOCS]
[-k NUM_TOPICS] [-t TAU_0] [-l KAPPA] [-m MODEL_OUT_FREQ]
dataset vocab_filepositional arguments:
dataset Input dataset filename.
vocab_file Vocabulary filename.optional arguments:
-h, --help show this help message and exit
-o OUTDIR, --outdir OUTDIR
Directory to place output files. (default='')
-b BATCHSIZE, --batchsize BATCHSIZE
Batch size. (default=256)
-d NUM_DOCS, --num_docs NUM_DOCS
Total # docs in dataset. (default=7990787)
-k NUM_TOPICS, --num_topics NUM_TOPICS
Number of topics. (default=100)
-t TAU_0, --tau_0 TAU_0
Tau learning parameter to downweight early documents
(default=1024)
-l KAPPA, --kappa KAPPA
Kappa learning parameter; decay factor for influence
of batches.(default=0.7)
-m MODEL_OUT_FREQ, --model_out_freq MODEL_OUT_FREQ
Number of iterations interval for outputting a model
file. (default=10000)
```Used in the blog post http://wellecks.wordpress.com/2014/10/26/ldaoverflow-with-online-lda/