Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/joewandy/hlda
Gibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model
https://github.com/joewandy/hlda
gibbs-sampler hierarchical-topic-models lda topic-hierarchies topic-modeling
Last synced: about 1 month ago
JSON representation
Gibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model
- Host: GitHub
- URL: https://github.com/joewandy/hlda
- Owner: joewandy
- License: gpl-3.0
- Created: 2016-09-29T23:13:42.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T08:01:46.000Z (about 2 years ago)
- Last Synced: 2024-09-23T04:39:07.339Z (3 months ago)
- Topics: gibbs-sampler, hierarchical-topic-models, lda, topic-hierarchies, topic-modeling
- Language: Jupyter Notebook
- Size: 5.74 MB
- Stars: 146
- Watchers: 6
- Forks: 38
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-topic-models - hlda - Python package based on *Mallet's* Gibbs sampler having a fixed depth on the nCRP tree (Models / Hierarchical LDA (hLDA) [:page_facing_up:](https://dl.acm.org/doi/10.5555/2981345.2981348))
README
Hierarchical Latent Dirichlet Allocation
----------------------------------------**Note: this repository should only be used for education purpose. For production use, I'd recommend using https://github.com/bab2min/tomotopy which is more production-ready**
---
Hierarchical Latent Dirichlet Allocation (hLDA) addresses the problem of learning topic hierarchies from data. The model relies on a non-parametric prior called the nested Chinese restaurant process, which allows for arbitrarily large branching factors and readily accommodates growing
data collections. The hLDA model combines this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation.[Hierarchical Topic Models and the Nested Chinese Restaurant Process](http://www.cs.columbia.edu/~blei/papers/BleiGriffithsJordanTenenbaum2003.pdf)
[The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies](http://cocosci.berkeley.edu/tom/papers/ncrp.pdf)
Implementation
--------------- [hlda/sampler.py](hlda/sampler.py) is the Gibbs sampler for hLDA inference, based on the implementation from [Mallet](http://mallet.cs.umass.edu/topics.php) having a fixed depth on the nCRP tree.
Installation
------------- Simply use `pip install hlda` to install the package.
- An example notebook that infers the hierarchical topics on the BBC Insight corpus can be found in [notebooks/bbc_test.ipynb](notebooks/bbc_test.ipynb).