Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-topic-models
✨ Awesome - A curated list of amazing Topic Models (implementations, libraries, and resources)
https://github.com/jonaschn/awesome-topic-models
Last synced: about 21 hours ago
JSON representation
-
Models
-
Topic Models for short documents
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- GPyM_TM - Python implementation of DMM and Poisson model
- jLDADMM - Java implementation using collapsed Gibbs sampling [:page_facing_up:](https://arxiv.org/pdf/1808.03835.pdf)
- BTM - Original C++ implementation using collapsed Gibbs sampling [:page_facing_up:](https://raw.githubusercontent.com/xiaohuiyan/xiaohuiyan.github.io/master/paper/BTM-WWW13.pdf)
- BurstyBTM - Original C++ implementation of the Bursty BTM (BBTM) [:page_facing_up:](https://raw.githubusercontent.com/xiaohuiyan/xiaohuiyan.github.io/master/paper/BBTM-AAAI15.pdf)
- R-BTM - R package wrapping the C++ code from BTM
- STTM - Java implementation and evaluation of DMM, WNTM, PTM, ETM, GPU-DMM, GPU-DPMM, LF-DMM [:page_facing_up:](https://arxiv.org/pdf/1904.07695.pdf)
- SATM - Java implementation of Self-Aggregation Topic Model [:page_facing_up:](https://dl.acm.org/doi/10.5555/2832415.2832564)
- shorttext - Python implementation of various algorithms for Short Text Mining
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
- :page_facing_up:
-
Truncated Singular Value Decomposition (SVD) / Latent Semantic Analysis (LSA) / Latent Semantic Indexing (LSI)
- SVDlibc - C implementation of SVD by Doug Rohde
- sparsesvd - Python wrapper for SVDlibc
- sparsesvd - Python wrapper for SVDlibc
- gensim - Python implementation using multi-pass [randomized SVD solver](https://arxiv.org/pdf/0909.4061.pdf) or a [one-pass merge algorithm](https://rdcu.be/cghAi)
- BIDMach - Scala implementation of a scalable approximate SVD using subspace iteration
- scikit-learn - Python implementation using fast [randomized SVD solver](https://arxiv.org/pdf/0909.4061.pdf) or a “naive” algorithm that uses [ARPACK](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html)
-
Latent Dirichlet Allocation (LDA) [:page_facing_up:](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)
- lda - Python implementation using collapsed Gibbs sampling which follows scikit-learn interface [:page_facing_up:](https://www.pnas.org/content/pnas/101/suppl_1/5228.full.pdf)
- PartiallyCollapsedLDA - Various fast parallelized samplers for LDA, including Partially Collapsed LDA, LightLDA, Partially Collapsed Light LDA and a very efficient Polya-Urn LDA
- topicmodel-lib - Cython library for online/streaming LDA (Online VB, Online CVB0, Online CGS, Online OPE, Online FW, Streaming VB, Streaming OPE, Streaming FW, ML-OPE, ML-CGS, ML-FW)
- jsLDA - JavaScript implementation of LDA topic modeling in the browser
- lda-nodejs - Node.js implementation of LDA topic modeling
- lda-purescript - PureScript, browser-based implementation of LDA topic modeling
- TopicModels.jl - Julia implementation of LDA
- turicreate - C++ [LDA](https://github.com/apple/turicreate/blob/master/userguide/text/README.md) and [aliasLDA](https://apple.github.io/turicreate/docs/api/generated/turicreate.topic_model.create.html) implementation with export to Apple's Core ML for use in iOS, macOS, watchOS, and tvOS apps
- MeTA - C++ implementation of (parallel) collapsed [Gibbs sampling, CVB0 and SCVB](https://meta-toolkit.org/topic-models-tutorial.html)
- Fugue - Java implementation of collapsed Gibbs sampling with slice sampling for hyper-parameter optimization
- GA-LDA - R scripts using Genetic Algorithms (GA) for hyper-paramenter optimization, based on Panichella [:page_facing_up:](https://doi.org/10.1016/j.infsof.2020.106411)
- Search-Based-LDA - R scripts using Genetic Algorithms (GA) for hyper-paramenter optimization by Panichella [:page_facing_up:](https://doi.org/10.1016/j.infsof.2020.106411)
- Dodge - Python tuning tool that ignores redundant tunings [:page_facing_up:](https://arxiv.org/pdf/1902.01838.pdf)
- LDADE - Python tuning tool using differential evolution [:page_facing_up:](https://arxiv.org/pdf/1608.08176.pdf)
- ldatuning - R package to find optimal number of topics for LDA [:page_facing_up:](https://rpubs.com/siri/ldatuning)
- topic_interpretability - Computation of the semantic interpretability of topics produced by topic models [:page_facing_up:](https://aclanthology.org/E14-1056.pdf)
- topic-coherence-sensitivity - Code to compute topic coherence for several topic cardinalities and aggregate scores across them [:page_facing_up:](https://aclanthology.org/N16-1057.pdf)
- topic-model-diversity - A collection of topic diversity measures for topic modeling [:page_facing_up:](https://dl.acm.org/doi/abs/10.1007/978-3-030-80599-9_4)
- FastLDA - C++ implementation of LDA [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/1401890.1401960)
- dmlc - Single-and multi-threaded C++ implementations of [lightLDA](https://arxiv.org/pdf/1412.1576.pdf), [F+LDA](https://arxiv.org/pdf/1412.4986v1.pdf), [AliasLDA](https://dl.acm.org/doi/pdf/10.1145/2623330.2623756), forestLDA and many more
- warpLDA - C++ cache efficient LDA implementation which samples each token in O(1) [:page_facing_up:](https://arxiv.org/pdf/1510.08628.pdf)
- lightLDA - C++ implementation using O(1) Metropolis-Hastings sampling [:page_facing_up:](https://arxiv.org/pdf/1412.1576.pdf)
- AliasLDA - C++ implemenation using Metropolis-Hastings and *alias* method[:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/2623330.2623756)
- Yahoo-LDA - Yahoo!'s topic modelling framework [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/2124295.2124312)
- PLDA+ - Google's C++ implementation using data placement and pipeline processing [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/1961189.1961198)
- Familia - A toolkit for industrial topic modeling (LDA, SentenceLDA and Topical Word Embedding) [:warning:](https://github.com/baidu/Familia/issues/111) [:page_facing_up:](https://arxiv.org/pdf/1707.09823.pdf)
- scikit-learn - Python implementation using online variational Bayes inference [:page_facing_up:](https://proceedings.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf)
- lda-gensim - Python implementation using online variational inference [:page_facing_up:](https://proceedings.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf)
- ldamulticore-gensim - Parallelized Python implementation using online variational inference [:page_facing_up:](https://proceedings.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf)
- CVBLDA-TopicModel4J - Java implementation using collapsed variational Bayesian (CVB) inference [:page_facing_up:](https://papers.nips.cc/paper/2006/file/532b7cbe070a3579f424988a040752f2-Paper.pdf)
- Vowpal Wabbit - C++ implementaion using online variational Bayes inference [:page_facing_up:](https://proceedings.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf)
- Scalable - Scalable Hyperparameter Selection for LDA [:page_facing_up:](https://www.tandfonline.com/doi/full/10.1080/10618600.2020.1741378)
- LDA\* - Tencent's hybrid sampler that uses different samplers for different types of documents in combination with an asymmetric parameter server [:page_facing_up:](http://www.vldb.org/pvldb/vol10/p1406-yu.pdf)
- F+LDA - C++ implementation of F+LDA using an appropriately modified Fenwick tree [:page_facing_up:](https://arxiv.org/pdf/1412.4986v1.pdf)
- GS-LDA-BIDMach - CPU and GPU-accelerated Scala implementation using Gibbs sampling
- VB-LDA-BIDMach - CPU and GPU-accelerated Scala implementation using online variational Bayes inference
- SparseLDA - Java algorithm and data structure for evaluating Gibbs sampling distributions used in Mallet [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/1557019.1557121)
- SaberLDA - GPU-based system that implements a sparsity-aware algorithm to achieve sublinear time complexity
-
Hierarchical Dirichlet Process (HDP) [:page_facing_up:](https://papers.nips.cc/paper/2004/file/fb4ab556bc42d6f0ee0f9e24ec4d1af0-Paper.pdf)
- hca - C implementation using Gibbs sampling with/without burstiness modelling
- bnp - Cython reimplementation based on *online-hdp* following scikit-learn's API.
- tomotopy - Python extension for C++ implementation using Gibbs sampling [:page_facing_up:](https://www.jmlr.org/papers/volume10/newman09a/newman09a.pdf)
- Mallet - Java-based package for topic modeling using Gibbs sampling
- TopicModel4J - Java implementation using Gibbs sampling based on Chinese restaurant franchise metaphor
- gensim - Python implementation using online variational inference [:page_facing_up:](http://proceedings.mlr.press/v15/wang11a/wang11a.pdf)
- Scalable HDP - interesting paper
-
Hierarchical LDA (hLDA) [:page_facing_up:](https://dl.acm.org/doi/10.5555/2981345.2981348)
-
Dynamic Topic Model (DTM) [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/1143844.1143859)
- FastDTM - Scalable C++ implementation using Gibbs sampling with Stochastic Gradient Langevin Dynamics (MCMC-based) [:page_facing_up:](https://arxiv.org/pdf/1602.06049.pdf)
- ldaseqmodel-gensim - Python implementation using online variational inference [:page_facing_up:](https://proceedings.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf)
- tca - C implementation using Gibbs sampling with/without burstiness modelling [:page_facing_up:](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.705.1649&rep=rep1&type=pdf)
-
Miscellaneous topic models
- BigTopicModel - C++ engine for running large-scale MedLDA models [:page_facing_up:](https://dl.acm.org/doi/10.1145/2487575.2487658)
- YWWTools - Java-based package for various topic models by Weiwei Yang
- trLDA - Python implementation of streaming LDA based on trust-regions [:page_facing_up:](http://proceedings.mlr.press/v37/theis15.pdf)
- Logistic LDA - Tensorflow implementation of Discriminative Topic Modeling with Logistic LDA [:page_facing_up:](https://proceedings.neurips.cc/paper/2019/file/54ebdfbbfe6c31c39aaba9a1ee83860a-Paper.pdf)
- EnsTop - Python implementation of *ENS*emble *TOP*ic modelling with pLSA
- discLDA - C++ implementation of discLDA based on GibbsLDA++ [:page_facing_up:](https://papers.nips.cc/paper/2008/file/7b13b2203029ed80337f27127a9f1d28-Paper.pdf)
- GuidedLDA - Python implementation that can be guided by setting some seed words per topic (using Gibbs sampling) [:page_facing_up:](https://www.aclweb.org/anthology/E12-1021.pdf)
- seededLDA - R package that implements seeded-LDA for semi-supervised topic modeling
- keyATM - R package for Keyword Assisted Topic Models.
- BayesPA - Python interface for streaming implementation of MedLDA, maximum entropy discrimination LDA (max-margin supervised topic model) [:page_facing_up:](http://proceedings.mlr.press/v32/shi14.pdf)
- DAPPER - Python implementation of Dynamic Author Persona (DAP) topic model [:page_facing_up:](https://arxiv.org/pdf/1811.01931.pdf)
- ToT - Python implementation of Topics Over Time (A Non-Markov Continuous-Time Model of Topical Trends) [:page_facing_up:](https://dl.acm.org/doi/10.1145/1150402.1150450)
- MLTM - C implementation of multilabel topic model (MLTM) [:page_facing_up:](https://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00939)
- Entropy-Based Topic Modeling - Java implementation of Entropy-Based Topic Modeling on Multiple Domain-Specific Text Collections
- Dual-Sparse Topic Model - implemented in TopicModel4J using collapsed variational Bayes inference [:page_facing_up:](https://dl.acm.org/doi/10.1145/2566486.2567980)
- sailing-pmls - Parallel LDA and medLDA implementation
- sequence-models - Java implementation of block HMM and the mixed membership Markov model (M4)
-
Embedding based Topic Models
- D-ETM - Dynamic Embedded Topic Model [:page_facing_up:](https://arxiv.org/pdf/1907.05545.pdf)
- BERTopic - BERTopic supports guided, (semi-) supervised, and dynamic topic modeling and visualization [:page_facing_up:](https://arxiv.org/pdf/2203.05794.pdf)
- CTM - CTMs combine contextualized embeddings (e.g., BERT) with topic models
- ETM - Embedded Topic Model [:page_facing_up:](https://arxiv.org/pdf/1907.04907.pdf)
- ProdLDA - Original TensorFlow implementation of Autoencoding Variational Inference (AEVI) for Topic Models [:page_facing_up:](https://arxiv.org/pdf/1703.01488.pdf)
- pytorch-ProdLDA - PyTorch implementation of ProdLDA [:page_facing_up:](https://arxiv.org/pdf/1703.01488.pdf)
- CatE - Discriminative Topic Mining via Category-Name Guided Text Embedding [:page_facing_up:](https://arxiv.org/pdf/1908.07162.pdf)
- Top2Vec - Python implementation that learns jointly embedded topic, document and word vectors [:page_facing_up:](https://arxiv.org/pdf/2008.09470.pdf)
- G-LDA - Java implementation of Gaussian LDA using word embeddings [:page_facing_up:](https://www.aclweb.org/anthology/P15-1077.pdf)
- MetaLDA - Java implementation using Gibbs sampling that leverages document metadata and word embeddings [:page_facing_up:](https://arxiv.org/pdf/1709.06365.pdf)
- LFTM - Java implementation of latent feature topic models (improving LDA and DMM with word embeddings) [:page_facing_up:](https://www.aclweb.org/anthology/Q15-1022.pdf)
- CorEx - Recover latent factors with Correlation Explanation (CorEx) [:page_facing_up:](https://arxiv.org/pdf/1406.1222.pdf)
- Anchored CorEx - Hierarchical Topic Modeling with Minimal Domain Knowledge [:page_facing_up:](https://arxiv.org/pdf/1611.10277.pdf)
- Linear CorEx - Latent Factor Models Based on Linear Total CorEx [:page_facing_up:](https://arxiv.org/pdf/1706.03353v3.pdf)
- lda2vec - Mixing dirichlet topic models and word embeddings to make lda2vec [:page_facing_up:](https://arxiv.org/pdf/1605.02019.pdf)
- lda2vec-pytorch - PyTorch implementation of lda2vec
- MG-LDA - Python implementation of (Multi-lingual) Gaussian LDA [:page_facing_up:](https://raw.githubusercontent.com/EliasKB/Multilingual-Gaussian-Latent-Dirichlet-Allocation-MGLDA/master/MGLDA.pdf)
-
Labeled Latent Dirichlet Allocation (LLDA, Labeled-LDA, L-LDA) [:page_facing_up:](https://www.aclweb.org/anthology/D09-1026.pdf)
- topbox - Python wrapper for labeled LDA implementation of *Stanford TMT*
- Labeled-LDA-Python - Python implementation (easy to use, does not scale)
- JGibbLabeledLDA - Java implementation based on the popular [JGibbLDA](jgibblda.sourceforge.net) package
- Mallet - Java implementation using Gibbs sampling [:page_facing_up:](http://www.mimno.org/articles/labelsandpatterns)
- gensims_mallet_wrapper - Python wrapper for Mallet using gensim interface
-
Supervised LDA (sLDA) [:page_facing_up:](https://papers.nips.cc/paper/2007/file/d56b9fc4b0f1be8871f5e1c40c0067e7-Paper.pdf)
- slda - Cython implementation of Gibbs sampling for LDA and various sLDA variants
-
Exotic models
- PTM - Prescription Topic Model for Traditional Chinese Medicine Prescriptions [:page_facing_up:](https://ieeexplore.ieee.org/abstract/document/8242679) (interesting benchmark models)
- TEM - Topic Expertise Model [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/2505515.2505720)
- KGE-LDA - Knowledge Graph Embedding LDA [:page_facing_up:](https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/viewFile/14170/14086)
- LDA-SP - A Latent Dirichlet Allocation Method for Selectional Preferences [:page_facing_up:](https://www.aclweb.org/anthology/P10-1044.pdf)
- LDA+FFT - LDA and FFTs (Fast and Frugal Trees) for better comprehensibility [:page_facing_up:](https://arxiv.org/pdf/1804.10657.pdf)
-
Relational Topic Model (RTM)
- Constrained-RTM - Java implementation of Contrained RTM [:page_facing_up:](https://doi.org/10.1016/j.ins.2019.09.039)
-
Non-Negative Matrix Factorization (NMF or NNMF)
- scikit-learn - Python implementation using a [coordinate descent](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.6398&rep=rep1&type=pdf) or a [multiplicative update](https://arxiv.org/pdf/1010.1763.pdf) solver
- gensim - Python implementation of [online NMF](https://arxiv.org/pdf/1604.02634.pdf)
- BIDMach - CPU and GPU-accelerated Scala implementation with L2 loss
-
Author-topic Model (ATM) [:page_facing_up:](https://arxiv.org/pdf/1207.4169.pdf)
- gensim - Python implementation with online training (constant in memory w.r.t. the number of documents)
-
-
Libraries & Toolkits
- scikit-learn - Python library for machine learning ![GitHub Repo stars](https://img.shields.io/github/stars/scikit-learn/scikit-learn?style=social)
- OCTIS - Python package to integrate, optimize and evaluate topic models ![GitHub Repo stars](https://img.shields.io/github/stars/MIND-Lab/OCTIS?style=social)
- tmtoolkit - Python topic modeling toolkit with parallel processing power ![GitHub Repo stars](https://img.shields.io/github/stars/WZBSocialScienceCenter/tmtoolkit?style=social)
- BIDMach - CPU and GPU-accelerated machine learning library ![GitHub Repo stars](https://img.shields.io/github/stars/BIDData/BIDMach?style=social)
- BigARTM - Fast topic modeling platform ![GitHub Repo stars](https://img.shields.io/github/stars/bigartm/bigartm?style=social)
- TopicNet - A high-level Python interface for BigARTM library ![GitHub Repo stars](https://img.shields.io/github/stars/machine-intelligence-laboratory/TopicNet?style=social)
- RMallet - R package to interface with the Java machine learning tool MALLET ![GitHub Repo stars](https://img.shields.io/github/stars/mimno/RMallet?style=social)
- R-lda - R package for topic modelling (LDA, sLDA, corrLDA, etc.) ![GitHub Repo stars](https://img.shields.io/github/stars/slycoder/R-lda?style=social)
- topicmodels - R package with interface to C code for LDA and CTM ![GitHub Repo stars](https://img.shields.io/github/stars/cran/topicmodels?style=social)
- lda++ - C++ library for LDA and (fast) supervised LDA (sLDA/fsLDA) using variational inference ![GitHub Repo stars](https://img.shields.io/github/stars/angeloskath/supervised-lda?style=social)
- gensim - Python library for topic modelling ![GitHub Repo stars](https://img.shields.io/github/stars/RaRe-Technologies/gensim?style=social)
- tomoto - Ruby extension for Gibbs sampling based *tomoto* which is written in C++ ![GitHub Repo stars](https://img.shields.io/github/stars/ankane/tomoto?style=social)
- stm - R package for the Structural Topic Model ![GitHub Repo stars](https://img.shields.io/github/stars/bstewart/stm?style=social)
-
Research Implementations
-
Embedding based Topic Models
- hLDA - C implementation of hierarchical LDA by David Blei
- ctm-c - C implementation of the correlated topic model by David Blei
- sLDA - C++ implementation of supervised topic models with a categorical response.
- lda-c - C implementation using variational EM by David Blei
- onlineldavb - Python online variational Bayes implementation by Matthew Hoffman [:page_facing_up:](https://proceedings.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf)
- HDP - C++ implementation of hierarchical Dirichlet processes by Chong Wang
- online-hdp - Python implementation of online hierarchical Dirichlet processes by Chong Wang
- ctr - C++ implementation of collaborative topic models by Chong Wang
- dtm - C implementation of dynamic topic models by David Blei & Sean Gerrish
- diln - C implementation of Discrete Infinite Logistic Normal (with HDP option) by John Paisley
- turbotopics - Python implementation that finds significant multiword phrases in topics by David Blei
- LDAGibbs - Java implementation of LDA using Gibbs sampling by Liu Yang
- cvbLDA - Python C extension implementation of collapsed variational Bayesian inference for LDA
- fast - A Fast And Scalable Topic-Modeling Toolbox (Fast-LDA, CVB0) by Arthur Asuncion and colleagues [:page_facing_up:](https://arxiv.org/pdf/1205.2662.pdf)
-
-
Popular Implementations (but not maintained anymore)
-
Embedding based Topic Models
- Matlab Topic Modeling Toolbox - Matlab implementations of LDA, ATM, HMM-LDA, LDA-COL (Collocation) models by Mark Steyvers and Tom Griffiths
- :fork_and_knife:
- Mr.LDA - Scalable Topic Modeling using Variational Inference in MapReduce [:page_facing_up:](https://dl.acm.org/doi/10.1145/2187836.2187955)
- GibbsLDA++ - C++ implementation using Gibbs sampling [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/1367497.1367510)
- JGibbLDA - Java implementation using Gibbs sampling
- Stanford Topic Modeling Toolbox - Scala implementation of LDA, labeledLDA, PLDA, PLDP by Daniel Ramage and Evan Rosen
-
-
Learning Implementations (hopefully easy to understand)
-
Embedding based Topic Models
- Topic-Model - Python implementation of LDA, Labeled LDA, ATM, Temporal Author-Topic Model using Gibbs sampling
- topic_models - Python implementation of LSA, PLSA and LDA
-
-
Probabilistic Programming Languages (PPL) (a.k.a. Build your own Topic Model)
-
Embedding based Topic Models
- Stan - Platform for statistical modeling and high-performance statistical computation, e.g., [LDA](https://mc-stan.org/docs/2_26/stan-users-guide/latent-dirichlet-allocation.html) [:page_facing_up:](https://files.eric.ed.gov/fulltext/ED590311.pdf)
- Turing.jl - Julia library for general-purpose probabilistic programming [:page_facing_up:](http://proceedings.mlr.press/v84/ge18b/ge18b.pdf)
- TFP - Probabilistic reasoning and statistical analysis in TensorFlow, e.g., [LDA](https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/latent_dirichlet_allocation_distributions.py) [:page_facing_up:](https://arxiv.org/pdf/2001.11819.pdf)
- edward2 - Simple PPL with core utilities in the NumPy and TensorFlow ecosystem [:page_facing_up:](https://arxiv.org/pdf/1811.02091.pdf)
- pyro - PPL built on PyTorch, e.g., [prodLDA](http://pyro.ai/examples/prodlda.html) [:page_facing_up:](https://www.jmlr.org/papers/volume20/18-403/18-403.pdf)
- edward - A PPL built on TensorFlow, e.g., [LDA](http://edwardlib.org/iclr2017?Figure%2011.%20Latent%20Dirichlet%20allocation) [:page_facing_up:](https://arxiv.org/pdf/1610.09787.pdf)
- ZhuSuan - A PPL for Bayesian deep learning, generative models, built on Tensorflow, e.g., [LDA](https://zhusuan.readthedocs.io/en/latest/tutorials/lntm.html) [:page_facing_up:](https://arxiv.org/pdf/1709.05870.pdf)
-
-
Visualizations
-
Embedding based Topic Models
- LDAvis - R package for interactive topic model visualization
- pyLDAvis - Python library for interactive topic model visualization
- scalaLDAvis - Scala port of pyLDAvis
- dtmvisual - Python package for visualizing DTM (trained with gensim)
- TMVE online - Online Django variant of topic model visualization engine (*TMVE*)
- TMVE - Original topic model visualization engine (LDA trained with *lda-c*) [:page_facing_up:](https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/viewFile/4645/5021)
- wordcloud - Python package for visualizing topics via word_cloud
- Mallet-GUI - GUI for creating and analyzing topic models produced by MALLET
- TWiC - Topic Words in Context is a highly-interactive, browser-based visualization for MALLET topic models
- dfr-browser - Explore Mallet's topic models of texts in a web browser
- Termite - Explore topic models using term-topic matrix, group-in-a-box visualization or scatter plot.
- Topics - Python library for topic modeling and visualization
- TopicsExplorer - Explore your own text collection with a topic model – without prior knowledge [:page_facing_up:](https://dh2018.adho.org/a-graphical-user-interface-for-lda-topic-modeling)
- topicApp - A Simple Shiny App for Topic Modeling
- stminsights - A Shiny Application for Inspecting Structural Topic Models
- topicmodel-lib - Python wrapper for TMVE for visualizing LDA (trained with topicmodel-lib)
-
-
Dirichlet hyperparameter optimization techniques
-
Embedding based Topic Models
- fastfit
- dirichlet
- lightspeed
- Slice sampling
- Minka
- lecture-notes
- Newton-Raphson Method
- fixed-point iteration - Wallach's PhD thesis, chapter 2.3
-
-
Resources
-
Embedding based Topic Models
- David Blei - David Blei's Homepage with introductory materials
-
-
Related awesome lists
-
Embedding based Topic Models
-
Programming Languages
Categories
Models
137
Visualizations
16
Research Implementations
14
Libraries & Toolkits
13
Dirichlet hyperparameter optimization techniques
8
Probabilistic Programming Languages (PPL) (a.k.a. Build your own Topic Model)
7
Popular Implementations (but not maintained anymore)
6
Related awesome lists
3
Learning Implementations (hopefully easy to understand)
2
Resources
1
License
1
Sub Categories
Embedding based Topic Models
75
Latent Dirichlet Allocation (LDA) [:page_facing_up:](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)
38
Topic Models for short documents
31
Miscellaneous topic models
17
Hierarchical Dirichlet Process (HDP) [:page_facing_up:](https://papers.nips.cc/paper/2004/file/fb4ab556bc42d6f0ee0f9e24ec4d1af0-Paper.pdf)
7
Truncated Singular Value Decomposition (SVD) / Latent Semantic Analysis (LSA) / Latent Semantic Indexing (LSI)
6
Exotic models
5
Labeled Latent Dirichlet Allocation (LLDA, Labeled-LDA, L-LDA) [:page_facing_up:](https://www.aclweb.org/anthology/D09-1026.pdf)
5
Dynamic Topic Model (DTM) [:page_facing_up:](https://dl.acm.org/doi/pdf/10.1145/1143844.1143859)
3
Non-Negative Matrix Factorization (NMF or NNMF)
3
Hierarchical LDA (hLDA) [:page_facing_up:](https://dl.acm.org/doi/10.5555/2981345.2981348)
2
Author-topic Model (ATM) [:page_facing_up:](https://arxiv.org/pdf/1207.4169.pdf)
1
Supervised LDA (sLDA) [:page_facing_up:](https://papers.nips.cc/paper/2007/file/d56b9fc4b0f1be8871f5e1c40c0067e7-Paper.pdf)
1
Relational Topic Model (RTM)
1
Keywords
topic-modeling
47
machine-learning
28
lda
17
python
15
nlp
13
data-science
11
natural-language-processing
10
topic-models
9
deep-learning
8
latent-dirichlet-allocation
8
text-mining
7
probabilistic-programming
6
word-embeddings
5
gibbs-sampling
5
r
5
statistics
5
tensorflow
4
bayesian-methods
4
bayesian-inference
4
short-text
4
bayesian
4
visualization
4
digital-humanities
4
variational-inference
3
bert
3
neural-networks
3
information-theory
3
unsupervised-learning
3
data-analysis
3
hyperparameter-optimization
3
hyperparameter-tuning
3
topic-modelling
3
clustering
3
text-analytics
2
evaluation-metrics
2
mallet
2
java
2
text-classification
2
text-analysis
2
topic
2
neural-topic-models
2
information-retrieval
2
embeddings
2
graphical-models
2
nlp-library
2
classification
2
c-plus-plus
2
python-library
2
neural-network
2
nlp-machine-learning
2