{"id":13661295,"url":"https://github.com/trinker/topicmodels_learning","last_synced_at":"2025-03-16T13:31:24.349Z","repository":{"id":146664317,"uuid":"48293781","full_name":"trinker/topicmodels_learning","owner":"trinker","description":"A repository of learning \u0026 R resources related to topic models ","archived":false,"fork":false,"pushed_at":"2016-01-30T22:10:20.000Z","size":39553,"stargazers_count":229,"open_issues_count":4,"forks_count":53,"subscribers_count":30,"default_branch":"master","last_synced_at":"2025-03-16T02:47:20.420Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trinker.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-12-19T19:01:38.000Z","updated_at":"2025-03-06T10:53:52.000Z","dependencies_parsed_at":"2023-04-14T16:19:21.143Z","dependency_job_id":null,"html_url":"https://github.com/trinker/topicmodels_learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftopicmodels_learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftopicmodels_learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftopicmodels_learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftopicmodels_learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trinker","download_url":"https://codeload.github.com/trinker/topicmodels_learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243875126,"owners_count":20361952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T05:01:32.121Z","updated_at":"2025-03-16T13:31:19.339Z","avatar_url":"https://github.com/trinker.png","language":"R","funding_links":[],"categories":["R","Natural Language Processing"],"sub_categories":[],"readme":"---\ntitle: \"Topic Models Learning and R Resources\"\ndate: \"`r format(Sys.time(), '%d %B, %Y')`\"\noutput:\n  md_document:\n    toc: true \n    toc_depth: 2    \n---\n```{r, echo=FALSE, message=FALSE}\n# rmarkdown::render(\"README.Rmd\", \"all\"); md_toc()\nlibrary(knitr)\nknit_hooks$set(htmlcap = function(before, options, envir) {\n  if(!before) {\n    paste('\u003cp class=\"caption\"\u003e\u003cb\u003e\u003cem\u003e',options$htmlcap,\"\u003c/em\u003e\u003c/b\u003e\u003c/p\u003e\",sep=\"\")\n    }\n    })\nknitr::opts_knit$set(self.contained = TRUE, cache = FALSE)\nknitr::opts_chunk$set(fig.path = \"inst/figure/\")\n```\n\nThis is a collection documenting the resources I find related to topic models with an R flavored focus. A *topic model* is a type of [*generative*](http://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-discriminative-algorithm) model used to \"discover\" latent topics that compose a *corpus* or collection of documents. Typically topic modeling is used on a collection of text documents but can be used for other modes including use as caption generation for images.\n\n![](inst/figure/topic-model.jpg)\n\n# Just the Essentials\n\nThis is my run down of the minimal readings, websites, videos, \u0026 scripts the reader needs to become familiar with topic modeling.  The list is in an order I believe will be of greatest use and contains a nice mix of introduction, theory, application, and interpretation.  As you want to learn more about topic modeling, the other sections will become more useful.\n\n1. Boyd-Graber, J. (2013). [Computational Linguistics I: Topic Modeling](https://www.youtube.com/watch?v=4p9MSJy761Y)    \n2. Underwood, T. (2012). [Topic Modeling Made Just Simple  Enough](http://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/)\n3. Weingart, S. (2012). [Topic Modeling for Humanists: A Guided Tour](http://www.scottbot.net/HIAL/?p=19113)\n4. Blei, D. M. (2012). [Probabilistic topic models](/articles/Blei2012.pdf). *Communications of the ACM, (55)*4, 77-84. doi:10.1145/2133806.2133826    \n5. inkhorn82 (2014). [A Delicious Analysis! (aka topic modelling using recipes)](http://rforwork.info/2014/02/17/a-delicious-analysis/) [(CODE)](https://gist.githubusercontent.com/inkhorn/9044779/raw/c7f0ba30d424aaeb75c5e221d12566f6732c4f29/recipe%20analysis.R)\n6. Gr\u0026uuml;en, B. \u0026 Hornik, K. (2011). [topicmodels: An R Package for Fitting Topic Models.](/articles/Gruen2011.pdf). *Journal of Statistical Software, 40*(13), 1-30. \n7. Marwick, B. (2014a). [The input parameters for using latent Dirichlet allocation](http://stats.stackexchange.com/a/25128/7482)\n8. Tang, J., Meng, Z., Nguyen, X. , Mei, Q. , \u0026 Zhang, M. (2014). [Understanding the limiting factors of topic modeling via posterior contraction analysis](/articles/Tang2014.pdf). In *31 st International Conference on Machine Learning*, 190-198.\n9. Sievert, C. (2014). [LDAvis: A method for visualizing and interpreting topic models](https://www.youtube.com/watch?v=IksL96ls4o0)\n10. Rhody, L. M. (2012). [Some Assembly Required: Understanding and Interpreting Topics in LDA Models of Figurative Language](http://www.lisarhody.com/some-assembly-required)\n11. Rinker, T.W. (2015). [R Script: Example Topic Model Analysis](https://raw.githubusercontent.com/trinker/topicmodels_learning/master/scripts/Example_topic_model_analysis.R)\n\n# Key Players\n\nPapadimitriou, Raghavan, Tamaki \u0026 Vempala, Santosh (1997) first introduced the notion of topic modeling in their [\"Latent Semantic Indexing: A probabilistic analysis\"](/articles/Papadimitriou1997.pdf).  Thomas Hofmann (1999) developed \"Probabilistic latent semantic indexing\".  Blei, Ng, \u0026 Jordan (2003) proposed *latent Dirichlet allocation* (LDA) as a means of modeling documents with multiple topics but assumes the topic are uncorrelated.  Blei \u0026 Lafferty (2007) proposed *correlated topics model* (CTM), extending LDA to allow for correlations between topics.  Roberts, Stewart, Tingley, \u0026 Airoldi (2013) propose a [*Structural Topic Model*](/articles/Roberts2013.pdf) (STM), allowing the inclusion of meta-data in the modeling process.\n\n# Videos\n\n## Introductory\n\n- Boyd-Graber, J. (2013). [Computational Linguistics I: Topic Modeling](https://www.youtube.com/watch?v=4p9MSJy761Y)\n\n## Theory\n\n- Blei, D. (2007) [Modeling Science: Dynamic Topic Models of Scholarly Research](https://www.youtube.com/watch?v=7BMsuyBPx90)\n- Blei, D. (2009) [Topic Models: Parts I \u0026 II](http://videolectures.net/mlss09uk_blei_tm/#) ([Lecture Notes](/presentations/Blei2009.pdf))\n- Jordan, M. (2014) [A Short History of Topic Models](https://www.youtube.com/watch?v=fBNsHPtTAGs)\n\n\n## Visualization\n\n- Sievert, C. (2014) [LDAvis: A method for visualizing and interpreting topic models](https://www.youtube.com/watch?v=IksL96ls4o0)\n- Maybe, B. (2015) [SavvySharpa: Visualizing Topic Models](https://www.youtube.com/watch?v=tGxW2BzC_DU)\n\n# Articles\n\n## Applied\n\n- Marwick, B. 2013. [Discovery of Emergent Issues and Controversies in Anthropology Using Text Mining, Topic Modeling, and Social Network Analysis of Microblog Content](https://www.academia.edu/5508141/Discovery_of_Emergent_Issues_and_Controversies_in_Anthropology_Using_Text_Mining_Topic_Modeling_and_Social_Network_Analysis_of_Microblog_Content). In Yanchang Zhao, Yonghua Cen (eds) Data Mining Applications with R. Elsevier. p. 63-93\n\n- Newman, D.J. \u0026 Block, S. (2006). [Probabilistic topic decomposition of an eighteenth-century American newspaper](/articles/Newman2006.pdf). *Journal of the American Society for Information Science and Technology. 57*(6), 753-767. doi:10.1002/asi.v57:6\n\n\n## Theoretical\n\n- Blei, D. M. (2012). [Probabilistic topic models](/articles/Blei2012.pdf). *Communications of the ACM, (55)*4, 77-84. doi:10.1145/2133806.2133826\n- Blei, D. M. \u0026  Lafferty, J. D. (2007) [A correlated topic model of Science](/articles/Blei2007.pdf). *The Annals of Applied Statistics 1*(1), 17-35. doi:10.1214/07-AOAS114\n- Blei, D. M. \u0026  Lafferty, J. D. (2009) [Topic models](/articles/Blei2009.pdf). In A Srivastava, M Sahami (eds.), [*Text mining: classification, clustering, and applications*](/articles/Srivastava2009.pdf). Chapman \u0026 Hall/CRC Press. 71-93.  \n- Blei, D. M. \u0026 McAuliffe, J. (2008). [Supervised topic models](/articles/Blei2008.pdf). In Advances in Neural Information Processing Systems 20, 1-8.\n- Blei, D. M., Ng, A.Y., \u0026 Jordan, M.I. (2003). [Latent Dirichlet Allocation](/articles/Blei2003.pdf). *Journal of Machine Learning Research, 3*, 993-1022.\n- Chang, J., Boyd-Graber, J. , Wang, C., Gerrish, S., \u0026 Blei. D. (2009). [Reading tea leaves: How humans interpret topic models](/articles/Chang2009.pdf). In *Neural Information Processing Systems*.\n- Griffiths, T.L. \u0026 Steyvers, M. (2004). [Finding Scientific Topics](/articles/Griffiths2004.pdf). Proceedings of the National\nAcademy of Sciences of the United States of America, 101, 5228-5235.\n- Griffiths, T.L., Steyvers, M., \u0026 Tenenbaum, J.B.T. (2007). [Topics in Semantic Representation](/articles/Griffiths2007.pdf). *Psychological Review, 114*(2), 211-244.\n- Gr\u0026uuml;en, B. \u0026 Hornik, K. (2011). [topicmodels: An R Package for Fitting Topic Models.](/articles/Gruen2011.pdf). *Journal of Statistical Software, 40*(13), 1-30. \n- Mimno, D. \u0026 A. Mccallum. (2007). [Organizing the OCA: learning faceted subjects from a library of digital books](/articles/Mimno2007.pdf). In *Joint Conference on Digital Libraries*. ACM Press, New York, NY, 376–385.\n- Ponweiser, M. (2012). [Latent Dirichlet Allocation in R (Diploma Thesis)](/articles/Ponweiser2012.pdf). Vienna University of Economics and Business, Vienna\n- Roberts M.E., Stewart B.M., Tingley D., \u0026 Airoldi E.M. (2013) [The Structural Topic Model and Applied Social Science](/articles/Roberts2013.pdf). *Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation*, 1-4.  \n- Roberts, M., Stewart, B., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S., Albertson, B., et al. (2014). [Structural topic models for open ended survey responses](/articles/Roberts2014.pdf). *American Journal of Political Science, American Journal of Political Science, 58*(4), 1064-1082.\n- Roberts, M., Stewart, B., Tingley, D. (n.d.). [stm: R Package for Structural Topic Models](/articles/Robertsnd.pdf), 1-49.\n- Sievert, C. \u0026 Shirley, K. E. (2014a). [LDAvis: A Method for Visualizing and Interpreting Topics.](/articles/Sievert2014a.pdf) in *Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces* 63-70.\n- Steyvers, M. \u0026 Griffiths, T. (2007). [Probabilistic topic models](/articles/Steyvers2007.pdf). In T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), *Latent Semantic Analysis: A Road to Meaning*. Laurence Erlbaum\n- Taddy, M.A. (2012). [On Estimation and Selection for Topic Models](/articles/Taddy2012.pdf) In *Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS 2012)*, 1184-1193.\n- Tang, J., Meng, Z., Nguyen, X. , Mei, Q. , \u0026 Zhang, M. (2014). [Understanding the limiting factors of topic modeling via posterior contraction analysis](/articles/Tang2014.pdf). In *31 st International Conference on Machine Learning*, 190-198.\n\n# Websites \u0026 Blogs\n\n- Blei, D. (n.d.). [Topic Modeling](https://www.cs.princeton.edu/~blei/topicmodeling.html)\n- Jockers, M.L. (2013). [\"Secret\" Recipe for Topic Modeling Themes](http://www.matthewjockers.net/2013/04/12/secret-recipe-for-topic-modeling-themes/)\n- Jones, T. (n.d.). [Topic Models Reading List](http://www.biasedestimates.com/p/topic-models-reading-list.html)\n- Marwick, B. (2014a). [The input parameters for using latent Dirichlet allocation](http://stats.stackexchange.com/a/25128/7482)\n- Marwick, B. (2014b). [Topic models: cross validation with loglikelihood or perplexity](http://stackoverflow.com/a/21394092/1000343)\n- Rhody, L. M. (2012). [Some Assembly Required: Understanding and Interpreting Topics in LDA Models of Figurative Language](http://www.lisarhody.com/some-assembly-required)\n- Schmidt, B.M. (2012). [Words Alone: Dismantling Topic Models in the Humanities](http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/)\n- Underwood, T. (2012a). [Topic Modeling Made Just Simple  Enough](http://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/)\n- Underwood, T. (2012b). [What kinds of \"topics\" does topic modeling actually produce?](http://tedunderwood.com/2012/04/01/what-kinds-of-topics-does-topic-modeling-actually-produce/)\n- Weingart, S. (2012). [Topic Modeling for Humanists: A Guided Tour](http://www.scottbot.net/HIAL/?p=19113)\n- Weingart, S. (2011). [Topic Modeling and Network Analysis](http://www.scottbot.net/HIAL/?p=221)\n\n\n# R Resources\n\n## Package Comparisons\n\n| Package       | Functionality     | Pluses  |  Author  | R Language Interface  |\n|-------------- | -------------|---------|----------|---------------------|\n| lda*           | Collapsed Gibbs for LDA     | Graphing utilities   |  Chang   | R |\n| topicmodels   | LDA and CTM  | Follows Blei's implementation; great vignette; takes | C | [DTM](https://en.wikipedia.org/wiki/Document-term_matrix) |  Gr\u0026uuml;en \u0026 Hornik |\n| stm           | Model w/ meta-data | Great documentation; nice visualization  |  Roberts, Stewart, \u0026 Tingley | C |\n| LDAvis        | Interactive visualization     | Aids in model interpretation  |  Sievert \u0026 Shirley  | R + Shiny |\n| mallet**        |  LDA                             | [MALLET](http://programminghistorian.org/lessons/topic-modeling-and-mallet) is well known                       | Mimno              |  Java |\n\n\\*[*StackExchange discussion of lda vs. topicmodels*](http://stats.stackexchange.com/questions/24441/two-r-packages-for-topic-modeling-lda-and-topicmodels)     \n\\*\\*[*Setting Up MALLET*](http://programminghistorian.org/lessons/topic-modeling-and-mallet)\n\n\n## R Specific References\n\n- Chang J. (2010). lda: Collapsed Gibbs Sampling Methods for Topic Models. http://CRAN.R-project.org/package=lda.\n- Gr\u0026uuml;en, B. \u0026 Hornik, K. (2011). [topicmodels: An R Package for Fitting Topic Models.](/articles/Gruen2011.pdf). *Journal of Statistical Software, 40*(13), 1-30. \n- Mimno, D. (2013). [vignette-mallet: A wrapper around the Java machine learning tool MALLET](/articles/Mimno2013.Rmd). https://CRAN.R-project.org/package=mallet\n- Ponweiser, M. (2012). [Latent Dirichlet Allocation in R (Diploma Thesis)](/articles/Ponweiser2012.pdf). Vienna University of Economics and Business, Vienna.\n- Roberts, M., Stewart, B., Tingley, D. (n.d.). [stm: R Package for Structural Topic Models](/articles/Robertsnd.pdf), 1-49.\n- Sievert, C. \u0026 Shirley, K. E. (2014a). [LDAvis: A Method for Visualizing and Interpreting Topics.](Sievert2014a.pdf) *Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces* 63-70.\n- Sievert, C. \u0026 Shirley, K. E. (2014b). [Vignette: LDAvis details.](/articles/Sievert2014b.pdf) 1-5.\n\n\n## Example Modeling\n\n- Awati, K. (2015). [A gentle introduction to topic modeling using R](https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/)\n- Dubins, M. (2013). [Topic Modeling in Python and R: A Rather Nosy Analysis of the Enron Email Corpus](https://dzone.com/articles/topic-modeling-python-and-r)\n- Goodrich, B. (2015) [Topic Modeling Twitter Using R](https://www.linkedin.com/pulse/topic-modeling-twitter-using-r-bryan-goodrich) [(CODE)](https://gist.githubusercontent.com/bryangoodrich/7b5ef683ce8db592669e/raw/3402e7390d10a0282dc0d6309ed4df9a4fb1cf5d/TwitterTopics.r)\n- inkhorn82 (2014). [A Delicious Analysis! (aka topic modelling using recipes)](http://rforwork.info/2014/02/17/a-delicious-analysis/) [(CODE)](https://gist.githubusercontent.com/inkhorn/9044779/raw/c7f0ba30d424aaeb75c5e221d12566f6732c4f29/recipe%20analysis.R)\n- Jockers, M.L. (2014).[Introduction to Text Analysis and Topic Modeling with R](http://www.matthewjockers.net/materials/dh-2014-introduction-to-text-analysis-and-topic-modeling-with-r/)\n- Medina, L. (2015). [Conspiracy Theories - Topic Modeling \u0026 Keyword Extraction](http://voidpatterns.org/2015/03/conspiracy-theories-topic-modeling-keyword-extraction/)\n- Sievert, C. (n.d.). [A topic model for movie reviews](http://cpsievert.github.io/LDAvis/reviews/reviews.html)     \n- Sievert, C. (2014). [Topic Modeling In R](https://ropensci.org/blog/2014/04/16/topic-modeling-in-R/)\n\n# Topic Modeling R Demo\n\n## topicmodels Package\n\nThe .R script for this demonstration can be downloaded from [scripts/Example_topic_model_analysis.R](https://raw.githubusercontent.com/trinker/topicmodels_learning/master/scripts/Example_topic_model_analysis.R)\n\n### Install/Load Tools \u0026 Data\n\n```{r}\nif (!require(\"pacman\")) install.packages(\"pacman\")\npacman::p_load_gh(\"trinker/gofastr\")\npacman::p_load(tm, topicmodels, dplyr, tidyr, igraph, devtools, LDAvis, ggplot2)\n\n## Source topicmodels2LDAvis \u0026 optimal_k functions\ninvisible(lapply(\n    file.path(\n        \"https://raw.githubusercontent.com/trinker/topicmodels_learning/master/functions\", \n        c(\"topicmodels2LDAvis.R\", \"optimal_k.R\")\n    ),\n    devtools::source_url\n))\n\ndata(presidential_debates_2012)\n```\n\n\n### Generate Stopwords \n```{r}\nstops \u003c- c(\n        tm::stopwords(\"english\"),\n        tm::stopwords(\"SMART\"),\n        \"governor\", \"president\", \"mister\", \"obama\",\"romney\"\n    ) %\u003e%\n    gofastr::prep_stopwords() \n```\n\n\n### Create the DocumentTermMatrix\n\n```{r}\ndoc_term_mat \u003c- presidential_debates_2012 %\u003e%\n    with(gofastr::q_dtm_stem(dialogue, paste(person, time, sep = \"_\"))) %\u003e%           \n    gofastr::remove_stopwords(stops, stem=TRUE) %\u003e%                                                    \n    gofastr::filter_tf_idf() %\u003e%\n    gofastr::filter_documents() \n```\n\n### Control List\n\n```{r}\ncontrol \u003c- list(burnin = 500, iter = 1000, keep = 100, seed = 2500)\n```\n\n\n### Determine Optimal Number of Topics\n\nThe plot below shows the harmonic mean of the log likelihoods against k (number of topics).  \n\n```{r, eval=FALSE}\n(k \u003c- optimal_k(doc_term_mat, 40, control = control))\n```\n\n```{r, echo=FALSE}\n(k \u003c- optimal_k(doc_term_mat, 40, control = control, drop.seed = FALSE))\n```\n\nIt appears the optimal number of topics is ~k = `r as.numeric(k)`.\n\n### Run the Model\n\n```{r}\ncontrol[[\"seed\"]] \u003c- 100\nlda_model \u003c- topicmodels::LDA(doc_term_mat, k=as.numeric(k), method = \"Gibbs\", \n    control = control)\n```\n\n### Plot the Topics Per Person \u0026 Time\n\n```{r, fig.width=10, fig.height=12}\ntopics \u003c- topicmodels::posterior(lda_model, doc_term_mat)[[\"topics\"]]\ntopic_dat \u003c- dplyr::add_rownames(as.data.frame(topics), \"Person_Time\")\ncolnames(topic_dat)[-1] \u003c- apply(terms(lda_model, 10), 2, paste, collapse = \", \")\n\ntidyr::gather(topic_dat, Topic, Proportion, -c(Person_Time)) %\u003e%\n    tidyr::separate(Person_Time, c(\"Person\", \"Time\"), sep = \"_\") %\u003e%\n    dplyr::mutate(Person = factor(Person, \n        levels = c(\"OBAMA\", \"ROMNEY\", \"LEHRER\", \"SCHIEFFER\", \"CROWLEY\", \"QUESTION\" ))\n    ) %\u003e%\n    ggplot2::ggplot(ggplot2::aes(weight=Proportion, x=Topic, fill=Topic)) +\n        ggplot2::geom_bar() +\n        ggplot2::coord_flip() +\n        ggplot2::facet_grid(Person~Time) +\n        ggplot2::guides(fill=FALSE) +\n        ggplot2::xlab(\"Proportion\")\n```\n\n\n### Plot the Topics Matrix as a Heatmap \n\n```{r}\nheatmap(topics, scale = \"none\")\n```\n\n### Network of the Word Distributions Over Topics (Topic Relation)\n\n```{r}\npost \u003c- topicmodels::posterior(lda_model)\n\ncor_mat \u003c- cor(t(post[[\"terms\"]]))\ncor_mat[ cor_mat \u003c .05 ] \u003c- 0\ndiag(cor_mat) \u003c- 0\n\ngraph \u003c- graph.adjacency(cor_mat, weighted=TRUE, mode=\"lower\")\ngraph \u003c- delete.edges(graph, E(graph)[ weight \u003c 0.05])\n\nE(graph)$edge.width \u003c- E(graph)$weight*20\nV(graph)$label \u003c- paste(\"Topic\", V(graph))\nV(graph)$size \u003c- colSums(post[[\"topics\"]]) * 15\n\npar(mar=c(0, 0, 3, 0))\nset.seed(110)\nplot.igraph(graph, edge.width = E(graph)$edge.width, \n    edge.color = \"orange\", vertex.color = \"orange\", \n    vertex.frame.color = NA, vertex.label.color = \"grey30\")\ntitle(\"Strength Between Topics Based On Word Probabilities\", cex.main=.8)\n```\n\n\n### Network of the Topics Over Dcouments (Topic Relation)\n\n```{r, fig.width=8, fig.height=8}\nminval \u003c- .1\ntopic_mat \u003c- topicmodels::posterior(lda_model)[[\"topics\"]]\n\ngraph \u003c- graph_from_incidence_matrix(topic_mat, weighted=TRUE)\ngraph \u003c- delete.edges(graph, E(graph)[ weight \u003c minval])\n\nE(graph)$edge.width \u003c- E(graph)$weight*17\nE(graph)$color \u003c- \"blue\"\nV(graph)$color \u003c- ifelse(grepl(\"^\\\\d+$\", V(graph)$name), \"grey75\", \"orange\")\nV(graph)$frame.color \u003c- NA\nV(graph)$label \u003c- ifelse(grepl(\"^\\\\d+$\", V(graph)$name), paste(\"topic\", V(graph)$name), gsub(\"_\", \"\\n\", V(graph)$name))\nV(graph)$size \u003c- c(rep(10, nrow(topic_mat)), colSums(topic_mat) * 20)\nV(graph)$label.color \u003c- ifelse(grepl(\"^\\\\d+$\", V(graph)$name), \"red\", \"grey30\")\n\npar(mar=c(0, 0, 3, 0))\nset.seed(369)\nplot.igraph(graph, edge.width = E(graph)$edge.width, \n    vertex.color = adjustcolor(V(graph)$color, alpha.f = .4))\ntitle(\"Topic \u0026 Document Relationships\", cex.main=.8)\n```\n\n\n### LDAvis of Model\n\nThe output from **LDAvis** is not easily embedded within an R markdown document, however, the reader may [see the results here](http://trinker.github.io/LDAvis/example/).\n\n```{r, eval=FALSE}\nlda_model %\u003e%\n    topicmodels2LDAvis() %\u003e%\n    LDAvis::serVis()\n```\n\n```{r, echo=FALSE, message=FALSE, results=\"hide\"}\ntarg \u003c- \"C:/Users/Tyler/GitHub/trinker.github.com/LDAvis/example/lda.json\"\nunlink(targ,,TRUE)\ntemp \u003c- tempfile()\n\nlda_model %\u003e%\n    topicmodels2LDAvis() %\u003e%\n    LDAvis::serVis(temp, open.browser = FALSE) %\u003e% \n    invisible()\n\nfile.copy(file.path(temp, \"lda.json\"), pathr::parse_path(targ) %\u003e% pathr::front())\npathr::open_path(\"C:/Users/Tyler/GitHub/trinker.github.com/trinker.github.com.Rproj\")\n```\n\n### Apply Model to New Data\n\n```{r, eval=FALSE}\n## Create the DocumentTermMatrix for New Data\ndoc_term_mat2 \u003c- partial_republican_debates_2015 %\u003e%\n    with(gofastr::q_dtm_stem(dialogue, paste(person, location, sep = \"_\"))) %\u003e%           \n    gofastr::remove_stopwords(stops, stem=TRUE) %\u003e%                                                    \n    gofastr::filter_tf_idf() %\u003e%\n    gofastr::filter_documents() \n\n\n## Update Control List\ncontrol2 \u003c- control\ncontrol2[[\"estimate.beta\"]] \u003c- FALSE\n\n\n## Run the Model for New Data\nlda_model2 \u003c- topicmodels::LDA(doc_term_mat2, k = k, model = lda_model, \n    control = list(seed = 100, estimate.beta = FALSE))\n\n\n## Plot the Topics Per Person \u0026 Location for New Data\ntopics2 \u003c- topicmodels::posterior(lda_model2, doc_term_mat2)[[\"topics\"]]\ntopic_dat2 \u003c- dplyr::add_rownames(as.data.frame(topics2), \"Person_Location\")\ncolnames(topic_dat2)[-1] \u003c- apply(terms(lda_model2, 10), 2, paste, collapse = \", \")\n\ntidyr::gather(topic_dat2, Topic, Proportion, -c(Person_Location)) %\u003e%\n    tidyr::separate(Person_Location, c(\"Person\", \"Location\"), sep = \"_\") %\u003e%\n    ggplot2::ggplot(ggplot2::aes(weight=Proportion, x=Topic, fill=Topic)) +\n        ggplot2::geom_bar() +\n        ggplot2::coord_flip() +\n        ggplot2::facet_grid(Person~Location) +\n        ggplot2::guides(fill=FALSE) +\n        ggplot2::xlab(\"Proportion\")\n\n\n## LDAvis of Model for New Data\nlda_model2 %\u003e%\n    topicmodels2LDAvis() %\u003e%\n    LDAvis::serVis()\n```\n\n# Contributing\n\nYou are welcome to:\n* submit suggestions and bug-reports at: \u003chttps://github.com/trinker/topicmodels_learning/issues\u003e\n* send a pull request on: \u003chttps://github.com/trinker/topicmodels_learning/\u003e\n* compose a friendly e-mail to: \u003ctyler.rinker@gmail.com\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinker%2Ftopicmodels_learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrinker%2Ftopicmodels_learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinker%2Ftopicmodels_learning/lists"}