{"id":13696663,"url":"https://github.com/ywwbill/YWWTools-v2","last_synced_at":"2025-05-03T17:32:03.396Z","repository":{"id":73884980,"uuid":"232361192","full_name":"ywwbill/YWWTools-v2","owner":"ywwbill","description":null,"archived":false,"fork":false,"pushed_at":"2020-09-05T08:08:11.000Z","size":18993,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-08-03T18:21:11.072Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ywwbill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-01-07T15:59:41.000Z","updated_at":"2021-12-10T02:32:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"979f4fad-03b3-4eb2-a242-8238d65cb6bf","html_url":"https://github.com/ywwbill/YWWTools-v2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywwbill%2FYWWTools-v2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywwbill%2FYWWTools-v2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywwbill%2FYWWTools-v2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ywwbill%2FYWWTools-v2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ywwbill","download_url":"https://codeload.github.com/ywwbill/YWWTools-v2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224369690,"owners_count":17299941,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T18:00:44.463Z","updated_at":"2025-05-03T17:32:03.385Z","avatar_url":"https://github.com/ywwbill.png","language":"Java","funding_links":[],"categories":["Models"],"sub_categories":["Miscellaneous topic models"],"readme":"# \u003ch1 id=\"top\"\u003eYWW Tools\u003c/h1\u003e\r\n\r\nA package of my ([Weiwei Yang](http://cs.umd.edu/~wwyang/)'s) various tools (most for NLP). Feel free to email me at \u003cwwyang@cs.umd.edu\u003e with any questions.\r\n\r\n* [Check Out](#check_out)\r\n* [Dependencies](#dependencies)\r\n* [Use YWW Tools in Command Line](#command)\r\n* [LDA (Latent Dirichlet Allocation) in Command Line](#lda_cmd)\r\n\t* [RTM: Relational Topic Model](#rtm_cmd)\r\n\t\t* [Lex-WSB-RTM: RTM with Lexical Weights and Weighted Stochastic Block Priors](#lex_wsb_rtm_cmd)\r\n\t\t* [Lex-WSB-Med-RTM: Lex-WSB-RTM with Hinge Loss](#lex_wsb_med_rtm_cmd)\r\n\t* [SLDA: Supervised LDA](#slda_cmd)\r\n\t\t* [BS-LDA: Binary SLDA](#bs_lda_cmd)\r\n\t\t* [Lex-WSB-BS-LDA: BS-LDA with Lexcial Weights and Weighted Stochastic Block Priors](#lex_wsb_bs_lda_cmd)\r\n\t\t* [Lex-WSB-Med-LDA: Lex-WSB-BS-LDA with Hinge Loss](#lex_wsb_med_lda_cmd)\r\n\t* [BP-LDA: LDA with Block Priors](#bp_lda_cmd)\r\n\t* [ST-LDA: Single Topic LDA](#st_lda_cmd)\r\n\t* [WSB-TM: Weighted Stochastic Block Topic Model](#wsb_tm_cmd)\r\n* [tLDA in Command Line](#tlda_cmd)\r\n* [MTM in Command Line](#mtm_cmd)\r\n* [Other Tools in Command Line](#other_cmd)\r\n\t* [WSBM: Weighted Stochastic Block Model](#wsbm_cmd)\r\n\t* [SCC: Strongly Connected Components](#scc_cmd)\r\n\t* [Stoplist](#stoplist_cmd)\r\n\t* [Lemmatizer](#lemmatizer_cmd)\r\n\t* [POS Tagger](#pos_tagger_cmd)\r\n\t* [Stemmer](#stemmer_cmd)\r\n\t* [Tokenizer](#tokenizer_cmd)\r\n\t* [Corpus Converter](#corpus_converter_cmd)\r\n\t* [Tree Builder](#tree_builder_cmd)\r\n* [Use YWW Tools Source Code](#code_examples)\r\n* [LDA Code Examples](#lda_code)\r\n\t* [RTM](#rtm_code)\r\n\t\t* [Lex-WSB-RTM](#lex_wsb_rtm_code)\r\n\t\t* [Lex-WSB-Med-RTM](#lex_wsb_med_rtm_code)\r\n\t* [SLDA](#slda_code)\r\n\t\t* [BS-LDA](#bs_lda_code)\r\n\t\t* [Lex-WSB-BS-LDA](#lex_wsb_bs_lda_code)\r\n\t\t* [Lex-WSB-Med-LDA](#lex_wsb_med_lda_code)\r\n\t* [BP-LDA](#bp_lda_code)\r\n\t* [ST-LDA](#st_lda_code)\r\n\t* [WSB-TM](#wsb_tm_code)\r\n* [tLDA Code Examples](#tlda_code)\r\n* [MTM Code Examples](#mtm_code)\r\n* [Other Code Examples](#other_code)\r\n\t* [WSBM](#wsbm_code)\r\n\t* [SCC](#scc_code)\r\n\t* [Tree Builder](#tree_builder_code)\r\n\t* [English Corpus Preprocessing](#preprocess)\r\n* [Citation](#citation)\r\n* [References](#ref)\r\n\r\n## \u003ch2 id=\"check_out\"\u003eCheck Out\u003c/h2\u003e\r\n\r\n```\r\ngit clone git@github.com:ywwbill/YWWTools-v2.git\r\n```\r\n\r\n## \u003ch2 id=\"dependencies\"\u003eDependencies\u003c/h2\u003e\r\n\r\n- Java 8.\r\n- Files in `lib/`.\r\n- Files in `dict/`.\r\n\r\n## \u003ch2 id=\"command\"\u003eUse YWW Tools in Command Line\u003c/h2\u003e\r\n\r\n```\r\njava -cp YWWTools-v2.jar:lib/* yang.weiwei.Tools \u003cconfig-file\u003e\r\n```\r\n\r\n- **\u003cfont size=4\u003eWindows users\u003c/font\u003e**\r\n\t- Please replace `YWWTools-v2.jar:lib/*` with `YWWTools-v2.jar;lib/*`.\r\n\t- If you encounter any encoding problems in command line (especially when processing Chinese), please add `-Dfile.encoding=utf8` in your command.\r\n- In `\u003cconfig-file\u003e`, specify the tool you want to use:\r\n\t```\r\n\ttool=\u003ctool-name\u003e\r\n\t```\r\n- Supported `\u003ctool-name\u003e` (case unsensitive) include\r\n\t- [LDA](#lda_cmd): Latent Dirichlet allocation. Include a variety of extensions.\r\n\t- [TLDA](#tlda_cmd): Tree LDA.\r\n\t- [MTM](#mtm_cmd): Multilingual Topic Model.\r\n\t- [WSBM](#wsbm_cmd): Weighted stochastic block model. Find blocks in a network.\r\n\t- [SCC](#scc_cmd): Strongly connected components.\r\n\t- [Stoplist](#stoplist_cmd): Remove stop words. Support English only, but can support other languages given dictionary.\r\n\t- [Lemmatizer](#lemmatizer_cmd): Lemmatize POS-tagged corpus. Support English only, but can support other languages given dictionary.\r\n\t- [POS-Tagger](#pos_tagger_cmd): Tag words' POS. Support English only, but can support other languages given trained models.\r\n\t- [Stemmer](#stemmer_cmd): Stem words. Support English only.\r\n\t- [Tokenizer](#tokenizer_cmd): Tokenize corpus. Support English only, but can support other languages given trained models.\r\n\t- [Corpus-Converter](#corpus_converter_cmd): Convert word corpus into indexed corpus (for [LDA](#lda_cmd)) and vice versa.\r\n\t- [Tree Builder](#tree_builder_cmd): Build tree priors from word associations.\r\n- You can always set `help` to true to see help information of \r\n\t- supported tool names if you don't specify a tool name:\r\n\t\t```\r\n\t\thelp=true\r\n\t\t```\r\n\t- a specific tool if you specify it (take [LDA](#lda_cmd) as an example): \r\n\t\t```\r\n\t\thelp=true\r\n\t\ttool=lda\r\n\t\t```\r\n\r\n## \u003ch2 id=\"lda_cmd\"\u003eLDA (Latent Dirichlet Allocation) in Command Line\u003c/h2\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=lda\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\n```\r\n\r\n- Implementation of [Blei et al. (2003)](#lda_ref).\r\n- Required arguments\r\n\t- `\u003cvocab-file\u003e`: Vocabulary file. Each line contains a unique word.\r\n\t- `\u003ccorpus-file\u003e`: Corpus file in which documents are represented by word indexes and frequencies. Each line contains a document in the following format\r\n\t\r\n\t\t```\r\n\t\t\u003cdoc-len\u003e \u003cword-type-1\u003e:\u003cfrequency-1\u003e \u003cword-type-2\u003e:\u003cfrequency-2\u003e ... \u003cword-type-n\u003e:\u003cfrequency-n\u003e\r\n\t\t```\r\n\t\r\n\t\t`\u003cdoc-len\u003e` is the total number of *tokens* in this document. `\u003cword-type-i\u003e` denotes the i-th word in `\u003cvocab-file\u003e`, starting from 0. Words with zero frequency can be omitted.\r\n\t- `\u003cmodel-file\u003e`: Trained model file in JSON format. Read and written by program.\r\n- Optional arguments\r\n\t- `model=\u003cmodel-name\u003e`: The topic model you want to use (default: [LDA](#lda_cmd)). Supported `\u003cmodel-name\u003e` (case unsensitive) are\r\n\t\t- [LDA](#lda_cmd): Vanilla LDA\r\n\t\t- [RTM](#rtm_cmd): Relational topic model.\r\n\t\t\t- [Lex-WSB-RTM](#lex_wsb_rtm_cmd): [RTM](#rtm_cmd) with WSB-computed block priors and lexical weights.\r\n\t\t\t- [Lex-WSB-Med-RTM](#lex_wsb_med_rtm_cmd): [Lex-WSB-RTM](#lex_wsb_rtm_cmd) with hinge loss.\r\n\t\t- [SLDA](#slda_cmd): Supervised [LDA](#lda_cmd). Support multi-class classification.\r\n\t\t\t- [BS-LDA](#bs_lda_cmd): Binary [SLDA](#slda_cmd).\r\n\t\t\t- [Lex-WSB-BS-LDA](#lex_wsb_bs_lda_cmd): [BS-LDA](#bs_lda_cmd) with WSB-computed block priors and lexical weights.\r\n\t\t\t- [Lex-WSB-Med-LDA](#lex_wsb_med_lda_cmd): [Lex-WSB-BS-LDA](#lex_wsb_bs_lda_cmd) with hinge loss.\r\n\t\t- [BP-LDA](#bp_lda_cmd): [LDA](#lda_cmd) with block priors. Blocks are pre-computed.\r\n\t\t- [ST-LDA](#st_lda_cmd): Single topic [LDA](#lda_cmd). Each document can only be assigned to one topic.\r\n\t\t- [WSB-TM](#wsb_tm_cmd): [LDA](#lda_cmd) with block priors. Blocks are computed by [WSBM](#wsbm_cmd).\r\n\t- `test=true`: Use the model for test (default: false).\r\n\t- `verbose=true`: Print log to console (default:true).\r\n\t- `alpha=\u003calpha-value\u003e`: Parameter of Dirichlet prior of document distribution over topics (default: 1.0). Must be a positive real number.\r\n\t- `beta=\u003cbeta-value\u003e`: Parameter of Dirichlet prior of topic distribution over words (default: 0.1). Must be a positive real number.\r\n\t- `topics=\u003cnum-topics\u003e`: Number of topics (default: 10). Must be a positive integer.\r\n\t- `iters=\u003cnum-iters\u003e`: Number of iterations (default: 100). Must be a positive integer.\r\n\t- `update=false`: Update alpha while sampling (default: false).\r\n\t- `update_interval=\u003cupdate-interval\u003e`: Interval of updating alpha (default: 10). Must be a positive integer.\r\n\t- `theta=\u003ctheta-file\u003e`: File for document distribution over topics. Each line contains a document's topic distribution. Topic weights are separated by space.\r\n\t- `output_topic=\u003ctopic-file\u003e`: File for showing topics.\r\n\t- `topic_count=\u003ctopic-count-file\u003e`: File for document-topic counts.\r\n\t- `top_word=\u003cnum-top-word\u003e`: Number of words to give when showing topics (default: 10). Must be a positive integer.\r\n\r\n### \u003ch3 id=\"rtm_cmd\"\u003eRTM: Relational Topic Model\u003c/h3\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=rtm\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nrtm_train_graph=\u003crtm-train-graph-file\u003e\r\n```\r\n\r\n- Implementation of [Chang and Blei (2010)](#rtm_ref).\r\n- Jointly models topics and document links.\r\n- Extends [LDA](#lda_cmd).\r\n- Semi-optional arguments\r\n\t- `rtm_train_graph=\u003crtm-train-graph-file\u003e` [optional in test]: Link file for RTM to train. Each line contains an edge in the format `node-1 \\t node-2 \\t weight`. Node number starts from 0. `weight` must be a non-negative integer. `weight` is either 0 or 1 and is optional. Its default value is 1 if not specified.\r\n\t- `rtm_test_graph=\u003crtm-test-graph-file\u003e` [optional in training]: Link file for RTM to evaluate. Can be the same with RTM train graph. Format is the same as `\u003crtm-train-graph-file\u003e`.\r\n- Optional arguments\r\n\t- `nu=\u003cnu-value\u003e`: Variance of normal priors for weight vectors/matrices in RTM and its extensions (default: 1.0). Must be a positive real number.\r\n\t- `plr_interval=\u003ccompute-PLR-interval\u003e`: Interval of computing predictive link rank (default: 20). Must be a positive integer.\r\n\t- `neg=true`: Sample negative links (default: false).\r\n\t- `neg_ratio=\u003cneg-ratio\u003e`: The ratio of number of negative links to number of positive links (default 1.0). Must be a positive real number.\r\n\t- `pred=\u003cpred-file\u003e`: Predicted document link probability matrix file.\r\n\t- `reg=\u003creg-file\u003e`: Doc-doc regression value file.\r\n\t- `directed=true`: Set all edges directed (default: false).\r\n\r\n#### \u003ch4 id=\"lex_wsb_rtm_cmd\"\u003eLex-WSB-RTM: [RTM](#rtm_ref) with Lexical Weights and Weighted Stochastic Block Priors\u003c/h4\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=lex-wsb-rtm\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nrtm_train_graph=\u003crtm-train-graph-file\u003e\r\n```\r\n\r\n- Extends [RTM](#rtm_cmd).\r\n- Optional arguments\r\n\t- `wsbm_graph=\u003cwsbm-graph-file\u003e`: Link file for [WSBM](#cmd) to find blocks. See [WSBM](#wsbm_cmd) for details.\r\n\t- `alpha_prime=\u003calpha-prime-value\u003e`: Parameter of Dirichlet prior of block distribution over topics (default: 1.0). Must be a positive real number.\r\n\t- `a=\u003ca-value\u003e`: Parameter of Gamma prior for block link rates (default: 1.0). Must be a positive real number.\r\n\t- `b=\u003cb-value\u003e`: Parameter of Gamma prior for block link rates (default: 1.0). Must be a positive real number.\r\n\t- `gamma=\u003cgamma-value\u003e`: Parameter of Dirichlet prior for block distribution (default: 1.0). Must be a positive real number.\r\n\t- `blocks=\u003cnum-blocks\u003e`: Number of blocks (default: 10). Must be a positive integer.\r\n\t- `output_wsbm=\u003cwsbm-output-file\u003e`: File for [WSBM](#wsbm_cmd)-identified blocks. See [WSBM](#wsbm_cmd) for details.\r\n\t- `block_feature=true`: Include block features in link prediction (default: false).\r\n\r\n#### \u003ch4 id=\"lex_wsb_med_rtm_cmd\"\u003eLex-WSB-Med-RTM: [Lex-WSB-RTM](#lex_wsb_rtm_cmd) with Hinge Loss\u003c/h4\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=lex-wsb-med-rtm\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nrtm_train_graph=\u003crtm-train-graph-file\u003e\r\n```\r\n\r\n- Implementation of [Yang et al. (2016)](#lex_wsb_med_rtm_ref)\r\n- See [Zhu et al. (2012) and Zhu et al. (2014)](#med_lda_ref) for hinge loss.\r\n- Extends [Lex-WSB-RTM](#lex_wsb_rtm_cmd).\r\n- Link weight is either 1 or -1.\r\n- Optional arguments\r\n\t- `c=\u003cc-value\u003e`: Regularization parameter in hinge loss (default: 1.0). Must be a positive real number.\r\n\r\n### \u003ch3 id=\"slda_cmd\"\u003eSLDA: Supervised [LDA](#lda_cmd)\u003c/h3\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=slda\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nlabel=\u003clabel-file\u003e\r\n```\r\n\r\n- Implementation of [McAuliffe and Blei (2008)](#slda_ref).\r\n- Jointly models topics and document labels. Support multi-class classification.\r\n- Extends [LDA](#lda_cmd).\r\n- Semi-optional arguments\r\n\t- `label=\u003clabel-file\u003e` [optional in test]: Label file. Each line contains corresponding document's numeric label. If a document label is not available, leave the corresponding line empty.\r\n- Optional arguments\r\n\t- `sigma=\u003csigma-value\u003e`: Variance for the Gaussian generation of response variable in SLDA (default: 1.0). Must be a positive real number.\r\n\t- `nu=\u003cnu-value\u003e`: Variance of normal priors for weight vectors in SLDA and its extensions (default: 1.0). Must be a positive real number.\r\n\t- `pred=\u003cpred-file\u003e`: Predicted label file.\r\n\t- `reg=\u003creg-file\u003e`: Regression value file.\r\n\r\n#### \u003ch4 id=\"bs_lda_cmd\"\u003eBS-LDA: Binary [SLDA](#slda_ref)\u003c/h4\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=bs-lda\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nlabel=\u003clabel-file\u003e\r\n```\r\n\r\n- For binary classification only.\r\n- Extends [SLDA](#slda_cmd).\r\n- Label is either 1 or 0.\r\n\r\n#### \u003ch4 id=\"lex_wsb_bs_lda_cmd\"\u003eLex-WSB-BS-LDA: [BS-LDA](#bs_lda_cmd) with Lexcial Weights and Weighted Stochastic Block Priors\u003c/h4\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=lex-wsb-bs-lda\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nlabel=\u003clabel-file\u003e\r\n```\r\n\r\n- Extends [BS-LDA](#bs_lda_cmd).\r\n- Optional arguments\r\n\t- `wsbm_graph=\u003cwsbm-graph-file\u003e`: Link file for [WSBM](#cmd) to find blocks. See [WSBM](#wsbm_cmd) for details.\r\n\t- `alpha_prime=\u003calpha-prime-value\u003e`: Parameter of Dirichlet prior of block distribution over topics (default: 1.0). Must be a positive real number.\r\n\t- `a=\u003ca-value\u003e`: Parameter of Gamma prior for block link rates (default: 1.0). Must be a positive real number.\r\n\t- `b=\u003cb-value\u003e`: Parameter of Gamma prior for block link rates (default: 1.0). Must be a positive real number.\r\n\t- `gamma=\u003cgamma-value\u003e`: Parameter of Dirichlet prior for block distribution (default: 1.0). Must be a positive real number.\r\n\t- `blocks=\u003cnum-blocks\u003e`: Number of blocks (default: 10). Must be a positive integer.\r\n\t- `directed=true`: Set all edges directed (default: false).\r\n\t- `output_wsbm=\u003cwsbm-output-file\u003e`: File for [WSBM](#wsbm_cmd)-identified blocks. See [WSBM](#wsbm_cmd) for details.\r\n\r\n#### \u003ch4 id=\"lex_wsb_med_lda_cmd\"\u003eLex-WSB-Med-LDA: [Lex-WSB-BS-LDA](#lex_wsb_bs_lda_cmd) with Hinge Loss\u003c/h4\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=lex-wsb-med-lda\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nlabel=\u003clabel-file\u003e\r\n```\r\n\r\n- See [Zhu et al. (2012) and (Zhu et al. (2014)](#med_lda_ref) for hinge loss.\r\n- Extends [Lex-WSB-BS-LDA](#lex_wsb_bs_lda_cmd).\r\n- Label is either 1 or -1.\r\n- Optional arguments\r\n\t- `c=\u003cc-value\u003e`: Regularization parameter in hinge loss (default: 1.0). Must be a positive real number.\r\n\r\n### \u003ch3 id=\"bp_lda_cmd\"\u003eBP-LDA: [LDA](#lda_cmd) with Block Priors\u003c/h3\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=bp-lda\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nblock_graph=\u003cblock-graph-file\u003e\r\n```\r\n\r\n- Use priors from pre-computed blocks.\r\n- Extends [LDA](#lda_cmd).\r\n- Semi-optional arguments\r\n\t- `block_graph=\u003cblock-graph-file\u003e` [optional in test]: Pre-computed block file. Each line contains a block and consists of one or more documents denoted by document numbers. Document numbers are separated by space.\r\n- Optional arguments\r\n\t- `alpha_prime=\u003calpha-prime-value\u003e`: Parameter of Dirichlet prior of block distribution over topics (default: 1.0). Must be a positive real number.\r\n\r\n### \u003ch3 id=\"st_lda_cmd\"\u003eST-LDA: Single Topic [LDA](#lda_cmd)\u003c/h3\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=st-lda\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nshort_corpus=\u003cshort-corpus-file\u003e\r\n```\r\n\r\n- Implementation of [Hong et al. (2016)](#st_lda_ref).\r\n- Each document can only be assigned to one topic.\r\n- Extends [LDA](#lda_cmd).\r\n- Semi-optional arguments\r\n\t- `short_corpus=\u003cshort-corpus-file\u003e` [at least one of `short_corpus` and `corpus` should be specified]: Short corpus file.\r\n- Optional arguments\r\n\t- `short_theta=\u003cshort-theta-file\u003e`: Short documents' background topic distribution file.\r\n\t- `short_topic_assign=\u003cshort-topic-assign-file\u003e`: Short documents' topic assignment file.\r\n\r\n### \u003ch3 id=\"wsb_tm_cmd\"\u003eWSB-TM: Weighted Stochastic Block Topic Model\u003c/h3\u003e\r\n\r\n```\r\ntool=lda\r\nmodel=wsb-tm\r\nvocab=\u003cvocab-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\nwsbm_graph=\u003cwsbm-graph-file\u003e\r\n```\r\n\r\n- Use priors from [WSBM](#wsbm_cmd)-computed blocks.\r\n- Extends [LDA](#lda_cmd).\r\n- Semi-optional arguments\r\n\t- `wsbm_graph=\u003cwsbm-graph-file\u003e` [optional in test]: Link file for [WSBM](#cmd) to find blocks. See [WSBM](#wsbm_cmd) for details.\r\n- Optional arguments\r\n\t- `alpha_prime=\u003calpha-prime-value\u003e`: Parameter of Dirichlet prior of block distribution over topics (default: 1.0). Must be a positive real number.\r\n\t- `a=\u003ca-value\u003e`: Parameter of Gamma prior for block link rates (default: 1.0). Must be a positive real number.\r\n\t- `b=\u003cb-value\u003e`: Parameter of Gamma prior for block link rates (default: 1.0). Must be a positive real number.\r\n\t- `gamma=\u003cgamma-value\u003e`: Parameter of Dirichlet prior for block distribution (default: 1.0). Must be a positive real number.\r\n\t- `blocks=\u003cnum-blocks\u003e`: Number of blocks (default: 10). Must be a positive integer.\r\n\t- `directed=true`: Set all edges directed (default: false).\r\n\t- `output_wsbm=\u003cwsbm-output-file\u003e`: File for [WSBM](#wsbm_cmd)-identified blocks. See [WSBM](#wsbm_cmd) for details.\r\n\r\n## \u003ch2 id=\"tlda_cmd\"\u003etLDA in Command Line\u003c/h2\u003e\r\n\r\n```\r\ntool=tlda\r\nvocab=\u003cvocab-file\u003e\r\ntree=\u003ctree-prior-file\u003e\r\ncorpus=\u003ccorpus-file\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\n```\r\n\r\n- Implementation of tree LDA [(Boyd-Graber et al., 2007)](#tlda_ref).\r\n- Required arguments\r\n\t- `\u003cvocab-file\u003e`: Vocabulary file. Each line contains a unique word.\r\n\t- `\u003ctree-prior-file\u003e`: Tree prior file. Generated by [Tree Builder](#tree_builder_cmd)\r\n\t- `\u003ccorpus-file\u003e`: Corpus file in which documents are represented by word indexes and frequencies. Each line contains a document in the following format\r\n\t\r\n\t\t```\r\n\t\t\u003cdoc-len\u003e \u003cword-type-1\u003e:\u003cfrequency-1\u003e \u003cword-type-2\u003e:\u003cfrequency-2\u003e ... \u003cword-type-n\u003e:\u003cfrequency-n\u003e\r\n\t\t```\r\n\t\r\n\t\t`\u003cdoc-len\u003e` is the total number of *tokens* in this document. `\u003cword-type-i\u003e` denotes the i-th word in `\u003cvocab-file\u003e`, starting from 0. Words with zero frequency can be omitted.\r\n\t- `\u003cmodel-file\u003e`: Trained model file. Read and written by program.\r\n- Optional arguments\r\n\t- `test=true`: Use the model for test (default: false).\r\n\t- `verbose=true`: Print log to console (default: true).\r\n\t- `alpha=\u003calpha-value\u003e`: Parameter of Dirichlet prior of document distribution over topics (default: 0.01). Must be a positive real number.\r\n\t- `beta=\u003cbeta-value\u003e`: Parameter of Dirichlet prior of topic distribution over words (default: 0.01). Must be a positive real number.\r\n\t- `topics=\u003cnum-topics\u003e`: Number of topics (default: 10). Must be a positive integer.\r\n\t- `iters=\u003cnum-iters\u003e`: Number of iterations (default: 100). Must be a positive integer.\r\n\t- `update=false`: Update alpha while sampling (default: false).\r\n\t- `update_interval=\u003cupdate-interval\u003e`: Interval of updating alpha (default: 10). Must be a positive integer.\r\n\t- `theta=\u003ctheta-file\u003e`: File for document distribution over topics. Each line contains a document's topic distribution. Topic weights are separated by space.\r\n\t- `output_topic=\u003ctopic-file\u003e`: File for showing topics.\r\n\t- `topic_count=\u003ctopic-count-file\u003e`: File for document-topic counts.\r\n\t- `top_word=\u003cnum-top-word\u003e`: Number of words to give when showing topics (default: 10). Must be a positive integer.\r\n\r\n## \u003ch2 id=\"mtm_cmd\"\u003eMTM in Command Line\u003c/h2\u003e\r\n\r\n```\r\ntool=mtm\r\nnum_langs=\u003cnum-languages\u003e\r\ndict=\u003cdict-file\u003e\r\nvocab=\u003cvocab-files\u003e\r\ncorpus=\u003ccorpus-files\u003e\r\ntrained_model=\u003cmodel-file\u003e\r\n```\r\n\r\n- Implementation of Multilingual Topic Model [(Yang et al., 2019)](#mtm_ref).\r\n- Required arguments\r\n\t- `\u003cnum-languages\u003e`: Number of languages. Must be a postive integer greater than 1.\r\n\t- `\u003cdict-file\u003e`: Dictionary file. Each line contains a word translation pair, represented by four elements separated by tab (\\t): language ID of the first word, first word, language ID of the second word, second word.\r\n\t- `\u003cvocab-files\u003e`: Vocabulary files. One file for each language. File names are separated by comma (,). Each line contains a unique word.\r\n\t- `\u003ccorpus-files\u003e`: Corpus files in which documents are represented by word indexes and frequencies. File names are separated by comma (,). One file for each language. Each line contains a document in the following format\r\n\t\r\n\t\t```\r\n\t\t\u003cdoc-len\u003e \u003cword-type-1\u003e:\u003cfrequency-1\u003e \u003cword-type-2\u003e:\u003cfrequency-2\u003e ... \u003cword-type-n\u003e:\u003cfrequency-n\u003e\r\n\t\t```\r\n\t\r\n\t\t`\u003cdoc-len\u003e` is the total number of *tokens* in this document. `\u003cword-type-i\u003e` denotes the i-th word in `\u003cvocab-file\u003e`, starting from 0. Words with zero frequency can be omitted.\r\n\t- `\u003cmodel-file\u003e`: Trained model file. Read and written by program.\r\n- Optional arguments\r\n\t- `test=true`: Use the model for test (default: false).\r\n\t- `verbose=true`: Print log to console (default: true).\r\n\t- `alpha=\u003calpha-values\u003e`: Parameter of Dirichlet prior of document distribution over topics (default: 0.01). One value for each language. Values separated by comma (,). Must be a positive real number.\r\n\t- `beta=\u003cbeta-values\u003e`: Parameter of Dirichlet prior of topic distribution over words (default: 0.01). One value for each language. Values separated by comma (,). Must be a positive real number.\r\n\t- `topics=\u003cnum-topics\u003e`: Number of topics (default: 10). One value for each language. Values separated by comma (,). Must be a positive integer.\r\n\t- `iters=\u003cnum-iters\u003e`: Number of iterations (default: 100). Must be a positive integer.\r\n\t- `update=false`: Update alpha while sampling (default: false).\r\n\t- `update_interval=\u003cupdate-interval\u003e`: Interval of updating alpha (default: 10). Must be a positive integer.\r\n\t- `theta=\u003ctheta-files\u003e`: Files for document distribution over topics. One file for each language. File names are separated by comma (,). Each line contains a document's topic distribution. Topic weights are separated by space.\r\n\t- `rho=\u003crho-file\u003e`: File for topic transformation matrices. Assuming there are $N$ languages, the file contains $N(N-1)$ matrices. Each matrix starts by a line of string `Rho[i][j]` where `i` and `j` indicate two languages. The following $K_i$ rows contains the topic transformation matrix from language `i` to language `j`, and each row has $K_j$ values separated by spaces, where $K_i$ and $K_j$ are the numbers of topics in languages `i` and `j` respectively.\r\n\t- `output_topic=\u003ctopic-file\u003e`: File for showing topics.\r\n\t- `topic_count=\u003ctopic-count-file\u003e`: Files for document-topic counts. One file for each language. File names are separated by comma (,).\r\n\t- `top_word=\u003cnum-top-word\u003e`: Number of words to give when showing topics (default: 10). Must be a positive integer.\r\n\t- `reg=\u003cregularization-option\u003e`: Regularization option (default: 0). 0 for no regularization, 1 for L1 norm, 2 for L2 norm, 3 for entropy, 4 for identity matrix.\r\n\t- `lambda=\u003clambda-value\u003e`: The regularization coefficient (default: 0.0). Only effective when `reg` is not 0.\r\n\t- `tfidf=true`: Use TF-IDF weights as word translation pairs' weights (default: false).\r\n\t- `word_tf_threshold=\u003cword-term-frequency-threshold\u003e`:  Ignore the word translation pairs if either word's term frequency is equal or lower than the given threshold (default: 0). One value for each language. Values are separated by comma (,). Must be non-negative integers.\r\n\t\r\n## \u003ch2 id=\"other_cmd\"\u003eOther Tools in Command Line\r\n\r\n### \u003ch3 id=\"wsbm_cmd\"\u003eWSBM: Weighted Stochastic Block Model\u003c/h3\u003e\r\n\r\n```\r\ntool=wsbm\r\nnodes=\u003cnum-nodes\u003e\r\nblocks=\u003cnum-blocks\u003e\r\ngraph=\u003cgraph-file\u003e\r\noutput=\u003coutput-file\u003e\r\n```\r\n\r\n- Implementation of [Aicher et al. (2014)](#wsbm_ref).\r\n- Find latent blocks in a network, such that nodes in the same block are densely connected and nodes in different blocks are sparsely connected.\r\n- Required arguments\r\n\t- `\u003cnum-nodes\u003e`: Number of nodes in the graph. Must be a positive integer.\r\n\t- `\u003cnum-blocks\u003e`: Number of blocks. Must be a positive integer.\r\n\t- `\u003cgraph-file\u003e`: Graph file. Each line contains an edge in the format `node-1 \\t node-2 \\t weight`. Node number starts from 0. `weight` must be a non-negative integer. `weight` is optional. Its default value is 1 if not specified.\r\n\t- `\u003coutput-file\u003e`: Result file. The i-th line contains the block assignment of i-th node.\r\n- Optional arguments\r\n\t- `directed=true`: Set the edges as directed (default: false).\r\n\t- `a=\u003ca-value\u003e`: Parameter for edge rates' Gamma prior (default: 1.0). Must be a positive real number.\r\n\t- `b=\u003cb-value\u003e`: Parameter for edge rates' Gamma prior (default: 1.0). Must be a positive real number.\r\n\t- `gamma=\u003cgamma-value\u003e`: Parameter for block distribution's Dirichlet prior (default 1.0). Must be a positive real number.\r\n\t- `iters=\u003cnum-iters\u003e`: Number of iterations (default: 100). Must be a positive integer.\r\n\t- `verbose=true`: Print log to console (default: true).\r\n\r\n### \u003ch3 id=\"scc_cmd\"\u003eSCC: Strongly Connected Components\u003c/h3\u003e\r\n\r\n```\r\ntool=scc\r\nnodes=\u003cnum-nodes\u003e\r\ngraph=\u003cgraph-file\u003e\r\noutput=\u003coutput-file\u003e\r\n```\r\n\r\n- New implementation.\r\n- Find [strongly connected components](https://en.wikipedia.org/wiki/Strongly_connected_component) in an undirected graph. In each component, every node is reachable from any other nodes in the same component.\r\n- Arguments\r\n\t- `\u003cnum-nodes\u003e`: Number of nodes in the graph. Must be a positive integer.\r\n\t- `\u003cgraph-file\u003e`: Graph file. Each line contains an edge in the format `node-1 \\t node-2`. Node number starts from 0.\r\n\t- `\u003coutput-file\u003e`: Result file. Each line contains a strongly connected component and consists of one or more nodes denoted by node numbers. Node numbers are separated by space.\r\n\r\n### \u003ch3 id=\"stoplist_cmd\"\u003eStoplist\u003c/h3\u003e\r\n\r\n```\r\ntool=stoplist\r\ncorpus=\u003ccorpus-file\u003e\r\noutput=\u003coutput-file\u003e\r\n```\r\n\r\n- New implementation.\r\n- Only supports English, but can support other languages if dictionary is provided.\r\n- Required arguments\r\n\t- `\u003ccorpus-file\u003e`: Corpus file with stop words. Each line contains a document. Words are separated by space.\r\n\t- `\u003coutput-file\u003e`: Corpus file without stop words. Each line contains a document. Words are separated by space.\r\n- Optional arguments\r\n\t- `dict=\u003cdict-file\u003e`: Dictionary file name. Each line contains a stop word.\r\n\r\n### \u003ch3 id=\"lemmatizer_cmd\"\u003eLemmatizer\u003c/h3\u003e\r\n\r\n```\r\ntool=lemmatizer\r\ncorpus=\u003ccorpus-file\u003e\r\noutput=\u003coutput-file\u003e\r\n```\r\n\r\n- A re-packaging of `opennlp.tools.lemmatizer.SimpleLemmatizer`.\r\n- Only supports English, but can support other languages if dictionary is provided.\r\n- Required arguments\r\n\t- `\u003ccorpus-file\u003e`: Unlemmatized corpus file. Each line contains a unlemmatized, *tokenized*, and *POS-tagged* document.\r\n\t- `\u003coutput-file\u003e`: Lemmatized corpus file. Each line contains a lemmatized document. Words are separated by space.\r\n- Optional arguments\r\n\t- `dict=\u003cdict-file\u003e`: Dictionary file name. Each line contains a rule in the format `unlemmatized-word \\t POS \\t lemmatized-word`.\r\n\r\n### \u003ch3 id=\"pos_tagger_cmd\"\u003ePOS Tagger\u003c/h3\u003e\r\n\r\n```\r\ntool=pos-tagger\r\ncorpus=\u003ccorpus-file\u003e\r\noutput=\u003coutput-file\u003e\r\n```\r\n\r\n- A re-packaing of `opennlp.tools.postag.POSTaggerME` (\u003chttps://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.postagger\u003e)\r\n- Only supports English, but can support other languages if model is provided.\r\n- Required arguments\r\n\t- `\u003ccorpus-file\u003e`: Untagged corpus file. Each line contains a *tokenized* untagged document.\r\n\t- `\u003coutput-file\u003e`: Tagged corpus file. Each line contains a tagged document. Each word is annotated as `word_POS`.\r\n- Optional arguments\r\n\t- `model=\u003cmodel-file\u003e`: [Model](https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.postagger.training) file name.\r\n\r\n### \u003ch3 id=\"stemmer_cmd\"\u003eStemmer\u003c/h3\u003e\r\n\r\n```\r\ntool=stemmer\r\ncorpus=\u003ccorpus-file\u003e\r\noutput=\u003coutput-file\u003e\r\n```\r\n\r\n- A re-packaging of `PorterStemmer` (\u003chttp://tartarus.org/~martin/PorterStemmer/index.html\u003e)\r\n- Only supports English.\r\n- Arguments\r\n\t- `\u003ccorpus-file\u003e`: Unstemmed corpus file. Each line contains an unstemmed document. Words are separated by space.\r\n\t- `\u003coutput-file\u003e`: Stemmed corpus file. Each line contains a stemmed document. Words are separated by space.\r\n\r\n### \u003ch3 id=\"tokenizer_cmd\"\u003eTokenizer\u003c/h3\u003e\r\n\r\n```\r\ntool=tokenizer\r\ncorpus=\u003ccorpus-file\u003e\r\noutput=\u003coutput-file\u003e\r\n```\r\n\r\n- A re-packaging of `opennlp.tools.tokenize.TokenizerME` (\u003chttps://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.tokenizer\u003e)\r\n- Only supports English, but can support other languages if model is provided.\r\n- Required arguments\r\n\t- `\u003ccorpus-file\u003e`: Untokenized corpus file. Each line contains a untokenized document.\r\n\t- `\u003coutput-file\u003e`: Tokenized corpus file. Each line contains a tokenized document.\r\n- Optional arguments\r\n\t- `model=\u003cmodel-file\u003e`: [Model](\u003chttps://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.tokenizer.training\u003e) file name.\r\n\r\n### \u003ch3 id=\"corpus_converter_cmd\"\u003eCorpus Converter\u003c/h3\u003e\r\n\r\n```\r\ntool=corpus-converter\r\nget_vocab|to_index|to_word=true\r\nword_corpus=\u003cword-corpus-file\u003e\r\nindex_corpus=\u003cindex-corpus-file\u003e\r\nvocab=\u003cvocab-file\u003e\r\n```\r\n\r\n- New implementation\r\n- Arguments\r\n\t- `get_vocab`, `to_index`, `to_word`: Only one of them should be true.\r\n\t\t- `get_vocab`: Collect vocabulary from `\u003cword-corpus-file\u003e` and write them in `\u003cvocab-file\u003e`.\r\n\t\t- `to_index`: Convert a word corpus file `\u003cword-corpus-file\u003e` into an indexed corpus file `\u003cindex-corpus-file\u003e` and write the vocabulary in `\u003cvocab-file\u003e`.\r\n\t\t- `to_word`: Convert an indexed corpus file `\u003cindex-corpus-file\u003e` into a word corpus file `\u003cword-corpus-file\u003e` given vocabulary file `\u003cvocab-file\u003e`.\r\n\t- `\u003cword-corpus-file\u003e`: Corpus file in which documents are represented by words. Each line contains a document. Words are separated by space.\r\n\t- `\u003cindex-corpus-file\u003e`: Corpus file in which documents are represented by word indexes and frequencies. Not required when using `--get-vocab`. Each line contains a document in the following format\r\n\t\r\n\t\t```\r\n\t\t\u003cdoc-len\u003e \u003cword-type-1\u003e:\u003cfrequency-1\u003e \u003cword-type-2\u003e:\u003cfrequency-2\u003e ... \u003cword-type-n\u003e:\u003cfrequency-n\u003e\r\n\t\t```\r\n\t\r\n\t\t`\u003cdoc-len\u003e` is the total number of *tokens* in this document. `\u003cword-type-i\u003e` denotes the i-th word in `\u003cvocab-file\u003e`, starting from 0. Words with zero frequency can be omitted.\r\n\r\n\t- `\u003cvocab-file\u003e`: Vocabulary file. Each line contains a unique word.\r\n\r\n### \u003ch3 id=\"tree_builder_cmd\"\u003eTree Builder\u003c/h3\u003e\r\n\r\n```\r\ntool=tree-builder\r\nvocab=\u003cvocab-file\u003e\r\nscore=\u003cscore-file\u003e\r\ntree=\u003ctree-file\u003e\r\n```\r\n\r\n- Implementation of [Yang et al. (2017)](#tree_builder_ref)\r\n- Arguments\r\n\t- `\u003cvocab-file\u003e`: Vocabulary file. Each line contains a unique word.\r\n\t- `\u003cscore-file\u003e`: Word association file. Assume there are V words in `\u003cvocab-file\u003e`. There are V lines in the `\u003cscore-file\u003e`. Each line corresponds to a word in the vocabulary and contains V float numbers which denote the word's association scores with all other words.\r\n\t- `\u003ctree-file\u003e`: The tree prior file.\r\n- Optional Arguments\r\n\t- `type=\u003ctree-type\u003e`: Tree prior type. 1 for two-level tree; 2 for hierarchical agglomerative clustering (HAC) tree; 3 for HAC tree with leaf duplication (default 1).\r\n\t- `child=\u003cnum-child\u003e`: Number of child nodes per internal node for a two-level tree (default 10).\r\n\t- `thresh=\u003cthreshold\u003e`: The confidence threshold for HAC (default 0.0).\r\n\r\n## \u003ch2 id=\"code_examples\"\u003eUse YWWTools Source Code\u003c/h2\u003e\r\n\r\nTo integrate my code into your project, please include `YWWTools-v2.jar` and everything in `lib/` to your project dependency.\r\n\r\nHere are examples for running some algorithms in this package. For more information, please look at JavaDoc in `doc/`.\r\n\r\n## \u003ch2 id=\"lda_code\"\u003eLDA Code Examples\u003c/h2\u003e\r\n\r\n- Classes: `yang.weiwei.lda.LDA` and `yang.weiwei.lda.LDAParam`.\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\"); //initialize a parameter object and set parameters as needed\r\n\t\tLDA ldaTrain = new LDA(param); // initialize an LDA object\r\n\t\tldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100); // set number of iterations as needed\r\n\t\tldaTrain.writeModel(\"model_file_name\"); // optional, see test code example\r\n\t\tldaTrain.writeDocTopicDist(\"theta_file_name\"); // optional, write document-topic distribution to file\r\n\t\tldaTrain.writeResult(\"topic_file_name\", 10); // optional, write top 10 words of each topic to file\r\n\t\tldaTrain.writeDocTopicCounts(\"topic_count_file_name\") // optional, write document-topic counts to file\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tLDA ldaTest = new LDA(ldaTrain, param); // initialize with pre-trained LDA object\r\n\t\t// LDA ldaTest = new LDA(\"model_file_name\", param); // or initialize with an LDA model in a file\r\n\t\tldaTest.readCorpus(\"corpus_file_name\");\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100); // set number of iterations as needed\r\n\t\tldaTest.writeDocTopicDist(\"theta_file_name\"); // optional, write document-topic distribution to file\r\n\t\tldaTest.writeDocTopicCounts(\"topic_count_file_name\"); // optional, write document-topic counts to file\r\n\r\n### \u003ch3 id=\"rtm_code\"\u003eRTM\u003c/h3\u003e\r\n\r\n- Class: `yang.weiwei.lda.rtm.RTM`.\r\n- Extends [LDA](#lda_code).\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tRTM ldaTrain = new RTM(param);\r\n\t\tldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\tldaTrain.readGraph(\"train_graph_file_name\", RTM.TRAIN_GRAPH); // read train graph\r\n\t\tldaTrain.readGraph(\"test_graph_file_name\", RTM.TEST_GRAPH); // read test graph\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100); \r\n\t\tldaTrain.writePred(\"pred_file_name\"); // optional, write predicted document link probabilities to file\r\n\t\tldaTrain.writeRegValues(\"reg_value_file_name\"); // optional, write doc-doc regression values to file\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tRTM ldaTest = new RTM(ldaTrain, param);\r\n\t\t// RTM ldaTest = new RTM(\"model_file_name\", param); \r\n\t\tldaTest.readCorpus(\"corpus_file_name\");\r\n\t\tldaTest.readGraph(\"train_graph_file_name\", RTM.TRAIN_GRAPH); // optional\r\n\t\tldaTest.readGraph(\"test_graph_file_name\", RTM.TEST_GRAPH);\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100); \r\n\t\tldaTest.writePred(\"pred_file_name\"); // optional, write predicted document link probabilities to file\r\n\t\tldaTest.writeRegValues(\"reg_value_file_name\"); // optional, write doc-doc regression values to file\r\n\r\n#### \u003ch4 id=\"lex_wsb_rtm_code\"\u003eLex-WSB-RTM\u003c/h4\u003e\r\n\r\n- Class: `yang.weiwei.lda.rtm.lex_wsb_rtm.LexWSBRTM`.\r\n- Extends [RTM](#rtm_code).\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tLexWSBRTM ldaTrain = new LexWSBRTM(param);\r\n\t\tldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\tldaTrain.readGraph(\"train_graph_file_name\", RTM.TRAIN_GRAPH); \r\n\t\tldaTrain.readGraph(\"test_graph_file_name\", RTM.TEST_GRAPH); \r\n\t\tldaTrain.readBlockGraph(\"wsbm_graph_file_name\"); // optional, read graph for WSBM\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100); \r\n\t\tldaTrain.writeBlocks(\"block_file_name\"); // optional, write WSBM results to file\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tLexWSBRTM ldaTest = new LexWSBRTM(ldaTrain, param);\r\n\t\t// LexWSBRTM ldaTest = new LexWSBRTM(\"model_file_name\", param); \r\n\t\tldaTest.readCorpus(\"corpus_file_name\");\r\n\t\tldaTest.readGraph(\"train_graph_file_name\", RTM.TRAIN_GRAPH); // optional\r\n\t\tldaTest.readGraph(\"test_graph_file_name\", RTM.TEST_GRAPH);\r\n\t\tldaTest.readBlockGraph(\"wsbm_graph_file_name\"); // optional\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100); \r\n\t\tldaTest.writeBlocks(\"block_file_name\"); // optional\r\n\r\n#### \u003ch4 id=\"lex_wsb_med_rtm_code\"\u003eLex-WSB-Med-RTM\u003c/h4\u003e\r\n\r\n- Class: `yang.weiwei.lda.rtm.lex_wsb_med_rtm.LexWSBMedRTM`.\r\n- Extends [Lex-WSB-RTM](#lex_wsb_rtm_code).\r\n- Code examples are the same with [Lex-WSB-RTM](#lex_wsb_rtm_code).\r\n\r\n### \u003ch3 id=\"slda_code\"\u003eSLDA\u003c/h3\u003e\r\n\r\n- Class: `yang.weiwei.lda.slda.SLDA`.\r\n- Extends [LDA](#lda_code).\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tSLDA ldaTrain = new SLDA(param);\r\n\t\tldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\tldaTrain.readLabels(\"label_file_name\"); // read label file\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100);\r\n\t\tldaTrain.writePredLabels(\"pred_label_file_name\"); // optional, write predicted labels\r\n\t\tldaTrain.writeRegValues(\"reg_value_file_name\"); // optioanl, write regression values\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tSLDA ldaTest = new SLDA(ldaTrain, param);\r\n\t\t// SLDA ldaTest = new SLDA(\"model_file_name\", param);\r\n\t\tldaTest.readCorpus(\"corpus_file_name\");\r\n\t\tldaTest.readLabels(\"label_file_name\"); // optional\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100);\r\n\t\tldaTest.writePredLabels(\"pred_label_file_name\"); // optional\r\n\t\tldaTest.writeRegValues(\"reg_value_file_name\"); // optional\r\n\r\n#### \u003ch4 id=\"bs_lda_code\"\u003eBS-LDA\u003c/h4\u003e\r\n\r\n- Class: `yang.weiwei.lda.slda.bs_lda.BSLDA`\r\n- Extends [SLDA](#slda_code).\r\n- Code examples are the same with [SLDA](#slda_code).\r\n\r\n#### \u003ch4 id=\"lex_wsb_bs_lda_code\"\u003eLex-WSB-BS-LDA\u003c/h4\u003e\r\n\r\n- Class: `yang.weiwei.lda.slda.lex_wsb_bs_lda.LexWSBBSLDA`.\r\n- Extends [BS-LDA](#bs_lda_code).\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tLexWSBBSLDA ldaTrain = new LexWSBBSLDA(param);\r\n\t\tldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\tldaTrain.readLabels(\"label_file_name\");\r\n\t\tldaTrain.readBlockGraph(\"wsbm_graph_file_name\"); // optional, read graph for WSBM\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100);\r\n\t\tldaTrain.writeBlocks(\"block_file_name\"); // optional, write WSBM results to file\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tLexWSBBSLDA ldaTest = new LexWSBBSLDA(ldaTrain, param);\r\n\t\t// LexWSBBSLDA ldaTest = new LexWSBBSLDA(\"model_file_name\", param);\r\n\t\tldaTest.readCorpus(\"corpus_file_name\");\r\n\t\tldaTest.readLabels(\"label_file_name\"); // optional\r\n\t\tldaTest.readBlockGraph(\"wsbm_graph_file_name\"); // optional\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100);\r\n\t\tldaTest.writePredLabels(\"pred_label_file_name\"); // optional\r\n\t\tldaTest.writeBlocks(\"block_file_name\"); // optional\r\n\r\n#### \u003ch4 id=\"lex_wsb_med_code\"\u003eLex-WSB-Med-LDA\u003c/h4\u003e\r\n\r\n- Class: `yang.weiwei.lda.slda.lex_wsb_med_lda.LexWSBMedLDA`.\r\n- Extends [Lex-WSB-BS-LDA](#lex_wsb_bs_lda_code).\r\n- Code examples are the same with [Lex-WSB-BS-LDA](#lex_wsb_bs_lda).\r\n\r\n### \u003ch3 id=\"bp_lda_code\"\u003eBP-LDA\u003c/h3\u003e\r\n\r\n- Class: `yang.weiwei.lda.bp_lda.BPLDA`\r\n- Extends [LDA](#lda_code).\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tBPLDA ldaTrain = new BPLDA(param); \r\n\t\tldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\tldaTrain.readBlocks(\"block_file_name\"); // read block file\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100);\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tBPLDA ldaTest = new BPLDA(ldaTrain, param);\r\n\t\t// BPLDA ldaTest = new BPLDA(\"model_file_name\", param);\r\n\t\tldaTest.readCorpus(\"corpus_file_name\");\r\n\t\tldaTest.readBlocks(\"block_file_name\"); // optional\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100); \r\n\r\n### \u003ch3 id=\"st_lda_code\"\u003eST-LDA\u003c/h3\u003e\r\n\r\n- Class: `yang.weiwei.lda.st_lda.STLDA`\r\n- Extends [LDA](#lda_code).\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tSTLDA ldaTrain = new STLDA(param);\r\n\t\tldaTrain.readCorpus(\"long_corpus_file_name\");\r\n\t\tldaTrain.readShortCorpus(\"short_corpus_file_name\");\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100);\r\n\t\tldaTrain.writeShortDocTopicDist(\"short_theta_file_name\"); // optional, write short documents' topic distribution to file\r\n\t\tldaTrain.writeShortDocTopicAssign(\"short_topic_assign_file_name\"); // optional, write short documents' topic assignments to file\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tSTLDA ldaTest = new STLDA(ldaTrain, param);\r\n\t\t// STLDA ldaTest = new STLDA(\"model_file_name\", param);\r\n\t\tldaTest.readCorpus(\"long_corpus_file_name\");\r\n\t\tldaTest.readShortCorpus(\"short_corpus_file_name\");\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100);\r\n\t\tldaTest.writeShortDocTopicDist(\"short_theta_file_name\"); // optional\r\n\t\tldaTest.writeShortDocTopicAssign(\"short_topic_assign_file_name\"); // optional\r\n\r\n### \u003ch3 id=\"wsb_tm_code\"\u003eWSB-TM\u003c/h3\u003e\r\n\r\n- Class: `yang.weiwei.lda.wsb_tm.WSBTM`\r\n- Extends [LDA](#lda_code).\r\n- Training code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tWSBTM ldaTrain = new WSBTM(param); \r\n\t\tldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\tldaTrain.readGraph(\"wsbm_graph_file_name\"); // read graph file\r\n\t\tldaTrain.initialize();\r\n\t\tldaTrain.sample(100);\r\n\r\n- Test code example\r\n\r\n\t\tLDAParam param = new LDAParam(\"vocab_file_name\");\r\n\t\tWSBTM ldaTest = new WSBTM(ldaTrain, param);\r\n\t\t// WSBTM ldaTest = new WSBTM(\"model_file_name\", param);\r\n\t\tldaTest.readCorpus(\"corpus_file_name\");\r\n\t\tldaTest.readGraph(\"wsbm_graph_file_name\"); // optional\r\n\t\tldaTest.initialize();\r\n\t\tldaTest.sample(100); \r\n\t\t\r\n## \u003ch2 id=\"tlda_code\"\u003etLDA Code Examples\u003c/h2\u003e\r\n\r\n- Classes: `yang.weiwei.tlda.TLDA` and `yang.weiwei.tlda.TLDAParam`.\r\n- Training code example\r\n\r\n\t\tTLDAParam param = new LDAParam(\"vocab_file_name\", \"tree_prior_file_name\"); //initialize a parameter object and set parameters as needed\r\n\t\tTLDA tldaTrain = new TLDA(param); // initialize a tLDA object\r\n\t\ttldaTrain.readCorpus(\"corpus_file_name\");\r\n\t\ttldaTrain.initialize();\r\n\t\ttldaTrain.sample(100); // set number of iterations as needed\r\n\t\ttldaTrain.writeModel(\"model_file_name\"); // optional, see test code example\r\n\t\ttldaTrain.writeDocTopicDist(\"theta_file_name\"); // optional, write document-topic distribution to file\r\n\t\ttldaTrain.writeWordResult(\"topic_file_name\", 10); // optional, write top 10 words of each topic to file\r\n\t\ttldaTrain.writeDocTopicCounts(\"topic_count_file_name\") // optional, write document-topic counts to file\r\n\r\n- Test code example\r\n\r\n\t\tTLDAParam param = new TLDAParam(\"vocab_file_name\", \"tree_prior_file_name\");\r\n\t\tTLDA tldaTest = new TLDA(tldaTrain, param); // initialize with pre-trained tLDA object\r\n\t\t// TLDA tldaTest = new TLDA(\"model_file_name\", param); // or initialize with a TLDA model in a file\r\n\t\ttldaTest.readCorpus(\"corpus_file_name\");\r\n\t\ttldaTest.initialize();\r\n\t\ttldaTest.sample(100); // set number of iterations as needed\r\n\t\ttldaTest.writeDocTopicDist(\"theta_file_name\"); // optional, write document-topic distribution to file\r\n\t\ttldaTest.writeDocTopicCounts(\"topic_count_file_name\"); // optional, write document-topic counts to file\r\n\r\n## \u003ch2 id=\"mtm_code\"\u003eMultilingual Topic Model Code Examples\u003c/h2\u003e\r\n\r\n- Classes: `yang.weiwei.mtm.MTM` and `yang.weiwei.mtm.MTMParam`.\r\n- Training code example\r\n\r\n\t\tMTMParam param = new MTMParam(vocabFileNames[]); //initialize a parameter object and set parameters as needed\r\n\t\tMTM mtmTrain = new MTM(param); // initialize a MTM object\r\n\t\tmtmTrain.readCorpus(corpusFileNames[]);\r\n\t\tmtmTrain.readWordAssociations(\"dict_file_name\");\r\n\t\tmtmTrain.initialize();\r\n\t\tmtmTrain.sample(100); // set number of iterations as needed\r\n\t\tmtmTrain.writeModel(\"model_file_name\"); // optional, see test code example\r\n\t\tmtmTrain.writeDocTopicDist(thetaFileNames[]); // optional, write document-topic distribution to files\r\n\t\tmtmTrain.writeResult(\"topic_file_name\", 10); // optional, write top 10 words of each topic to file\r\n\t\tmtmTrain.writeDocTopicCounts(topicCountFileNames[]) // optional, write document-topic counts to files\r\n\t\tmtmTrain.writeTopicTransMatrices(\"rho_file_name\"); // optional, write topic transformation matrices to file\r\n\r\n- Test code example\r\n\r\n\t\tMTMParam param = new MTMParam(vocabFileNames[]);\r\n\t\tMTM mtmTest = new MTM(mtmTrain, param); // initialize with pre-trained MTM object\r\n\t\t// MTM mtmTest = new MTM(\"model_file_name\", param); // or initialize with a MTM model in a file\r\n\t\tmtmTest.readCorpus(corpusFileNames[]);\r\n\t\tmtmTest.initialize();\r\n\t\tmtmTest.sample(100); // set number of iterations as needed\r\n\t\tmtmTest.writeDocTopicDist(thetaFileNames[]); // optional, write document-topic distribution to files\r\n\t\tmtmTest.writeDocTopicCounts(topicCountFileNames[]); // optional, write document-topic counts to files\r\n\r\n## \u003ch2 id=\"other_code\"\u003eOther Code Examples\u003c/h2\u003e\r\n\r\n### \u003ch3 id=\"wsbm_code\"\u003eWSBM\u003c/h3\u003e\r\n\r\n- Classes: `yang.weiwei.wsbm.WSBM` and `yang.weiwei.wsbm.WSBMParam`.\r\n- Code example\r\n\r\n\t\tWSBMParam param = new WSBMParam(); // initialize a parameter object and set parameters as needed\r\n\t\tWSBM wsbm = new WSBM(param); // initialize a WSBM object with parameters\r\n\t\twsbm.readGraph(\"graph_file_name\");\r\n\t\twsbm.init();\r\n\t\twsbm.sample(100); // set number of iterations as needed\r\n\t\twsbm.printResults();\r\n\r\n### \u003ch3 id=\"scc_code\"\u003eSCC\u003c/h3\u003e\r\n\r\n- Class: `yang.weiwei.scc.SCC`.\r\n- Code example\r\n\r\n\t\tSCC scc = new SCC(10); // initialize with number of nodes\r\n\t\tscc.readGraph(\"graph_file_name\");\r\n\t\tscc.cluster();\r\n\t\tscc.writeCluster(\"result_file_name\");\r\n\t\t\r\n### \u003ch3 id=\"tree_bulder_code\"\u003eTree Builder\u003c/h3\u003e\r\n\r\n- Class: `yang.weiwei.tlda.TreeBuilder`.\r\n- Code example\r\n\r\n\t\tTreeBuilder tb = new TreeBuilder();\r\n\t\ttb.build2LevelTree(\"score_file_name\", \"vocab_file_name\", \"tree_file_name\", num_Child); // Build a two-level tree\r\n\t\ttb.hac(\"score_file_name\", \"vocab_file_name\", \"tree_file_name\", threshold); // Build a tree with hierarchical agglomerative clustering (HAC)\r\n\t\ttb.hacWithLeafDup(\"score_file_name\", \"vocab_file_name\", \"tree_file_name\", threshold); // Build a tree with HAC and leaf duplication\r\n\t\r\n### \u003ch3 id=\"preprocess\"\u003eEnglish Corpus Preprocessing\u003c/h3\u003e\r\n\r\n- Basically there are two ways to preprocess an English corpus for topic models as follows.\r\n\t- `tokenization` -\u003e `stop words removal` -\u003e `stemming`\r\n\t- `tokenization` -\u003e `POS tagging` -\u003e `lemmatization` -\u003e `stop words removal`\r\n- The first way is quick but with low word readability. The second one takes more time but produce better readability.\r\n- Finally you may want to remove low (document-)frequency words, in order to accelerate topic modeling without hurting the performance.\r\n\r\n## \u003ch2 id=\"citation\"\u003eCitation\u003c/h2\u003e\r\n\r\n- If you use [Tree Builder](#tree_builder_cmd), please cite\r\n\r\n\t\t@InProceedings{Yang:Boyd-Graber:Resnik-2017,\r\n\t\t\tTitle = {Adapting Topic Models using Lexical Associations with Tree Priors},\r\n\t\t\tBooktitle = {Empirical Methods in Natural Language Processing},\r\n\t\t\tAuthor = {Weiwei Yang and Jordan Boyd-Graber and Philip Resnik},\r\n\t\t\tYear = {2017},\r\n\t\t\tLocation = {Copenhagen, Denmark},\r\n\t\t}\r\n\r\n- If you use [Lex-WSB-RTM](#lex_wsb_rtm_cmd) (aka LBS-RTM), [Lex-WSB-Med-RTM](#lex_wsb_med_rtm_cmd) (aka LBH-RTM), [Lex-WSB-BS-LDA](#lex_wsb_bs_lda_cmd), and/or [Lex-WSB-Med-LDA](#lex_wsb_med_lda_cmd), please cite\r\n\r\n\t\t@InProceedings{Yang:Boyd-Graber:Resnik-2016,\r\n\t\t\tTitle = {A Discriminative Topic Model using Document Network Structure},\r\n\t\t\tBooktitle = {Association for Computational Linguistics},\r\n\t\t\tAuthor = {Weiwei Yang and Jordan Boyd-Graber and Philip Resnik},\r\n\t\t\tYear = {2016},\r\n\t\t\tLocation = {Berlin, Germany},\r\n\t\t}\r\n\r\n- If you use [ST-LDA](#st_lda_cmd), please cite\r\n\r\n\t\t@InProceedings{Hong:Yang:Resnik:Frias-Martinez-2016,\r\n\t\t\tTitle = {Uncovering Topic Dynamics of Social Media and News: The Case of Ferguson},\r\n\t\t\tBooktitle = {International Conference on Social Informatics},\r\n\t\t\tAuthor = {Lingzi Hong and Weiwei Yang and Philip Resnik and Vanessa Frias-Martinez},\r\n\t\t\tYear = {2016},\r\n\t\t\tLocation = {Bellevue, WA, USA}\r\n\t\t}\r\n\r\n- If you use [MTM](#mtm_cmd), please cite\r\n\r\n\t\t@InProceedings{Yang:Boyd-Graber:Resnik-2019,\r\n\t\t\tTitle = {A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability},\r\n\t\t\tBooktitle = {Empirical Methods in Natural Language Processing},\r\n\t\t\tAuthor = {Weiwei Yang and Jordan Boyd-Graber and Philip Resnik},\r\n\t\t\tYear = {2019},\r\n\t\t\tLocation = {Hong Kong, China},\r\n\t\t}\r\n\r\n## \u003ch2 id=\"ref\"\u003eReferences\u003c/h2\u003e\r\n\r\n### \u003ch3 id=\"lda_ref\"\u003e[LDA](#lda_cmd): Latent Dirichlet Allocation\u003c/h3\u003e\r\n\r\nDavid M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research.\r\n\r\n### \u003ch3 id=\"slda_ref\"\u003e[SLDA](#slda_cmd): Supervised [LDA](#lda_cmd)\u003c/h3\u003e\r\n\r\nJon D. McAuliffe and David M. Blei. 2008. Supervised topic models. In Proceedings of Advances in Neural Information Processing Systems.\r\n\r\n### \u003ch3 id=\"med_lda_ref\"\u003eMed-LDA: Max-margin [LDA](#lda_cmd)\u003c/h3\u003e\r\n\r\nJun Zhu, Amr Ahmed, and Eric P. Xing. 2012. MedLDA: Maximum margin supervised topic models. Journal of Machine Learning Research.\r\n\r\nJun Zhu, Ning Chen, Hugh Perkins, and Bo Zhang. 2014. Gibbs max-margin topic models with data augmentation. Journal of Machine Learning Research.\r\n\r\n### \u003ch3 id=\"rtm_ref\"\u003e[RTM](#rtm_cmd): Relational Topic Model\u003c/h3\u003e\r\n\r\nJonathan Chang and David M. Blei. 2010. Hierarchical relational models for document networks. The Annals of Applied Statistics.\r\n\r\n### \u003ch3 id=\"lex_wsb_med_rtm_ref\"\u003e[Lex-WSB-Med-RTM](#lex_wsb_med_rtm_cmd): [RTM](#rtm_cmd) with WSB-computed Block Priors, Lexical Weights, and Hinge Loss\r\n\r\nWeiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2016. A discriminative topic model using document network structure. In Proceedings of Association for Computational Linguistics.\r\n\r\n### \u003ch3 id=\"st_lda_ref\"\u003e[ST-LDA](#st_lda_cmd): Single Topic [LDA](#lda_cmd)\r\n\r\nLingzi Hong, Weiwei Yang, Philip Resnik, and Vanessa Frias-Martinez. 2016. Uncovering topic dynamics of social media and news: The case of Ferguson. In Proceedings of International Conference on Social Informatics.\r\n\r\n### \u003ch3 id=\"wsbm_ref\"\u003e[WSBM](#wsbm_cmd): Weighted Stochastic Block Model\u003c/h3\u003e\r\n\r\nChristopher Aicher, Abigail Z. Jacobs, and Aaron Clauset. 2014. Learning latent block structure in weighted networks. Journal of Complex Networks.\r\n\r\n### \u003ch3 id=\"tlda_ref\"\u003e[tLDA](#tlda_cmd): Tree [LDA](#lda_cmd)\r\n\r\nJordan Boyd-Graber, David M. Blei, and Xiaojin Zhu. 2007. A topic model for word sense disambiguation. Empirical Methods in Natural Language Processing.\r\n\r\n### \u003ch3 id=\"tree_builder_ref\"\u003e[Tree Builder](#tree_builder_cmd)\u003c/h3\u003e\r\n\r\nWeiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2017. Adapting topic models using lexical associations with tree priors. Empirical Methods in Natural Language Processing.\r\n\r\n### \u003ch3 id=\"mtm_ref\"\u003e[MTM](#mtm_cmd): Multilingual Topic Model\u003c/h3\u003e\r\n\r\nWeiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2019. A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability. Empirical Methods in Natural Language Processing.\r\n\r\n[Back to Top](#top)\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fywwbill%2FYWWTools-v2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fywwbill%2FYWWTools-v2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fywwbill%2FYWWTools-v2/lists"}