{"id":13696637,"url":"https://github.com/MIND-Lab/Constrained-RTM","last_synced_at":"2025-05-03T17:32:03.047Z","repository":{"id":39959767,"uuid":"182282449","full_name":"MIND-Lab/Constrained-RTM","owner":"MIND-Lab","description":"Constrained Relational Topic Models that use potential functions to incorporate label knowledge in the form of document constraints","archived":false,"fork":false,"pushed_at":"2022-07-01T20:39:19.000Z","size":305141,"stargazers_count":8,"open_issues_count":2,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-13T00:32:58.457Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MIND-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-19T15:05:33.000Z","updated_at":"2024-03-27T02:05:37.000Z","dependencies_parsed_at":"2022-08-24T20:20:45.777Z","dependency_job_id":null,"html_url":"https://github.com/MIND-Lab/Constrained-RTM","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIND-Lab%2FConstrained-RTM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIND-Lab%2FConstrained-RTM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIND-Lab%2FConstrained-RTM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MIND-Lab%2FConstrained-RTM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MIND-Lab","download_url":"https://codeload.github.com/MIND-Lab/Constrained-RTM/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252226805,"owners_count":21714871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T18:00:43.903Z","updated_at":"2025-05-03T17:32:00.016Z","avatar_url":"https://github.com/MIND-Lab.png","language":"Java","funding_links":[],"categories":["Models"],"sub_categories":["Relational Topic Model (RTM)"],"readme":"# \u003ch1 id=\"top\"\u003eConstrained Relational Topic Models\u003c/h1\u003e\r\nImplementation of Constrained Relational Topic Models (C-RTM), proposed in the paper \"Constrained Relational Topic Models\" \u003ca href=\"https://doi.org/10.1016/j.ins.2019.09.039\"\u003e[https://doi.org/10.1016/j.ins.2019.09.039]\u003c/a\u003e accepted in Information Sciences, 2020.\r\nCRTM is a family of topic models that extend the well-know [Relational Topic Models (Chang, 2009)](#rtm). It models the structure of a document network and incorporates other types of relational information obtained by prior domain knowledge. This implementation extends the code from the package of ([Weiwei Yang](http://cs.umd.edu/~wwyang/)'s). \r\n\r\n\r\n## \u003ch2 id=\"clda\"\u003eExecution of the program in Command Line\u003c/h2\u003e\r\n```\r\njava -cp YWWTools.jar:deps.jar yang.weiwei.Tools --tool lda --model lda --constrained true --vocab \u003cvocab-file\u003e --corpus \u003ccorpus-file\u003e --trained-model \u003cmodel-file\u003e\r\n```\r\n- Required arguments\r\n\t- `--constrained true`: it must be set to true to allow the incorporation of prior knowledge constraints.\r\n\t- `\u003cvocab-file\u003e`: Vocabulary file. Each line contains a unique word.\r\n\t- `\u003ccorpus-file\u003e`: Corpus file in which documents are represented by word indexes and frequencies. Each line contains a document in the following format\r\n\r\n\t\t```\r\n\t\t\u003cdoc-len\u003e \u003cword-type-1\u003e:\u003cfrequency-1\u003e \u003cword-type-2\u003e:\u003cfrequency-2\u003e ... \u003cword-type-n\u003e:\u003cfrequency-n\u003e\r\n\t\t```\r\n\t\r\n\t\t`\u003cdoc-len\u003e` is the total number of *tokens* in this document. `\u003cword-type-i\u003e` denotes the i-th word in `\u003cvocab-file\u003e`, starting from 0. Words with zero frequency can be omitted.\r\n\t- `\u003cmodel-file\u003e`: Trained model file in JSON format. Read and written by program. \r\n  - `--train-c-file \u003cconstraint-file\u003e`: File containing the document constraints. Each line contains a constraint in the following format\r\n  \r\n    ```\r\n    \u003cconstraint-type\u003e \u003cdocument-1\u003e \u003cdocument-2\u003e\r\n    ```\r\n    \r\n    `\u003cdocument-1\u003e` is row-id of document-1. `\u003cdocument-2\u003e` is row-id of document-2. `\u003cconstraint-type\u003e` must be set to `M` (if it is a must-constraint) or `C` (if it is a cannot-constraint).\r\n- Optional arguments\r\n\t- `--model \u003cmodel-name\u003e`: The topic model you want to use (default: [LDA](#lda_cmd)). Tested `\u003cmodel-name\u003e` (case unsensitive) are\r\n\t\t- LDA: Constrained LDA\r\n\t\t- RTM: Constrained Relational topic model.\r\n    - other models as extensions of LDA implemented by Weiwei Yang can be used and are already provided in the code.\r\n    - `--newfun \u003cboolean\u003e`: Type of potential function of the constrained model. Default: `false`. If true, it is normalized. Otherwise it corresponds to the potential function described in [SC-LDA](#sclda).\r\n    - `--lambda \u003clambda\u003e`: Strength parameter for the potential function described in [SC-LDA](#sclda). It is valid only if `--newfun false`.\r\n\t- `--no-verbose`: Stop printing log to console.\r\n\t- `--alpha \u003calpha-value\u003e`: Parameter of Dirichlet prior of document distribution over topics (default: 1.0). Must be a positive real number.\r\n\t- `--beta \u003cbeta-value\u003e`: Parameter of Dirichlet prior of topic distribution over words (default: 0.1). Must be a positive real number.\r\n\t- `--topics \u003cnum-topics\u003e`: Number of topics (default: 10). Must be a positive integer.\r\n\t- `--iters \u003cnum-iters\u003e`: Number of iterations (default: 100). Must be a positive integer.\r\n\t- `--update`: Update alpha while sampling (default: false). It does not work well.\r\n\t- `--update-int \u003cupdate-interval\u003e`: Interval of updating alpha (default: 10). Must be a positive integer.\r\n\t- `--theta \u003ctheta-file\u003e`: File for document distribution over topics. Each line contains a document's topic distribution. Topic weights are separated by space.\r\n\t- `--output-topic \u003ctopic-file\u003e`: File for showing topics.\r\n\t- `--topic-count \u003ctopic-count-file\u003e`: File for document-topic counts.\r\n\t- `--top-word \u003cnum-top-word\u003e`: Number of words to give when showing topics (default: 10). Must be a positive integer.\r\n  - `--burn-in \u003cburnin\u003e`: Number of burn-in iterations. Default: 0.\r\n\r\n## \u003ch2 id=\"datasets\"\u003eDatasets\u003c/h2\u003e\r\nThree benchmark relational [datasets](http://www.cs.umd.edu/~sen/lbc-proj/LBC.html) are included in their related folders. They are already preprocessed and ready to be used as input for the model. \r\nNotice that the file `labels.txt` can be used to create the must- and cannot-constraints. Two random documents can be extracted and if their labels are the same, a must-constraint may be added to the `\u003cconstraint-file\u003e`, otherwise a cannot-constraint may be added.\r\n\r\n## \u003ch2 id=\"references\"\u003e[References](#references) \u003c/h2\u003e\r\n### \u003ch3 id=\"sclda\"\u003e[SC-LDA](#sclda): Sparse Constrained LDA \u003c/h3\u003e\r\n\r\nYang, Y., Downey, D., Boyd-Graber, J.: Efficient Methods for Incorporating Knowledge into Topic Models. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 308-317 (2015)\r\n\r\n### \u003ch3 id=\"rtm\"\u003e[RTM](#rtm): Relational Topic Models \u003c/h3\u003e\r\n\r\nJonathan Chang, David M. Blei: Relational Topic Models for Document Networks. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS) 2009: 81-88\r\n\r\n[Back to Top](#top)\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMIND-Lab%2FConstrained-RTM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMIND-Lab%2FConstrained-RTM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMIND-Lab%2FConstrained-RTM/lists"}