{"id":13697213,"url":"https://github.com/blei-lab/diln","last_synced_at":"2025-04-23T20:25:44.831Z","repository":{"id":21681579,"uuid":"25002795","full_name":"blei-lab/diln","owner":"blei-lab","description":"This implements the discrete infinite logistic normal, a Bayesian nonparametric topic model that finds correlated topics.","archived":false,"fork":false,"pushed_at":"2014-10-09T18:27:42.000Z","size":132,"stargazers_count":6,"open_issues_count":0,"forks_count":3,"subscribers_count":30,"default_branch":"master","last_synced_at":"2025-03-30T03:11:44.170Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-2.1","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blei-lab.png","metadata":{"files":{"readme":"README.txt","changelog":null,"contributing":null,"funding":null,"license":"license.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-10-09T18:26:31.000Z","updated_at":"2020-01-08T10:37:26.000Z","dependencies_parsed_at":"2022-08-17T23:15:31.188Z","dependency_job_id":null,"html_url":"https://github.com/blei-lab/diln","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Fdiln","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Fdiln/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Fdiln/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Fdiln/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blei-lab","download_url":"https://codeload.github.com/blei-lab/diln/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250507675,"owners_count":21442057,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T18:00:54.050Z","updated_at":"2025-04-23T20:25:44.808Z","avatar_url":"https://github.com/blei-lab.png","language":"C","funding_links":[],"categories":["Research Implementations"],"sub_categories":["Embedding based Topic Models"],"readme":"-----------------------------------------------------------------------\r\nThe Discrete Infinite Logistic Normal (with HDP option) in C\r\n-----------------------------------------------------------------------\r\n\r\n(C) Copyright 2010, John Paisley, Chong Wang and David Blei\r\n\r\nWritten by John Paisley, jpaisley@princeton.edu.\r\n\r\nThis file is part of DILN-C\r\n\r\nDILN-C is free software; you can redistribute it and/or modify it under \r\nthe terms of the GNU General Public License as published by the Free \r\nSoftware Foundation; either version 2 of the License, or \r\n(at your option) any later version.\r\n\r\nDILN-C is distributed in the hope that it will be useful, but WITHOUT \r\nANY WARRANTY; without even the implied warranty of MERCHANTABILITY \r\nor FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public \r\nLicense for more details.\r\n\r\nYou should have received a copy of the GNU General Public License \r\nalong with this program; if not, write to the Free Software Foundation,  \r\nInc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA\r\n\r\n-----------------------------------------------------------------------\r\n\r\nThis is a C implementation of the discrete infinite logistic normal (DILN) \r\nfor topic modeling. Variational Bayes is used for inference. \r\n\r\nThe hierarchical Dirichlet process (HDP) is also a model option.\r\n\r\nIn both model priors, the top-level is represented as a stick-breaking\r\nDirichlet process, and each second-level probability distribution is \r\nrepresented as the normalization of a sequence of gamma random variables.\r\n\r\nThis code requires the GSL, http://www.gnu.org/software/gsl/\r\n\r\n-----------------------------------------------------------------------\r\n\r\nTABLE OF CONTENTS\r\n\r\n\r\nA. COMPILING\r\n\r\nB. DATA FORMAT\r\n\r\nC. TRAINING ON A CORPUS\r\n\r\nD. OUTPUT\r\n\r\nE. FILES INCLUDED\r\n\r\n-----------------------------------------------------------------------\r\n\r\nA. COMPILING\r\n\r\nType \"make\" in a shell. You will need to change the Makefile to\r\npoint to the GSL on your machine.\r\n\r\n\r\nB. DATA FORMAT ********************************************************\r\n\r\nThis code uses the same data format as in CTM-C by David M. Blei.\r\nA data file contains an entire corpus for training. Each line of a\r\ndata file represents a document as follows:\r\n\r\n    [M] [term_1]:[count_1] [term_2]:[count_2] ...  [term_N]:[count_N]\r\n\r\n[M]: The number of unique terms in the document\r\n\r\n[term_i]: An integer associated with the i-th term in a vocabulary.\r\n\r\n[count_i]: The number of times that the i-th term appears in the document.\r\n\r\nNotes: [count_i] [term_i+1] are separated by a space. Only terms with \r\ncounts greater than zero should be included.\r\n\r\n\r\nC. TRAINING ON A CORPUS ************************************************\r\n\r\nBelow is a list of inputs to DILNtm.exe\r\n\r\nCommand Line: DILNtm.exe argv[1] argv[2] argv[3] argv[4] argv[5] (optional)\r\n\r\nargv[1] : corpus file\r\nargv[2] : number of topics (must be \u003e 2)\r\nargv[3] : method (1 = DILN, 2 = HDP)\r\nargv[4] : if argv[4] integer -\u003e number of iterations\r\n          if 0 \u003c argv[4] \u003c 1 -\u003e error threshold (fractional change in bound)\r\nargv[5] : Dirichlet base concentration parameter\r\n          default = 0.5*|Vocab| -\u003e Dir(0.5,...,0.5)\r\n\r\nWe currently do not provide the ability to do testing.\r\n\r\n\r\nD. OUTPUT **************************************************************\r\n\r\nThe code outputs parameter values into individual csv files. The list of output\r\nparameters are given below (output files are [name].txt). (*) indicates that \r\nthese parameters are not output for HDP.\r\n\r\n--- Below, each column is a document and each row is a topic ---\r\n\r\nA:    matrix of posterior gamma parameters (first parameter)\r\nB:    matrix of posterior gamma parameters (second parameter)\r\n*mu:  matrix of log-normal vector posterior means (doc specific)\r\n*sig: matrix of log-normal vector posterior variances (doc specific)\r\n\r\n    --------------------------------------------------------\r\n\r\n*u:     posterior mean of log-normal vectors\r\n*Kern:  posterior covariance matrix (kernel) for log-normal vectors\r\nV:      top-level stick-breaking proportions\r\nGam:    posterior of topics. each row is a topic. each col is a word\r\nLbound: lower bound as a function of iteration\r\nalpha:\ttop-level scaling parameter\r\nbeta: \tsecond-level scaling parameter\r\n\r\n\r\nE. FILES INCLUDED *******************************************************\r\n\r\nmain.c\r\nDILNfunctions.c (.h) : functions specific to DILN (HDP) inference\r\ngsl_wrapper.c (.h) : wrapper functions to interact with the gsl\r\nimportData.c (.h) : functions for importing (and exporting) data\r\n\r\nsettings.txt : Contains additional initializations and settings not input\r\nin the command line. The default values are:\r\n\r\n   alpha_init = 20        (top-level scaling parameter initialization)\r\n   beta_init = 5 \t  (second-level scaling parameter initialization)\r\n   bool_learn_alpha = 1   (a boolean indicating whether to learn alpha)\r\n   bool_learn_beta = 0    (a boolean indicating whether to learn beta)\r\n   Kmeans_iterations = 1  (number of Kmeans iterations for initialization)\r\n\r\nMakefile : should be changed to point to the GSL on your machine\r\nREADME.txt : this file\r\nlicense.txt : gnu license\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblei-lab%2Fdiln","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblei-lab%2Fdiln","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblei-lab%2Fdiln/lists"}