{"id":20049899,"url":"https://github.com/kymmt90/lda","last_synced_at":"2025-10-15T02:26:42.730Z","repository":{"id":27161459,"uuid":"30630840","full_name":"kymmt90/LDA","owner":"kymmt90","description":"Latent Dirichlet Allocation in Java 8","archived":false,"fork":false,"pushed_at":"2015-03-15T03:18:35.000Z","size":1324,"stargazers_count":0,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-24T03:58:17.285Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"Yalantis/PullToRefresh","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kymmt90.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-11T04:39:24.000Z","updated_at":"2015-03-15T03:17:48.000Z","dependencies_parsed_at":"2022-08-07T12:15:33.721Z","dependency_job_id":null,"html_url":"https://github.com/kymmt90/LDA","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kymmt90%2FLDA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kymmt90%2FLDA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kymmt90%2FLDA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kymmt90%2FLDA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kymmt90","download_url":"https://codeload.github.com/kymmt90/LDA/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241480799,"owners_count":19969731,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T11:53:11.072Z","updated_at":"2025-10-15T02:26:37.711Z","avatar_url":"https://github.com/kymmt90.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"LDA in Java 8\n=============\n\nLatent Dirichlet Allocation in Java 8.\n\nLatent Dirichlet Allocation (LDA) [Blei+ 2003] is the basic probabilistic topic model.\nPlease see following for more details:\n\n- [Latent Dirichlet allocation - Wikipedia, the free encyclopedia](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)\n\nNow, this software supports [collapsed Gibbs sampling](http://psiexp.ss.uci.edu/research/papers/sciencetopics.pdf) [Griffiths and Steyvers 2004] for model inference.\n\nThis repository includes dataset from [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets) [Lichman 2013].\n\nRequierments\n------------\n\n- Java 8\n- Apache Commons\n  - Math\n  - Lang\n- Maven\n\nFor unit testing, these libraries are also needed.\n\n- JUnit\n- Mockito\n\nUsage\n-----\n\n### Dataset Form\n\nThe form of bag-of-words dataset follows [Bag of Words Data Set](https://archive.ics.uci.edu/ml/datasets/Bag+of+Words) in [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.html).\nThe form of doc-vocab-count dataset is following:\n\n    #Documents\n    #Vocabularies\n    #NonZeros\n    docID vocabID count\n    docID vocabID count\n    ...\n    docID vocabID count\n\nThe form of vocabularies dataset is following:\n\n    vocab1\n    vocab2\n    vocab3\n    ...\n    vocabN\n\nEach number of lines is `vocabID`.\n\n### Example\n\nThere is `lda.BagOfWords` to read dataset from files.\n`lda.BagOfWords` object and other parameters are passed to initialize `lda.LDA`.\nFor example:\n\n    Dataset dataset = new Dataset(\"path/to/doc-vocab-counts\", \"path/to/vocabs\");\n    LDA lda = new LDA(0.1                    /* initial alpha */,\n                      0.1                    /* initial beta */,\n                      50                     /* the number of topics */,\n                      bow                    /* bag-of-words */,\n                      CGS                    /* use collapsed Gibbs sampler for inference */,\n                      \"path/to/properties\"   /* properties file path */);\n    lda.run();\n\nThese items are available as properties:\n\n    numIteration=\u003cthe number of iteration of collapsed Gibbs sampling\u003e\n    seed=\u003cseed for the pseudo random number generator\u003e\n\nThe results of topics can be refered as follows:\n\n    List\u003cPair\u003cString, Double\u003e\u003e vocabs\n        = LDA.getVocabsSortedByPhi(0 /* = topic ID */);\n    vocabs.get(0).getLeft();  // the largest probability vocabulary in topic-0\n    vocabs.get(0).getRight(); // the probability value of the above vocabulary\n\nPlease see `example.Example#main` for more details.\nExecute these commands at the directory `LDA` to build and run `example.Example#main`.\n\n    $ mvn clean package dependency:copy-dependencies -DincludeScope=runtime\n    $ java -jar target/LDA-\u003cversion\u003e.jar\n\nLicense\n-------\n\n- [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkymmt90%2Flda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkymmt90%2Flda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkymmt90%2Flda/lists"}