{"id":13696482,"url":"https://github.com/chyikwei/bnp","last_synced_at":"2025-05-03T17:31:16.527Z","repository":{"id":148964841,"uuid":"83863958","full_name":"chyikwei/bnp","owner":"chyikwei","description":"Bayesian nonparametric models for python","archived":false,"fork":false,"pushed_at":"2018-09-11T14:39:53.000Z","size":6751,"stargazers_count":17,"open_issues_count":1,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-13T00:32:52.714Z","etag":null,"topics":["bayesian","data-analysis","probabilistic-graphical-models","python","topic-modeling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chyikwei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-03-04T04:08:08.000Z","updated_at":"2023-02-14T02:01:29.000Z","dependencies_parsed_at":"2024-04-08T02:57:17.917Z","dependency_job_id":"f145d9bc-46b0-419b-b1ba-7769307d40d9","html_url":"https://github.com/chyikwei/bnp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chyikwei%2Fbnp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chyikwei%2Fbnp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chyikwei%2Fbnp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chyikwei%2Fbnp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chyikwei","download_url":"https://codeload.github.com/chyikwei/bnp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252226705,"owners_count":21714853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian","data-analysis","probabilistic-graphical-models","python","topic-modeling"],"created_at":"2024-08-02T18:00:40.880Z","updated_at":"2025-05-03T17:31:16.048Z","avatar_url":"https://github.com/chyikwei.png","language":"Python","funding_links":[],"categories":["Models"],"sub_categories":["Hierarchical Dirichlet Process (HDP) [:page_facing_up:](https://papers.nips.cc/paper/2004/file/fb4ab556bc42d6f0ee0f9e24ec4d1af0-Paper.pdf)"],"readme":"[![Build Status](https://travis-ci.org/chyikwei/bnp.svg?branch=master)](https://travis-ci.org/chyikwei/bnp)\n[![Build Status](https://circleci.com/gh/chyikwei/bnp.png?\u0026style=shield)](https://circleci.com/gh/gh/chyikwei/bnp)\n[![Coverage Status](https://coveralls.io/repos/github/chyikwei/bnp/badge.svg?branch=master)](https://coveralls.io/github/chyikwei/bnp?branch=master)\n\n# Bayesian Nonparametric\nBayesian Nonparametric models with Python.\n\nModels follow scikit-learn's API and can be used as its extension.\n\nCurrent model:\n--------------\n- **Hierarchical Dirichlet Process**\n\n   HDP is similar to LDA (Latent Direchlet Allocation) but assumes an \"infinite\" number of topics. This implementation is based on Chong Wang's online-hdp and optimized with cython.\n  \n\nReference:\n----------\n- \"Stochastic Variational Inference\", Matthew D. Hoffman, David M. Blei, Chong Wang, John Paisley, 2013\n- \"Online Variational Inference for the Hierarchical Dirichlet Process\", Chong Wang, John Paisley, David M. Blei, 2011\n- Chong Wang's [online-hdp code](https://github.com/blei-lab/online-hdp).\n\nInstall:\n--------\n```\n# clone repoisitory\ngit clone git@github.com:chyikwei/bnp.git\ncd bnp\n\n# install dependencies (cython, numpy, scipy, scikit-learn)\npip install -r requirements.txt\npip install .\n```\n\nGetting started:\n----------------\nIn `bnp.utils` we proivde a function to generate fake document-word matrix with hidden topics. We will run our HDP model with it.\n\nFirst, we can generate a document-word matrix with 5 hidden topics. (each topic has 10 uniuque words and each topic has 100 docs.)\n\n```python\n\u003e\u003e\u003e from __future__ import print_function\n\u003e\u003e\u003e from bnp.online_hdp import HierarchicalDirichletProcess\n\u003e\u003e\u003e from bnp.utils import make_doc_word_matrix\n\n\u003e\u003e\u003e tf = make_doc_word_matrix(n_topics=5,\n...                           words_per_topic=10,\n...                           docs_per_topic=100,\n...                           words_per_doc=20,\n...                           shuffle=True,\n...                           random_state=0)\n\u003e\u003e\u003e tf.shape\n(500, 50)\n```\n\nFor samples in the matrix, each row(document) only contains words from a specific topic (word 0 to 9: topic 1, 10 to 19: topic 2,...)\n\n```python\n\u003e\u003e\u003e tf[0].toarray()\narray([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 1, 4, 1, 2, 3, 3, 0, 0,\n        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n        0, 0, 0, 0, 0, 0]])\n\u003e\u003e\u003e tf[1].toarray()\narray([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n        0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 3, 1, 3, 2, 1, 2, 0, 3, 0, 0, 0, 0,\n        0, 0, 0, 0, 0, 0]])\n```\n\nNext we fit a HDP model with this matrix\n\n```python\n\u003e\u003e\u003e hdp = HierarchicalDirichletProcess(n_topic_truncate=10,\n...                                    n_doc_truncate=3,\n...                                    max_iter=5,\n...                                    random_state=0)\n\u003e\u003e\u003e hdp.fit(tf)\n```\n\nThen we can print out topic proportion and top topic words in HDP model.\n\n```python\n# print topic function\n\u003e\u003e\u003e def print_top_words(model, n_words):\n...     topic_distr = model.topic_distribution()\n...     for topic_idx in range(model.lambda_.shape[0]):\n...         topic = model.lambda_[topic_idx, :]\n...         message = \"Topic %d (proportion: %.2f): \" % (topic_idx, topic_distr[topic_idx])\n...         message += \" \".join([str(i) for i in topic.argsort()[:-n_words - 1:-1]])\n...         print(message)\n\n\u003e\u003e\u003e print_top_words(hdp, 10)\nTopic 0 (proportion: 0.20): 3 1 7 5 8 4 0 2 9 6\nTopic 1 (proportion: 0.00): 49 12 22 21 20 19 18 17 16 15\nTopic 2 (proportion: 0.04): 43 49 44 45 47 40 46 48 41 42\nTopic 3 (proportion: 0.13): 14 18 10 15 16 12 17 19 11 13\nTopic 4 (proportion: 0.07): 19 16 10 15 11 17 12 13 18 14\nTopic 5 (proportion: 0.01): 23 29 28 20 21 25 26 24 27 22\nTopic 6 (proportion: 0.01): 31 38 35 39 30 33 34 37 32 36\nTopic 7 (proportion: 0.19): 35 31 39 30 33 38 32 34 36 37\nTopic 8 (proportion: 0.16): 48 42 46 49 45 47 41 44 40 43\nTopic 9 (proportion: 0.19): 21 29 28 23 20 24 26 27 25 22\n```\n\nHere HDP find 7 large topics (\u003e 1%) and those can map to the hidden topics we generated before.\n\n\nExamples\n--------\nIn `bnp/examples` folder. (Will add ipython notebook soon)\n\n\nRunning Test:\n-------------\n```\npython setup.py test\n```\n\nUninstall:\n----------\n```\npip uninstall bnp\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchyikwei%2Fbnp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchyikwei%2Fbnp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchyikwei%2Fbnp/lists"}