{"id":13696325,"url":"https://github.com/lda-project/lda","last_synced_at":"2025-05-14T16:12:20.270Z","repository":{"id":20529330,"uuid":"23808489","full_name":"lda-project/lda","owner":"lda-project","description":"Topic modeling with latent Dirichlet allocation using Gibbs sampling","archived":false,"fork":false,"pushed_at":"2024-07-29T19:05:40.000Z","size":521,"stargazers_count":1274,"open_issues_count":0,"forks_count":388,"subscribers_count":47,"default_branch":"develop","last_synced_at":"2025-05-11T10:17:41.358Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://lda.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lda-project.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-09-08T21:11:26.000Z","updated_at":"2025-04-24T07:42:11.000Z","dependencies_parsed_at":"2023-11-28T23:29:28.829Z","dependency_job_id":"b42559e2-3bc1-4e6d-ba42-98695993564e","html_url":"https://github.com/lda-project/lda","commit_stats":{"total_commits":157,"total_committers":7,"mean_commits":"22.428571428571427","dds":"0.32484076433121023","last_synced_commit":"b8afeda0bf740a86ce1f11f48f2a6a3735b978be"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lda-project%2Flda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lda-project%2Flda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lda-project%2Flda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lda-project%2Flda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lda-project","download_url":"https://codeload.github.com/lda-project/lda/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254179905,"owners_count":22027884,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T18:00:38.490Z","updated_at":"2025-05-14T16:12:20.252Z","avatar_url":"https://github.com/lda-project.png","language":"Python","funding_links":[],"categories":["Models"],"sub_categories":["Latent Dirichlet Allocation (LDA) [:page_facing_up:](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)"],"readme":"lda: Topic modeling with latent Dirichlet allocation\n====================================================\n\n|pypi| |actions| |zenodo|\n\n**NOTE: This package is in maintenance mode. Critical bugs will be fixed. No new features will be added.**\n\n\n``lda`` implements latent Dirichlet allocation (LDA) using collapsed Gibbs\nsampling. ``lda`` is fast and is tested on Linux, OS X, and Windows.\n\nYou can read more about lda in `the documentation \u003chttps://lda.readthedocs.io\u003e`_.\n\nInstallation\n------------\n\n``pip install lda``\n\nGetting started\n---------------\n\n``lda.LDA`` implements latent Dirichlet allocation (LDA). The interface follows\nconventions found in scikit-learn_.\n\nThe following demonstrates how to inspect a model of a subset of the Reuters\nnews dataset. The input below, ``X``, is a document-term matrix (sparse matrices\nare accepted).\n\n.. code-block:: python\n\n    \u003e\u003e\u003e import numpy as np\n    \u003e\u003e\u003e import lda\n    \u003e\u003e\u003e import lda.datasets\n    \u003e\u003e\u003e X = lda.datasets.load_reuters()\n    \u003e\u003e\u003e vocab = lda.datasets.load_reuters_vocab()\n    \u003e\u003e\u003e titles = lda.datasets.load_reuters_titles()\n    \u003e\u003e\u003e X.shape\n    (395, 4258)\n    \u003e\u003e\u003e X.sum()\n    84010\n    \u003e\u003e\u003e model = lda.LDA(n_topics=20, n_iter=1500, random_state=1)\n    \u003e\u003e\u003e model.fit(X)  # model.fit_transform(X) is also available\n    \u003e\u003e\u003e topic_word = model.topic_word_  # model.components_ also works\n    \u003e\u003e\u003e n_top_words = 8\n    \u003e\u003e\u003e for i, topic_dist in enumerate(topic_word):\n    ...     topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]\n    ...     print('Topic {}: {}'.format(i, ' '.join(topic_words)))\n\n    Topic 0: british churchill sale million major letters west britain\n    Topic 1: church government political country state people party against\n    Topic 2: elvis king fans presley life concert young death\n    Topic 3: yeltsin russian russia president kremlin moscow michael operation\n    Topic 4: pope vatican paul john surgery hospital pontiff rome\n    Topic 5: family funeral police miami versace cunanan city service\n    Topic 6: simpson former years court president wife south church\n    Topic 7: order mother successor election nuns church nirmala head\n    Topic 8: charles prince diana royal king queen parker bowles\n    Topic 9: film french france against bardot paris poster animal\n    Topic 10: germany german war nazi letter christian book jews\n    Topic 11: east peace prize award timor quebec belo leader\n    Topic 12: n't life show told very love television father\n    Topic 13: years year time last church world people say\n    Topic 14: mother teresa heart calcutta charity nun hospital missionaries\n    Topic 15: city salonika capital buddhist cultural vietnam byzantine show\n    Topic 16: music tour opera singer israel people film israeli\n    Topic 17: church catholic bernardin cardinal bishop wright death cancer\n    Topic 18: harriman clinton u.s ambassador paris president churchill france\n    Topic 19: city museum art exhibition century million churches set\n\nThe document-topic distributions are available in ``model.doc_topic_``.\n\n.. code-block:: python\n\n    \u003e\u003e\u003e doc_topic = model.doc_topic_\n    \u003e\u003e\u003e for i in range(10):\n    ...     print(\"{} (top topic: {})\".format(titles[i], doc_topic[i].argmax()))\n    0 UK: Prince Charles spearheads British royal revolution. LONDON 1996-08-20 (top topic: 8)\n    1 GERMANY: Historic Dresden church rising from WW2 ashes. DRESDEN, Germany 1996-08-21 (top topic: 13)\n    2 INDIA: Mother Teresa's condition said still unstable. CALCUTTA 1996-08-23 (top topic: 14)\n    3 UK: Palace warns British weekly over Charles pictures. LONDON 1996-08-25 (top topic: 8)\n    4 INDIA: Mother Teresa, slightly stronger, blesses nuns. CALCUTTA 1996-08-25 (top topic: 14)\n    5 INDIA: Mother Teresa's condition unchanged, thousands pray. CALCUTTA 1996-08-25 (top topic: 14)\n    6 INDIA: Mother Teresa shows signs of strength, blesses nuns. CALCUTTA 1996-08-26 (top topic: 14)\n    7 INDIA: Mother Teresa's condition improves, many pray. CALCUTTA, India 1996-08-25 (top topic: 14)\n    8 INDIA: Mother Teresa improves, nuns pray for \"miracle\". CALCUTTA 1996-08-26 (top topic: 14)\n    9 UK: Charles under fire over prospect of Queen Camilla. LONDON 1996-08-26 (top topic: 8)\n\n\nRequirements\n------------\n\nPython ≥3.10 and NumPy.\n\nCaveat\n------\n\n``lda`` aims for simplicity. (It happens to be fast, as essential parts are\nwritten in C via Cython_.) If you are working with a very large corpus you may\nwish to use more sophisticated topic models such as those implemented in hca_\nand MALLET_.  hca_ is written entirely in C and MALLET_ is written in Java.\nUnlike ``lda``, hca_ can use more than one processor at a time. Both MALLET_ and\nhca_ implement topic models known to be more robust than standard latent\nDirichlet allocation.\n\nNotes\n-----\n\nLatent Dirichlet allocation is described in `Blei et al. (2003)`_ and `Pritchard\net al. (2000)`_. Inference using collapsed Gibbs sampling is described in\n`Griffiths and Steyvers (2004)`_.\n\nImportant links\n---------------\n\n- Documentation: http://lda.readthedocs.org\n- Source code: https://github.com/lda-project/lda/\n- Issue tracker: https://github.com/lda-project/lda/issues\n\nOther implementations\n---------------------\n- scikit-learn_'s `LatentDirichletAllocation \u003chttp://scikit-learn.org/dev/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html\u003e`_ (uses online variational inference)\n- `gensim \u003chttps://pypi.python.org/pypi/gensim\u003e`_ (uses online variational inference)\n\nLicense\n-------\n\nlda is licensed under Version 2.0 of the Mozilla Public License.\n\n.. _Python: http://www.python.org/\n.. _scikit-learn: http://scikit-learn.org\n.. _hca: https://www.mloss.org/software/view/527/\n.. _MALLET: http://mallet.cs.umass.edu/\n.. _numpy: http://www.numpy.org/\n.. _pbr: https://pypi.python.org/pypi/pbr\n.. _Cython: http://cython.org\n.. _Blei et al. (2003): http://jmlr.org/papers/v3/blei03a.html\n.. _Pritchard et al. (2000): http://www.genetics.org/content/155/2/945.full\n.. _Griffiths and Steyvers (2004): http://www.pnas.org/content/101/suppl_1/5228.abstract\n\n.. |pypi| image:: https://badge.fury.io/py/lda.png\n    :target: https://pypi.python.org/pypi/lda\n    :alt: pypi version\n\n.. |actions| image:: https://github.com/lda-project/lda/actions/workflows/release.yml/badge.svg\n    :target: https://github.com/lda-project/lda/actions\n    :alt: github actions build status\n\n.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1412135.svg\n    :target: https://doi.org/10.5281/zenodo.1412135\n    :alt: Zenodo citation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flda-project%2Flda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flda-project%2Flda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flda-project%2Flda/lists"}