{"id":15698657,"url":"https://github.com/x-tabdeveloping/turftopic","last_synced_at":"2025-05-08T22:15:05.465Z","repository":{"id":210585919,"uuid":"724528545","full_name":"x-tabdeveloping/turftopic","owner":"x-tabdeveloping","description":"Robust and fast topic models with sentence-transformers.","archived":false,"fork":false,"pushed_at":"2025-05-07T08:30:15.000Z","size":34046,"stargazers_count":48,"open_issues_count":9,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-08T22:14:53.502Z","etag":null,"topics":["contextual","llm","topic-modeling","transformers"],"latest_commit_sha":null,"homepage":"https://x-tabdeveloping.github.io/turftopic/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/x-tabdeveloping.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"citation.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-11-28T09:09:47.000Z","updated_at":"2025-05-06T11:46:31.000Z","dependencies_parsed_at":"2024-01-06T13:44:05.152Z","dependency_job_id":"cdb897a3-f026-4005-8b8a-68358334d248","html_url":"https://github.com/x-tabdeveloping/turftopic","commit_stats":null,"previous_names":["x-tabdeveloping/turftopic"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fturftopic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fturftopic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fturftopic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fturftopic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/x-tabdeveloping","download_url":"https://codeload.github.com/x-tabdeveloping/turftopic/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253154977,"owners_count":21862623,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["contextual","llm","topic-modeling","transformers"],"created_at":"2024-10-03T19:31:52.183Z","updated_at":"2025-05-08T22:15:05.425Z","avatar_url":"https://github.com/x-tabdeveloping.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n\u003cimg align=\"center\" height=\"200\" src=\"assets/logo_w_text.svg\"\u003e\n\u003cbr\u003e\n \u003cb\u003eTopic modeling is your turf too.\u003c/b\u003e \u003cbr\u003e \u003ci\u003e Contextual topic models with representations from transformers. \u003c/i\u003e\u003c/p\u003e\n\n\n## Features\n| | |\n| - | - |\n| SOTA Transformer-based Topic Models | :compass: [S³](https://x-tabdeveloping.github.io/turftopic/s3/), :key: [KeyNMF](https://x-tabdeveloping.github.io/turftopic/KeyNMF/),  :gem: [GMM](https://x-tabdeveloping.github.io/turftopic/GMM/), [Clustering Models](https://x-tabdeveloping.github.io/turftopic/GMM/), [CTMs](https://x-tabdeveloping.github.io/turftopic/ctm/), [FASTopic](https://x-tabdeveloping.github.io/turftopic/FASTopic/) |\n| Models for all Scenarios | :chart_with_upwards_trend: [Dynamic](https://x-tabdeveloping.github.io/turftopic/dynamic/), :ocean: [Online](https://x-tabdeveloping.github.io/turftopic/online/), :herb: [Seeded](https://x-tabdeveloping.github.io/turftopic/seeded/), and :evergreen_tree: [Hierarchical](https://x-tabdeveloping.github.io/turftopic/hierarchical/) topic modeling |\n| [Easy Interpretation](https://x-tabdeveloping.github.io/turftopic/model_interpretation/) | :bookmark_tabs: Pretty Printing, :bar_chart: Interactive Figures, :art: [topicwizard](https://github.com/x-tabdeveloping/topicwizard) compatible |\n| [Topic Naming](https://x-tabdeveloping.github.io/turftopic/namers/) | :robot: LLM-based, N-gram Retrieval, :wave: Manual |\n| [Informative Topic Descriptions](https://x-tabdeveloping.github.io/turftopic/vectorizers/) | :key: Keyphrases, Noun-phrases, Lemmatization, Stemming |\n\n\n## Basics [(Documentation)](https://x-tabdeveloping.github.io/turftopic/)\n[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb)\n\n### Installation\n\nTurftopic can be installed from PyPI.\n\n```bash\npip install turftopic\n```\n\nIf you intend to use CTMs, make sure to install the package with Pyro as an optional dependency.\n\n```bash\npip install turftopic[pyro-ppl]\n```\n\n### Fitting a Model\n\nTurftopic's models follow the scikit-learn API conventions, and as such they are quite easy to use if you are familiar with\nscikit-learn workflows.\n\nHere's an example of how you use KeyNMF, one of our models on the 20Newsgroups dataset from scikit-learn.\n\n```python\nfrom sklearn.datasets import fetch_20newsgroups\n\nnewsgroups = fetch_20newsgroups(\n    subset=\"all\",\n    remove=(\"headers\", \"footers\", \"quotes\"),\n)\ncorpus = newsgroups.data\n```\n\nTurftopic also comes with interpretation tools that make it easy to display and understand your results.\n\n```python\nfrom turftopic import KeyNMF\n\nmodel = KeyNMF(20).fit(corpus)\n```\n\n### Interpreting Models\n\nTurftopic comes with a number of pretty printing utilities for interpreting the models.\n\nTo see the highest the most important words for each topic, use the `print_topics()` method.\n\n```python\nmodel.print_topics()\n```\n\n\u003ccenter\u003e\n\n| Topic ID | Top 10 Words                                                                                    |\n| -------- | ----------------------------------------------------------------------------------------------- |\n|        0 | armenians, armenian, armenia, turks, turkish, genocide, azerbaijan, soviet, turkey, azerbaijani |\n|        1 | sale, price, shipping, offer, sell, prices, interested, 00, games, selling                      |\n|        2 | christians, christian, bible, christianity, church, god, scripture, faith, jesus, sin           |\n|        3 | encryption, chip, clipper, nsa, security, secure, privacy, encrypted, crypto, cryptography      |\n|         | ....                                |\n\n\n\u003c/center\u003e\n\n```python\n# Print highest ranking documents for topic 0\nmodel.print_representative_documents(0, corpus, document_topic_matrix)\n```\n\n\u003ccenter\u003e\n\n| Document                                                                                             | Score |\n| -----------------------------------------------------------------------------------------------------| ----- |\n| Poor 'Poly'. I see you're preparing the groundwork for yet another retreat from your...              |  0.40 |\n| Then you must be living in an alternate universe. Where were they? An Appeal to Mankind During the... |  0.40 |\n| It is 'Serdar', 'kocaoglan'. Just love it. Well, it could be your head wasn't screwed on just right... |  0.39 |\n\n\u003c/center\u003e\n\n```python\nmodel.print_topic_distribution(\n    \"I think guns should definitely banned from all public institutions, such as schools.\"\n)\n```\n\n\u003ccenter\u003e\n\n| Topic name                                | Score |\n| ----------------------------------------- | ----- |\n| 7_gun_guns_firearms_weapons               |  0.05 |\n| 17_mail_address_email_send                |  0.00 |\n| 3_encryption_chip_clipper_nsa             |  0.00 |\n| 19_baseball_pitching_pitcher_hitter       |  0.00 |\n| 11_graphics_software_program_3d           |  0.00 |\n\n\u003c/center\u003e\n\n#### Automated Topic Naming\n\nTurftopic now allows you to automatically assign human readable names to topics using LLMs or n-gram retrieval!\n\n```python\nfrom turftopic import KeyNMF\nfrom turftopic.namers import OpenAITopicNamer\n\nmodel = KeyNMF(10).fit(corpus)\n\nnamer = OpenAITopicNamer(\"gpt-4o-mini\")\nmodel.rename_topics(namer)\nmodel.print_topics()\n```\n\n| Topic ID | Topic Name | Highest Ranking |\n| - | - | - |\n| 0 | Operating Systems and Software  | windows, dos, os, ms, microsoft, unix, nt, memory, program, apps |\n| 1 | Atheism and Belief Systems | atheism, atheist, atheists, belief, religion, religious, theists, beliefs, believe, faith |\n| 2 | Computer Architecture and Performance | motherboard, ram, memory, cpu, bios, isa, speed, 486, bus, performance |\n| 3 | Storage Technologies | disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot |\n| | ... |\n\n### Vectorizers Module\n\nYou can use a set of custom vectorizers for topic modeling over **phrases**, as well as **lemmata** and **stems**.\n\n```python\nfrom turftopic import KeyNMF\nfrom turftopic.vectorizers.spacy import NounPhraseCountVectorizer\n\nmodel = KeyNMF(\n    n_components=10,\n    vectorizer=NounPhraseCountVectorizer(\"en_core_web_sm\"),\n)\nmodel.fit(corpus)\nmodel.print_topics()\n```\n\n| Topic ID | Highest Ranking |\n| - | - |\n| | ... |\n| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism |\n| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index |\n| | ... |\n\n### Visualization\n\nTurftopic does not come with built-in visualization utilities, [topicwizard](https://github.com/x-tabdeveloping/topicwizard), an interactive topic model visualization library, is compatible with all models from Turftopic.\n\n```bash\npip install topic-wizard\n```\n\nBy far the easiest way to visualize your models for interpretation is to launch the topicwizard web app.\n\n```python\nimport topicwizard\n\ntopicwizard.visualize(corpus, model=model)\n```\n\n\u003cfigure\u003e\n  \u003cimg src=\"https://x-tabdeveloping.github.io/topicwizard/_images/screenshot_topics.png\" width=\"70%\" style=\"margin-left: auto;margin-right: auto;\"\u003e\n  \u003cfigcaption\u003eScreenshot of the topicwizard Web Application\u003c/figcaption\u003e\n\u003c/figure\u003e\n\nAlternatively you can use the [Figures API](https://x-tabdeveloping.github.io/topicwizard/figures.html) in topicwizard for individual HTML figures.\n\n## References\n- Kardos, M., Kostkan, J., Vermillet, A., Nielbo, K., Enevoldsen, K., \u0026 Rocca, R. (2024, June 13). $S^3$ - Semantic Signal separation. arXiv.org. https://arxiv.org/abs/2406.09556\n- Wu, X., Nguyen, T., Zhang, D. C., Wang, W. Y., \u0026 Luu, A. T. (2024). FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm. ArXiv Preprint ArXiv:2405.17978.\n - Grootendorst, M. (2022, March 11). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv.org. https://arxiv.org/abs/2203.05794\n - Angelov, D. (2020, August 19). Top2VEC: Distributed representations of topics. arXiv.org. https://arxiv.org/abs/2008.09470\n - Bianchi, F., Terragni, S., \u0026 Hovy, D. (2020, April 8). Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. arXiv.org. https://arxiv.org/abs/2004.03974\n - Bianchi, F., Terragni, S., Hovy, D., Nozza, D., \u0026 Fersini, E. (2021). Cross-lingual Contextualized Topic Models with Zero-shot Learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 1676–1683). Association for Computational Linguistics.\n - Kristensen-McLachlan, R. D., Hicke, R. M. M., Kardos, M., \u0026 Thunø, M. (2024, October 16). Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media. arXiv.org. https://arxiv.org/abs/2410.12791\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fx-tabdeveloping%2Fturftopic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fx-tabdeveloping%2Fturftopic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fx-tabdeveloping%2Fturftopic/lists"}