{"id":14958330,"url":"https://github.com/x-tabdeveloping/topicwizard","last_synced_at":"2025-04-10T20:23:43.479Z","repository":{"id":156304960,"uuid":"538088575","full_name":"x-tabdeveloping/topicwizard","owner":"x-tabdeveloping","description":"Powerful topic model visualization in Python","archived":false,"fork":false,"pushed_at":"2024-08-22T14:59:32.000Z","size":110419,"stargazers_count":101,"open_issues_count":4,"forks_count":13,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-29T23:33:35.354Z","etag":null,"topics":["dash","machine","mantine","plotly","plotly-dash","scikit-learn","sklearn","tailwindcss","topic-modeling","visualization"],"latest_commit_sha":null,"homepage":"https://x-tabdeveloping.github.io/topicwizard/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/x-tabdeveloping.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"citation.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-18T11:33:15.000Z","updated_at":"2024-10-26T20:12:17.000Z","dependencies_parsed_at":"2024-11-13T13:21:26.746Z","dependency_job_id":null,"html_url":"https://github.com/x-tabdeveloping/topicwizard","commit_stats":{"total_commits":253,"total_committers":5,"mean_commits":50.6,"dds":"0.015810276679841917","last_synced_commit":"4066f5e636d8718c6eb54872ca9e85fb3e9b285d"},"previous_names":["x-tabdeveloping/topicwizard","x-tabdeveloping/topic-wizard"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Ftopicwizard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Ftopicwizard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Ftopicwizard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Ftopicwizard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/x-tabdeveloping","download_url":"https://codeload.github.com/x-tabdeveloping/topicwizard/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248289997,"owners_count":21078923,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dash","machine","mantine","plotly","plotly-dash","scikit-learn","sklearn","tailwindcss","topic-modeling","visualization"],"created_at":"2024-09-24T13:16:47.160Z","updated_at":"2025-04-10T20:23:43.452Z","avatar_url":"https://github.com/x-tabdeveloping.png","language":"Python","readme":"\u003cimg align=\"left\" width=\"82\" height=\"82\" src=\"assets/logo.svg\"\u003e\n\n# topicwizard\n\n#### [Try in :hugs: Spaces](https://huggingface.co/spaces/kardosdrur/topicwizard_20newsgroups_KeyNMF)\n\n\n\u003cbr\u003e\n\nPretty and opinionated topic model visualization in Python.\n\n[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/x-tabdeveloping/topic-wizard/blob/main/examples/basic_usage.ipynb)\n[![PyPI version](https://badge.fury.io/py/topic-wizard.svg)](https://pypi.org/project/topic-wizard/)\n[![pip downloads](https://img.shields.io/pypi/dm/topic-wizard.svg)](https://pypi.org/project/topic-wizard/)\n[![python version](https://img.shields.io/badge/Python-%3E=3.8-blue)](https://github.com/centre-for-humanities-computing/tweetopic)\n[![Code style: black](https://img.shields.io/badge/Code%20Style-Black-black)](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html)\n\u003cbr\u003e\n\n\nhttps://github.com/x-tabdeveloping/topicwizard/assets/13087737/9736f33c-6865-4ed4-bc17-d8e6369bda80\n\n## New in version 1.1.3\n\nYou can now specify your own font that should be used for wordclouds.\nThis makes topicwizard usable with Chinese and other non-indo-european scripts.\n\n```python\ntopicwizard.visualize(topic_data=topic_data, wordcloud_font_path=\"NotoSansTC-Bold.ttf\")\n```\n\n## New in version 1.1.0 🌟\n\n### Easier Deployment and Faster Cold Starts\n\nIf you want to produce a deployment of topicwizard with a fitted topic model, you can now produce a Docker deployment folder with `easy_deploy()`.\n\n```python\nimport joblib\nimport topicwizard\n\n# Load previously produced topic_data object\ntopic_data = joblib.load(\"topic_data.joblib\")\n\ntopicwizard.easy_deploy(topic_data, dest_dir=\"deployment\", port=7860)\n```\n\nThis will put everything you need in the `deployment/` directory, and will work out of the box on cloud platforms or HuggingFace Spaces.\n\nCold starts are now faster, as UMAP projections can be precomputed.\n\n```python\ntopic_data_w_positions = topicwizard.precompute_positions(topic_data)\n```\n\nYou can try a deployment produced with `easy_deploy()` on [:hugs: Spaces](https://huggingface.co/spaces/kardosdrur/topicwizard_20newsgroups_KeyNMF)\n\n## Features\n\n-   Investigate complex relations between topics, words, documents and groups/genres/labels interactively\n-   Easy to use pipelines for classical topic models that can be utilized for downstream tasks\n-   Sklearn, Turftopic, Gensim and BERTopic compatible  :nut_and_bolt:\n-   Interactive and composable Plotly figures\n-   Rename topics at will\n-   Share your results\n-   Easy deployment :earth_africa:\n\n## Installation\n\nInstall from PyPI:\n\n\u003e Notice that the package name on PyPI contains a dash: `topic-wizard` instead of `topicwizard`.\n\u003e I know it's a bit confusing, sorry for that\n\n```bash\npip install topic-wizard\n```\n\n## [Classical Topic Models](https://x-tabdeveloping.github.io/topicwizard/usage.pipelines.html)\n\nThe main abstraction of topicwizard around a classical/bag-of-words models is a topic pipeline, \nwhich consists of a vectorizer, that turns texts into bag-of-words\nrepresentations and a topic model which decomposes these representations into vectors of topic importance.\ntopicwizard allows you to use both scikit-learn pipelines or its own `TopicPipeline`.\n\n\u003cimg align=\"right\" width=\"300\" src=\"https://x-tabdeveloping.github.io/topicwizard/_images/pipeline.png\"\u003e\n\n\nLet's build a pipeline. We will use scikit-learns CountVectorizer as our vectorizer component:\n```python\nfrom sklearn.feature_extraction.text import CountVectorizer\n\nvectorizer = CountVectorizer(min_df=5, max_df=0.8, stop_words=\"english\")\n```\nThe topic model I will use for this example is Non-negative Matrix Factorization as it is fast and usually finds good topics.\n```python\nfrom sklearn.decomposition import NMF\n\nmodel = NMF(n_components=10)\n```\nThen let's put this all together in a pipeline. You can either use sklearn Pipelines...\n```python\nfrom sklearn.pipeline import make_pipeline\n\ntopic_pipeline = make_pipeline(vectorizer, model)\n```\n\nOr topicwizard's [TopicPipeline](https://x-tabdeveloping.github.io/topicwizard/usage.pipelines.html#topicpipeline)\n\n```python\nfrom topicwizard.pipeline import make_topic_pipeline\n\ntopic_pipeline = make_topic_pipeline(vectorizer, model)\n```\n\nYou can also turn an already existing pipeline into a `TopicPipeline`.\n\n```python\nfrom topicwizard.pipeline import TopicPipeline\n\ntopic_pipeline = TopicPipeline.from_pipeline(pipeline)\n```\n\nLet's load a corpus that we would like to analyze, in this example I will use 20newsgroups from sklearn.\n\n```python\nfrom sklearn.datasets import fetch_20newsgroups\n\nnewsgroups = fetch_20newsgroups(subset=\"all\")\ncorpus = newsgroups.data\n\n# Sklearn gives the labels back as integers, we have to map them back to\n# the actual textual label.\ngroup_labels = [newsgroups.target_names[label] for label in newsgroups.target]\n```\n\nThen let's fit our pipeline to this data:\n```python\ntopic_pipeline.fit(corpus)\n```\n\n\u003e Models do not necessarily have to be fitted before visualizing, topicwizard fits the model automatically on the corpus if it isn't prefitted.\n\nThen launch the topicwizard web app to interpret the model.\n\n```python\nimport topicwizard\n\ntopicwizard.visualize(corpus, model=topic_pipeline)\n```\n\n### Gensim\n\nYou can also use your gensim topic models in topicwizard by wrapping them in a `TopicPipeline`.\n\n```python\nfrom gensim.corpora.dictionary import Dictionary\nfrom gensim.models import LdaModel\nfrom topicwizard.compatibility import gensim_pipeline\n\ntexts: list[list[str]] = [\n    ['computer', 'time', 'graph'],\n    ['survey', 'response', 'eps'],\n    ['human', 'system', 'computer'],\n    ...\n]\n\ndictionary = Dictionary(texts)\nbow_corpus = [dictionary.doc2bow(text) for text in texts]\nlda = LdaModel(bow_corpus, num_topics=10)\n\npipeline = gensim_pipeline(dictionary, model=lda)\n# Then you can use the pipeline as usual\ncorpus = [\" \".join(text) for text in texts]\ntopicwizard.visualize(corpus, model=pipeline)\n```\n\n## Contextually Sensitive Models *(New in 1.0.0)*\n\ntopicwizard can also help you interpret topic models that understand contextual nuances in text, by utilizing representations from [sentence transformers](https://www.sbert.net/).\nThe package is mainly designed to be compatible with [turftopic](https://github.com/x-tabdeveloping/turftopic),\nwhich to my knowledge contains the broadest range of contextually sensitive models,\nbut we also provide compatibility with [BERTopic](https://maartengr.github.io/BERTopic/index.html).\n\nHere's an example of interpreting a [Semantic Signal Separation](https://x-tabdeveloping.github.io/turftopic/s3/) model over the same corpus.\n\n```python\nimport topicwizard\nfrom turftopic import SemanticSignalSeparation\n\nmodel = SemanticSignalSeparation(n_components=10)\ntopicwizard.visualize(corpus, model=model)\n```\n\nYou can also use BERTopic models by wrapping them in a compatibility layer:\n\n```python\nfrom bertopic import BERTopic\nfrom topicwizard.compatibility import BERTopicWrapper\n\nmodel = BERTopicWrapper(BERTopic(language=\"english\"))\ntopicwizard.visualize(corpus, model=model)\n```\n\nThe documentation also includes examples of how you can construct Top2Vec and CTM models in turftopic,\nor you can write your own wrapper quite easily if needed.\n\n## [Web Application](https://x-tabdeveloping.github.io/topicwizard/application.html)\n\nYou can launch the topic wizard web application for interactively investigating your topic models. The app is also quite easy to [deploy](https://x-tabdeveloping.github.io/topicwizard/usage.deployment.html) in case you want to create a client-facing interface.\n\n```python\nimport topicwizard\n\ntopicwizard.visualize(corpus, model=topic_pipeline)\n```\n\nFrom version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:\n\n```python\n# A large corpus takes a looong time to compute 2D projections for so\n# so you can speed up preprocessing by disabling it alltogether.\ntopicwizard.visualize(corpus, pipeline=topic_pipeline, exclude_pages=[\"documents\"])\n```\n| [Topics](https://x-tabdeveloping.github.io/topicwizard/usage.topics.html) | [Words](https://x-tabdeveloping.github.io/topicwizard/usage.words.html) |\n| :----: | :----: |\n| ![topics screenshot](assets/screenshot_topics.png) | ![words screenshot](assets/screenshot_words.png)  |\n\n[Documents](https://x-tabdeveloping.github.io/topicwizard/usage.documents.html) | [Groups](https://x-tabdeveloping.github.io/topicwizard/usage.groups.html) |\n| :----: | :----: |\n| ![documents screenshot](assets/screenshot_documents.png) | ![groups screenshot](docs/_static/screenshot_groups.png) |\n\n## TopicData\n\nAll compatible models in topicwizard have a `prepare_topic_data()` method, which produces a `TopicData` object containing information about topical inference and model fit on a given corpus.\n\nTopicData is in essence a typed dictionary, containing all information that is needed for interactive visualization in topicwizard. \n\nYou can produce this data with `TopicPipeline`\n\n```python\npipeline = make_topic_pipeline(CountVectorizer(), NMF(10))\ntopic_data = pipeline.prepare_topic_data(corpus)\n```\n\nAnd with contextual models:\n```python\nmodel = SemanticSignalSeparation(10)\ntopic_data = model.prepare_topic_data(corpus)\n\n# or with BERTopic\nmodel = BERTopicWrapper(BERTopic())\ntopic_data = model.prepare_topic_data(corpus)\n```\n\n`TopicData` can then be used to spin up the web application.\n\n```python\nimport topicwizard\n\ntopicwizard.visualize(topic_data=topic_data)\n```\n\nThis data structure can be serialized, saved and shared.\ntopicwizard uses `joblib` for serializing the data.\n\n\u003e Beware that topicwizard 1.0.0 is no longer fully backwards compatible with the old topic data files.\n\u003e No need to panic, you can either construct `TopicData` manually from the old data structures, or try to run the app anyway.\n\u003e It will probably work just fine, but certain functionality might be missing.\n\n\n```python\nimport joblib\nfrom topicwizard.data import TopicData\n\n# Save the data\njoblib.dump(topic_data, \"topic_data.joblib\")\n\n# Load the data\n# (The type annotation is just for type checking, it doesn't do anything)\ntopic_data: TopicData = joblib.load(\"topic_data.joblib\")\n```\n\n\u003e When sharing across machines, make sure that everyone is on the same page with versions of the different packages.\n\u003e For example if the inference machine has `scikit-learn==1.2.0`, it's advisable that you have a version on the server that is compatible, otherwise deserialization might fail.\n\u003e Same thing goes for BERTopic and turftopic of course.\n\nIn fact when you click the download button in the application this is exactly what happens in the background.\n\nThe reason that this is useful, is that you might want to have the results of an inference run on a server locally, or you might want to run inference on a different machine from the one that is used to deploy the application.\n\n## [Figures](https://x-tabdeveloping.github.io/topicwizard/api_reference.html#module-topicwizard.figures)\n\nIf you want customizable, faster, html-saveable interactive plots, you can use the figures API.\n\nAll figures are produced from a `TopicData` object so that you don't have to run inference twice on the same corpus for two different figures.\n\nHere are a couple of examples:\n\n```python\nfrom topicwizard.figures import word_map, document_topic_timeline, topic_wordclouds, word_association_barchart\n```\n\n| Word Map | Timeline of Topics in a Document | \n| :----: | :----: |\n| `word_map(topic_data)` | `document_topic_timeline(topic_data, \"Joe Biden takes over presidential office from Donald Trump.\")` |\n| ![word map screenshot](assets/word_map.png) | ![doc_timeline](https://github.com/x-tabdeveloping/topic-wizard/assets/13087737/cf1faceb-e8ef-411f-80cd-a2a58befcf99) |\n\n| Wordclouds of Topics | Topic for Word Importance |\n| :----: | :----: |\n| `topic_wordclouds(topic_data)` | `word_association_barchart(topic_data, [\"supreme\", \"court\"])` |\n| ![wordclouds](assets/topic_wordclouds.png) | ![topic_word_imp](https://github.com/x-tabdeveloping/topic-wizard/assets/13087737/0767b631-9e83-42cf-8796-8536abc486d0) |\n\nFigures in topicwizard are in essence just Plotly interactive figures and they can be modified at will.\nConsult [Plotly's documentation](https://plotly.com/python/) for more details about manipulating and building plots.\n\nFor more information consult our [Documentation](https://x-tabdeveloping.github.io/topicwizard/index.html)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fx-tabdeveloping%2Ftopicwizard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fx-tabdeveloping%2Ftopicwizard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fx-tabdeveloping%2Ftopicwizard/lists"}