{"id":13595070,"url":"https://github.com/MaartenGr/BERTopic","last_synced_at":"2025-04-09T10:32:45.916Z","repository":{"id":37100845,"uuid":"297672263","full_name":"MaartenGr/BERTopic","owner":"MaartenGr","description":"Leveraging BERT and c-TF-IDF to create easily interpretable topics. ","archived":false,"fork":false,"pushed_at":"2025-03-28T15:30:54.000Z","size":26363,"stargazers_count":6639,"open_issues_count":393,"forks_count":803,"subscribers_count":51,"default_branch":"master","last_synced_at":"2025-04-08T07:12:31.172Z","etag":null,"topics":["bert","ldavis","machine-learning","nlp","sentence-embeddings","topic","topic-modeling","topic-modelling","topic-models","transformers"],"latest_commit_sha":null,"homepage":"https://maartengr.github.io/BERTopic/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaartenGr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-22T14:19:29.000Z","updated_at":"2025-04-08T04:05:34.000Z","dependencies_parsed_at":"2022-07-14T06:40:32.690Z","dependency_job_id":"8c816c63-e7b4-4b70-bcf1-d728a4673d4d","html_url":"https://github.com/MaartenGr/BERTopic","commit_stats":{"total_commits":191,"total_committers":77,"mean_commits":"2.4805194805194803","dds":0.6910994764397906,"last_synced_commit":"510c15e0f3f3c6481f21087d9461e5fb5e8b89af"},"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FBERTopic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FBERTopic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FBERTopic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FBERTopic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaartenGr","download_url":"https://codeload.github.com/MaartenGr/BERTopic/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248020593,"owners_count":21034459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","ldavis","machine-learning","nlp","sentence-embeddings","topic","topic-modeling","topic-modelling","topic-models","transformers"],"created_at":"2024-08-01T16:01:43.400Z","updated_at":"2025-04-09T10:32:45.910Z","avatar_url":"https://github.com/MaartenGr.png","language":"Python","funding_links":[],"categories":["Python","Models","文本摘要","Industry Strength Natural Language Processing","Tasks and Methods"],"sub_categories":["Embedding based Topic Models","Topic Modeling"],"readme":"[![PyPI Downloads](https://static.pepy.tech/badge/bertopic)](https://pepy.tech/projects/bertopic)\n[![PyPI - Python](https://img.shields.io/badge/python-v3.9+-blue.svg)](https://pypi.org/project/bertopic/)\n[![Build](https://img.shields.io/github/actions/workflow/status/MaartenGr/BERTopic/testing.yml?branch=master)](https://github.com/MaartenGr/BERTopic/actions)\n[![docs](https://img.shields.io/badge/docs-Passing-green.svg)](https://maartengr.github.io/BERTopic/)\n[![PyPI - PyPi](https://img.shields.io/pypi/v/BERTopic)](https://pypi.org/project/bertopic/)\n[![PyPI - License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/MaartenGr/VLAC/blob/master/LICENSE)\n[![arXiv](https://img.shields.io/badge/arXiv-2203.05794-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2203.05794)\n\n\n# BERTopic\n\n\u003cimg src=\"images/logo.png\" width=\"35%\" height=\"35%\" align=\"right\" /\u003e\n\nBERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters\nallowing for easily interpretable topics whilst keeping important words in the topic descriptions.\n\nBERTopic supports all kinds of topic modeling techniques:  \n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/guided/guided.html\"\u003eGuided\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/supervised/supervised.html\"\u003eSupervised\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/semisupervised/semisupervised.html\"\u003eSemi-supervised\u003c/a\u003e\u003c/td\u003e\n \u003c/tr\u003e\n   \u003ctr\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/manual/manual.html\"\u003eManual\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/distribution/distribution.html\"\u003eMulti-topic distributions\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/hierarchicaltopics/hierarchicaltopics.html\"\u003eHierarchical\u003c/a\u003e\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/topicsperclass/topicsperclass.html\"\u003eClass-based\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/topicsovertime/topicsovertime.html\"\u003eDynamic\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/online/online.html\"\u003eOnline/Incremental\u003c/a\u003e\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/multimodal/multimodal.html\"\u003eMultimodal\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/multiaspect/multiaspect.html\"\u003eMulti-aspect\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/representation/llm.html\"\u003eText Generation/LLM\u003c/a\u003e\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html\"\u003eZero-shot \u003cb\u003e(new!)\u003c/b\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/merge/merge.html\"\u003eMerge Models \u003cb\u003e(new!)\u003c/b\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://maartengr.github.io/BERTopic/getting_started/seed_words/seed_words.html\"\u003eSeed Words \u003cb\u003e(new!)\u003c/b\u003e\u003c/a\u003e\u003c/td\u003e\n \u003c/tr\u003e\n\u003c/table\u003e\n\nCorresponding medium posts can be found [here](https://medium.com/data-science/topic-modeling-with-bert-779f7db187e6?sk=0b5a470c006d1842ad4c8a3057063a99\n), [here](https://medium.com/data-science/using-whisper-and-bertopic-to-model-kurzgesagts-videos-7d8a63139bdf?sk=b1e0fd46f70cb15e8422b4794a81161d\n) and [here](https://medium.com/data-science/interactive-topic-modeling-with-bertopic-1ea55e7d73d8?sk=03c2168e9e74b6bda2a1f3ed953427e4\n). For a more detailed overview, you can read the [paper](https://arxiv.org/abs/2203.05794) or see a [brief overview](https://maartengr.github.io/BERTopic/algorithm/algorithm.html). \n\n## Installation\n\nInstallation, with sentence-transformers, can be done using [pypi](https://pypi.org/project/bertopic/):\n\n```bash\npip install bertopic\n```\n\nIf you want to install BERTopic with other embedding models, you can choose one of the following:\n\n```bash\n# Choose an embedding backend\npip install bertopic[flair,gensim,spacy,use]\n\n# Topic modeling with images\npip install bertopic[vision]\n```\n\nFor a *light-weight installation* without transformers, UMAP and/or HDBSCAN (for training with Model2Vec or perhaps for inference), see [this tutorial](https://maartengr.github.io/BERTopic/getting_started/tips_and_tricks/tips_and_tricks.html#lightweight-installation).\n\n## Getting Started\nFor an in-depth overview of the features of BERTopic \nyou can check the [**full documentation**](https://maartengr.github.io/BERTopic/) or you can follow along \nwith one of the examples below:\n\n| Name  | Link  |\n|---|---|\n| Start Here - **Best Practices in BERTopic**  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1BoQ_vakEVtojsd2x_U6-_x52OOuqruj2?usp=sharing)  |\n| **🆕 New!** - Topic Modeling on Large Data (GPU Acceleration)  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1W7aEdDPxC29jP99GGZphUlqjMFFVKtBC?usp=sharing)  |\n| **🆕 New!** - Topic Modeling with Llama 2 🦙 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QCERSMUjqGetGGujdrvv_6_EeoIcd_9M?usp=sharing)  |\n| **🆕 New!** - Topic Modeling with Quantized LLMs | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DdSHvVPJA3rmNfBWjCo2P1E9686xfxFx?usp=sharing)  |\n| Topic Modeling with BERTopic  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1FieRA9fLdkQEGDIMYl0I3MCjSUKVF8C-?usp=sharing)  |\n| (Custom) Embedding Models in BERTopic  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/18arPPe50szvcCp_Y6xS56H2tY0m-RLqv?usp=sharing) |\n| Advanced Customization in BERTopic  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ClTYut039t-LDtlcd-oQAdXWgcsSGTw9?usp=sharing) |\n| (semi-)Supervised Topic Modeling with BERTopic  |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1bxizKzv5vfxJEB29sntU__ZC7PBSIPaQ?usp=sharing)  |\n| Dynamic Topic Modeling with Trump's Tweets  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1un8ooI-7ZNlRoK0maVkYhmNRl0XGK88f?usp=sharing)  |\n| Topic Modeling arXiv Abstracts | [![Kaggle](https://img.shields.io/static/v1?style=for-the-badge\u0026message=Kaggle\u0026color=222222\u0026logo=Kaggle\u0026logoColor=20BEFF\u0026label=)](https://www.kaggle.com/maartengr/topic-modeling-arxiv-abstract-with-bertopic) |\n\n\n## Quick Start\nWe start by extracting topics from the well-known 20 newsgroups dataset containing English documents:\n\n```python\nfrom bertopic import BERTopic\nfrom sklearn.datasets import fetch_20newsgroups\n \ndocs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']\n\ntopic_model = BERTopic()\ntopics, probs = topic_model.fit_transform(docs)\n```\n\nAfter generating topics and their probabilities, we can access all of the topics together with their topic representations:\n\n```python\n\u003e\u003e\u003e topic_model.get_topic_info()\n\nTopic\tCount\tName\n-1\t4630\t-1_can_your_will_any\n0\t693\t49_windows_drive_dos_file\n1\t466\t32_jesus_bible_christian_faith\n2\t441\t2_space_launch_orbit_lunar\n3\t381\t22_key_encryption_keys_encrypted\n...\n```\n\nThe `-1` topic refers to all outlier documents and are typically ignored. Each word in a topic describes the underlying theme of that topic and can be used \nfor interpreting that topic. Next, let's take a look at the most frequent topic that was generated:\n\n```python\n\u003e\u003e\u003e topic_model.get_topic(0)\n\n[('windows', 0.006152228076250982),\n ('drive', 0.004982897610645755),\n ('dos', 0.004845038866360651),\n ('file', 0.004140142872194834),\n ('disk', 0.004131678774810884),\n ('mac', 0.003624848635985097),\n ('memory', 0.0034840976976789903),\n ('software', 0.0034415334250699077),\n ('email', 0.0034239554442333257),\n ('pc', 0.003047105930670237)]\n```  \n\nUsing `.get_document_info`, we can also extract information on a document level, such as their corresponding topics, probabilities, whether they are representative documents for a topic, etc.:\n\n```python\n\u003e\u003e\u003e topic_model.get_document_info(docs)\n\nDocument                               Topic\tName\t                        Top_n_words                     Probability    ...\nI am sure some bashers of Pens...\t0\t0_game_team_games_season\tgame - team - games...\t        0.200010       ...\nMy brother is in the market for...      -1     -1_can_your_will_any\t        can - your - will...\t        0.420668       ...\nFinally you said what you dream...\t-1     -1_can_your_will_any\t        can - your - will...            0.807259       ...\nThink! It's the SCSI card doing...\t49     49_windows_drive_dos_file\twindows - drive - docs...\t0.071746       ...\n1) I have an old Jasmine drive...\t49     49_windows_drive_dos_file\twindows - drive - docs...\t0.038983       ...\n```\n\n**`🔥 Tip`**: Use `BERTopic(language=\"multilingual\")` to select a model that supports 50+ languages. \n\n## Fine-tune Topic Representations\n\nIn BERTopic, there are a number of different [topic representations](https://maartengr.github.io/BERTopic/getting_started/representation/representation.html) that we can choose from. They are all quite different from one another and give interesting perspectives and variations of topic representations. A great start is `KeyBERTInspired`, which for many users increases the coherence and reduces stopwords from the resulting topic representations:\n\n ```python\nfrom bertopic.representation import KeyBERTInspired\n\n# Fine-tune your topic representations\nrepresentation_model = KeyBERTInspired()\ntopic_model = BERTopic(representation_model=representation_model)\n```\n\nHowever, you might want to use something more powerful to describe your clusters. You can even use ChatGPT or other models from OpenAI to generate labels, summaries, phrases, keywords, and more:\n\n```python\nimport openai\nfrom bertopic.representation import OpenAI\n\n# Fine-tune topic representations with GPT\nclient = openai.OpenAI(api_key=\"sk-...\")\nrepresentation_model = OpenAI(client, model=\"gpt-4o-mini\", chat=True)\ntopic_model = BERTopic(representation_model=representation_model)\n```\n\n**`🔥 Tip`**: Instead of iterating over all of these different topic representations, you can model them simultaneously with [multi-aspect topic representations](https://maartengr.github.io/BERTopic/getting_started/multiaspect/multiaspect.html) in BERTopic. \n\n\n## Visualizations\nAfter having trained our BERTopic model, we can iteratively go through hundreds of topics to get a good \nunderstanding of the topics that were extracted. However, that takes quite some time and lacks a global representation. Instead, we can use one of the [many visualization options](https://maartengr.github.io/BERTopic/getting_started/visualization/visualization.html) in BERTopic. \nFor example, we can visualize the topics that were generated in a way very similar to \n[LDAvis](https://github.com/cpsievert/LDAvis):\n\n```python\ntopic_model.visualize_topics()\n``` \n\n\u003cimg src=\"images/topic_visualization.gif\" width=\"60%\" height=\"60%\" align=\"center\" /\u003e\n\n## Modularity\nBy default, the [main steps](https://maartengr.github.io/BERTopic/algorithm/algorithm.html) for topic modeling with BERTopic are sentence-transformers, UMAP, HDBSCAN, and c-TF-IDF run in sequence. However, it assumes some independence between these steps which makes BERTopic quite modular. In other words, BERTopic not only allows you to build your own topic model but to explore several topic modeling techniques on top of your customized topic model:\n\nhttps://user-images.githubusercontent.com/25746895/218420473-4b2bb539-9dbe-407a-9674-a8317c7fb3bf.mp4\n\nYou can swap out any of these models or even remove them entirely. The following steps are completely modular:\n\n1. [Embedding](https://maartengr.github.io/BERTopic/getting_started/embeddings/embeddings.html) documents\n2. [Reducing dimensionality](https://maartengr.github.io/BERTopic/getting_started/dim_reduction/dim_reduction.html) of embeddings\n3. [Clustering](https://maartengr.github.io/BERTopic/getting_started/clustering/clustering.html) reduced embeddings into topics\n4. [Tokenization](https://maartengr.github.io/BERTopic/getting_started/vectorizers/vectorizers.html) of topics\n5. [Weight](https://maartengr.github.io/BERTopic/getting_started/ctfidf/ctfidf.html) tokens\n6. [Represent topics](https://maartengr.github.io/BERTopic/getting_started/representation/representation.html) with one or [multiple](https://maartengr.github.io/BERTopic/getting_started/multiaspect/multiaspect.html) representations\n\n\n## Functionality\nBERTopic has many functions that quickly can become overwhelming. To alleviate this issue, you will find an overview \nof all methods and a short description of its purpose. \n\n### Common\nBelow, you will find an overview of common functions in BERTopic. \n\n| Method | Code  | \n|-----------------------|---|\n| Fit the model    |  `.fit(docs)` |\n| Fit the model and predict documents  |  `.fit_transform(docs)` |\n| Predict new documents    |  `.transform([new_doc])` |\n| Access single topic   | `.get_topic(topic=12)`  |   \n| Access all topics     |  `.get_topics()` |\n| Get topic freq    |  `.get_topic_freq()` |\n| Get all topic information|  `.get_topic_info()` |\n| Get all document information|  `.get_document_info(docs)` |\n| Get representative docs per topic |  `.get_representative_docs()` |\n| Update topic representation | `.update_topics(docs, n_gram_range=(1, 3))` |\n| Generate topic labels | `.generate_topic_labels()` |\n| Set topic labels | `.set_topic_labels(my_custom_labels)` |\n| Merge topics | `.merge_topics(docs, topics_to_merge)` |\n| Reduce nr of topics | `.reduce_topics(docs, nr_topics=30)` |\n| Reduce outliers | `.reduce_outliers(docs, topics)` |\n| Find topics | `.find_topics(\"vehicle\")` |\n| Save model    |  `.save(\"my_model\", serialization=\"safetensors\")` |\n| Load model    |  `BERTopic.load(\"my_model\")` |\n| Get parameters |  `.get_params()` |\n\n\n### Attributes\nAfter having trained your BERTopic model, several attributes are saved within your model. These attributes, in part, \nrefer to how model information is stored on an estimator during fitting. The attributes that you see below all end in `_` and are \npublic attributes that can be used to access model information. \n\n| Attribute | Description |\n|------------------------|---------------------------------------------------------------------------------------------|\n| `.topics_`               | The topics that are generated for each document after training or updating the topic model. |\n| `.probabilities_` | The probabilities that are generated for each document if HDBSCAN is used. |\n| `.topic_sizes_`           | The size of each topic                                                                      |\n| `.topic_mapper_`          | A class for tracking topics and their mappings anytime they are merged/reduced.             |\n| `.topic_representations_` | The top *n* terms per topic and their respective c-TF-IDF values.                           |\n| `.c_tf_idf_`              | The topic-term matrix as calculated through c-TF-IDF.                                       |\n| `.topic_aspects_`          | The different aspects, or representations, of each topic.                                  |\n| `.topic_labels_`          | The default labels for each topic.                                                          |\n| `.custom_labels_`         | Custom labels for each topic as generated through `.set_topic_labels`.                      |\n| `.topic_embeddings_`      | The embeddings for each topic if `embedding_model` was used.                                |\n| `.representative_docs_`   | The representative documents for each topic if HDBSCAN is used.                             |\n\n\n### Variations\nThere are many different use cases in which topic modeling can be used. As such, several variations of BERTopic have been developed such that one package can be used across many use cases.\n\n| Method | Code  | \n|-----------------------|---|\n| [Topic Distribution Approximation](https://maartengr.github.io/BERTopic/getting_started/distribution/distribution.html) | `.approximate_distribution(docs)` |\n| [Online Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/online/online.html) | `.partial_fit(doc)` |\n| [Semi-supervised Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/semisupervised/semisupervised.html) | `.fit(docs, y=y)` |\n| [Supervised Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/supervised/supervised.html) | `.fit(docs, y=y)` |\n| [Manual Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/manual/manual.html) | `.fit(docs, y=y)` |\n| [Multimodal Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/multimodal/multimodal.html) | ``.fit(docs, images=images)`` |\n| [Topic Modeling per Class](https://maartengr.github.io/BERTopic/getting_started/topicsperclass/topicsperclass.html) | `.topics_per_class(docs, classes)` |\n| [Dynamic Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/topicsovertime/topicsovertime.html) | `.topics_over_time(docs, timestamps)` |\n| [Hierarchical Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/hierarchicaltopics/hierarchicaltopics.html) | `.hierarchical_topics(docs)` |\n| [Guided Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/guided/guided.html) | `BERTopic(seed_topic_list=seed_topic_list)` |\n| [Zero-shot Topic Modeling](https://maartengr.github.io/BERTopic/getting_started/zeroshot/zeroshot.html) | `BERTopic(zeroshot_topic_list=zeroshot_topic_list)` |\n| [Merge Multiple Models](https://maartengr.github.io/BERTopic/getting_started/merge/merge.html) | `BERTopic.merge_models([topic_model_1, topic_model_2])` |\n\n\n### Visualizations\nEvaluating topic models can be rather difficult due to the somewhat subjective nature of evaluation. \nVisualizing different aspects of the topic model helps in understanding the model and makes it easier \nto tweak the model to your liking. \n\n| Method | Code  | \n|-----------------------|---|\n| Visualize Topics    |  `.visualize_topics()` |\n| Visualize Documents    |  `.visualize_documents()` |\n| Visualize Document Hierarchy    |  `.visualize_hierarchical_documents()` |\n| Visualize Topic Hierarchy    |  `.visualize_hierarchy()` |\n| Visualize Topic Tree   |  `.get_topic_tree(hierarchical_topics)` |\n| Visualize Topic Terms    |  `.visualize_barchart()` |\n| Visualize Topic Similarity  |  `.visualize_heatmap()` |\n| Visualize Term Score Decline  |  `.visualize_term_rank()` |\n| Visualize Topic Probability Distribution    |  `.visualize_distribution(probs[0])` |\n| Visualize Topics over Time   |  `.visualize_topics_over_time(topics_over_time)` |\n| Visualize Topics per Class | `.visualize_topics_per_class(topics_per_class)` | \n\n\n## Citation\nTo cite the [BERTopic paper](https://arxiv.org/abs/2203.05794), please use the following bibtex reference:\n\n```bibtext\n@article{grootendorst2022bertopic,\n  title={BERTopic: Neural topic modeling with a class-based TF-IDF procedure},\n  author={Grootendorst, Maarten},\n  journal={arXiv preprint arXiv:2203.05794},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaartenGr%2FBERTopic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMaartenGr%2FBERTopic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaartenGr%2FBERTopic/lists"}