https://github.com/bumble-tech/buzzwords
GPU-Powered Topic Modelling
https://github.com/bumble-tech/buzzwords
bert bumble buzzwords clustering gpu topic-modelling
Last synced: about 1 year ago
JSON representation
GPU-Powered Topic Modelling
- Host: GitHub
- URL: https://github.com/bumble-tech/buzzwords
- Owner: bumble-tech
- License: apache-2.0
- Created: 2022-08-09T09:22:05.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-12-20T11:00:21.000Z (over 3 years ago)
- Last Synced: 2025-04-06T06:34:25.224Z (about 1 year ago)
- Topics: bert, bumble, buzzwords, clustering, gpu, topic-modelling
- Language: Python
- Homepage: https://bumble-tech.github.io/buzzwords/
- Size: 86.9 KB
- Stars: 70
- Watchers: 5
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Buzzwords
Buzzwords is Bumble's GPU-powered topic modelling tool, used for gathering insights on topics in text or images on a large scale. The algorithm is based on [Bertopic](https://maartengr.github.io/BERTopic/index.html) and [Top2Vec](https://arxiv.org/abs/2008.09470), but altered to be faster.
For more information see [the website](https://bumble-tech.github.io/buzzwords/)
## Installation
Installation for buzzwords is somewhat complicated, due to the need for RAPIDS.ai (and to a lesser extent, FAISS) on an Nvidia GPU-powered machine. RAPIDS doesn't support installation through pip anymore, so we need to use conda environments.
For ease of installation, we've packaged it up into a bash script `install.sh`
```bash
$ ./install.sh buzzwords
```
This will create the conda environment (with either your given name or `buzzwords` as default) with Buzzwords installed in it
## Basic Examples
To instantiate the model is very simple
```python
from buzzwords import Buzzwords
model = Buzzwords()
```
To train the model on a set of documents, call the fit_transform() function to return the topics
```python
docs = df['text_column']
topics = model.fit_transform(docs)
```