https://github.com/x-tabdeveloping/topic-benchmark

Just Benchmarking Topic Models :)
https://github.com/x-tabdeveloping/topic-benchmark

Last synced: 25 days ago
JSON representation

Just Benchmarking Topic Models :)

Host: GitHub
URL: https://github.com/x-tabdeveloping/topic-benchmark
Owner: x-tabdeveloping
License: mit
Created: 2024-02-19T07:38:08.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-05-14T12:42:54.000Z (9 months ago)
Last Synced: 2025-05-14T12:46:16.559Z (9 months ago)
Language: Python
Size: 31.2 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # topic-benchmark

Command Line Interface for benchmarking topic models.

The package contains `catalogue` registries for all models, datasets and metrics for model evaluation,

along with scripts for producing tables and figures for the S3 paper.

## Usage

### Installation

You can install the package from PyPI.

```bash

pip install topic-benchmark

```

### Commands

#### `run`

Run the benchmark. Defaults to running all models with the benchmark used in Kardos et al. (2024).

```bash

python3 -m topic_benchmark run

```

| Argument               | Short Flag | Description                                                                                              | Type                                   | Default         |

|------------------------|------------|----------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------|

| `--out_dir OUT_DIR`    | `-o`       | Output directory for the results.                                                                        | `str`                                  | `results/`      |

| `--encoders ENCODERS`  | `-e`       | Which encoders should be used for conducting runs?                                                       | `str`                                  | `None`          |

| `--models MODELS`      | `-m`       | What subsection of models should the benchmark be run on.                                                | `Optional[list[str], NoneType]`        | `None`          |

| `--datasets DATASETS`  | `-d`       | What datasets should the models be evaluated on.                                                         | `Optional[list[str], NoneType]`        | `None`          |

| `--metrics METRICS`    | `-t`       | What metrics should the models be evaluated on.                                                          | `Optional[list[str], NoneType]`        | `None`          |

| `--seeds SEEDS`        | `-s`       | What seeds should the models be evaluated on.                                                            | `Optional[list[int], NoneType]`        | `None`          |

### Push to hub

Push results to a HuggingFace repository.

```bash

python3 -m topic_benchmark push_to_hub "your_user/your_repo"

```

| Argument          | Description                                            | Type  | Default    |

|-------------------|--------------------------------------------------------|-------|------------|

| `hf_repo`         | HuggingFace repository to push results to.             | `str` | N/A        |

| `results_folder`  | Folder containing results for all embedding models.    | `str` | `results/` |

## Reproducing $S^3$ paper results

Result files to all runs in the $S^3$ publication can be found in the `results/` folder in the repository.

To reproduce the results reported in our paper, please do the following.

First, install this package by running the following command:

```bash

pip install topic-benchmark

python3 -m topic-benchmark run -o results/

```

The results for each embedding model will be found in the `results` folder (unless a value for `--out_file` is explicitly passed).

To produce figures and tables in the paper, you can use the scripts in the  `scripts/s3_paper/` folder.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/x-tabdeveloping/topic-benchmark

Awesome Lists containing this project

README