https://github.com/kirillbobyrev/visartm

Visualizer for Topic Models built using BigARTM
https://github.com/kirillbobyrev/visartm

Last synced: 2 months ago
JSON representation

Visualizer for Topic Models built using BigARTM

Host: GitHub
URL: https://github.com/kirillbobyrev/visartm
Owner: kirillbobyrev
License: apache-2.0
Created: 2016-03-18T01:00:22.000Z (over 9 years ago)
Default Branch: main
Last Pushed: 2021-04-25T14:22:10.000Z (about 4 years ago)
Last Synced: 2025-04-14T11:18:56.657Z (2 months ago)
Language: HTML
Homepage:
Size: 360 KB
Stars: 6
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # VisARTM

VisARTM is intended to become a successor of

[tm_navigator](https://github.com/omtcyf0/tm_navigator), a tool for visualizing

and assessing Topic Models primarily built using

[BigARTM](https://github.com/bigartm/bigartm) - fast and scalable library

for Topic Modelling.

## Installation and setup

VisARTM uses Python 3. While VisARTM is likely to work with Python 2, it is not

guaranteed.

`pip install -r requirements.txt` before using VisARTM. VisARTM requiers fairly

recent `flask` and `flask_sqlalchemy`.

## Data format in VisARTM

All files required by VisARTM should be provided in `.csv` format. See columns

and sample values for each input file below.

### Files related to dataset

#### document.csv

id|abstract|content

---|---|---

0|document-0|abstact-0|document-0-content

1|document-1|abstact-1|
Header
document-1-content

#### term.csv

id|text

---|---

0|milk

1|Python

#### document_similarity.csv

document_l_id|document_r_id|similarity

---|---|---

0|1|0.5

0|2|0.2

#### term_similarity.csv

term_l_id|term_r_id|similarity

---|---|---

0|1|0.5

0|2|0.6

#### document_term.csv

document_id|term_id|count

---|---|---

0|0|100

0|1|0

### Files related to topic model

#### topic.csv

id|title|probability|is_background

---|---|---|---

0|Topic 0|0.95|1

1|Topic 1|0.2|0

#### topic_similarity.csv

topic_l_id|topic_r_id|similarity

---|---|---

0|1|0.22

0|2|0.6

#### document_topic.csv

document_id|topic_id|prob_dt|prob_td

---|---|---|---

0|0|0.22|0.6

0|1|0.61|0.3

#### topic_term.csv

topic_id|term_id|prob_wt|prob_tw

---|---|---|---

0|0|0.22|0.6

0|1|0.4|0.2

## Loading data into VisARTM

To generate some random data and see its visualization use `./setup_sample.py`.

This script generates some random data, writes everything to `data` subfolder

and adds generated data to VisARTM database.

Generating VisARTM-compatible models from BigARTM models would be supported in

the future.

To load your custom model into VisARTM do the following:

0. Put data files in appropriate format into a folder.

1. Call `clear()` and `create()` to ensure that project database is cleared

from everything.

2. Call following Python functions from `manage.py`:

  * `add_dataset('Your Dataset Name', 'path_to_dataset')` - this creates

    dataset-related entries in the database and loads data.

  * `add_topic_model('Your Topic Model name', 'data', created_dataset_id)`

    where `created_dataset_id` is the id of added dataset returned from

    previous point

3. Good job! Now you're all set. Do `python3 serve.py` to see the loaded model

and begin assessment.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kirillbobyrev/visartm

Awesome Lists containing this project

README

Header