https://github.com/fvalle1/nsbm
nSBM: multi branch topic modeling
https://github.com/fvalle1/nsbm
hacktoberfest hacktoberfest-accepted hacktoberfest2022 hacktoberfest2024 natural-language-processing python stochastic-simulation-algorithm topic-modeling
Last synced: 24 days ago
JSON representation
nSBM: multi branch topic modeling
- Host: GitHub
- URL: https://github.com/fvalle1/nsbm
- Owner: fvalle1
- License: gpl-3.0
- Created: 2021-01-28T17:21:02.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2025-09-02T08:36:11.000Z (6 months ago)
- Last Synced: 2026-01-26T22:23:01.069Z (about 2 months ago)
- Topics: hacktoberfest, hacktoberfest-accepted, hacktoberfest2022, hacktoberfest2024, natural-language-processing, python, stochastic-simulation-algorithm, topic-modeling
- Language: Jupyter Notebook
- Homepage:
- Size: 16.3 MB
- Stars: 1
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
[](https://zenodo.org/badge/latestdoi/333831359)
[](https://trisbm.readthedocs.io/en/latest/?badge=latest)
[](https://github.com/fvalle1/trisbm/actions/workflows/python-package.yml)
[](https://github.com/fvalle1/nsbm/actions/workflows/miniconda.yml)
[](https://github.com/fvalle1/trisbm/actions/workflows/docker.yml)
[](LICENSE)
# multipartite Stochastic Block Modeling
Inheriting hSBM from [https://github.com/martingerlach/hSBM_Topicmodel](https://github.com/martingerlach/hSBM_Topicmodel) extends it to tripartite networks (aka supervised topic models)
The idea is to run SBM-based topic modeling on networks given keywords on documents

# Install
## With pip
```bash
python3 -m pip install . -vv
```
## With conda/mamba
```bash
conda install -c conda-forge nsbm
```
# Example
```python
from nsbm import nsbm
import pandas as pd
import numpy as np
df = pd.DataFrame(
index = ["w{}".format(w) for w in range(1000)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 100, 250000).reshape((1000, 250)))
df_key_list = []
## keywords
df_key_list.append(
pd.DataFrame(
index = ["keyword{}".format(w) for w in range(100)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 10, (100, 250)))
)
## authors
df_key_list.append(
pd.DataFrame(
index = ["author{}".format(w) for w in range(10)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 5, (10, 250)))
)
## other features
df_key_list.append(
pd.DataFrame(
index = ["feature{}".format(w) for w in range(25)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 5, (25, 250)))
)
model = nsbm()
model.make_graph_multiple_df(df, df_key_list)
model.fit(n_init=1, B_min=50, verbose=False)
model.save_data()
```
# Run with Docker
```bash
docker run -it -u jovyan -v $PWD:/home/jovyan/work -p 8899:8888 docker.pkg.github.com/fvalle1/trisbm/trisbm:latest
```
If a *graph.xml.gz* file is found in the current dir the analysis will be performed on it.
# Tests
```bash
python3 tests/run_tests.py
```
# Caveats
Please check this stuff in your data:
- there should be no zero-degree nodes (all nodes should have at least one link)
- there shouldn't be any duplicate node
- The `make_form_BoW_df` function discretises the data
# Documentation
[Docs](https://fvalle1.github.io/nsbm/)
[Readthedocs](https://trisbm.readthedocs.io/en/latest/index.html)
# License
See [LICENSE](LICENSE).
This work [is in part based on](https://www.gnu.org/licenses/gpl-faq.en.html#WhyDoesTheGPLPermitUsersToPublishTheirModifiedVersions) [sbmtm](https://github.com/martingerlach/hSBM_Topicmodel)
## Third party libraries
This package depends on [graph-tool](https://graph-tool.skewed.de)