Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Vokturz/tsnmf-sparse

Topic supervised non-negative matrix factorization with sparse matrices
https://github.com/Vokturz/tsnmf-sparse

Last synced: 3 months ago
JSON representation

Topic supervised non-negative matrix factorization with sparse matrices

Host: GitHub
URL: https://github.com/Vokturz/tsnmf-sparse
Owner: Vokturz
License: bsd-3-clause
Created: 2019-07-20T21:21:55.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-03-24T19:31:01.000Z (over 4 years ago)
Last Synced: 2024-07-10T17:04:27.215Z (4 months ago)
Language: Python
Size: 33.2 KB
Stars: 12
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# tsnmf-sparse

This repository contains an implementation of Topic-Supervised Non-Negative Matrix Factorization (TS-NMF) [1] with Sparse Matrices in Python, using a Scikit-Learn's compatible API.

## How it Works
From [1]: Suppose that one supervises *k << n* documents and identifies *l << t* topics that were contained in a subset of the documents. One can supervise the `NMF` method using this information, represented by an *n×d topic supervision* matrix *L*.The elements of *L* contrain the importance weights of matrix *W* and are of the following form:

$L_{ij}=\left\{\begin{matrix} 1 &\text{if topic } j \text{ is permitted in document } i\\ 0 &\text{if topic } j \text{ is \textit{not} permitted in document } i\\ \end{matrix}\right.$

Then, for a term-document matrix *V* and supervision matrix *L*, TS-NMF seeks matrices *W* and *H* that minimize

$D_{TS}(W,H)=||V-(W \circ L) H||^2,\quad W \geq 0,\quad H \geq0.$

Where ○ represent the Hadamard (element-wise) product operator.

## Installation
You can install TS-NMF via pip:

```python
pip install tsnmf
```

Or clonning this repository and running `setup.py`:

```python
python setup.py install
```
## Usage
TS-NMF is used in a similar way as the module `decomposition.NMF` from Scikit-Learn. The extra thing that you need is a `list of list` that contains the labels to build the matrix *L*.

Suppose you want to get 3 topics from 5 documents. The 5 documents should be represented in a matrix `V`, the most used way is apply a TF-IDF Vectorizer, which reflect how important a word is to a document.

Each element of the `list of list` of labels correspond to a document. These elements contain a list of topics that contrain the document. For example

```python

labels = [[],
[0,2], # document 1
[],
[],
[1]] # document 4
```

means that the document 1 is contrained to be topic 0 or 2 and document 4 to be topic 1. For the other documents all the topics are permitted.

Finally, to run TS-NMF:

```python
from tsnmf import TSNMF

tsnmf = TSNMF(n_components=3, random_state=1)
W = tsnmf.fit_transform(V, labels=labels)
H = tsnmf.components_
```

## Credits

- Developed mainly by Victor Navarro (@vokturz), under the guidance of Eduardo Graells-Garrido (@carnby), in the context of CONICYT Fondo de Fomento al Desarrollo Científico y Tecnológico (FONDECYT) Proyecto de Iniciación 11180913.
- Based on [scikit-learn's NMF code](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/_nmf.py) and the original [ws-nmf](https://github.com/kelsey-macmillan/ws-nmf).

## References

1. MacMillan, Kelsey, and James D. Wilson. ["Topic supervised non-negative matrix factorization."](https://arxiv.org/abs/1706.05084) _arXiv preprint arXiv:1706.05084_ (2017).