Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yoch/sparse-som
Efficient Self-Organizing Map for Sparse Data
https://github.com/yoch/sparse-som
algorithm neural-nets openmp python self-organizing-map som sparse-data
Last synced: 9 days ago
JSON representation
Efficient Self-Organizing Map for Sparse Data
- Host: GitHub
- URL: https://github.com/yoch/sparse-som
- Owner: yoch
- License: gpl-3.0
- Created: 2017-07-26T11:36:41.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-11-27T13:17:20.000Z (almost 4 years ago)
- Last Synced: 2024-10-06T05:26:41.678Z (about 1 month ago)
- Topics: algorithm, neural-nets, openmp, python, self-organizing-map, som, sparse-data
- Language: C++
- Homepage:
- Size: 4.76 MB
- Stars: 18
- Watchers: 4
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sparse-som
Efficient Implementation of Self-Organizing Map for Sparse Input Data.
This program uses an algorithm especially intended for sparse data,
which much faster than the classical one on very sparse datasets
(time-complexity depend to non-zero values only).#### Main features
- Highly optimized for sparse data (LIBSVM format).
- Support both online and batch SOM algorithms.
- Parallel batch implementation (OpenMP).
- OS independent.
- [Python](https://pypi.python.org/pypi?:action=display&name=sparse-som) support.## Build
The simplest way to build the cli tools from the main directory : `cd src && make all`.
After the compilation terminates, the resulting executables may be found in the `build` directory.GCC is reccomended, but you can use another compiler if you want. C++11 support is required.
OpenMP support is required to take advantage of parallelism (sparse-bsom).## Install
####
No install required.
#### Python
To install the python version, simply run `pip install sparse-som`.
## Usage
### CLI
#### sparse-som
To use the *online* version :
```
Usage: sparse-som
-i infile input file at libsvm sparse format
-y nrows number of rows in the codebook
-x ncols number of columns in the codebook
[ -d dim ] force the dimension of codebook's vectors
[ -u ] one based column indices (default is zero based)
[ -N ] normalize the input vectors
[ -l cb ] load codebook from binary file
[ -o|O cb ] output codebook to filename (o:binary, O:text)
[ -c|C cl ] output classification (c:without counts, C:with counts)
[ -n neig ] neighborhood topology: 4=circ, 6=hexa, 8=rect (default 8)
[ -t n | -T e ] number of training iterations or epochs (epoch = nrows)
[ -r r0 -R rN ] radius at start and end (default r=(x+y)/2, R=0.5)
[ -a a0 -A aN ] learning rate at start and end (default a=0.5, A=1.e-37)
[ -H rCool ] radius cooling: 0=linear, 1=exponential (default 0)
[ -h aCool ] alpha cooling: 0=linear, 1=exponential (default 0)
[ -s stdCf ] sigma = radius * stdCf (default 0.3)
[ -v ] increase verbosity level (default 0, max 2)
```#### sparse-bsom
To use the *batch* version :
```
Usage: sparse-bsom
-i infile input file at libsvm sparse format
-y nrows number of rows in the codebook
-x ncols number of columns in the codebook
[ -d dim ] force the dimension of codebook's vectors
[ -u ] one based column indices (default is zero based)
[ -N ] normalize the input vectors
[ -l cb ] load codebook from binary file
[ -o|O cb ] output codebook to filename (o:binary, O:text)
[ -c|C cl ] output classification (c:without counts, C:with counts)
[ -n neig ] neighborhood topology: 4=circ, 6=hexa, 8=rect (default 8)
[ -T epoc ] number of epochs (default 10)
[ -r r0 -R rN ] radius at start and end (default r=(x+y)/2, R=0.5)
[ -H rCool ] radius cooling: 0=linear, 1=exponential (default 0)
[ -s stdCf ] sigma = radius * stdCf (default 0.3)
[ -v ] increase verbosity level (default 0, max 2)
```To control the number of threads used by OpenMP, set to `OMP_NUM_THREADS` variable to the desired value, for example :
```
OMP_NUM_THREADS=4 sparse-bsom ...
```If undefined one thread per CPU is used.
### Python
```python
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.datasets import load_digits
from sklearn.metrics import classification_report
from sparse_som import *# Load some dataset
dataset = load_digits()# convert to sparse CSR format
X = csr_matrix(dataset.data, dtype=np.float32)# setup SOM dimensions
H, W = 12, 15 # Network height and width
_, N = X.shape # Nb. features (vectors dimension)################ Simple usage ################
# setup SOM network
som = Som(H, W, N, topology.HEXA) # , verbose=True
print(som.nrows, som.ncols, som.dim)# reinit the codebook (not needed)
som.codebook = np.random.rand(H, W, N).\
astype(som.codebook.dtype, copy=False)# train the SOM
som.train(X)# get bmus for the data
bmus = som.bmus(X)################ Use classifier ################
# setup SOM classifier (using batch SOM)
cls = SomClassifier(BSom, H, W, N)# train SOM, do calibration and predict labels
y = cls.fit_predict(X, labels=dataset.target)print('Quantization Error: %2.4f' % cls.quant_error)
print('Topographic Error: %2.4f' % cls.topog_error)
print('='*50)
print(classification_report(dataset.target, y))
```Other examples are available in the `python/examples` directory.
## Documentation
### CLI
#### Files Format
Input files must be at LIBSVM format.
```
: : ...
.
.
.
```Each line contains an instance and is ended by a '\n' character. The pair `:` gives a feature (attribute) value: `` is an integer starting from 0 and `` is a real number. Indices must be in ASCENDING order. Labels in the file are only used for network calibration. If they are unknown, just fill the first column with any numbers.
### Python documentation
The python documentation can be found at: http://sparse-som.readthedocs.io/en/latest/
### API
The C++ API is not public yet, because things still may change.
## How to cite this work
```
@InProceedings{melka-mariage:ijcci17,
author={Melka, Josu{\'e} and Mariage, Jean-Jacques},
title={Efficient Implementation of Self-Organizing Map for Sparse Input Data},
booktitle={Proceedings of the 9th International Joint Conference on Computational Intelligence: IJCCI},
volume={1},
month={November},
year={2017},
address={Funchal, Madeira, Portugal},
pages={54-63},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006499500540063},
isbn={978-989-758-274-5},
url={http://www.ai.univ-paris8.fr/~jmelka/IJCCI_2017_20.pdf}
}
``````
@Inbook{Melka2019,
author = "Melka, Josu{\'e} and Mariage, Jean-Jacques",
editor = "Sabourin, Christophe and Merelo, Juan Julian and Madani, Kurosh and Warwick, Kevin",
title = "Adapting Self-Organizing Map Algorithm to Sparse Data",
bookTitle = "Computational Intelligence: 9th International Joint Conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1-3, 2017 Revised Selected Papers",
year = "2019",
publisher = "Springer International Publishing",
address = "Cham",
pages = "139--161",
isbn = "978-3-030-16469-0",
doi = "10.1007/978-3-030-16469-0_8",
url = "https://doi.org/10.1007/978-3-030-16469-0_8"
}
```