Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yoch/sparse-som

Efficient Self-Organizing Map for Sparse Data
https://github.com/yoch/sparse-som

algorithm neural-nets openmp python self-organizing-map som sparse-data

Last synced: 3 months ago
JSON representation

Efficient Self-Organizing Map for Sparse Data

Host: GitHub
URL: https://github.com/yoch/sparse-som
Owner: yoch
License: gpl-3.0
Created: 2017-07-26T11:36:41.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-11-27T13:17:20.000Z (about 4 years ago)
Last Synced: 2024-10-06T05:26:41.678Z (4 months ago)
Topics: algorithm, neural-nets, openmp, python, self-organizing-map, som, sparse-data
Language: C++
Homepage:
Size: 4.76 MB
Stars: 18
Watchers: 4
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # sparse-som

Efficient Implementation of Self-Organizing Map for Sparse Input Data.

This program uses an algorithm especially intended for sparse data, 

which much faster than the classical one on very sparse datasets

(time-complexity depend to non-zero values only).

#### Main features

- Highly optimized for sparse data (LIBSVM format).

- Support both online and batch SOM algorithms.

- Parallel batch implementation (OpenMP).

- OS independent.

- [Python](https://pypi.python.org/pypi?:action=display&name=sparse-som) support.

## Build

The simplest way to build the cli tools from the main directory : `cd src && make all`.

After the compilation terminates, the resulting executables may be found in the `build` directory.

GCC is reccomended, but you can use another compiler if you want. C++11 support is required.

OpenMP support is required to take advantage of parallelism (sparse-bsom).

## Install

####

No install required.

#### Python

To install the python version, simply run `pip install sparse-som`.

## Usage

### CLI

#### sparse-som

To use the *online* version :

```

Usage: sparse-som

        -i infile        input file at libsvm sparse format

        -y nrows         number of rows in the codebook

        -x ncols         number of columns in the codebook

        [ -d dim ]       force the dimension of codebook's vectors

        [ -u ]           one based column indices (default is zero based)

        [ -N ]           normalize the input vectors

        [ -l cb ]        load codebook from binary file

        [ -o|O cb ]      output codebook to filename (o:binary, O:text)

        [ -c|C cl ]      output classification (c:without counts, C:with counts)

        [ -n neig ]      neighborhood topology: 4=circ, 6=hexa, 8=rect (default 8)

        [ -t n | -T e ]  number of training iterations or epochs (epoch = nrows)

        [ -r r0 -R rN ]  radius at start and end (default r=(x+y)/2, R=0.5)

        [ -a a0 -A aN ]  learning rate at start and end (default a=0.5, A=1.e-37)

        [ -H rCool ]     radius cooling: 0=linear, 1=exponential (default 0)

        [ -h aCool ]     alpha cooling: 0=linear, 1=exponential (default 0)

        [ -s stdCf ]     sigma = radius * stdCf (default 0.3)

        [ -v ]           increase verbosity level (default 0, max 2)

```

#### sparse-bsom

To use the *batch* version :

```

Usage: sparse-bsom

        -i infile        input file at libsvm sparse format

        -y nrows         number of rows in the codebook

        -x ncols         number of columns in the codebook

        [ -d dim ]       force the dimension of codebook's vectors

        [ -u ]           one based column indices (default is zero based)

        [ -N ]           normalize the input vectors

        [ -l cb ]        load codebook from binary file

        [ -o|O cb ]      output codebook to filename (o:binary, O:text)

        [ -c|C cl ]      output classification (c:without counts, C:with counts)

        [ -n neig ]      neighborhood topology: 4=circ, 6=hexa, 8=rect (default 8)

        [ -T epoc ]      number of epochs (default 10)

        [ -r r0 -R rN ]  radius at start and end (default r=(x+y)/2, R=0.5)

        [ -H rCool ]     radius cooling: 0=linear, 1=exponential (default 0)

        [ -s stdCf ]     sigma = radius * stdCf (default 0.3)

        [ -v ]           increase verbosity level (default 0, max 2)

```

To control the number of threads used by OpenMP, set to `OMP_NUM_THREADS` variable to the desired value, for example :

```

OMP_NUM_THREADS=4 sparse-bsom ...

```

If undefined one thread per CPU is used.

### Python

```python

import numpy as np

from scipy.sparse import csr_matrix

from sklearn.datasets import load_digits

from sklearn.metrics import classification_report

from sparse_som import *

# Load some dataset

dataset = load_digits()

# convert to sparse CSR format

X = csr_matrix(dataset.data, dtype=np.float32)

# setup SOM dimensions

H, W = 12, 15   # Network height and width

_, N = X.shape  # Nb. features (vectors dimension)

################ Simple usage ################

# setup SOM network

som = Som(H, W, N, topology.HEXA) # , verbose=True

print(som.nrows, som.ncols, som.dim)

# reinit the codebook (not needed)

som.codebook = np.random.rand(H, W, N).\

                    astype(som.codebook.dtype, copy=False)

# train the SOM

som.train(X)

# get bmus for the data

bmus = som.bmus(X)

################ Use classifier ################

# setup SOM classifier (using batch SOM)

cls = SomClassifier(BSom, H, W, N)

# train SOM, do calibration and predict labels

y = cls.fit_predict(X, labels=dataset.target)

print('Quantization Error: %2.4f' % cls.quant_error)

print('Topographic  Error: %2.4f' % cls.topog_error)

print('='*50)

print(classification_report(dataset.target, y))

```

Other examples are available in the `python/examples` directory.

## Documentation

### CLI

#### Files Format

Input files must be at LIBSVM format.

```

 : : ...

.

.

.

```

Each line contains an instance and is ended by a '\n' character. The pair `:` gives a feature (attribute) value: `` is an integer starting from 0 and `` is a real number. Indices must be in ASCENDING order. Labels in the file are only used for network calibration. If they are unknown, just fill the first column with any numbers.

### Python documentation

The python documentation can be found at: http://sparse-som.readthedocs.io/en/latest/

### API

The C++ API is not public yet, because things still may change.

## How to cite this work

```

@InProceedings{melka-mariage:ijcci17,

  author={Melka, Josu{\'e} and Mariage, Jean-Jacques},

  title={Efficient Implementation of Self-Organizing Map for Sparse Input Data},

  booktitle={Proceedings of the 9th International Joint Conference on Computational Intelligence: IJCCI},

  volume={1},

  month={November},

  year={2017},

  address={Funchal, Madeira, Portugal},

  pages={54-63},

  publisher={SciTePress},

  organization={INSTICC},

  doi={10.5220/0006499500540063},

  isbn={978-989-758-274-5},

  url={http://www.ai.univ-paris8.fr/~jmelka/IJCCI_2017_20.pdf}

}

```

```

@Inbook{Melka2019,

    author    = "Melka, Josu{\'e} and Mariage, Jean-Jacques",

    editor    = "Sabourin, Christophe and Merelo, Juan Julian and Madani, Kurosh and Warwick, Kevin",

    title     = "Adapting Self-Organizing Map Algorithm to Sparse Data",

    bookTitle = "Computational Intelligence: 9th International Joint Conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1-3, 2017 Revised Selected Papers",

    year      = "2019",

    publisher = "Springer International Publishing",

    address   = "Cham",

    pages     = "139--161",

    isbn      = "978-3-030-16469-0",

    doi       = "10.1007/978-3-030-16469-0_8",

    url       = "https://doi.org/10.1007/978-3-030-16469-0_8"

}

```