An open API service indexing awesome lists of open source software.

https://github.com/gjjvdburg/gensvm

A Generalized Multiclass Support Vector Machine available in Python, R, and C
https://github.com/gjjvdburg/gensvm

c classification machine-learning multiclass-classification support-vector-machines

Last synced: 9 months ago
JSON representation

A Generalized Multiclass Support Vector Machine available in Python, R, and C

Awesome Lists containing this project

README

          

GenSVM
======

This is the repository for the C implementation of *GenSVM*, a generalized
multiclass support vector machine proposed in:

> [GenSVM: A Generalized Multiclass Support Vector
> Machine](http://jmlr.org/papers/v17/14-526.html)

> G.J.J. van den Burg and P.J.F. Groenen

> *Journal of Machine Learning Research*, 2016.

GenSVM is available in these languages:

Language | URL
:-------:|:-------:
| [https://github.com/GjjvdBurg/PyGenSVM](https://github.com/GjjvdBurg/PyGenSVM)
| [https://github.com/GjjvdBurg/RGenSVM](https://github.com/GjjvdBurg/RGenSVM)
| [https://github.com/GjjvdBurg/GenSVM](https://github.com/GjjvdBurg/GenSVM)

Introduction
------------

GenSVM is a general multiclass support vector machine, which you can use for
classification problems with multiple classes. Training GenSVM in
cross-validation or grid search setups can be done efficiently due to the
ability to use warm starts. See the
[paper](http://jmlr.org/papers/v17/14-526.html) for more information, and
Usage below for how to use GenSVM.

The library has support for datasets in
[MSVMpack](https://members.loria.fr/FLauer/files/MSVMpack/MSVMpack.html) and
[LibSVM/SVMlight](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) format, and can
take advantage of sparse datasets. There is also preliminary support for
nonlinear GenSVM through kernels.

For documentation on how the library is implemented, see the [Doxygen
documentation available here](https://gjjvdburg.github.io/GenSVM/). There are
also many unit tests, which you can use to further understand how the library
works. For the latest version of the library you can view the [test coverage
report](https://gjjvdburg.github.io/GenSVM/cover) online.

This is the C library for GenSVM that contains two executables for using the
method. A Python package for GenSVM is available
[here](https://github.com/GjjvdBurg/PyGenSVM). An R package for GenSVM is
planned. If you are interested in this, please express your interest for the
R package [here](https://github.com/GjjvdBurg/GenSVM/issues/2).

Usage
-----

First, download and compile the library. Minimal requirements for compilation
are a working BLAS and LAPACK installation, which you can likely obtain from
your package manager. It is however recommended to use ATLAS versions of these
libraries, since this will give a significant increase in speed. If you choose
not to use ATLAS, remove linking with ``-latlas`` in the ``LDFLAGS`` variable
in the Makefile.

Then, compile the library with a simple:

```
$ make
```

If you like to run the tests, use ``make test`` on the command line.

After successful compilation, you will have two executables ``gensvm`` and
``gensvm_grid``. Type:

```
$ ./gensvm
```

To get an overview of the command line options to the executable (similar for
``gensvm_grid``).

The ``gensvm`` executable can be used to train a GenSVM model on a dataset
with a single hyperparameter configuration, whereas the ``gensvm_grid``
executable can be used to run a grid search on a dataset.

Here's an example of using the ``gensvm`` executable on a single dataset, with
some custom parameters:

```
$ ./gensvm -l 1e-5 -k 1.0 -p 1.5 data/iris.train
```

This fits the model with regularization parameter ``1e-5``, Huber hinge
parameter ``1.0`` and lp norm parameter ``1.5``, and default settings
otherwise. On my computer this yields a model with 18 support vectors in about
0.1 seconds. The ``gensvm`` executable can also be used to get predictions for
a test dataset, if it is supplied as final argument to the command. In this
case, predictions will be printed to stdout, unless an output file is
specified with the ``-o`` option.

The ``gensvm_grid`` executable can be used to run a grid search on a dataset.
The input to this executable is a file (called a grid file), which specifies
the values of the parameters. See the ``training`` directory for examples and
the documentation [here](https://gjjvdburg.github.io/GenSVM/) for more info on
the file format. One important thing to note is that when the ``repeats``
field has a positive value, a so-called "consistency check" will be performed
after the grid search has finished. This is a robustness check on the best
performing configurations, to find the best overall hyperparameter
configuration with the best performance and smallest training time. In this
robustness check warm-starts are not used, to ensure the observations are
independent measurements of training time.

Here's an example of running ``gensvm_grid`` without repeats on the iris
dataset:

```
$ ./gensvm_grid training/iris_norepeats.training
```

On my computer this runs in about 8 seconds with 342 hyperparameter
configurations. Alternatively, if consistency checks are desired we can run:

```
$ ./gensvm_grid training/iris.training
```

which runs the same grid search but also does 5 consistency repeats for each
of the configurations with the 5% best performance. Note that the performance
is measured by cross-validated accuracy scores. This example runs in about 13
seconds on my computer.

Reference
---------

If you use GenSVM in any of your projects, please cite the GenSVM paper
available at
[http://jmlr.org/papers/v17/14-526.html](http://jmlr.org/papers/v17/14-526.html).
You can use the following BibTeX code:

```bib
@article{JMLR:v17:14-526,
author = {Gerrit J.J. van den Burg and Patrick J.F. Groenen},
title = {{GenSVM}: A Generalized Multiclass Support Vector Machine},
journal = {Journal of Machine Learning Research},
year = {2016},
volume = {17},
number = {225},
pages = {1-42},
url = {http://jmlr.org/papers/v17/14-526.html}
}
```

License
-------

Copyright 2016, G.J.J. van den Burg.

GenSVM is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

GenSVM is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GenSVM. If not, see .

For more information please contact:

G.J.J. van den Burg
email: gertjanvandenburg@gmail.com