https://github.com/skent259/mildsvm

Multiple Instance Learning with Distributions, SVM
https://github.com/skent259/mildsvm

distributional-data multiple-instance-learning ordinal r svm weakly-supervised-learning

Last synced: 7 months ago
JSON representation

Multiple Instance Learning with Distributions, SVM

Host: GitHub
URL: https://github.com/skent259/mildsvm
Owner: skent259
License: other
Created: 2020-07-23T15:55:14.000Z (almost 6 years ago)
Default Branch: main
Last Pushed: 2025-09-05T02:14:08.000Z (9 months ago)
Last Synced: 2025-09-05T04:11:40.234Z (9 months ago)
Topics: distributional-data, multiple-instance-learning, ordinal, r, svm, weakly-supervised-learning
Language: R
Homepage:
Size: 886 KB
Stars: 3
Watchers: 0
Forks: 0
Open Issues: 18
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# mildsvm

[![CRAN status](https://www.r-pkg.org/badges/version/mildsvm)](https://CRAN.R-project.org/package=mildsvm)

[![R-CMD-check](https://github.com/skent259/mildsvm/workflows/R-CMD-check/badge.svg)](https://github.com/skent259/mildsvm/actions)

[![Codecov test coverage](https://codecov.io/gh/skent259/mildsvm/branch/master/graph/badge.svg)](https://app.codecov.io/gh/skent259/mildsvm?branch=master)

Weakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The `mildsvm` package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames. 

The `mildsvm` package implements methods that cover a variety of data types, including:

- ordinal and binary labels

- weakly supervised and traditional supervised structures 

- vector-based and distributional-instance rows of data 

A full table of functions with references is available [below](#methods-implemented). We highlight two methods based on recent research: 

- `omisvm()` runs a novel OMI-SVM approach for ordinal, multiple instance (weakly supervised) data using the work of Kent and Yu (2022+)

- `mismm()` run the MISMM approach for binary, weakly supervised data where the instances can be thought of as a matrix of draws from a distribution. This non-convex SVM approach is formalized and applied to breast cancer diagnosis based on morphological features of the tumor microenvironment in [Kent and Yu (2022)][p2].

## Usage

A typical MI data frame (a `mi_df`) with ordinal labels might look like this, with multiple rows of information for each of the `bag_name`s involved and a label that matches each bag: 

```{r ordmvnorm}

library(mildsvm)

data("ordmvnorm")

print(ordmvnorm)

# dplyr::distinct(ordmvnorm, bag_label, bag_name)

```

The `mildsvm` package uses the familiar formula and predict methods that R uses will be familiar with. To indicate that MI data is involved, we specify the unique bag label and bag name with `mi(bag_label, bag_name) ~ predictors`:  

```{r ord-example}

fit <- omisvm(mi(bag_label, bag_name) ~ V1 + V2 + V3,

              data = ordmvnorm, 

              weights = NULL)

print(fit)

predict(fit, new_data = ordmvnorm)

```

Or, if the data frame has the `mi_df` class, we can directly pass it to the function and all features will be included:

```{r ord-example-2}

fit2 <- omisvm(ordmvnorm)

print(fit2)

```

## Installation

You can install the released version of mildsvm from [CRAN](https://CRAN.R-project.org) with:

``` r

install.packages("mildsvm")

```

Alternatively, you can install the development version from [GitHub](https://github.com/) with:

``` r

# install.packages("devtools")

devtools::install_github("skent259/mildsvm")

```

## Additional Usage

`mildsvm` also works well MI data with distributional instances. There is a 3-level structure with *bags*, *instances*, and *samples*.  As in MIL, *instances* are contained within *bags* (where we only observe the bag label).  However, for MILD, each instance represents a distribution, and the *samples* are drawn from this distribution.  

You can generate MILD data with `generate_mild_df()`:

```{r generate_mild_df}

# Normal(mean=0, sd=1) vs Normal(mean=3, sd=1)

set.seed(4)

mild_df <- generate_mild_df(

  ncov = 1, nimp_pos = 1, nimp_neg = 1, 

  positive_dist = "mvnormal", positive_mean = 3,

  negative_dist = "mvnormal", negative_mean = 0, 

  nbag = 4,

  ninst = 2, 

  nsample = 2

)

print(mild_df)

```

You can train a MISVM classifier using `mismm()` on the MILD data with the `mild()` formula specification:

```{r message = FALSE}

fit3 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1, data = mild_df, cost = 100)

# summarize predictions at the bag layer

library(dplyr)

mild_df %>% 

  dplyr::bind_cols(predict(fit3, mild_df, type = "raw")) %>% 

  dplyr::bind_cols(predict(fit3, mild_df, type = "class")) %>% 

  dplyr::distinct(bag_label, bag_name, .pred, .pred_class)

```

If you summarize a MILD data set (for example, by taking the mean of each covariate), you can recover a MIL data set.  Use `summarize_samples()` for this:

```{r summarize_samples}

mil_df <- summarize_samples(mild_df, .fns = list(mean = mean)) 

print(mil_df)

```

You can train an MI-SVM classifier using `misvm()` on MIL data with the helper function `mi()`:

```{r, message = FALSE, warning=FALSE}

fit4 <- misvm(mi(bag_label, bag_name) ~ mean, data = mil_df, cost = 100)

print(fit4)

```

### Methods implemented

| Function        | Method           | Outcome/label | Data type             | Extra libraries | Reference |

|-----------------|------------------|---------------|-----------------------|-----------------|-----------|

| `omisvm()`      | `"qp-heuristic"` | ordinal       | MI                    | gurobi          | [1]       |

| `mismm()`       | `"heuristic"`    | binary        | distributional MI     | ---             | [2]       |

| `mismm()`       | `"mip"`          | binary        | distributional MI     | gurobi          | [2]       |

| `mismm()`       | `"qp-heuristic"` | binary        | distributional MI     | gurobi          | [2]       |

| `misvm()`       | `"heuristic"`    | binary        | MI                    | ---             | [3]       |

| `misvm()`       | `"mip"`          | binary        | MI                    | gurobi          | [3], [2]  |

| `misvm()`       | `"qp-heuristic"` | binary        | MI                    | gurobi          | [3]       |

| `mior()`        | `"qp-heuristic"` | ordinal       | MI                    | gurobi          | [4]       |

| `misvm_orova()` | `"heuristic"`    | ordinal       | MI                    | ---             | [3], [1]  |

| `misvm_orova()` | `"mip"`          | ordinal       | MI                    | gurobi          | [3], [1]  |

| `misvm_orova()` | `"qp-heuristic"` | ordinal       | MI                    | gurobi          | [3], [1]  |

| `svor_exc()`    | `"smo"`          | ordinal       | vector                | ---             | [5]       |

| `smm()`         | ---              | binary        | distributional vector | ---             | [6]       |

#### Table acronyms

- MI: multiple instance

- SVM: support vector machine

- SMM: support measure machine

- OR: ordinal regression

- OVA: one-vs-all

- MIP: mixed integer programming

- QP: quadratic programming

- SVOR: support vector ordinal regression

- EXC: explicit constraints

- SMO: sequential minimal optimization

### References 

[1] Kent, S., & Yu, M. (2022+). Ordinal multiple instance support vector machines. *In prep.*

[2] [Kent, S., & Yu, M. (2022)][p2]. Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment. *arXiv preprint arXiv:2206.14704.*

[3] Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. *Advances in neural information processing systems, 15.*

[4] Xiao, Y., Liu, B., & Hao, Z. (2017). Multiple-instance ordinal regression. *IEEE Transactions on Neural Networks and Learning Systems*, *29*(9), 4398-4413.

[5] Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. *Neural computation*, *19*(3), 792-815.

[6] Muandet, K., Fukumizu, K., Dinuzzo, F., & Schölkopf, B. (2012). Learning from distributions via support measure machines. *Advances in neural information processing systems*, *25*.

[p2]: https://arxiv.org/abs/2206.14704

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/skent259/mildsvm

Awesome Lists containing this project

README