{"id":24847369,"url":"https://github.com/skent259/mildsvm","last_synced_at":"2025-10-14T18:31:48.744Z","repository":{"id":38154282,"uuid":"281995013","full_name":"skent259/mildsvm","owner":"skent259","description":"Multiple Instance Learning with Distributions, SVM","archived":false,"fork":false,"pushed_at":"2025-09-05T02:14:08.000Z","size":907,"stargazers_count":3,"open_issues_count":18,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-05T04:11:40.234Z","etag":null,"topics":["distributional-data","multiple-instance-learning","ordinal","r","svm","weakly-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skent259.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-07-23T15:55:14.000Z","updated_at":"2025-09-05T02:14:11.000Z","dependencies_parsed_at":"2025-09-05T04:07:39.382Z","dependency_job_id":"78f462c0-6b7c-4cfe-a7f2-6447bd739275","html_url":"https://github.com/skent259/mildsvm","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/skent259/mildsvm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skent259%2Fmildsvm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skent259%2Fmildsvm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skent259%2Fmildsvm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skent259%2Fmildsvm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skent259","download_url":"https://codeload.github.com/skent259/mildsvm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skent259%2Fmildsvm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279017356,"owners_count":26086054,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributional-data","multiple-instance-learning","ordinal","r","svm","weakly-supervised-learning"],"created_at":"2025-01-31T11:20:04.332Z","updated_at":"2025-10-14T18:31:48.111Z","avatar_url":"https://github.com/skent259.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# mildsvm\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/mildsvm)](https://CRAN.R-project.org/package=mildsvm)\n[![R-CMD-check](https://github.com/skent259/mildsvm/workflows/R-CMD-check/badge.svg)](https://github.com/skent259/mildsvm/actions)\n[![Codecov test coverage](https://codecov.io/gh/skent259/mildsvm/branch/master/graph/badge.svg)](https://app.codecov.io/gh/skent259/mildsvm?branch=master)\n\u003c!-- badges: end --\u003e\n\nWeakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The `mildsvm` package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames. \n\nThe `mildsvm` package implements methods that cover a variety of data types, including:\n\n- ordinal and binary labels\n- weakly supervised and traditional supervised structures \n- vector-based and distributional-instance rows of data \n\nA full table of functions with references is available [below](#methods-implemented). We highlight two methods based on recent research: \n\n- `omisvm()` runs a novel OMI-SVM approach for ordinal, multiple instance (weakly supervised) data using the work of Kent and Yu (2022+)\n- `mismm()` run the MISMM approach for binary, weakly supervised data where the instances can be thought of as a matrix of draws from a distribution. This non-convex SVM approach is formalized and applied to breast cancer diagnosis based on morphological features of the tumor microenvironment in [Kent and Yu (2022)][p2].\n\n## Usage\n\nA typical MI data frame (a `mi_df`) with ordinal labels might look like this, with multiple rows of information for each of the `bag_name`s involved and a label that matches each bag: \n\n```{r ordmvnorm}\nlibrary(mildsvm)\ndata(\"ordmvnorm\")\n\nprint(ordmvnorm)\n# dplyr::distinct(ordmvnorm, bag_label, bag_name)\n```\n\n\nThe `mildsvm` package uses the familiar formula and predict methods that R uses will be familiar with. To indicate that MI data is involved, we specify the unique bag label and bag name with `mi(bag_label, bag_name) ~ predictors`:  \n\n```{r ord-example}\nfit \u003c- omisvm(mi(bag_label, bag_name) ~ V1 + V2 + V3,\n              data = ordmvnorm, \n              weights = NULL)\nprint(fit)\npredict(fit, new_data = ordmvnorm)\n```\n\nOr, if the data frame has the `mi_df` class, we can directly pass it to the function and all features will be included:\n\n```{r ord-example-2}\nfit2 \u003c- omisvm(ordmvnorm)\nprint(fit2)\n```\n\n\n## Installation\n\nYou can install the released version of mildsvm from [CRAN](https://CRAN.R-project.org) with:\n\n``` r\ninstall.packages(\"mildsvm\")\n```\n\nAlternatively, you can install the development version from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"skent259/mildsvm\")\n```\n\n## Additional Usage\n\n`mildsvm` also works well MI data with distributional instances. There is a 3-level structure with *bags*, *instances*, and *samples*.  As in MIL, *instances* are contained within *bags* (where we only observe the bag label).  However, for MILD, each instance represents a distribution, and the *samples* are drawn from this distribution.  \n\nYou can generate MILD data with `generate_mild_df()`:\n\n```{r generate_mild_df}\n# Normal(mean=0, sd=1) vs Normal(mean=3, sd=1)\nset.seed(4)\nmild_df \u003c- generate_mild_df(\n  ncov = 1, nimp_pos = 1, nimp_neg = 1, \n  positive_dist = \"mvnormal\", positive_mean = 3,\n  negative_dist = \"mvnormal\", negative_mean = 0, \n  nbag = 4,\n  ninst = 2, \n  nsample = 2\n)\nprint(mild_df)\n```\n\nYou can train a MISVM classifier using `mismm()` on the MILD data with the `mild()` formula specification:\n\n```{r message = FALSE}\nfit3 \u003c- mismm(mild(bag_label, bag_name, instance_name) ~ X1, data = mild_df, cost = 100)\n\n# summarize predictions at the bag layer\nlibrary(dplyr)\nmild_df %\u003e% \n  dplyr::bind_cols(predict(fit3, mild_df, type = \"raw\")) %\u003e% \n  dplyr::bind_cols(predict(fit3, mild_df, type = \"class\")) %\u003e% \n  dplyr::distinct(bag_label, bag_name, .pred, .pred_class)\n```\n\nIf you summarize a MILD data set (for example, by taking the mean of each covariate), you can recover a MIL data set.  Use `summarize_samples()` for this:\n\n```{r summarize_samples}\nmil_df \u003c- summarize_samples(mild_df, .fns = list(mean = mean)) \nprint(mil_df)\n```\n\nYou can train an MI-SVM classifier using `misvm()` on MIL data with the helper function `mi()`:\n\n```{r, message = FALSE, warning=FALSE}\nfit4 \u003c- misvm(mi(bag_label, bag_name) ~ mean, data = mil_df, cost = 100)\n\nprint(fit4)\n```\n\n\n\n\n### Methods implemented\n\n| Function        | Method           | Outcome/label | Data type             | Extra libraries | Reference |\n|-----------------|------------------|---------------|-----------------------|-----------------|-----------|\n| `omisvm()`      | `\"qp-heuristic\"` | ordinal       | MI                    | gurobi          | [1]       |\n| `mismm()`       | `\"heuristic\"`    | binary        | distributional MI     | ---             | [2]       |\n| `mismm()`       | `\"mip\"`          | binary        | distributional MI     | gurobi          | [2]       |\n| `mismm()`       | `\"qp-heuristic\"` | binary        | distributional MI     | gurobi          | [2]       |\n| `misvm()`       | `\"heuristic\"`    | binary        | MI                    | ---             | [3]       |\n| `misvm()`       | `\"mip\"`          | binary        | MI                    | gurobi          | [3], [2]  |\n| `misvm()`       | `\"qp-heuristic\"` | binary        | MI                    | gurobi          | [3]       |\n| `mior()`        | `\"qp-heuristic\"` | ordinal       | MI                    | gurobi          | [4]       |\n| `misvm_orova()` | `\"heuristic\"`    | ordinal       | MI                    | ---             | [3], [1]  |\n| `misvm_orova()` | `\"mip\"`          | ordinal       | MI                    | gurobi          | [3], [1]  |\n| `misvm_orova()` | `\"qp-heuristic\"` | ordinal       | MI                    | gurobi          | [3], [1]  |\n| `svor_exc()`    | `\"smo\"`          | ordinal       | vector                | ---             | [5]       |\n| `smm()`         | ---              | binary        | distributional vector | ---             | [6]       |\n\n#### Table acronyms\n\n- MI: multiple instance\n- SVM: support vector machine\n- SMM: support measure machine\n- OR: ordinal regression\n- OVA: one-vs-all\n- MIP: mixed integer programming\n- QP: quadratic programming\n- SVOR: support vector ordinal regression\n- EXC: explicit constraints\n- SMO: sequential minimal optimization\n\n### References \n\n[1] Kent, S., \u0026 Yu, M. (2022+). Ordinal multiple instance support vector machines. *In prep.*\n\n[2] [Kent, S., \u0026 Yu, M. (2022)][p2]. Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment. *arXiv preprint arXiv:2206.14704.*\n\n[3] Andrews, S., Tsochantaridis, I., \u0026 Hofmann, T. (2002). Support vector machines for multiple-instance learning. *Advances in neural information processing systems, 15.*\n\n[4] Xiao, Y., Liu, B., \u0026 Hao, Z. (2017). Multiple-instance ordinal regression. *IEEE Transactions on Neural Networks and Learning Systems*, *29*(9), 4398-4413.\n\n[5] Chu, W., \u0026 Keerthi, S. S. (2007). Support vector ordinal regression. *Neural computation*, *19*(3), 792-815.\n\n[6] Muandet, K., Fukumizu, K., Dinuzzo, F., \u0026 Schölkopf, B. (2012). Learning from distributions via support measure machines. *Advances in neural information processing systems*, *25*.\n\n\u003c!-- Links that are re-used --\u003e\n[p2]: https://arxiv.org/abs/2206.14704\n\n\n\u003c!-- TODO: create a vignette and link --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskent259%2Fmildsvm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskent259%2Fmildsvm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskent259%2Fmildsvm/lists"}