An open API service indexing awesome lists of open source software.

https://github.com/skent259/omisvm-sims

Simulation code for "Ordinal Multiple Instance Support Vector Machines"
https://github.com/skent259/omisvm-sims

multiple-instance-learning ordinal r reproducible-research simulation svm weakly-supervised-learning

Last synced: about 1 year ago
JSON representation

Simulation code for "Ordinal Multiple Instance Support Vector Machines"

Awesome Lists containing this project

README

          

# Simulations for Ordinal Multiple Instance Support Vector Machines

Striving for reproducibility, this repo details the simulations for *Ordinal Multiple Instance Support Vector Machines* by Sean Kent and Menggang Yu. This allows for recreation of all figures and tables used in the paper, except in the case of proprietary data. Full details and descriptions are available in the paper.

As a quick reference for getting started:

* The experiments can be run via the `run.sh` file, however this will take considerable CPU time. We recommend that you run in a high-throughput environment in batches (see `run.sh` for inspiration)
* Figures from the paper can be recreated by running `analysis/create-all-figures.R`.
* Data (open-source) can be downloaded by running `data/build-data.sh`.
* Simulation source code can be found in the `sim/` directory.
* The `condor/` directory can be ignored. It was used to run simulations in the HTCondor (high-throughput) environment.

## Methods Compared

The main model code relies on the `mildsvm` package (v0.4.0) in R, which can be found here: https://github.com/skent259/mildsvm

| Method name | Function | parameters | Reference |
| ------------------ | ---------------------- | ----------------------------------------------- | --------- |
| OMI-SVM (proposed) | `mildsvm::omisvm` | `method = "qp-heuristic"` | [1] |
| MI-SVM (OVA) | `mildsvm::misvm_orova` | `method = "qp-heuristic"` | [2], [3] |
| SI-SVOREXC | `mildsvm::svor_exc` | `method = "smo"` | [4], [5] |
| MIOR (as written) | `mildsvm::mior` | `method = "qp-heuristic", option = "xiao"` | [6] |
| MIOR (corrected) | `mildsvm::mior` | `method = "qp-heuristic", option = "corrected"` | [6] |

## Experiments

| Data set | Experiment type | Name | ID | Section |
| --------------------- | -------------------- | --------- | --- | ---------- |
| Amazon review ratings | Performance | per-amrev | 1.0 | Evaluation |
| tissue micro-array | Performance | per-tma | 2.0 | Evaluation |
| IMDB movie reviews | Training data size | size-imdb | 3.0 | Evaluation |
| SWD | Training data size | size-swd | 4.0 | Appendix |
| winequality-red | Training data size | size-wq | 5.0 | Appendix |
| CAR | Witness rate changes | wr-car | 6.0 | Appendix |
| ERA | Witness rate changes | wr-era | 7.0 | Appendix |

Most data from the experiments is available from open-source repositories:

* Amazon review ratings: (Wang et al. 2011)
* IMDB movie reviews: (Maas et al. 2011)
* winequality-red: (Cortez et al. 2009)
* SWD:
* CAR: (Bohanec and Rajkovič, 1988; Dua and Graff, 2017)
* ERA:

The TMA data set is not publicly available due to privacy restrictions. It can be made available from the authors.

## References

[1] Kent, S., & Yu, M. (2023+). Ordinal multiple instance support vector machines. *In prep*.

[2] Andrews, S., Tsochantaridis, I., & Hofmann, T. (2003). Support vector machines for multiple-instance learning. *Advances in Neural Information Processing Systems* *15*, 577–584).

[4] Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multiclass support vector machines. *IEEE Transactions on Neural Networks*, *13*(2), 415–425. https://doi.org/10.1109/72.991427

[4] Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. *Neural Computation*, *19*(3), 792–815. https://doi.org/10.1162/neco.2007.19.3.792

[5] Alpaydın, E., Cheplygina, V., Loog, M., & Tax, D. M. J. (2015). Single- vs. Multiple-instance classification. *Pattern Recognition*, *48*(9), 2831–2838. https://doi.org/10.1016/j.patcog.2015.04.006

[6] Xiao, Y., Liu, B., & Hao, Z. (2018). Multiple-instance ordinal regression. *IEEE Transactions on Neural Networks and Learning Systems*, *29*(9), 4398–4413. https://doi.org/10.1109/TNNLS.2017.2766164

[7] Wang, H., Wang, C., Zhai, C., & Han, J. (2011). Learning online discussion structures by conditional random fields. *Proceedings of the 34th International Acm Sigir Conference on Research and Development in Information Retrieval*, 435–444. https://doi.org/10.1145/2009916.2009976

[8] Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. *Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies*, 142–150.

[9] Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. *Decision Support Systems*, 47(4), 547–553. https://doi.org/10.1016/j.dss.2009.05.016

[10] Bohanec, M., & Rajkovič, V. (1988). Knowledge acquisition and explanation for multi-attribute decision. *8th International Workshop on Expert Systems and Their Applications*, 1, 59–78.

[11] Dua, D., & Graff, C. (2017). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml