https://github.com/hbldh/skboost

MILBoost and other boosting algorithms, compatible with scikit-learn
https://github.com/hbldh/skboost

boosting boosting-algorithms machine-learning scikit-learn

Last synced: about 1 year ago
JSON representation

MILBoost and other boosting algorithms, compatible with scikit-learn

Host: GitHub
URL: https://github.com/hbldh/skboost
Owner: hbldh
License: mit
Created: 2016-02-08T13:57:24.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2024-11-15T15:47:18.000Z (over 1 year ago)
Last Synced: 2025-03-27T21:52:06.915Z (over 1 year ago)
Topics: boosting, boosting-algorithms, machine-learning, scikit-learn
Language: Python
Size: 1.37 MB
Stars: 14
Watchers: 4
Forks: 12
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# skboost

[![Build and test](https://github.com/hbldh/skboost/actions/workflows/python-package.yml/badge.svg)](https://github.com/hbldh/skboost/actions/workflows/python-package.yml)

Boosting Algorithms compatible with [scikit-learn](http://scikit-learn.org/).

## Boosting Algorithms

The `skboost` package contains implementations of some boosting algorithms that
are outside the scope of [scikit-learn](http://scikit-learn.org/).

The main point of interest is the MILBoost algorithm, which performs boosting
with a Multiple Instance Learning formulation.

### MILBoost

See \[1\], \[2\] and \[4\].

### GentleBoost

See \[3\].

### LogitBoost

See \[3\].

## Datasets

This repository includes a vendored copy of the MUSK datasets (\[5\]), both version 1 and version 2.
These are used for multiple instance learning benchmarks:

> This dataset describes a set of 92 molecules of which 47 are judged by human experts
> to be musks and the remaining 45 molecules are judged to be non-musks. The goal is
> to learn to predict whether new molecules will be musks or non-musks. However, the 166
> features that describe these molecules depend upon the exact shape, or conformation,
> of the molecule. Because bonds can rotate, a single molecule can adopt many different
> shapes. To generate this data set, the low-energy conformations of the molecules were
> generated and then filtered to remove highly similar conformations. This left 476
> conformations. Then, a feature vector was extracted that describes each conformation.
>
> This many-to-one relationship between feature vectors and molecules is
> called the "multiple instance problem". When learning a classifier for this data,
> the classifier should classify a molecule as "musk" if ANY of its conformations is
> classified as a musk. A molecule should be classified as "non-musk" if NONE of its
> conformations is classified as a musk.

## References

\[1\] [B. Babenko, P. Dollar, Z. Tu, and S. Belongie. Simultaneous learning
and alignment: Multi-instance and multi-pose learning. In Faces in
Real-Life Images, October 2008.](http://vision.ucsd.edu/~pdollar/research/papers/BabenkoEtAlECCV08simul.pdf)

\[2\] [Babenko, B.; Ming-Hsuan Yang; Belongie, S., "Robust Object Tracking
with Online Multiple Instance Learning," in Pattern Analysis and Machine
Intelligence, IEEE Transactions on , vol.33, no.8, pp.1619-1632, Aug. 2011
doi: 10.1109/TPAMI.2010.226](http://vision.ucsd.edu/~bbabenko/data/miltrack-pami-final.pdf)

\[3\] [Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert & others (2000).
Additive logistic regression: a statistical view of
boosting (with discussion and a rejoinder by the authors).
The annals of statistics, 28, 337-407.](https://web.stanford.edu/~hastie/Papers/AdditiveLogisticRegression/alr.pdf)

\[4\] [Paul Viola, John C. Platt, and Cha Zhang. Multiple instance boosting
for object detection. In In NIPS 18, pages 1419–1426. MIT Press, 2006.](http://vision.ucsd.edu/~bbabenko/data/miltrack-pami-final.pdf)

\[5\] Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml](http://archive.ics.uci.edu/ml).
Irvine, CA: University of California, School of Information and Computer Science.

### Other references

\[6\] C.M. Bishop. Pattern recognition and machine learning. Information
science and statistics. Springer, 2006.

\[7\] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.
Cambridge University Press, March 2004.

\[8\] Thomas G. Dietterich and Richard H. Lathrop. Solving the multiple-
instance problem with axis-parallel rectangles. Artificial Intelligence,
89:31–71, 1997.

\[9\] Yoav Freund and Robert E. Schapire. A short introduction to boosting,
1999.

\[10\] Jerome H. Friedman. Stochastic gradient boosting. Computational
Statistics and Data Analysis, 38:367–378, 1999.

\[11\] Jerome H. Friedman. Greedy function approximation: A gradient
boosting machine. Annals of Statistics, 29:1189–1232, 2000.

\[12\] James D. Keeler, David E. Rumelhart, and Wee Kheng Leow. Integrated
segmentation and recognition of hand-printed numerals. In
NIPS’90, pages 557–563, 1990.

\[13\] Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean.
Boosting algorithms as gradient descent in function space, 1999.

\[14\] William H. Press, Saul A. Teukolsky, William T. Vetterling, and
Brian P. Flannery. Numerical Recipes 3rd Edition: The Art of Sci-
entific Computing. Cambridge University Press, New York, NY, USA,
3 edition, 2007.

\[15\] Vladimir N. Vapnik. The nature of statistical learning theory. Springer-
Verlag New York, Inc., New York, NY, USA, 1995.

\[16\] Paul Viola and Michael Jones. Robust real-time object detection.
International Journal of Computer Vision, 57(2):137–154, 2002.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hbldh/skboost

Awesome Lists containing this project

README