https://github.com/andrewwango/femda

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data. Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.
https://github.com/andrewwango/femda

20newsgroup classification discriminant-analysis em-algorithm fashion-mnist linear-discriminant-analysis machine-learning quadratic-discriminant-analysis robust-estimation robust-statistics

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/andrewwango/femda
Owner: Andrewwango
License: mit
Created: 2021-03-22T09:43:04.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2022-09-06T13:17:59.000Z (almost 4 years ago)
Last Synced: 2025-02-02T17:30:13.874Z (over 1 year ago)
Topics: 20newsgroup, classification, discriminant-analysis, em-algorithm, fashion-mnist, linear-discriminant-analysis, machine-learning, quadratic-discriminant-analysis, robust-estimation, robust-statistics
Language: Python
Homepage:
Size: 16.6 MB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.

Code for the paper on [IEEE](https://ieeexplore.ieee.org/document/9747576) and [arXiv](https://arxiv.org/abs/2201.02967).

### Authors

Andrew Wang, University of Cambridge, Cambridge, UK

Pierre Houdouin, CentraleSupélec, Paris, France

## Instllation

`pip install -i https://test.pypi.org/simple/ femda`

## Get started

```python

>>> from sklearn.datasets import load_iris

>>> from femda import FEMDA

>>> X, y = load_iris(return_X_y=True)

>>> clf = FEMDA()

>>> clf.fit(X, y)

FEMDA()

>>> clf.score(X, y)

0.9666666666666667

```

Using a specific dataset...

```python

>>> import femda.experiments.preprocessing as pre

>>> X_train, y_train, X_test, y_test = pre.statlog(r"root\datasets\\")

>>> FEMDA().fit(X_train, y_train).score(X_test, y_test)

...

```

Using a `sklearn.pipeline.Pipeline`...

```python

>>> from sklearn.datasets import load_digits

>>> from sklearn.pipeline import make_pipeline

>>> from sklearn.decomposition import PCA

>>> X, y = load_digits(return_X_y=True)

>>> pipe = make_pipeline(PCA(n_components=5), FEMDA()).fit(X, y)

>>> pipe.predict(X)

...

```

## Run all experiments presented in the paper

```python

>>> from femda.experiments import run_experiments()

>>> run_experiments()

...

```

See ![demo.ipynb](demo.ipynb) for more.

## Abstract

Linear and Quadraic Discriminant Analysis are well-known classical methods but suffer heavily from non-Gaussian class distributions and are very non-robust in contaminated datasets. In this paper, we present a new discriminant analysis style classification algorithm that directly models noise and diverse shapes which can deal with a wide range of datasets. 

Each data point is modelled by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter, modelling directly very heterogeneous, non-i.i.d datasets. We show that maximum-likelihood parameter estimation and classification are simple and fast under this model.

We highlight the flexibility of the model to a wide range of Elliptically Symmetrical distribution shapes and varying levels of contamination in synthetic datasets. Then, we show that our algorithm outperforms other robust methods on contaminated datasets from Computer Vision and NLP.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andrewwango/femda

Awesome Lists containing this project

README