https://github.com/andrewwango/femda
FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data. Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.
https://github.com/andrewwango/femda
20newsgroup classification discriminant-analysis em-algorithm fashion-mnist linear-discriminant-analysis machine-learning quadratic-discriminant-analysis robust-estimation robust-statistics
Last synced: 10 months ago
JSON representation
FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data. Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.
- Host: GitHub
- URL: https://github.com/andrewwango/femda
- Owner: Andrewwango
- License: mit
- Created: 2021-03-22T09:43:04.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2022-09-06T13:17:59.000Z (over 3 years ago)
- Last Synced: 2025-02-02T17:30:13.874Z (12 months ago)
- Topics: 20newsgroup, classification, discriminant-analysis, em-algorithm, fashion-mnist, linear-discriminant-analysis, machine-learning, quadratic-discriminant-analysis, robust-estimation, robust-statistics
- Language: Python
- Homepage:
- Size: 16.6 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data
Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.
Code for the paper on [IEEE](https://ieeexplore.ieee.org/document/9747576) and [arXiv](https://arxiv.org/abs/2201.02967).
### Authors
Andrew Wang, University of Cambridge, Cambridge, UK
Pierre Houdouin, CentraleSupélec, Paris, France
## Instllation
`pip install -i https://test.pypi.org/simple/ femda`
## Get started
```python
>>> from sklearn.datasets import load_iris
>>> from femda import FEMDA
>>> X, y = load_iris(return_X_y=True)
>>> clf = FEMDA()
>>> clf.fit(X, y)
FEMDA()
>>> clf.score(X, y)
0.9666666666666667
```
Using a specific dataset...
```python
>>> import femda.experiments.preprocessing as pre
>>> X_train, y_train, X_test, y_test = pre.statlog(r"root\datasets\\")
>>> FEMDA().fit(X_train, y_train).score(X_test, y_test)
...
```
Using a `sklearn.pipeline.Pipeline`...
```python
>>> from sklearn.datasets import load_digits
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.decomposition import PCA
>>> X, y = load_digits(return_X_y=True)
>>> pipe = make_pipeline(PCA(n_components=5), FEMDA()).fit(X, y)
>>> pipe.predict(X)
...
```
## Run all experiments presented in the paper
```python
>>> from femda.experiments import run_experiments()
>>> run_experiments()
...
```
See  for more.
## Abstract
Linear and Quadraic Discriminant Analysis are well-known classical methods but suffer heavily from non-Gaussian class distributions and are very non-robust in contaminated datasets. In this paper, we present a new discriminant analysis style classification algorithm that directly models noise and diverse shapes which can deal with a wide range of datasets.
Each data point is modelled by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter, modelling directly very heterogeneous, non-i.i.d datasets. We show that maximum-likelihood parameter estimation and classification are simple and fast under this model.
We highlight the flexibility of the model to a wide range of Elliptically Symmetrical distribution shapes and varying levels of contamination in synthetic datasets. Then, we show that our algorithm outperforms other robust methods on contaminated datasets from Computer Vision and NLP.