Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/maxhalford/prince

:crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA
https://github.com/maxhalford/prince

ca correspondence-analysis factor-analysis famd mca mfa multiple-correspondence-analysis multiple-factor-analysis pandas pca principal-component-analysis procrustes python scikit-learn svd

Last synced: 26 days ago
JSON representation

:crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA

Awesome Lists containing this project

README

        


prince_logo





documentation



pypi



pepy



pepy_month



Unit tests



Code quality



license


Prince is a Python library for multivariate exploratory data analysis in Python. It includes a variety of methods for summarizing tabular data, including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [correspondence analysis (CA)](https://www.wikiwand.com/en/Correspondence_analysis). Prince provides efficient implementations, using a scikit-learn API.

## Example usage

```py
>>> import prince

>>> dataset = prince.datasets.load_decathlon()
>>> decastar = dataset.query('competition == "Decastar"')

>>> pca = prince.PCA(n_components=5)
>>> pca = pca.fit(decastar, supplementary_columns=['rank', 'points'])
>>> pca.eigenvalues_summary
eigenvalue % of variance % of variance (cumulative)
component
0 3.114 31.14% 31.14%
1 2.027 20.27% 51.41%
2 1.390 13.90% 65.31%
3 1.321 13.21% 78.52%
4 0.861 8.61% 87.13%

>>> pca.transform(dataset).tail()
component 0 1 2 3 4
competition athlete
OlympicG Lorenzo 2.070933 1.545461 -1.272104 -0.215067 -0.515746
Karlivans 1.321239 1.318348 0.138303 -0.175566 -1.484658
Korkizoglou -0.756226 -1.975769 0.701975 -0.642077 -2.621566
Uldal 1.905276 -0.062984 -0.370408 -0.007944 -2.040579
Casarsa 2.282575 -2.150282 2.601953 1.196523 -3.571794

```

```py
>>> chart = pca.plot(dataset)

```




This chart is interactive, which doesn't show on GitHub. The green points are the column loadings.




```py
>>> chart = pca.plot(
... dataset,
... show_row_labels=True,
... show_row_markers=False,
... row_labels_column='athlete',
... color_rows_by='competition'
... )

```



## Installation

```sh
pip install prince
```

🎨 Prince uses [Altair](https://altair-viz.github.io/) for making charts.

## Methods

```mermaid
flowchart TD
cat?(Categorical data?) --> |"✅"| num_too?(Numerical data too?)
num_too? --> |"✅"| FAMD
num_too? --> |"❌"| multiple_cat?(More than two columns?)
multiple_cat? --> |"✅"| MCA
multiple_cat? --> |"❌"| CA
cat? --> |"❌"| groups?(Groups of columns?)
groups? --> |"✅"| MFA
groups? --> |"❌"| shapes?(Analysing shapes?)
shapes? --> |"✅"| GPA
shapes? --> |"❌"| PCA
```

### [Principal component analysis (PCA)](https://maxhalford.github.io/prince/pca)

### [Correspondence analysis (CA)](https://maxhalford.github.io/prince/ca)

### [Multiple correspondence analysis (MCA)](https://maxhalford.github.io/prince/mca)

### [Multiple factor analysis (MFA)](https://maxhalford.github.io/prince/mfa)

### [Factor analysis of mixed data (FAMD)](https://maxhalford.github.io/prince/famd)

### [Generalized procrustes analysis (GPA)](https://maxhalford.github.io/prince/gpa)

## Correctness

Prince is tested against scikit-learn and [FactoMineR](http://factominer.free.fr/). For the latter, [rpy2](https://rpy2.github.io/) is used to run code in R, and convert the results to Python, which allows running automated tests. See more in the [`tests`](/tests/) directory.

## Citation

Please use this citation if you use this software as part of a scientific publication.

```bibtex
@software{Halford_Prince,
author = {Halford, Max},
license = {MIT},
title = {{Prince}},
url = {https://github.com/MaxHalford/prince}
}
```

## Support

I made Prince when I was at university, back in 2016. I've had very little time over the years to maintain this package. I spent a significant amount of time in 2022 to revamp the entire package. Prince has now been downloaded over [1 million times](https://pepy.tech/project/prince). I would be grateful to anyone willing to [sponsor](https://github.com/sponsors/MaxHalford) me. Sponsorships allow me to spend more time working on open source software, including Prince.

## License

The MIT License (MIT). Please see the [license file](LICENSE) for more information.