Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/maxhalford/prince
:crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA
https://github.com/maxhalford/prince
ca correspondence-analysis factor-analysis famd mca mfa multiple-correspondence-analysis multiple-factor-analysis pandas pca principal-component-analysis procrustes python scikit-learn svd
Last synced: 26 days ago
JSON representation
:crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA
- Host: GitHub
- URL: https://github.com/maxhalford/prince
- Owner: MaxHalford
- License: mit
- Created: 2016-10-22T12:36:06.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2024-09-08T06:00:12.000Z (about 2 months ago)
- Last Synced: 2024-09-28T05:02:04.847Z (about 1 month ago)
- Topics: ca, correspondence-analysis, factor-analysis, famd, mca, mfa, multiple-correspondence-analysis, multiple-factor-analysis, pandas, pca, principal-component-analysis, procrustes, python, scikit-learn, svd
- Language: Python
- Homepage: https://maxhalford.github.io/prince
- Size: 8.14 MB
- Stars: 1,255
- Watchers: 26
- Forks: 182
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
Prince is a Python library for multivariate exploratory data analysis in Python. It includes a variety of methods for summarizing tabular data, including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [correspondence analysis (CA)](https://www.wikiwand.com/en/Correspondence_analysis). Prince provides efficient implementations, using a scikit-learn API.
## Example usage
```py
>>> import prince>>> dataset = prince.datasets.load_decathlon()
>>> decastar = dataset.query('competition == "Decastar"')>>> pca = prince.PCA(n_components=5)
>>> pca = pca.fit(decastar, supplementary_columns=['rank', 'points'])
>>> pca.eigenvalues_summary
eigenvalue % of variance % of variance (cumulative)
component
0 3.114 31.14% 31.14%
1 2.027 20.27% 51.41%
2 1.390 13.90% 65.31%
3 1.321 13.21% 78.52%
4 0.861 8.61% 87.13%>>> pca.transform(dataset).tail()
component 0 1 2 3 4
competition athlete
OlympicG Lorenzo 2.070933 1.545461 -1.272104 -0.215067 -0.515746
Karlivans 1.321239 1.318348 0.138303 -0.175566 -1.484658
Korkizoglou -0.756226 -1.975769 0.701975 -0.642077 -2.621566
Uldal 1.905276 -0.062984 -0.370408 -0.007944 -2.040579
Casarsa 2.282575 -2.150282 2.601953 1.196523 -3.571794```
```py
>>> chart = pca.plot(dataset)```
This chart is interactive, which doesn't show on GitHub. The green points are the column loadings.
```py
>>> chart = pca.plot(
... dataset,
... show_row_labels=True,
... show_row_markers=False,
... row_labels_column='athlete',
... color_rows_by='competition'
... )```
## Installation
```sh
pip install prince
```🎨 Prince uses [Altair](https://altair-viz.github.io/) for making charts.
## Methods
```mermaid
flowchart TD
cat?(Categorical data?) --> |"✅"| num_too?(Numerical data too?)
num_too? --> |"✅"| FAMD
num_too? --> |"❌"| multiple_cat?(More than two columns?)
multiple_cat? --> |"✅"| MCA
multiple_cat? --> |"❌"| CA
cat? --> |"❌"| groups?(Groups of columns?)
groups? --> |"✅"| MFA
groups? --> |"❌"| shapes?(Analysing shapes?)
shapes? --> |"✅"| GPA
shapes? --> |"❌"| PCA
```### [Principal component analysis (PCA)](https://maxhalford.github.io/prince/pca)
### [Correspondence analysis (CA)](https://maxhalford.github.io/prince/ca)
### [Multiple correspondence analysis (MCA)](https://maxhalford.github.io/prince/mca)
### [Multiple factor analysis (MFA)](https://maxhalford.github.io/prince/mfa)
### [Factor analysis of mixed data (FAMD)](https://maxhalford.github.io/prince/famd)
### [Generalized procrustes analysis (GPA)](https://maxhalford.github.io/prince/gpa)
## Correctness
Prince is tested against scikit-learn and [FactoMineR](http://factominer.free.fr/). For the latter, [rpy2](https://rpy2.github.io/) is used to run code in R, and convert the results to Python, which allows running automated tests. See more in the [`tests`](/tests/) directory.
## Citation
Please use this citation if you use this software as part of a scientific publication.
```bibtex
@software{Halford_Prince,
author = {Halford, Max},
license = {MIT},
title = {{Prince}},
url = {https://github.com/MaxHalford/prince}
}
```## Support
I made Prince when I was at university, back in 2016. I've had very little time over the years to maintain this package. I spent a significant amount of time in 2022 to revamp the entire package. Prince has now been downloaded over [1 million times](https://pepy.tech/project/prince). I would be grateful to anyone willing to [sponsor](https://github.com/sponsors/MaxHalford) me. Sponsorships allow me to spend more time working on open source software, including Prince.
## License
The MIT License (MIT). Please see the [license file](LICENSE) for more information.