{"id":14958240,"url":"https://github.com/maxhalford/prince","last_synced_at":"2025-04-11T03:32:55.127Z","repository":{"id":38325851,"uuid":"71637984","full_name":"MaxHalford/prince","owner":"MaxHalford","description":":crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA","archived":false,"fork":false,"pushed_at":"2025-03-09T22:06:43.000Z","size":8747,"stargazers_count":1337,"open_issues_count":0,"forks_count":184,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-04-03T20:41:47.705Z","etag":null,"topics":["ca","correspondence-analysis","factor-analysis","famd","mca","mfa","multiple-correspondence-analysis","multiple-factor-analysis","pandas","pca","principal-component-analysis","procrustes","python","scikit-learn","svd"],"latest_commit_sha":null,"homepage":"https://maxhalford.github.io/prince","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaxHalford.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"MaxHalford"}},"created_at":"2016-10-22T12:36:06.000Z","updated_at":"2025-04-02T06:54:21.000Z","dependencies_parsed_at":"2023-12-21T13:30:42.176Z","dependency_job_id":"49b200ae-fc3c-4ab3-87d0-4e1f6eaf94c5","html_url":"https://github.com/MaxHalford/prince","commit_stats":{"total_commits":396,"total_committers":15,"mean_commits":26.4,"dds":0.09848484848484851,"last_synced_commit":"c66cbda1fb7014014fce7f18ec6f2a3f996a377e"},"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fprince","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fprince/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fprince/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxHalford%2Fprince/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaxHalford","download_url":"https://codeload.github.com/MaxHalford/prince/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248335649,"owners_count":21086633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ca","correspondence-analysis","factor-analysis","famd","mca","mfa","multiple-correspondence-analysis","multiple-factor-analysis","pandas","pca","principal-component-analysis","procrustes","python","scikit-learn","svd"],"created_at":"2024-09-24T13:16:34.850Z","updated_at":"2025-04-11T03:32:55.103Z","avatar_url":"https://github.com/MaxHalford.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/static/images/logo.png\" alt=\"prince_logo\" width=\"80%\" /\u003e\n\u003c/div\u003e\n\n\u003cbr/\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003c!-- Documentation --\u003e\n  \u003ca href=\"https://maxhalford.github.io/prince\"\u003e\n    \u003cimg src=\"https://img.shields.io/website?label=docs\u0026style=flat-square\u0026url=https://maxhalford.github.io/prince\" alt=\"documentation\"\u003e\n  \u003c/a\u003e\n  \u003c!-- PyPi --\u003e\n  \u003ca href=\"https://pypi.org/project/prince/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/prince.svg\" alt=\"pypi\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- PePy --\u003e\n  \u003ca href=\"https://pepy.tech/project/prince\"\u003e\n    \u003cimg src=\"https://static.pepy.tech/badge/prince\" alt=\"pepy\"\u003e\n  \u003c/a\u003e\n  \u003c!-- PePy by month --\u003e\n  \u003ca href=\"https://pepy.tech/project/prince\"\u003e\n    \u003cimg src=\"https://static.pepy.tech/badge/prince/month\" alt=\"pepy_month\"\u003e\n  \u003c/a\u003e\n  \u003c!-- Unit tests --\u003e\n  \u003ca href=\"https://github.com/MaxHalford/prince/actions/workflows/unit-tests.yml\"\u003e\n    \u003cimg src=\"https://github.com/MaxHalford/prince/actions/workflows/unit-tests.yml/badge.svg\" alt=\"Unit tests\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- Code quality --\u003e\n  \u003ca href=\"https://github.com/MaxHalford/prince/actions/workflows/code-quality.yml\"\u003e\n    \u003cimg src=\"https://github.com/MaxHalford/prince/actions/workflows/code-quality.yml/badge.svg\" alt=\"Code quality\" /\u003e\n  \u003c/a\u003e\n  \u003c!-- License --\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\n    \u003cimg src=\"http://img.shields.io/:license-mit-ff69b4.svg\" alt=\"license\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cbr/\u003e\n\nPrince is a Python library for multivariate exploratory data analysis in Python. It includes a variety of methods for summarizing tabular data, including [principal component analysis (PCA)](https://www.wikiwand.com/en/Principal_component_analysis) and [correspondence analysis (CA)](https://www.wikiwand.com/en/Correspondence_analysis). Prince provides efficient implementations, using a scikit-learn API.\n\nI made Prince when I was at university, back in 2016. I spent a significant amount of time in 2022 to revamp the entire package. It is thoroughly tested and supports many features, such as supplementary row/columns, as well as row/column weights.\n\n## Example usage\n\n```py\n\u003e\u003e\u003e import prince\n\n\u003e\u003e\u003e dataset = prince.datasets.load_decathlon()\n\u003e\u003e\u003e decastar = dataset.query('competition == \"Decastar\"')\n\n\u003e\u003e\u003e pca = prince.PCA(n_components=5)\n\u003e\u003e\u003e pca = pca.fit(decastar, supplementary_columns=['rank', 'points'])\n\u003e\u003e\u003e pca.eigenvalues_summary\n          eigenvalue % of variance % of variance (cumulative)\ncomponent\n0              3.114        31.14%                     31.14%\n1              2.027        20.27%                     51.41%\n2              1.390        13.90%                     65.31%\n3              1.321        13.21%                     78.52%\n4              0.861         8.61%                     87.13%\n\n\u003e\u003e\u003e pca.transform(dataset).tail()\ncomponent                       0         1         2         3         4\ncompetition athlete\nOlympicG    Lorenzo      2.070933  1.545461 -1.272104 -0.215067 -0.515746\n            Karlivans    1.321239  1.318348  0.138303 -0.175566 -1.484658\n            Korkizoglou -0.756226 -1.975769  0.701975 -0.642077 -2.621566\n            Uldal        1.905276 -0.062984 -0.370408 -0.007944 -2.040579\n            Casarsa      2.282575 -2.150282  2.601953  1.196523 -3.571794\n\n```\n\n```py\n\u003e\u003e\u003e chart = pca.plot(dataset)\n\n```\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"figures/decastar.svg\" width=\"74%\" /\u003e\n  \u003cp\u003e\n    \u003ci\u003eThis chart is interactive, which doesn't show on GitHub. The green points are the column loadings.\u003c/i\u003e\n  \u003cp\u003e\n\u003c/div\u003e\n\n```py\n\u003e\u003e\u003e chart = pca.plot(\n...     dataset,\n...     show_row_labels=True,\n...     show_row_markers=False,\n...     row_labels_column='athlete',\n...     color_rows_by='competition'\n... )\n\n```\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"figures/decastar_bis.svg\" width=\"74%\" /\u003e\n\u003c/div\u003e\n\n## Installation\n\n```sh\npip install prince\n```\n\n🎨 Prince uses [Altair](https://altair-viz.github.io/) for making charts.\n\n## Methods\n\n```mermaid\nflowchart TD\n    cat?(Categorical data?) --\u003e |\"✅\"| num_too?(Numerical data too?)\n    num_too? --\u003e |\"✅\"| FAMD\n    num_too? --\u003e |\"❌\"| multiple_cat?(More than two columns?)\n    multiple_cat? --\u003e |\"✅\"| MCA\n    multiple_cat? --\u003e |\"❌\"| CA\n    cat? --\u003e |\"❌\"| groups?(Groups of columns?)\n    groups? --\u003e |\"✅\"| MFA\n    groups? --\u003e |\"❌\"| shapes?(Analysing shapes?)\n    shapes? --\u003e |\"✅\"| GPA\n    shapes? --\u003e |\"❌\"| PCA\n```\n\n### [Principal component analysis (PCA)](https://maxhalford.github.io/prince/pca)\n\n### [Correspondence analysis (CA)](https://maxhalford.github.io/prince/ca)\n\n### [Multiple correspondence analysis (MCA)](https://maxhalford.github.io/prince/mca)\n\n### [Multiple factor analysis (MFA)](https://maxhalford.github.io/prince/mfa)\n\n### [Factor analysis of mixed data (FAMD)](https://maxhalford.github.io/prince/famd)\n\n### [Generalized procrustes analysis (GPA)](https://maxhalford.github.io/prince/gpa)\n\n## Correctness\n\nPrince is tested against scikit-learn and [FactoMineR](http://factominer.free.fr/). For the latter, [rpy2](https://rpy2.github.io/) is used to run code in R, and convert the results to Python, which allows running automated tests. See more in the [`tests`](/tests/) directory.\n\n## Citation\n\nPlease use this citation if you use this software as part of a scientific publication.\n\n```bibtex\n@software{Halford_Prince,\n    author = {Halford, Max},\n    license = {MIT},\n    title = {{Prince}},\n    url = {https://github.com/MaxHalford/prince}\n}\n```\n\n## License\n\nThe MIT License (MIT). Please see the [license file](LICENSE) for more information.\n","funding_links":["https://github.com/sponsors/MaxHalford"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhalford%2Fprince","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxhalford%2Fprince","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhalford%2Fprince/lists"}