{"id":17111320,"url":"https://github.com/nredell/rari","last_synced_at":"2026-02-17T22:02:05.607Z","repository":{"id":46083260,"uuid":"245269951","full_name":"nredell/RARI","owner":"nredell","description":"A python package which implements a distance-based extension of the adjusted Rand index for the supervised validation of 2 cluster analysis solutions","archived":false,"fork":false,"pushed_at":"2022-12-26T21:01:22.000Z","size":377,"stargazers_count":4,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-08T09:06:05.948Z","etag":null,"topics":["adjusted-rand-index","ari","cluster-analysis","cluster-validation","cluster-validity-index","ranked-adjusted-rand-index","rari","t-sne","umap"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nredell.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-05T21:19:16.000Z","updated_at":"2022-10-01T02:15:46.000Z","dependencies_parsed_at":"2023-01-31T01:45:49.217Z","dependency_job_id":null,"html_url":"https://github.com/nredell/RARI","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nredell/RARI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FRARI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FRARI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FRARI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FRARI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nredell","download_url":"https://codeload.github.com/nredell/RARI/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FRARI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29559961,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T21:50:49.831Z","status":"ssl_error","status_checked_at":"2026-02-17T21:46:15.313Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adjusted-rand-index","ari","cluster-analysis","cluster-validation","cluster-validity-index","ranked-adjusted-rand-index","rari","t-sne","umap"],"created_at":"2024-10-14T16:51:16.281Z","updated_at":"2026-02-17T22:02:05.576Z","avatar_url":"https://github.com/nredell.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)\n\n# package.rari \u003cimg src=\"./tools/rari_logo.png\" alt=\"rari logo\" align=\"right\" height=\"138.5\" style=\"display: inline-block;\"\u003e\n\n`rari` is a Python implementation of Pinto et. al's ranked adjusted Rand index (RARI) from\n[Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement](https://doi.org/10.1186/1471-2105-8-44).\nRARI is an extension of the [adjusted Rand index](https://en.wikipedia.org/wiki/Rand_index) (ARI)\nthat measures the agreement between two independent clustering solutions while incorporating distances\nbetween instances/clusters from each solution.\n\n* **RARI = 1:** Perfect agreement between cluster solutions 'A' and 'B'. Identical cluster partitions and equally\n*ranked* relative distances between clusters in cluster solutions 'A' and 'B'.\n\n* **RARI = 0:** No agreement between cluster solutions 'A' and 'B'. Only occurs when, in cluster solution\n'A', all instances are in the same cluster and, in cluster solution 'B', all instances are in their own cluster and all\nclusters are equidistant from each other.\n\nRoughly speaking, the benefit of RARI is in penalizing the ARI when a given pair of instances is close together in cluster\nsolution 'A' and far apart in cluster solution 'B'.\n\n## Lightning Example\n\n* Below is a comparison of the agreement between hierarchical and k-means clustering solutions on the iris data set. The\nsame distance matrix is used to calculate pairwise distances between each iris instance, but this is not a requirement.\n\n``` python\nfrom sklearn.datasets import load_iris\nfrom sklearn.cluster import AgglomerativeClustering, KMeans\nfrom sklearn.metrics import pairwise_distances\nfrom rari import rari\n\nX = load_iris().data\n\nmodel_1 = AgglomerativeClustering(n_clusters=3, linkage='ward')\nx = model_1.fit_predict(X)\n\nmodel_2 = KMeans(n_clusters=3)\ny = model_2.fit_predict(X)\n\ndist_x = pairwise_distances(X, metric='euclidean')\ndist_y = pairwise_distances(X, metric='euclidean')\n\nrari(x, y, dist_x, dist_y)\n```\nOut[1]: **.975**\n\n## Install\n\n* Development\n\n``` python\npip install git+https://github.com/nredell/rari\n```\n\n## Intuition\n\nBelow is Figure 1 from Pinto et. al's article which demonstrates the impact of inter-cluster distances on the RARI\nmetric as compared to, say, the ARI.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"tools/figure_1.PNG\" width=\"400px\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\n## Examples\n\n### Example 1: ARI vs. RARI, Few Clusters, High Agreement\n\n``` python\nimport numpy as np\nimport pandas as pd\nfrom sklearn.datasets import make_blobs\nfrom sklearn.cluster import AgglomerativeClustering, KMeans\nfrom sklearn.metrics import adjusted_rand_score, pairwise_distances\nfrom rari import rari\n\nX, y = make_blobs(n_samples=[50, 50, 50], n_features=2, cluster_std=1.0, center_box=(-5.0, 5.0), shuffle=True, random_state=224)\ndata = pd.DataFrame(np.hstack([X, y[:, np.newaxis]]), columns=[\"X1\", \"X2\", \"Cluster\"])\n\nmodel_1 = AgglomerativeClustering(n_clusters=3, linkage='ward')\nx = model_1.fit_predict(X)\n\nmodel_2 = KMeans(n_clusters=3)\ny = model_2.fit_predict(X)\n\ndist_x = pairwise_distances(X, metric='euclidean')\ndist_y = pairwise_distances(X, metric='euclidean')\n```\n\n![](./tools/example_1_plot_1.png) ![](./tools/example_1_plot_2.png) ![](./tools/example_1_plot_3.png)\n\n``` python\nadjusted_rand_score(x, y)\nrari(x, y, dist_x, dist_y)\n```\n**ARI:** .83\n**RARI:** .89\n\n### Example 2: ARI vs. RARI, A New Data Point\n\nThe toy 1D example below illustrates how the dynamic RARI changes as the distance between clusters changes\nwhile the static ARI remains the same.\n\nImagine that the moving data point represents a new data point added\nto the data set, at which point each of 2 models is re-run and the clusters are re-labeled. For the sake of illustration, the\nlabels for this new data point from each model are held constant through each of the 11 analyses to emphasize the impact of cluster spacing.\nIn a real problem, it's likely that the moving data point would be classified as a '2' as it approaches the yellow '2' on the\nright hand side of each plot. However, this change of labels may not even occur in a simple 2D example with a method like spectral\nclustering. And our intuitions will fail us in higher dimensions, but RARI will account for these changes in cluster orientation if so desired.\n\n![](./tools/rari_points.gif)\n\n## Implementation Details\n\nAt present, inter-cluster distances are based on the euclidean distance between pairs of instances in `dist_x` and `dist_y`.\nThat is to say, even if the input pairwise distance matrices are, for example, cosine and manhattan, the inter-cluster distance ranks\nare still based on a euclidean, complete linkage measure of these pairwise distances. This will be relaxed in the future with support\nfor additional input arguments.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnredell%2Frari","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnredell%2Frari","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnredell%2Frari/lists"}