{"id":24630728,"url":"https://github.com/arrrrrmin/ranking-metrics","last_synced_at":"2025-03-20T05:42:08.903Z","repository":{"id":190416611,"uuid":"676864051","full_name":"arrrrrmin/ranking-metrics","owner":"arrrrrmin","description":"A repository to understand ranking metrics as described by Musgrave et al. (2020). Plus some other metrics utilising confidence values.","archived":false,"fork":false,"pushed_at":"2023-09-18T17:50:20.000Z","size":2015,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-15T03:27:43.528Z","etag":null,"topics":["erc","error-reject-curve","learning-project","mapr","metrics","ranking-metrics","recall"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arrrrrmin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-10T07:27:52.000Z","updated_at":"2024-10-29T23:39:41.000Z","dependencies_parsed_at":"2023-08-24T15:40:01.861Z","dependency_job_id":null,"html_url":"https://github.com/arrrrrmin/ranking-metrics","commit_stats":null,"previous_names":["arrrrrmin/ranking-metrics"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrrrrmin%2Franking-metrics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrrrrmin%2Franking-metrics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrrrrmin%2Franking-metrics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arrrrrmin%2Franking-metrics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arrrrrmin","download_url":"https://codeload.github.com/arrrrrmin/ranking-metrics/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244560384,"owners_count":20472218,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["erc","error-reject-curve","learning-project","mapr","metrics","ranking-metrics","recall"],"created_at":"2025-01-25T07:12:55.372Z","updated_at":"2025-03-20T05:42:08.879Z","avatar_url":"https://github.com/arrrrrmin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ranking-metrics\n\n\u003e A repository to understand ranking metrics as described by Musgrave et al. (2020).\n\nResources used:\n* [faiss getting started guide](https://github.com/facebookresearch/faiss/wiki/Getting-started)\n* [probabilistic-embeddings](https://github.com/tinkoff-ai/probabilistic-embeddings)\n\n## Basics metrics\n\n$R-Precision = \\frac{r}{R}$, where R is the number of nearest neighbor embeddings for a query\nand r is the number of embeddings actually corresponding to the queries class.\n\n$$ MAP@R = \\frac{1}{R} \\sum_{i=1}^{R}{P(i)},\\ where\\ P(i) = \\begin{cases} precision\\ at\\ i, \u0026 if\\ the\\ ith\\ retrieval\\ is\\ correct \\\\\n 0, \u0026 \\text{otherwise} \\end{cases} $$\n\nOther ranking metrics are described in \n[Assessing ranking metrics in top-N recommendation](https://link.springer.com/article/10.1007/s10791-020-09377-x),\nby Valcarce et al. (2020). These are largely not used but give a good introduction into established metrics for\nranking metrics. In this repository we only use *recall@k*, since it is useful for combinations with confidence\nvalues, like *ERC* (Error vs. Reject Curve). Professional researchers additionally use *MAP@R* in \ncombination with *ERC*.\n\n$$ Recall@k = \\frac{|L_{k}\\cap R|}{R} \\\\ L_{k} = List\\ of\\ top\\ k,\\ R=Recommendation $$\n\nThe following examples replicate the toy example of Musgrave et al. in \n[*A Metric Learning Reality Check*](https://arxiv.org/abs/2003.08505). \nPlots are generated by running the tests in \n[test_reality_check.py](tests/test_reality_check.py). Examples show how *MAP@R* \nis rewarding well clusterd embedding spaces. \n\n![Example 1](figures/reality_check_example1.png)\n\n![Example 2](figures/reality_check_example2.png)\n\n![Example 3](figures/reality_check_example3.png)\n\nThe code for calculating the metrics can be found in \n[embed_metrics.py](src/ranking_metrics/embed_metrics.py) and thanks to faiss it's \nnot as long. Faiss takes care of finding the nearest neighbors for a query.\n\n## Additional metrics using confidences\n\nAs mentioned above *recall@1* and *MAP@R* can be used to see the effekt of model confidences or\nuncertainties. The assumption stated is: *If a model can properly predict confidence values on \nambigious inputs excluding low confidence values will increase the metric.* This is can be proven\nby using *Error vs. rejct curve* (ERC).\n\nThe following example shows 3 different metrics, all using confidences as indicator for the \nembedding spaces cluster quality.\n\n![Example 2 with simulated confidences](figures/confidence_example2.png)\n\nThe above plots opacity is corresponding to the simulated confidences. A few errors where injected,\nfor which we can controll the confidence and see how scores behave when we change confidences for\nthese erronenous samples.\n\n![Confidence metrics](figures/confidence_recall_metrics.png)\n\nAs we increase the models confidence on the x axis the scores drop and errors increase, because\nthe model provides increasing confidence on erronenous samples. The rest of confidences is keept the same.\nFor more detail on the illustrations see tests in [test_uncertainty.py](tests/test_uncertainty.py).\n\nPlease note that *Confidence vs Recall@1* only works with confidences in probabilistic ranges $ c \\in [0, 1] $.\n*ERCs* will still work, since they just sort the by confidence (regardless of range). \n\n## Ground truth $\\sigma$\n\nIn probabilistic embeddings $\\sigma$ is often used as the confidence value for a prediction. Either \nit's learned or it's predicted implicitly from the $\\mu$ embedding (e.g. l2-norm of $\\mu$) \n([Scott et al. 2021](https://arxiv.org/abs/2103.15718)). Common strategies of creating $\\sigma$-targets\ninclude ambiguity through label entropy or augmentation like i.e. croping or bluring images \n([Wu \u0026 Goodman 2020](https://arxiv.org/abs/2010.02038)). Often target $\\sigma$ values are created using\n$\\sigma \\in [0,1]$. It is likely that models will output values of other ranges. In these \ncases it's useful to use a rank correlation metric. Here is a little example:\n\n![Rank correlation over simulated sigmas](figures/rank_corr_over_simulated_sigmas.png)\n\nThe score is the highest at the point $x=6827.5$, since this is the mean of the corrupted confidences\nin the ground truth labels.\n\n## Credible Intervals\n\nWhen models estimate posterior distributions in the embedding space, credible intervals can be used\nto show how good models are at retrieving similar data (same class or same instance) using an interval \nover the confidence parameter $\\sigma$. This is usually done by selecting the mode of a posterior\ndistribution and measuring the highest posterior density interval (HDPI) for a value $p$ i.e. $p=0.95$\nis the 95%-CI around the mode of the posterior distribution.\n\n## Reference implementations\n\n* [powerful-benchmarker](https://github.com/KevinMusgrave/powerful-benchmarker)\n* [pytorch-metric-learning](https://github.com/KevinMusgrave/pytorch-metric-learning)\n* [probabilistic-embeddings](https://github.com/tinkoff-ai/probabilistic-embeddings/tree/main/src/probabilistic_embeddings/metrics)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farrrrrmin%2Franking-metrics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farrrrrmin%2Franking-metrics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farrrrrmin%2Franking-metrics/lists"}