{"id":13569529,"url":"https://github.com/jacobgil/confidenceinterval","last_synced_at":"2025-10-14T16:01:45.582Z","repository":{"id":139916796,"uuid":"612653565","full_name":"jacobgil/confidenceinterval","owner":"jacobgil","description":"The long missing library for python confidence intervals","archived":false,"fork":false,"pushed_at":"2024-05-24T04:03:04.000Z","size":55,"stargazers_count":137,"open_issues_count":8,"forks_count":17,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-31T06:02:32.702Z","etag":null,"topics":["data-science","machine-learning","metrics","statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacobgil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-11T15:34:10.000Z","updated_at":"2025-02-19T08:22:19.000Z","dependencies_parsed_at":"2023-07-11T01:15:10.610Z","dependency_job_id":"f24ffc7a-81c5-4967-9985-bdc041002f99","html_url":"https://github.com/jacobgil/confidenceinterval","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobgil%2Fconfidenceinterval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobgil%2Fconfidenceinterval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobgil%2Fconfidenceinterval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobgil%2Fconfidenceinterval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacobgil","download_url":"https://codeload.github.com/jacobgil/confidenceinterval/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247618068,"owners_count":20967721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","machine-learning","metrics","statistics"],"created_at":"2024-08-01T14:00:41.061Z","updated_at":"2025-10-14T16:01:45.488Z","avatar_url":"https://github.com/jacobgil.png","language":"Python","funding_links":[],"categories":["Uncategorized","Resources"],"sub_categories":["Uncategorized",":microscope: Experiments \u0026 Analysis"],"readme":"# The long missing python library for confidence intervals\n![logo](logo.png)\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![Build Status](https://github.com/jacobgil/confidenceinterval/workflows/Tests/badge.svg)\n[![Downloads](https://static.pepy.tech/personalized-badge/confidenceinterval?period=month\u0026units=international_system\u0026left_color=black\u0026right_color=brightgreen\u0026left_text=Monthly%20Downloads)](https://pepy.tech/project/confidenceinterval)\n[![Downloads](https://static.pepy.tech/personalized-badge/confidenceinterval?period=total\u0026units=international_system\u0026left_color=black\u0026right_color=blue\u0026left_text=Total%20Downloads)](https://pepy.tech/project/confidenceinterval)\n\n`pip install confidenceinterval`\n\nThis is a package that computes common machine learning metrics like F1, and returns their confidence intervals.\n\n\n⭐ Very easy to use, with the standard scikit-learn naming convention and interface.\n\n⭐ Support for many metrics, with modern confidence interval methods.\n\n⭐ The only package with analytical computation of the CI for Macro/Micro/Binary averaging F1, Precision and Recall.\n\n⭐ Support for both analytical computation of the confidence intervals, and bootstrapping methods.\n\n⭐ Easy to use interface to compute confidence intervals on new metrics that don't appear here, with bootstrapping.\n\n## The motivation\n\nA confidence interval gives you a lower and upper bound on your metric. It's affected by the sample size, and by how sensitive the metric is to changes in the data.\n\nWhen making decisions based on metrics, you should prefer narrow intervals. If the interval is wide, you can't be confident that your high performing metric is not just by luck.\n\nWhile confidence intervals are commonly used by statisticans, with many great R language implementations,\nthey are astonishingly rarely used by python users, although python took over the data science world !\n\nPart of this is because there were no simple to use python packages for this.\n\n\n## Getting started\n\n```python\n# All the possible imports:\nfrom confidenceinterval import roc_auc_score\nfrom confidence interval import precision_score, recall_score, f1_score\nfrom confidence interval import accuracy_score,\n                                ppv_score,\n                                npv_score,\n                                tpr_score,\n                                fpr_score,\n                                tnr_score\nfrom confidenceinterval.bootstrap import bootstrap_ci\n\n\n# Analytic CI:\nauc, ci = roc_auc_score(y_true,\n                        y_pred,\n                        confidence_level=0.95)\n# Bootstrap CI:\nauc, ci = roc_auc_score(y_true,\n                        y_pred,\n                        confidence_level=0.95,\n                        method='bootstrap_bca',\n                        n_resamples=5000)\n\n\n\n```\n\n## All methods do an analytical computation by default, but can do bootsrapping instead\nBy default all the methods return an analytical computation of the confidence interval (CI).\n\nFor a bootstrap computation of the CI for any of the methods belonw, just specify method='bootstrap_bca', or method='bootstrap_percentile' or method='bootstrap_basic'.\nThese are different ways of doing the bootstrapping, but method='bootstrap_bca' is the generalibly reccomended method.\n\nYou can also pass the number of bootstrap resamples (n_resamples), and a random generator for controling the reproducability:\n\n```python\nrandom_state = np.random.default_rng()\nn_resamples=9999\n```\n\n## Support for binary, macro and micro averaging for F1, Precision and Recall.\n```python\nfrom confidence interval import precision_score, recall_score, f1_score\nbinary_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='binary')\nmacro_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='macro')\nmicro_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='micro')\nbootstrap_binary_f1, ci = f1_score(y_true, y_pred, confidence_interval=0.95, average='binary', method='bootstrap_bca', n_resamples=5000)\n\n```\n\nThe analytical computation here is using the (amazing) 2022 paper of Takahashi et al (reference below).\nThe paper derived recall and precision only for micro averaging.\nWe derive the recall and precision confidence intervals for macro F1 as well using the delta method.\n\n\n## ROC AUC\n```python\nfrom confidence interval import roc_auc_score\n```\nThe analytical computation here is a fast implementation of the DeLong method.\n\n\n## Binary metrics\n```python\nfrom confidence interval import accuracy_score,\n                                ppv_score,\n                                npv_score,\n                                tpr_score,\n                                fpr_score,\n                                tnr_score\n# Wilson is used by default:\nppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='wilson')\nppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='jeffreys')\nppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='agresti_coull')\nppv, ci = ppv_score(y_true, y_pred, confidence_level=0.95, method='bootstrap_bca')\n\n```\n\nFor these methods, the confidence interval is estimated by treating the ratio as a binomial proportion,\nsee the [wiki page](https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval).\n\nBy default method='wilson', the wilson interval, which behaves better for smaller samples.\n\nmethod can be one of ['wilson', 'normal', 'agresti_coull', 'beta', 'jeffreys', 'binom_test'], or one of the boostrap methods.\n\n## Get a Classification Report\nThe [classification_report.py](confidenceinterval%2Fclassification_report.py) function builds a text report showing the main classification metrics and their confidence intervals.\nEach class will be first treated as a binary classification problem, the default CI for P and R used being Wilson, and Takahashi-binary for F1. Then the \nmicro and macro multi-class metric will be calculated using the Takahashi-methods.\n\n```python\nfrom confidenceinterval import classification_report_with_ci\n\ny_true = [0, 1, 2, 2, 2, 1, 1, 1, 0, 2, 2, 1, 0, 2, 2, 1, 2, 2, 1, 1]\ny_pred = [0, 1, 0, 0, 2, 1, 1, 1, 0, 2, 2, 1, 0, 1, 2, 1, 2, 2, 1, 1]\n\nclassification_report_with_ci(y_true, y_pred)\n\n     Class  Precision  Recall  F1-Score    Precision CI       Recall CI     F1-Score CI  Support\n0  Class 0      0.600   1.000     0.750  (0.231, 0.882)    (0.439, 1.0)  (0.408, 1.092)        3\n1  Class 1      0.889   1.000     0.941   (0.565, 0.98)    (0.676, 1.0)  (0.796, 1.086)        8\n2  Class 2      1.000   0.667     0.800     (0.61, 1.0)  (0.354, 0.879)  (0.562, 1.038)        9\n3    micro      0.850   0.850     0.850  (0.694, 1.006)  (0.694, 1.006)  (0.694, 1.006)       20\n4    macro      0.830   0.889     0.830  (0.702, 0.958)  (0.775, 1.002)  (0.548, 1.113)       20\n```\nYou can also provide a custom mapping for the class names, as well as modify the binary CI method and rounding.\n```python\nfrom confidenceinterval import classification_report_with_ci\n\ny_true = [0, 1, 2, 2, 2, 1, 1, 1, 0, 2, 2, 1, 0, 2, 2, 1, 2, 2, 1, 1]\ny_pred = [0, 1, 0, 0, 2, 1, 1, 1, 0, 2, 2, 1, 0, 1, 2, 1, 2, 2, 1, 1]\n\nnumerical_to_label = {\n    0: \"Cherries\",\n    1: \"Olives\",\n    2: \"Tangerines\"\n}\n\nclassification_report_with_ci(y_true, y_pred, round_ndigits=2, numerical_to_label_map = numerical_to_label, binary_method='wilson')\n\n        Class  Precision  Recall  F1-Score  Precision CI     Recall CI   F1-Score CI  Support\n0    Cherries       0.60    1.00      0.75  (0.23, 0.88)   (0.44, 1.0)  (0.41, 1.09)        3\n1      Olives       0.89    1.00      0.94  (0.57, 0.98)   (0.68, 1.0)   (0.8, 1.09)        8\n2  Tangerines       1.00    0.67      0.80   (0.61, 1.0)  (0.35, 0.88)  (0.56, 1.04)        9\n3       micro       0.85    0.85      0.85  (0.69, 1.01)  (0.69, 1.01)  (0.69, 1.01)       20\n4       macro       0.83    0.89      0.83   (0.7, 0.96)   (0.78, 1.0)  (0.55, 1.11)       20\n```\n\n\n## Get a confidence interval for any custom metric with Bootstrapping\nWith the bootstrap_ci method, you can get the CI for any metric function that gets y_true and y_pred as arguments.\n\nAs an example, lets get the CI for the balanced accuracy metric from scikit-learn.\n\n```python\nfrom confidenceinterval.bootstrap import bootstrap_ci\n# You can specify a random generator for reproducability, or pass None\nrandom_generator = np.random.default_rng()\nbootstrap_ci(y_true=y_true,\n             y_pred=y_pred,\n             metric=sklearn.metrics.balanced_accuracy_score,\n             confidence_level=0.95,\n             n_resamples=9999,\n             method='bootstrap_bca',\n             random_state=random_generator)\n```\n\n\n\n----------\n\nCitation\nIf you use this for research, please cite. Here is an example BibTeX entry:\n\n```\n@misc{jacobgildenblatconfidenceinterval,\n  title={A python library for confidence intervals},\n  author={Jacob Gildenblat},\n  year={2023},\n  publisher={GitHub},\n  howpublished={\\url{https://github.com/jacobgil/confidenceinterval}},\n}\n```\n\n----------\n\n## References\n\nThe binomial confidence interval computation uses the statsmodels package:\nhttps://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html\n\nYandex data school implementation of the fast delong method:\nhttps://github.com/yandexdataschool/roc_comparison\n\nhttps://ieeexplore.ieee.org/document/6851192\nX. Sun and W. Xu, \"Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves,\" in IEEE Signal Processing Letters, vol. 21, no. 11, pp. 1389-1393, Nov. 2014, doi: 10.1109/LSP.2014.2337313.\n\nhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8936911/#APP2\n\nConfidence interval for micro-averaged F1 and macro-averaged F1 scores\n`Kanae Takahashi,1,2 Kouji Yamamoto,3 Aya Kuchiba,4,5 and Tatsuki Koyama6`\n\nB. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman \u0026 Hall/CRC, Boca Raton, FL, USA (1993)\n\nhttp://users.stat.umn.edu/~helwig/notes/bootci-Notes.pdf\n`Nathaniel E. Helwig, “Bootstrap Confidence Intervals”`\n\n\nBootstrapping (statistics), Wikipedia, https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobgil%2Fconfidenceinterval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacobgil%2Fconfidenceinterval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobgil%2Fconfidenceinterval/lists"}