{"id":13737159,"url":"https://github.com/oracle/macest","last_synced_at":"2025-08-19T14:32:45.563Z","repository":{"id":40955297,"uuid":"394444813","full_name":"oracle/macest","owner":"oracle","description":"Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores","archived":false,"fork":false,"pushed_at":"2023-10-26T12:15:19.000Z","size":11554,"stargazers_count":100,"open_issues_count":9,"forks_count":20,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-07T01:14:05.962Z","etag":null,"topics":["confidence-estimation","data-science","machine-learning","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"upl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oracle.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-08-09T21:33:53.000Z","updated_at":"2024-08-04T05:46:35.000Z","dependencies_parsed_at":"2024-01-11T13:20:12.956Z","dependency_job_id":null,"html_url":"https://github.com/oracle/macest","commit_stats":{"total_commits":8,"total_committers":4,"mean_commits":2.0,"dds":0.375,"last_synced_commit":"4a38cb5fac4b4bb18331636394754c9ea149bc8e"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fmacest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fmacest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fmacest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fmacest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oracle","download_url":"https://codeload.github.com/oracle/macest/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230359935,"owners_count":18214157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["confidence-estimation","data-science","machine-learning","python"],"created_at":"2024-08-03T03:01:36.404Z","updated_at":"2025-08-19T14:32:45.548Z","avatar_url":"https://github.com/oracle.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# MACEst (Model Agnostic Confidence Estimator)\n## What is MACEst?\nMACEst is a confidence estimator that can be used alongside any model (regression or \nclassification) which uses previously seen data (i.e. any supervised learning model) to produce a \npoint prediction.\n\nIn the regression case, MACEst produces a _confidence interval_ about the point prediction, e.g. \n\"the point prediction is 10 and I am 90% confident that the prediction lies between 8 and 12.\"\n\nIn Classification MACEst produces a _confidence score_ for the point prediction. e.g. the point \nprediction is class 0 and I am 90% sure that the prediction is correct.\n\nMACEst produces well-calibrated confidence estimates, i.e. 90% confidence means that you will on \naverage be correct 90% of the time. \nIt is also aware of the model limitations i.e. when a model is being asked to predict a point which \nit does not have the necessary knowledge (data) to predict confidently. \nIn these cases MACEst is able to incorporate the (epistemic) uncertainty due to this and return a \nvery low confidence prediction (in regression this means a large prediction interval).\n\n## Why use MACEst ?\nMachine learning has become an integral part of many of the tools that are used every day. \nThere has been a huge amount of progress on improving the global accuracy of machine learning \nmodels but calculating how likely a single prediction is to be correct has seen considerably less \nprogress.\n\nMost algorithms will still produce a prediction, even if this is in a part of the feature space the \nalgorithm has no information about. \nThis could be because the feature vector is unlike anything seen during training, or because the \nfeature vector falls in a part of the feature space where there is a large amount of uncertainty \nsuch as if the border between two classes overlaps.\nIn cases like this the prediction may well be meaningless. \nIn most models, it is impossible to distinguish this sort of meaningless prediction from a sensible \nprediction. \nMACEst addresses this situation by providing an additional confidence estimate.\n\nIn some areas such as Finance, Infrastructure, or Healthcare, making a single bad prediction can \nhave major consequences.\nIt is important in these situations that a model is able to understand how likely any prediction it \nmakes is to be correct before acting upon it. \nIt is often even more important in these situations that any model *knows what it doesn't know* so \nthat it will not blindly make bad predictions.\n\n## Summary of the Methodology\n### TL;DR\nMACEst produces confidence estimates for a given point x by considering two factors:\n1. How accurate is the model when predicting previously seen points that are **similar** to x? \nLess confident if the model is less accurate in the region close to x.\n2. How **similar** is x to the points that we have seen previously? \nLess confident if x is not **similar** to the data used to train the model.\n\n### Longer Explanation\nMACEst seeks to provide reliable confidence estimates for both regression and classification. \nIt draws from ideas present in trust scores, conformal learning, Gaussian processes, and Bayesian \nmodelling.\n\nThe general idea is that confidence is a local quantity. \nEven when the model is accurate globally, there are likely still some predictions about which it \nshould not be very confident. \nSimilarly, if the model is not accurate globally, there may still be some predictions for which the \nmodel can be very confident about.\n\nTo model this local confidence for a given prediction on a point x, we define the local \nneighbourhood by finding the k nearest neighbours to x. \nWe then attempt to directly model the two causes of uncertainty, these are:\n1. _Aleatoric Uncertainty_: Even with lots of (possibly infinite) data there will be some \nvariance/noise in the predictions.\nOur local approximation to this will be to define a local accuracy estimate. i.e. for the k nearest \nneighbours how accurate were the predictions?\n2. _Epistemic Uncertainty_: The model can only know relationships learnt from the training data. \nIf the model has not seen any data point similar to x then it does not have as much knowledge about \npoints like x, therefore the confidence estimate should be lower. \nMACEst estimates this by calculating how **similar** x is to the k nearest (most similar) points \nthat it has previously seen.\n\nWe define a simple parametric function of these two quantities and calibrate this function so that \nour confidence estimates approximate the empirical accuracy, i.e. 90% confident -\u003e 90% correct on \naverage. \nBy directly modelling these two effects, MACEst estimates are able to encapsulate the local \nvariance accurately whilst also being aware of when the model is being asked to predict a point \nthat is very different to what it has been trained on. \nThis will make it robust to problems such as overconfident extrapolations and out of sample \npredictions.\n\n### Example\nIf a model has been trained to classify images of cats and dogs, and we want to predict an image of \na poodle, we find the k most poodle-like cats and the k most poodle-like dogs. \nWe then calculate how accurate the model was on these sets of images, and how similar the poodle is \nto each of these k cats and k dogs. We combine these two to produce a confidence estimate for each \nclass.\n\nAs the poodle-like cats will likely be strange cats, they will be harder to classify and the \naccuracy will be lower for these than the poodle-like dogs this combined with the fact that image \nwill be considerably more similar to poodle-like dogs the confidence of the dog prediction will be \nhigh.\n\nIf we now try to classify an image of a horse, we find that the new image is very **dissimilar** to \nboth cats and dogs, so the similarity term dominates and the model will return an approximately \nuniform distribution, this can be interpreted as MACEst saying \"I don't know what this is because \nI've never seen an image of a horse!\".\n\n## Getting Started\nWe recommend using Python 3.10 for MACEst.\n\nCreate a virtual environment and source into it:\n```bash\npython3.10 -m venv venv\nsource venv/bin/activate\n```\n\nInstall dependencies and MACEst: \n```bash\npip install -r requirements.txt\npip install -r requirements_notebooks.txt\npip install macest\n```\n\nOr add `macest` to your project's `requirements.txt` file as a dependency. \n\n### Software Prerequisites\nTo import and use MACEst we recommend Python version \u003e= `3.10.*`. \n\n## Basic Usage\nBelow shows examples of using MACEst for classification and regression.\nFor more examples, and advanced usage, please see the example [notebooks](./notebooks).\n\n### Classification \nTo use MACEst for a classification task, the following example can be used:\n``` python\n\n   import numpy as np\n   from macest.classification import models as cl_mod\n   from sklearn.ensemble import RandomForestClassifier\n   from sklearn import datasets\n   from sklearn.model_selection import train_test_split\n\n   X,y = datasets.make_circles(n_samples= 2 * 10**4, noise = 0.4, factor =0.001)\n\n   X_pp_train, X_conf_train, y_pp_train, y_conf_train  = train_test_split(X,\n                                                                          y,\n                                                                          test_size=0.66,\n                                                                          random_state=10)\n\n   X_conf_train, X_cal, y_conf_train, y_cal = train_test_split(X_conf_train,\n                                                               y_conf_train,\n                                                               test_size=0.5,\n                                                               random_state=0)\n\n   X_cal, X_test, y_cal,  y_test, = train_test_split(X_cal,\n                                                     y_cal,\n                                                     test_size=0.5,\n                                                     random_state=0)\n\n   point_pred_model = RandomForestClassifier(random_state =0,\n                                             n_estimators =800,\n                                             n_jobs =-1)\n\n   point_pred_model.fit(X_pp_train,\n                        y_pp_train)\n\n   macest_model = cl_mod.ModelWithConfidence(point_pred_model,\n                                          X_conf_train,\n                                          y_conf_train)\n\n   macest_model.fit(X_cal, y_cal)\n\n   conf_preds = macest_model.predict_confidence_of_point_prediction(X_test)\n``` \n\n### Regression\nTo use MACEst for a regression task, the following example can be used:\n``` python\n   import numpy as np\n   from macest.regression import models as reg_mod\n   from sklearn.linear_model import LinearRegression\n   from sklearn.model_selection import train_test_split\n\n   X = np.linspace(0,1,10**3)\n   y = np.zeros(10**3)\n   y = 2*X*np.sin(2 *X)**2 + np.random.normal(0 , 1 , len(X))\n\n   X_pp_train, X_conf_train, y_pp_train, y_conf_train  = train_test_split(X,\n                                                                          y,\n                                                                          test_size=0.66,\n                                                                          random_state=0)\n\n   X_conf_train, X_cal, y_conf_train, y_cal = train_test_split(X_conf_train, y_conf_train,\n                                                            test_size=0.5, random_state=1)\n\n   X_cal, X_test, y_cal,  y_test, =  train_test_split(X_cal,\n                                                      y_cal,\n                                                      test_size=0.5,\n                                                      random_state=1)\n\n   point_pred_model = LinearRegression()\n   point_pred_model.fit(X_pp_train[:,None], y_pp_train)\n\n   preds = point_pred_model.predict(X_conf_train[:,None])\n   test_error = abs(preds - y_conf_train)\n   y_conf_train_var = np.var(train_error)\n\n   macest_model = reg_mod.ModelWithPredictionInterval(point_pred_model,\n                                                    X_conf_train[:,None],\n                                                    test_error)\n\n   macest_model.fit(X_cal[:,None], y_cal)\n   conf_preds = confidence_model.predict_interval(X_test, conf_level=90)\n ```\n\n### MACEst with sparse data (see notebooks for more details)\n```python\nimport scipy\nfrom scipy.sparse import csr_matrix\nfrom scipy.sparse import random as sp_rand\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.ensemble import RandomForestClassifier\nfrom macest.classification import models as clmod\nimport nmslib \n\nn_rows = 10**3\nn_cols = 5 * 10**3\nX = csr_matrix(sp_rand(n_rows, n_cols))\ny = np.random.randint(0, 2, n_rows)\n\nX_pp_train, X_conf_train, y_pp_train, y_conf_train = train_test_split(X, y, test_size=0.66, random_state=10)\nX_conf_train, X_cal, y_conf_train, y_cal = train_test_split(X_conf_train, y_conf_train,\n                                                            test_size=0.5, random_state=0)\nX_cal, X_test, y_cal,  y_test, = train_test_split(X_cal, y_cal, test_size=0.5, random_state=0)\n\nmodel = RandomForestClassifier(random_state=0,\n                               n_estimators=800,\n                               n_jobs=-1)\n\nmodel.fit(csr_matrix(X_pp_train), y_pp_train)\n\nparam_bounds = clmod.SearchBounds(alpha_bounds=(0, 500), k_bounds=(5, 15))\nneighbour_search_params = clmod.HnswGraphArgs(query_args=dict(ef=1100),\n                                              init_args=dict(method=\"hnsw\",\n                                                             space=\"cosinesimil_sparse\",\n                                                             data_type=nmslib.DataType.SPARSE_VECTOR))\nmacest_model = clmod.ModelWithConfidence(model,\n                                       X_conf_train,\n                                       y_conf_train,\n                                       search_method_args=neighbour_search_params)\n\nmacest_model.fit(X_cal, y_cal)\n\nmacest_point_prediction_conf = macest_model.predict_confidence_of_point_prediction(X_test)\n\n```\n\n## Contributing\nSee the [`CONTRIBUTING.md`](./CONTRIBUTING.md) file for information about contributing to MACEst.\n\n\n## Related Publications\n\nFor more information about the underlying methodology behind MACEst, then please refer to our \naccompanying research paper that has been shared on arXiv:\n\n* \"[MACEst: The reliable and trustworthy Model Agnostic Confidence \nEstimator](https://arxiv.org/abs/2109.01531). Rhys Green, Matthew Rowe, and Alberto Polleri. 2021.\"\n\n## Security\n\nPlease consult the [security guide](./SECURITY.md) for our responsible security vulnerability disclosure process\n\n## License\nCopyright (c) 2021, 2023 Oracle and/or its affiliates. All rights reserved.\n\nThis library is licensed under Universal Permissive License (UPL) 1.0 as shown at \nhttps://oss.oracle.com/licenses/upl\n\nSee [LICENSE.txt](./LICENSE.txt) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle%2Fmacest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foracle%2Fmacest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle%2Fmacest/lists"}