{"id":42414792,"url":"https://github.com/adrian-lison/interval-scoring","last_synced_at":"2026-01-28T01:57:25.313Z","repository":{"id":106819662,"uuid":"283722797","full_name":"adrian-lison/interval-scoring","owner":"adrian-lison","description":"This repository contains python implementations of scoring rules for forecasts provided in a prediction interval format.","archived":false,"fork":false,"pushed_at":"2020-07-31T12:25:52.000Z","size":18,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-01-30T15:23:38.107Z","etag":null,"topics":["forecasting","interval","quantile","quantile-regression","scoring"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adrian-lison.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-07-30T09:06:28.000Z","updated_at":"2023-12-23T15:53:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"e0a519ad-d865-489f-9d8b-e4900055da51","html_url":"https://github.com/adrian-lison/interval-scoring","commit_stats":{"total_commits":9,"total_committers":1,"mean_commits":9.0,"dds":0.0,"last_synced_commit":"bf85ddbbf314ff16629d4e0320dbe129a8c8a4fd"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/adrian-lison/interval-scoring","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrian-lison%2Finterval-scoring","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrian-lison%2Finterval-scoring/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrian-lison%2Finterval-scoring/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrian-lison%2Finterval-scoring/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adrian-lison","download_url":"https://codeload.github.com/adrian-lison/interval-scoring/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrian-lison%2Finterval-scoring/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28833134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T23:29:49.665Z","status":"ssl_error","status_checked_at":"2026-01-27T23:25:58.379Z","response_time":168,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["forecasting","interval","quantile","quantile-regression","scoring"],"created_at":"2026-01-28T01:57:23.692Z","updated_at":"2026-01-28T01:57:25.307Z","avatar_url":"https://github.com/adrian-lison.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Interval Scoring\nThis repository contains python implementations of scoring rules for forecasts provided in a prediction interval format, for example when only quantiles of the predictive distribution are given.\n\n- **Interval Score** (Gneiting \u0026 Raftery, 2007): Scores sharpness and calibration of a specific prediction interval. \n- **Weighted Interval Score** (Bracher, Gneiting \u0026 Reich, 2020): A weighted sum of sharpness and calibration scores of several prediction intervals. By suitable weighting, this can be used to approximate the CRPS.\n- **Outside Interval Count**: Simple count of observations outside the prediction interval.\n- **Interval Consistency Score**: Adapted version of the interval score which compares consecutive forecasts for the same point in time and evaluates their consistency (see below).\n\nAll credits for the interval score and weighted interval score go to the corresponding authors.\n\nFor exemplary uses of the scoring functions, see [here](#example). For the docstring of the functions, see [here](#docs).\n\n\n#### References\n- Gneiting, T. and A. E. Raftery (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102(477), 359–378, DOI: [10.1198/016214506000001437](https://doi.org/10.1198/016214506000001437)\n- Bracher, J., Ray, E. L., Gneiting, T., \u0026 Reich, N. G. (2020). Evaluating epidemic forecasts in an interval format. arXiv preprint [arXiv:2005.12881](https://arxiv.org/abs/2005.12881).\n\n## \u003ca id='example'\u003e\u003c/a\u003eExemplary Usage of Scoring Functions\nThe following notebook snippets show the usage of different scoring functions with exemplary data.\n\n\n```python\nimport timeit # timeit for measuring runtime of approaches\n```\n\n\n```python\nimport scoring as scoring # the scoring module\nimport numpy as np\n```\n\n### Test data\n\n\n```python\n# Exemplary ground truth observations as array\nobservations_test = np.array([4,7,4,6,2,1,3,8])\n\n# Exemplary point forecasts\npoint_dict_test = {\n    0.1: np.array([2, 4.7, 5.2 , 9.6, 1.8, -2  , 0.4, 8.8]),\n    0.2: np.array([2, 4.7, 5.2 , 9.6, 1.8, -2  , 0.4, 8.8]),\n    0.5: np.array([2, 4.7, 5.2 , 9.6, 1.8, -2  , 0.4, 8.8]),\n    0.8: np.array([2, 4.7, 5.2 , 9.6, 1.8, -2  , 0.4, 8.8]),\n    0.9: np.array([2, 4.7, 5.2 , 9.6, 1.8, -2  , 0.4, 8.8]),\n}\n\n# Exemplary quantile forecasts\nquantile_dict_test = {\n    0.1: np.array([2, 3  , 5   , 9  , 1  , -3  , 0.2, 8.7]),\n    0.2: np.array([2, 4.6, 5   , 9.4, 1.4, -2  , 0.4, 8.8]),\n    0.5: np.array([2, 4.7, 5.2 , 9.6, 1.8, -2  , 0.4, 8.8]),\n    0.8: np.array([4, 4.8, 5.7 , 12 , 4.3, -1.5, 2  , 8.9]),\n    0.9: np.array([5, 5  , 7   , 13 , 5  , -1  , 3  , 9])\n}\n\n# Exemplary alpha level\nalpha_test=0.2\n```\n\n### Interval Score\n\nCompute interval scores by providing a left and right quantile.\n\n\n```python\nscoring.interval_score(observations_test,alpha_test,q_left=quantile_dict_test[0.1],q_right=quantile_dict_test[0.9])\n```\n\n\n\n\n    (array([ 3. , 22. , 12. , 34. ,  4. , 22. ,  2.8,  7.3]),\n     array([3. , 2. , 2. , 4. , 4. , 2. , 2.8, 0.3]),\n     array([ 0., 20., 10., 30.,  0., 20.,  0.,  7.]))\n\n\n\nCompute interval scores by providing a dictionary of quantiles. The quantiles needed for the alpha level specified are automatically selected from the dictionary.\n\n\n```python\nscoring.interval_score(observations_test,alpha_test,q_dict=quantile_dict_test)\n```\n\n\n\n\n    (array([ 3. , 22. , 12. , 34. ,  4. , 22. ,  2.8,  7.3]),\n     array([3. , 2. , 2. , 4. , 4. , 2. , 2.8, 0.3]),\n     array([ 0., 20., 10., 30.,  0., 20.,  0.,  7.]))\n\n\n\nTest if the percentage version of the interval score works:\nWe use the (1-[alpha=1])-interval, i.e. the median, for the interval score, set `percentage=True` and check whether this is two times the absolute percentage error of the median.\n\n\n```python\nprint(scoring.interval_score(observations_test,1,q_dict=quantile_dict_test,percent=True)[0])\nprint(2*np.abs((observations_test-quantile_dict_test[0.5])/observations_test))\n```\n\n    [1.         0.65714286 0.6        1.2        0.2        6.\n     1.73333333 0.2       ]\n    [1.         0.65714286 0.6        1.2        0.2        6.\n     1.73333333 0.2       ]\n    \n\n### Weighted Interval Score\n\nCompute weighted interval scores by providing a dictionary of quantiles and a list of alpha levels and corresponding weights.\n\nThere are two implementations, a simple one based on iterated computation of interval scores and a faster one based on joint computation of all interval scores. They yield the same results, so the faster one is preferrable and the basic one is only provided for understanding purposes.\n\n\n```python\nprint(scoring.weighted_interval_score(observations_test,alphas=[0.2,0.4],weights=[2,5],q_dict=quantile_dict_test))\nprint(scoring.weighted_interval_score_fast(observations_test,alphas=[0.2,0.4],weights=[2,5],q_dict=quantile_dict_test))\n```\n\n    (array([ 2.28571429, 14.28571429,  7.5       , 23.71428571,  3.21428571,\n           15.57142857,  5.51428571,  5.01428571]), array([2.28571429, 0.71428571, 1.07142857, 3.        , 3.21428571,\n           0.92857143, 1.94285714, 0.15714286]), array([ 0.        , 13.57142857,  6.42857143, 20.71428571,  0.        ,\n           14.64285714,  3.57142857,  4.85714286]))\n    (array([ 2.28571429, 14.28571429,  7.5       , 23.71428571,  3.21428571,\n           15.57142857,  5.51428571,  5.01428571]), array([2.28571429, 0.71428571, 1.07142857, 3.        , 3.21428571,\n           0.92857143, 1.94285714, 0.15714286]), array([ 0.        , 13.57142857,  6.42857143, 20.71428571,  0.        ,\n           14.64285714,  3.57142857,  4.85714286]))\n    \n\nCompute weighted interval scores by providing a dictionary of quantiles and a list of alpha levels, but no corresponding weights. This means setting `weights=alphas/2`, yielding an approximation of the CRPS.\n\n\n```python\nscoring.weighted_interval_score_fast(observations_test,alphas=[0.2,0.4],weights=None,q_dict=quantile_dict_test)\n```\n\n\n\n\n    (array([ 2.33333333, 14.8       ,  7.8       , 24.4       ,  3.26666667,\n            16.        ,  5.33333333,  5.16666667]),\n     array([2.33333333, 0.8       , 1.13333333, 3.06666667, 3.26666667,\n            1.        , 2.        , 0.16666667]),\n     array([ 0.        , 14.        ,  6.66666667, 21.33333333,  0.        ,\n            15.        ,  3.33333333,  5.        ]))\n\n\n\nCompute percentage version of the weighted interval scores (once with just the median, equal to the simple interval scorce, and once with several intervals).\n\n\n```python\nprint(scoring.weighted_interval_score_fast(observations_test,alphas=[1],weights=None,q_dict=point_dict_test,percent=True)[0])\nprint(scoring.weighted_interval_score_fast(observations_test,alphas=[0.2,0.4,1],weights=None,q_dict=point_dict_test,percent=True)[0])\n```\n\n    [1.         0.65714286 0.6        1.2        0.2        6.\n     1.73333333 0.2       ]\n    [ 1.875       1.23214286  1.125       2.25        0.375      11.25\n      3.25        0.375     ]\n    \n\n#### Compare runtimes of weighted interval score methods\n\nRuntimes of simple implementation for different sizes\n\n\n```python\nprint({n: min(timeit.repeat(lambda: scoring.weighted_interval_score(observations_test,alphas=[0.2]*n,weights=None,q_dict=quantile_dict_test),repeat=100,number=100)) for n in [2,5,10,20]})\n```\n\n    {2: 0.007953100000000823, 5: 0.014574499999999269, 10: 0.02527900000000116, 20: 0.04693550000000002}\n    \n\nRuntimes of fast implementation for different sizes\n\n\n```python\nprint({n: min(timeit.repeat(lambda: scoring.weighted_interval_score_fast(observations_test,alphas=[0.2]*2,weights=None,q_dict=quantile_dict_test),repeat=100,number=100)) for n in [2,5,10,20,40,80]})\n```\n\n    {2: 0.008790000000001186, 5: 0.00875909999999891, 10: 0.008776400000002127, 20: 0.008793000000000717, 40: 0.008763800000000543, 80: 0.008803100000001507}\n    \n\nAs can be seen, the fast implementation is in fact quicker for more than 2 or 3 intervals. Its runtime stays almost constant.\n\n### Outside Interval Count\n\nCount number of times the true observation was outside an interval.\n\n\n```python\nprint(scoring.outside_interval(observations_test,lower=quantile_dict_test[0.1],upper=quantile_dict_test[0.9]))\n```\n\n    [0 1 1 1 0 1 0 1]\n    \n\n### Interval Consistency Score\n\nCompute adapted variant of the interval score which measures the consistency of updated intervals over time.\n\nThe interval consistency score does not evaluate the sharpness of the intervals. It only penalizes the difference between old and new forecasted intervals when the new interval is (partially) outside the old interval.\nIdeally, updated predicted intervals would always be within the previous estimates of the interval, yielding\na score of zero (best). The underlying logic is that the old forecasts have been made with less perfect knowledge than the new ones and should reflect this epistemic uncertainty in wider prediction intervals.\n\n\n```python\nprint(scoring.interval_consistency_score(lower_old=quantile_dict_test[0.1],upper_old=quantile_dict_test[0.9],lower_new=quantile_dict_test[0.1],upper_new=quantile_dict_test[0.8]))\n```\n\n    [0. 0. 0. 0. 0. 0. 0. 0.]\n    \n\n\n```python\nprint(scoring.interval_consistency_score(lower_old=quantile_dict_test[0.1],upper_old=quantile_dict_test[0.8],lower_new=quantile_dict_test[0.1],upper_new=quantile_dict_test[0.9]))\n```\n\n    [1.  0.2 1.3 1.  0.7 0.5 1.  0.1]\n    \n\n## \u003ca id='docs'\u003e\u003c/a\u003eFunction Docs\n\n### interval_score\n    Compute interval scores for an array of observations and predicted intervals.\n    \n    Either a dictionary with the respective (alpha/2) and (1-(alpha/2)) quantiles via q_dict needs to be specified or the quantiles need to be specified via q_left and q_right.\n\n    Parameters\n    ----------\n    observations : array_like\n        Ground truth observations.\n    alpha : numeric\n        Alpha level for (1-alpha) interval.\n    q_dict : dict, optional\n        Dictionary with predicted quantiles for all instances in `observations`.\n    q_left : array_like, optional\n        Predicted (alpha/2)-quantiles for all instances in `observations`.\n    q_right : array_like, optional\n        Predicted (1-(alpha/2))-quantiles for all instances in `observations`.\n    percent: bool, optional\n        If `True`, score is scaled by absolute value of observations to yield a percentage error. Default is `False`.\n    check_consistency: bool, optional\n        If `True`, quantiles in `q_dict` are checked for consistency. Default is `True`.\n\n    Returns\n    -------\n    total : array_like\n        Total interval scores.\n    sharpness : array_like\n        Sharpness component of interval scores.\n    calibration : array_like\n        Calibration component of interval scores.\n        \n### weighted_interval_score / weighted_interval_score_fast\n    Compute weighted interval scores for an array of observations and a number of different predicted intervals.\n    \n    This function implements the WIS-score. A dictionary with the respective (alpha/2)\n    and (1-(alpha/2)) quantiles for all alpha levels given in `alphas` needs to be specified.\n    \n    Parameters\n    ----------\n    observations : array_like\n        Ground truth observations.\n    alphas : iterable\n        Alpha levels for (1-alpha) intervals.\n    q_dict : dict\n        Dictionary with predicted quantiles for all instances in `observations`.\n    weights : iterable, optional\n        Corresponding weights for each interval. If `None`, `weights` is set to `alphas`, yielding the WIS^alpha-score.\n    percent: bool, optional\n        If `True`, score is scaled by absolute value of observations to yield the double absolute percentage error. Default is `False`.\n    check_consistency: bool, optional\n        If `True`, quantiles in `q_dict` are checked for consistency. Default is `True`.\n        \n    Returns\n    -------\n    total : array_like\n        Total weighted interval scores.\n    sharpness : array_like\n        Sharpness component of weighted interval scores.\n    calibration : array_like\n        Calibration component of weighted interval scores.\n        \n### interval_consistency_score\n    Compute interval consistency scores for an old and a new interval.\n    \n    Adapted variant of the interval score which measures the consistency of updated intervals over time.\n    Ideally, updated predicted intervals would always be within the previous estimates of the interval, yielding\n    a score of zero (best).\n    \n    Parameters\n    ----------\n    lower_old : array_like\n        Previous lower interval boundary for all instances in `observations`.\n    upper_old : array_like, optional\n        Previous upper interval boundary for all instances in `observations`.\n    lower_new : array_like\n        New lower interval boundary for all instances in `observations`. Ideally higher than the previous boundary.\n    upper_new : array_like, optional\n        New upper interval boundary for all instances in `observations`. Ideally lower than the previous boundary.\n    check_consistency: bool, optional\n        If interval boundaries are checked for consistency. Default is `True`.\n        \n    Returns\n    -------\n    scores : array_like\n        Interval consistency scores.\n        \n### outside_interval\n    Indicate whether observations are outside a predicted interval for an array of observations and predicted intervals.\n    \n    Parameters\n    ----------\n    observations : array_like\n        Ground truth observations.\n    lower : array_like, optional\n        Predicted lower interval boundary for all instances in `observations`.\n    upper : array_like, optional\n        Predicted upper interval boundary for all instances in `observations`.\n    check_consistency: bool, optional\n        If `True`, interval boundaries are checked for consistency. Default is `True`.\n        \n    Returns\n    -------\n    Out : array_like\n        Array of zeroes (False) and ones (True) counting the number of times observations where outside the interval.\n\n\n```python\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrian-lison%2Finterval-scoring","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadrian-lison%2Finterval-scoring","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrian-lison%2Finterval-scoring/lists"}