{"id":26590224,"url":"https://github.com/wrwrwr/scikit-gof","last_synced_at":"2025-03-23T13:36:28.282Z","repository":{"id":57464457,"uuid":"45170505","full_name":"wrwrwr/scikit-gof","owner":"wrwrwr","description":"Variations on goodness of fit tests for SciPy.","archived":false,"fork":false,"pushed_at":"2020-08-17T19:59:46.000Z","size":65,"stargazers_count":7,"open_issues_count":4,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-10T03:53:50.353Z","etag":null,"topics":["statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wrwrwr.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-10-29T08:37:13.000Z","updated_at":"2021-04-08T23:25:02.000Z","dependencies_parsed_at":"2022-08-31T03:10:11.914Z","dependency_job_id":null,"html_url":"https://github.com/wrwrwr/scikit-gof","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wrwrwr%2Fscikit-gof","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wrwrwr%2Fscikit-gof/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wrwrwr%2Fscikit-gof/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wrwrwr%2Fscikit-gof/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wrwrwr","download_url":"https://codeload.github.com/wrwrwr/scikit-gof/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245109342,"owners_count":20562183,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["statistics"],"created_at":"2025-03-23T13:36:27.714Z","updated_at":"2025-03-23T13:36:28.257Z","avatar_url":"https://github.com/wrwrwr.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"==========\nscikit-gof\n==========\n\nProvides variants of Kolmogorov-Smirnov, Cramer-von Mises and Anderson-Darling\ngoodness of fit tests for fully specified continuous distributions.\n\nExample\n=======\n\n.. code:: python\n\n    \u003e\u003e\u003e from scipy.stats import norm, uniform\n    \u003e\u003e\u003e from skgof import ks_test, cvm_test, ad_test\n\n    \u003e\u003e\u003e ks_test((1, 2, 3), uniform(0, 4))\n    GofResult(statistic=0.25, pvalue=0.97...)\n\n    \u003e\u003e\u003e cvm_test((1, 2, 3), uniform(0, 4))\n    GofResult(statistic=0.04..., pvalue=0.95...)\n\n    \u003e\u003e\u003e data = norm(0, 1).rvs(random_state=1, size=100)\n    \u003e\u003e\u003e ad_test(data, norm(0, 1))\n    GofResult(statistic=0.75..., pvalue=0.51...)\n    \u003e\u003e\u003e ad_test(data, norm(.3, 1))\n    GofResult(statistic=3.52..., pvalue=0.01...)\n\nSimple tests\n============\n\nScikit-gof currently only offers three nonparametric tests that let you\ncompare a sample with a reference probability distribution. These are:\n\n``ks_test()``\n    Kolmogorov-Smirnov supremum statistic; almost the same as\n    ``scipy.stats.kstest()`` with ``alternative='two-sided'`` but with\n    (hopefully) somewhat more precise p-value calculation;\n\n``cvm_test()``\n    Cramer-von Mises L2 statistic, with a rather crude estimation of the\n    statistic distribution (but seemingly the best available);\n\n``ad_test()``\n    Anderson-Darling statistic with a fair approximation of its distribution;\n    unlike the composite ``scipy.stats.anderson()`` this one needs a fully\n    specified hypothesized distribution.\n\nSimple test functions use a common interface, taking as the first argument the\ndata (sample) to be compared and as the second argument a frozen ``scipy.stats``\ndistribution.\nThey return a named tuple with two fields: ``statistic`` and ``pvalue``.\n\nFor a simple example consider the hypothesis that the sample (.4, .1, .7) comes\nfrom the uniform distribution on [0, 1]:\n\n.. code:: python\n\n    if ks_test((.4, .1, .7), unif(0, 1)).pvalue \u003c .05:\n        print(\"Hypothesis rejected with 5% significance.\")\n\nIf your samples are very large and you have them sorted ahead of time, pass\n``assume_sorted=True`` to save some time that would be wasted resorting.\n\nExtending\n=========\n\nSimple tests are composed of two phases: calculating the test statistic and\ndetermining how likely is the resulting value (under the hypothesis).\nNew tests may be defined by providing a new statistic calculation routine or an\nalternative distribution for a statistic.\n\nFunctions calculating statistics are given evaluations of the reference\ncumulative distribution function on sorted data and are expected to return\na single number.\nFor a simple test, if the sample indeed comes from the hypothesized (continuous)\ndistribution, the values passed to the function should be uniformly distributed\nover [0, 1].\n\nHere is a simplistic example of how a statistic function might look like:\n\n.. code:: python\n\n    def ex_stat(data):\n        return abs(data.sum() - data.size / 2)\n\nStatistic functions for the provided tests, ``ks_stat()``, ``cvm_stat()``,\nand ``ad_stat()``, can be imported from ``skgof.ecdfgof``.\n\nStatistic distributions should derive from ``rv_continuous`` and implement\nat least one of the abstract ``_cdf()`` or ``_pdf()`` methods (you might\nalso consider directly coding ``_sf()`` for increased precision of results\nclose to 1). For example:\n\n.. code:: python\n\n    from numpy import sqrt\n    from scipy.stats import norm, rv_continuous\n\n    class ex_unif_gen(rv_continuous):\n        def _cdf(self, statistic, samples):\n            return 1 - 2 * norm.cdf(-statistic, scale=sqrt(samples / 12))\n\n    ex_unif = ex_unif_gen(a=0, name='ex-unif', shapes='samples')\n\nThe provided distributions live in separate modules, respectively ``ksdist``,\n``cvmdist``, and ``addist``.\n\nOnce you have a statistic calculation function and a statistic distribution the\ntwo parts can be combined using ``simple_test``:\n\n.. code:: python\n\n    from functools import partial\n    from skgof.ecdfgof import simple_test\n\n    ex_test = partial(simple_test, stat=ex_stat, pdist=ex_unif)\n\n**Exercise**: The example test has a fundamental flaw. Can you point it out?\n\n..  The test is not consistent under all alternatives. For instance, if the\n    hypothesis was that samples come from the uniform distribution on [0, 1],\n    but they really were \"drawn\" from the degenerate distribution at .5, the\n    test would never notice, even for arbitrarily large sample sizes.\n\n    Moreover, the asymptotic distribution is not a good approximation of the\n    actual statistic distribution for small sample sizes.\n\nInstallation\n============\n\n.. code:: bash\n\n    pip install scikit-gof\n\nRequires recent versions of Python (\u003e 3), NumPy (\u003e= 1.10) and SciPy.\n\nPlease fix or point out any errors, inaccuracies or typos you notice.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwrwrwr%2Fscikit-gof","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwrwrwr%2Fscikit-gof","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwrwrwr%2Fscikit-gof/lists"}