{"id":18931957,"url":"https://github.com/crflynn/sklearn-instrumentation","last_synced_at":"2025-04-15T16:33:29.834Z","repository":{"id":37899706,"uuid":"311867479","full_name":"crflynn/sklearn-instrumentation","owner":"crflynn","description":"Generalized scikit-learn machine learning model instrumentation library","archived":false,"fork":false,"pushed_at":"2022-07-21T16:02:26.000Z","size":303,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-28T23:43:28.770Z","etag":null,"topics":["instrumentation","machine-learning","scikit-learn"],"latest_commit_sha":null,"homepage":"https://sklearn-instrumentation.readthedocs.io/en/stable/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crflynn.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-11T04:59:44.000Z","updated_at":"2024-03-21T13:11:53.000Z","dependencies_parsed_at":"2022-08-08T22:16:21.157Z","dependency_job_id":null,"html_url":"https://github.com/crflynn/sklearn-instrumentation","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crflynn%2Fsklearn-instrumentation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crflynn%2Fsklearn-instrumentation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crflynn%2Fsklearn-instrumentation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crflynn%2Fsklearn-instrumentation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crflynn","download_url":"https://codeload.github.com/crflynn/sklearn-instrumentation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249108476,"owners_count":21214002,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["instrumentation","machine-learning","scikit-learn"],"created_at":"2024-11-08T11:47:20.131Z","updated_at":"2025-04-15T16:33:29.489Z","avatar_url":"https://github.com/crflynn.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"sklearn-instrumentation\n=======================\n\n|actions| |rtd| |pypi| |pyversions|\n\n.. |actions| image:: https://github.com/crflynn/sklearn-instrumentation/actions/workflows/build.yml/badge.svg\n    :target: https://github.com/crflynn/sklearn-instrumentation/actions\n\n.. |rtd| image:: https://img.shields.io/readthedocs/sklearn-instrumentation.svg\n    :target: http://sklearn-instrumentation.readthedocs.io/en/latest/\n\n.. |pypi| image:: https://img.shields.io/pypi/v/sklearn-instrumentation.svg\n    :target: https://pypi.python.org/pypi/sklearn-instrumentation\n\n.. |pyversions| image:: https://img.shields.io/pypi/pyversions/sklearn-instrumentation.svg\n    :target: https://pypi.python.org/pypi/sklearn-instrumentation\n\n\nGeneralized instrumentation tooling for scikit-learn models. ``sklearn_instrumentation`` allows instrumenting the ``sklearn`` package and any scikit-learn compatible packages with estimators and transformers inheriting from ``sklearn.base.BaseEstimator``.\n\nInstrumentation applies decorators to methods of ``BaseEstimator``-derived classes or instances. By default the instrumentor applies instrumentation to the following methods (except when they are properties of instances):\n\n* fit\n* fit_transform\n* predict\n* predict_log_proba\n* predict_proba\n* transform\n* _fit\n* _fit_transform\n* _predict\n* _predict_log_proba\n* _predict_proba\n* _transform\n\n**sklearn-instrumentation** supports instrumentation of full sklearn-compatible packages, as well as recursive instrumentation of models (metaestimators like ``Pipeline``, or even single estimators like ``RandomForestClassifier``)\n\nInstallation\n------------\n\nThe sklearn-instrumentation package is available on pypi and can be installed using pip\n\n.. code-block:: bash\n\n    pip install sklearn-instrumentation\n\n\nPackage instrumentation\n-----------------------\n\nInstrument any sklearn compatible package that has ``BaseEstimator``-derived classes.\n\n.. code-block:: python\n\n    from sklearn_instrumentation import SklearnInstrumentor\n\n    instrumentor = SklearnInstrumentor(instrument=my_instrument)\n    instrumentor.instrument_packages([\"sklearn\", \"xgboost\", \"lightgbm\"])\n\n\nFull example:\n\n.. code-block:: python\n\n    import logging\n\n    from sklearn.datasets import load_iris\n    from sklearn.decomposition import PCA\n    from sklearn.ensemble import RandomForestClassifier\n    from sklearn.pipeline import FeatureUnion\n    from sklearn.pipeline import Pipeline\n    from sklearn.preprocessing import StandardScaler\n\n    from sklearn_instrumentation import SklearnInstrumentor\n    from sklearn_instrumentation.instruments.logging import TimeElapsedLogger\n\n    logging.basicConfig(level=logging.INFO)\n\n    # Create an instrumentor and instrument sklearn\n    instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())\n    instrumentor.instrument_packages([\"sklearn\"])\n\n    # Create a toy model for classification\n    ss = StandardScaler()\n    pca = PCA(n_components=3)\n    rf = RandomForestClassifier()\n    classification_model = Pipeline(\n        steps=[\n            (\n                \"fu\",\n                FeatureUnion(\n                    transformer_list=[\n                        (\"ss\", ss),\n                        (\"pca\", pca),\n                    ]\n                ),\n            ),\n            (\"rf\", rf),\n        ]\n    )\n    X, y = load_iris(return_X_y=True)\n\n    # Observe logging\n    classification_model.fit(X, y)\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0006406307220458984 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0001430511474609375 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0006711483001708984 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.0026731491088867188 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.1768970489501953 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.17983102798461914 seconds\n\n    # Observe logging\n    classification_model.predict(X)\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00024509429931640625 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.0002181529998779297 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0012080669403076172 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.013531208038330078 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.013692140579223633 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.015219926834106445 seconds\n\n    # Remove instrumentation\n    instrumentor.uninstrument_packages([\"sklearn\"])\n\n    # Observe no logging\n    classification_model.predict(X)\n\n\nInstance instrumentation\n------------------------\n\nInstrument any sklearn compatible trained estimator or metaestimator.\n\n.. code-block:: python\n\n    from sklearn_instrumentation import SklearnInstrumentor\n\n    instrumentor = SklearnInstrumentor(instrument=my_instrument)\n    instrumentor.instrument_instance(estimator=my_ml_pipeline)\n\n\nExample:\n\n.. code-block:: python\n\n    import logging\n\n    from sklearn.datasets import load_iris\n    from sklearn_instrumentation import SklearnInstrumentor\n    from sklearn_instrumentation.instruments.logging import TimeElapsedLogger\n    from sklearn.ensemble import RandomForestClassifier\n\n    logging.basicConfig(level=logging.INFO)\n\n    # Train a classifier\n    X, y = load_iris(return_X_y=True)\n    rf = RandomForestClassifier()\n\n    rf.fit(X, y)\n\n    # Create an instrumentor which decorates BaseEstimator methods with\n    # logging output when entering and exiting methods, with time elapsed logged\n    # on exit.\n    instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())\n\n    # Apply the decorator to all BaseEstimators in each of these libraries\n    instrumentor.instrument_instance(rf)\n\n    # Observe the logging output\n    rf.predict(X)\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.014165163040161133 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.014327764511108398 seconds\n\n    # Remove the decorator from all BaseEstimators in each of these libraries\n    instrumentor.uninstrument_instance(rf)\n\n    # No more logging\n    rf.predict(X)\n\n\nInstance class instrumentation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDuring fitting, some metaestimators will copy estimator instances using scikit-learn's ``clone`` function. This results in cloned fitted estimators not having instrumentation. To get around this we can instrument the classes rather than the instances.\n\n.. code-block:: python\n\n    from sklearn.datasets import load_iris\n    from sklearn.decomposition import PCA\n    from sklearn.ensemble import RandomForestClassifier\n    from sklearn.pipeline import FeatureUnion\n    from sklearn.pipeline import Pipeline\n    from sklearn.preprocessing import StandardScaler\n\n    from sklearn_instrumentation import SklearnInstrumentor\n    from sklearn_instrumentation.instruments.logging import TimeElapsedLogger\n\n    logging.basicConfig(level=logging.INFO)\n\n    ss = StandardScaler()\n    pca = PCA(n_components=3)\n    rf = RandomForestClassifier()\n    classification_model = Pipeline(\n        steps=[\n            (\n                \"fu\",\n                FeatureUnion(\n                    transformer_list=[\n                        (\"ss\", ss),\n                        (\"pca\", pca),\n                    ]\n                ),\n            ),\n            (\"rf\", rf),\n        ]\n    )\n    X, y = load_iris(return_X_y=True)\n\n    instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())\n    instrumentor.instrument_instance_classes(classification_model)\n\n    classification_model.fit(X, y)\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0006749629974365234 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0007731914520263672 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00016427040100097656 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0002810955047607422 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0004239082336425781 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0005612373352050781 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.002705097198486328 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.002802133560180664 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.16085195541381836 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.16097569465637207 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.1639721393585205 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.16404390335083008 seconds\n    classification_model.predict(X)\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0001049041748046875 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00017309188842773438 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.0001690387725830078 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.00023698806762695312 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0008630752563476562 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0009222030639648438 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.01138925552368164 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.011497974395751953 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.011577844619750977 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.011635780334472656 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.012682199478149414 seconds\n    # INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.012733936309814453 seconds\n\n    instrumentor.uninstrument_instance_classes(classification_model)\n\n    classification_model.predict(X)\n\nInstruments\n-----------\n\nThe package comes with a handful of instruments which log information about ``X`` or timing of execution. You can create your own instrument just by creating a decorator, following this pattern\n\n.. code-block:: python\n\n    from functools import wraps\n\n\n    def my_instrumentation(estimator, func, **dkwargs):\n        \"\"\"Wrap an estimator method with instrumentation.\n\n        :param obj: The class or instance on which to apply instrumentation\n        :param func: The method to be instrumented.\n        :param dkwargs: Decorator kwargs, which can be passed to the\n            decorator at decoration time. For estimator instrumentation\n            this allows different parametrizations for each ml model.\n        \"\"\"\n        @wraps(func)\n        def wrapper(*args, **kwargs):\n            \"\"\"Wrapping function.\n\n            :param args: The args passed to methods, typically\n                just ``X`` and/or ``y``\n            :param kwargs: The kwargs passed to methods, usually\n                weights or other params\n            \"\"\"\n            # Code goes here before execution of the estimator method\n            retval = func(*args, **kwargs)\n            # Code goes here after execution of the estimator method\n            return retval\n\n        return wrapper\n\n\nTo create a stateful instrument, use a class with the ``__call__`` method for implementing the decorator:\n\n.. code-block:: python\n\n    from functools import wraps\n\n    from sklearn_instrumentation.instruments.base import BaseInstrument\n\n\n    class MyInstrument(BaseInstrument)\n\n        def __init__(self, *args, **kwargs):\n            # handle any statefulness here\n            pass\n\n        def __call__(self, estimator, func, **dkwargs):\n            \"\"\"Wrap an estimator method with instrumentation.\n\n            :param obj: The class or instance on which to apply instrumentation\n            :param func: The method to be instrumented.\n            :param dkwargs: Decorator kwargs, which can be passed to the\n                decorator at decoration time. For estimator instrumentation\n                this allows different parametrizations for each ml model.\n            \"\"\"\n            @wraps(func)\n            def wrapper(*args, **kwargs):\n                \"\"\"Wrapping function.\n\n                :param args: The args passed to methods, typically\n                    just ``X`` and/or ``y``\n                :param kwargs: The kwargs passed to methods, usually\n                    weights or other params\n                \"\"\"\n                # Code goes here before execution of the estimator method\n                retval = func(*args, **kwargs)\n                # Code goes here after execution of the estimator method\n                return retval\n\n            return wrapper\n\n\nTo pass kwargs for different ml models:\n\n.. code-block:: python\n\n    instrumentor = SklearnInstrumentor(instrument=my_instrument)\n\n    instrumentor.instrument_instance(estimator=ml_model_1, instrument_kwargs={\"name\": \"awesome_model\"})\n    instrumentor.instrument_instance(estimator=ml_model_2, instrument_kwargs={\"name\": \"better_model\"})\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrflynn%2Fsklearn-instrumentation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrflynn%2Fsklearn-instrumentation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrflynn%2Fsklearn-instrumentation/lists"}