{"id":24797230,"url":"https://github.com/shimantorahman/empulse","last_synced_at":"2025-10-12T22:32:09.345Z","repository":{"id":217912374,"uuid":"654945788","full_name":"ShimantoRahman/empulse","owner":"ShimantoRahman","description":"Value-driven and cost-sensitive analysis for scikit-learn","archived":false,"fork":false,"pushed_at":"2025-01-28T22:41:29.000Z","size":13210,"stargazers_count":18,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-28T22:47:30.527Z","etag":null,"topics":["cost-sensitive","cost-sensitive-learning","data-science","machine-learning","profit-driven","profit-driven-analytics","python","scikit-learn","sklearn","value-driven","value-driven-analytics"],"latest_commit_sha":null,"homepage":"https://empulse.readthedocs.io/en/stable/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShimantoRahman.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-17T12:13:02.000Z","updated_at":"2025-01-28T22:37:52.000Z","dependencies_parsed_at":"2024-01-21T23:45:38.914Z","dependency_job_id":"3c94e25c-a4e5-4135-ac7d-267545f6ddff","html_url":"https://github.com/ShimantoRahman/empulse","commit_stats":null,"previous_names":["shimantorahman/empulse"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShimantoRahman%2Fempulse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShimantoRahman%2Fempulse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShimantoRahman%2Fempulse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShimantoRahman%2Fempulse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShimantoRahman","download_url":"https://codeload.github.com/ShimantoRahman/empulse/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236280003,"owners_count":19123476,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cost-sensitive","cost-sensitive-learning","data-science","machine-learning","profit-driven","profit-driven-analytics","python","scikit-learn","sklearn","value-driven","value-driven-analytics"],"created_at":"2025-01-30T01:18:41.493Z","updated_at":"2025-10-12T22:32:09.327Z","avatar_url":"https://github.com/ShimantoRahman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI Downloads](https://static.pepy.tech/badge/empulse)](https://pepy.tech/projects/empulse)\n[![Python Version](https://img.shields.io/pypi/v/empulse)](https://pypi.org/project/empulse/)\n[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/ShimantoRahman/empulse)\n![](https://img.shields.io/pypi/pyversions/empulse)\n![Tests](https://github.com/ShimantoRahman/empulse/actions/workflows/tests.yml/badge.svg)\n[![Docs](https://img.shields.io/readthedocs/empulse)](https://empulse.readthedocs.io/en/latest/)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![DOI](https://zenodo.org/badge/654945788.svg)](https://zenodo.org/doi/10.5281/zenodo.11185663)\n\n# Empulse\n\n\u003ca href=\"https://empulse.readthedocs.io/en/latest/\"\u003e\u003cimg src=\"https://empulse.readthedocs.io/en/latest/_static/assets/empulse_logo_light.png\" alt=\"Empulse Logo\" width=\"25%\" height=\"25%\" align=\"right\" /\u003e\u003c/a\u003e\n\n\u003c!-- start-of-readme-intro --\u003e\n\nEmpulse is a package aimed to enable value-driven and cost-sensitive analysis in Python.\nThe package implements popular value-driven and cost-sensitive metrics and algorithms \nin accordance to sci-kit learn conventions.\nThis allows the measures to seamlessly integrate into existing ML workflows.\n\n## Installation\n\nEmpulse requires python 3.10 or higher.\n\nInstall `empulse` via pip with\n\n```bash\npip install empulse\n```\n\n\u003c!-- end-of-readme-install --\u003e\n\n## Documentation\nYou can find the documentation [here](https://empulse.readthedocs.io/en/stable/).\n\n\u003c!-- end-of-readme-intro --\u003e\n\n## Features\n\n- [Ready to use out of the box with scikit-learn](#ready-to-use-out-of-the-box-with-scikit-learn)\n- [Use case specific profit and cost metrics](#use-case-specific-profit-and-cost-metrics)\n- [Build your own profit and cost metrics](#build-your-own-profit-and-cost-metrics)\n- [Various profit-driven and cost-sensitive models](#various-profit-driven-and-cost-sensitive-models)\n- [Easy passing of instance-dependent costs](#easy-passing-of-instance-dependent-costs)\n- [Cost-aware resampling and relabeling](#cost-aware-resampling-and-relabeling)\n- [Find the optimal decision threshold](#find-the-optimal-decision-threshold)\n- [Easy access to real-world datasets for benchmarking](#easy-access-to-real-world-datasets-for-benchmarking)\n\n## Take the tour\n\n### Ready to use out of the box with scikit-learn\n\nAll components of the package are designed to work seamlessly with scikit-learn.\n\nModels are implemented as scikit-learn estimators and can be used anywhere a scikit-learn estimator can be used.\n\n#### Pipelines\n```python\nfrom empulse.models import CSLogitClassifier\nfrom sklearn.datasets import make_classification\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nX, y = make_classification()\npipeline = Pipeline([\n    (\"scaler\", StandardScaler()),\n    (\"model\", CSLogitClassifier())\n])\npipeline.fit(X, y, model__fp_cost=10, model__fn_cost=1)\n```\n\n#### Cross-validation\n```python\nfrom sklearn.model_selection import cross_val_score\n\ncross_val_score(\n    pipeline, \n    X, \n    y, \n    scoring=\"roc_auc\", \n    params={\"model__fp_cost\": 10, \"model__fn_cost\": 1}\n)\n```\n\n#### Grid search\n```python\nfrom sklearn.model_selection import GridSearchCV\n\nparam_grid = {\"model__C\": [0.1, 1, 10]}\ngrid_search = GridSearchCV(pipeline, param_grid, scoring=\"roc_auc\")\ngrid_search.fit(X, y, model__fp_cost=10, model__fn_cost=1)\n```\n\nAll metrics can easily be converted as scikit-learn scorers \nand can be used in the same way as any other scikit-learn scorer.\n\n```python\nfrom empulse.metrics import expected_cost_loss\nfrom sklearn.metrics import make_scorer\n\nscorer = make_scorer(\n    expected_cost_loss, \n    response_method=\"predict_proba\", \n    greater_is_better=False,\n    fp_cost=10,\n    fn_cost=1\n)\n\ncross_val_score(\n    pipeline, \n    X, \n    y, \n    scoring=scorer, \n    params={\"model__fp_cost\": 10, \"model__fn_cost\": 1}\n)\n```\n\n### Use-case specific profit and cost metrics\n\nEmpulse offers a wide range of profit and cost metrics that are tailored to specific use cases such as:\n- [customer churn](https://empulse.readthedocs.io/en/stable/reference/metrics.html#customer-churn-metrics), \n- [customer acquisition](https://empulse.readthedocs.io/en/stable/reference/metrics.html#customer-acquisition-metrics),\n- [credit scoring](https://empulse.readthedocs.io/en/stable/reference/metrics.html#credit-scoring-metrics),\n- and fraud detection (coming soon).\n\nFor other use cases, the package provides generic implementations for:\n- the [cost loss](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.cost_loss.html),\n- the [expected cost loss](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.expected_cost_loss.html),\n- the [expected log cost loss](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.expected_log_cost_loss.html),\n- the [savings score](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.savings_score.html),\n- the [expected savings score](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.expected_savings_score.html),\n- and the [maximum profit score](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.max_profit_score.html).\n\n\n### Build your own profit and cost metrics\n\nThe [Metric](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.Metric.html) class allows\nyou to easily build your own profit and cost metrics.\n\nFirst, you start to define the cost matrix \nthrough the [CostMatrix](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.CostMatrix.html)\nclass.\n\nA tiny example: suppose you’re building a spam filter.\n- False positive (legit email marked as spam) costs 5, because you might miss something important\n- False negative (spam slips through) costs 1, because it wastes your time\n\n```python\nfrom empulse.metrics import CostMatrix\n\ncost_matrix = CostMatrix().add_fp_cost('fp').add_fn_cost('fn')\ncost_matrix.alias({'opportunity_cost': 'fp', 'time_wasted_cost': 'fn'})\n```\n\nNow, you can define your own profit and cost metrics using different strategies.\n\n- [Cost](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.Cost.html): lower is better.\n- [Savings](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.Savings.html): higher is better.\n- [MaxProfit](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.MaxProfit.html):\n  higher is better.\n\n```python\nfrom empulse.metrics import Metric, Cost, Savings, MaxProfit\n\nexpected_cost_loss = Metric(cost_matrix=cost_matrix, strategy=Cost()) \nexpected_savings_score = Metric(cost_matrix=cost_matrix, strategy=Savings())\nexpected_max_profit_score = Metric(cost_matrix=cost_matrix, strategy=MaxProfit())\n\nexpected_cost_loss(\n    y, pipeline.predict_proba(X)[:, 1], opportunity_cost=5, time_wasted_cost=1\n)\n```\n\nYour custom metric can also be optimized by the models:\n\n```python\nfrom empulse.models import CSBoostClassifier\n\ncsboost = CSBoostClassifier(loss=expected_cost_loss)\ncsboost.fit(X, y, opportunity_cost=5, time_wasted_cost=1)\n```\n\nRead more in the [User Guide](https://empulse.readthedocs.io/en/stable/guide/metrics/user_defined_value_metric.html).\n\n### Various profit-driven and cost-sensitive models\n\nEmpulse provides a range of profit-driven and cost-sensitive models such as:\n- [CSLogitClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.CSLogitClassifier.html),\n- [CSBoostClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.CSBoostClassifier.html),\n- [CSTreeClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.CSTreeClassifier.html),\n- [CSForestClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.CSForestClassifier.html),\n- [CSBaggingClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.CSBaggingClassifier.html),\n- [B2BoostClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.B2BoostClassifier.html),\n- [RobustCSClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.RobustCSClassifier.html),\n- [ProfLogitClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.ProfLogitClassifier.html),\n- [BiasRelabelingClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.BiasRelabelingClassifier.html),\n- [BiasResamplingClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.BiasResamplingClassifier.html),\n- and [BiasReweighingClassifier](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.BiasReweighingClassifier.html).\n\nEach classifier tries to mimic the behaviour of sklearn's classifiers with a cost-sensitive twist.\n\n```python\nfrom empulse.models import CSTreeClassifier\nfrom sklearn.tree import DecisionTreeClassifier\n\ncstree = CSTreeClassifier(max_depth=2, min_samples_leaf=1, random_state=42)\ndtree = DecisionTreeClassifier(max_depth=2, min_samples_leaf=1, random_state=42)\n\ncstree.fit(X, y, fp_cost=10, fn_cost=1)\ndtree.fit(X, y)\n```\n\n### Easy passing of instance-dependent costs\n\nInstance-dependent costs can easily be passed to the models through [metadata routing](https://scikit-learn.org/stable/metadata_routing.html).\n\nFor instance, the instance-dependent costs are passed dynamically to each fold of the cross-validation\nthrough requesting the costs in the `set_fit_request` method of the model \nand the `set_score_request` method of the scorer.\n    \n```python\nimport numpy as np\nfrom empulse.models import CSTreeClassifier\nfrom empulse.metrics import expected_cost_loss\nfrom sklearn import set_config\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import cross_val_score\nfrom sklearn.metrics import make_scorer\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nset_config(enable_metadata_routing=True)\n\nX, y = make_classification()\nfp_cost = np.random.rand(y.size)\nfn_cost = np.random.rand(y.size)\n\npipeline = Pipeline([\n    (\"scale\", StandardScaler()),\n    (\"model\", CSTreeClassifier().set_fit_request(fp_cost=True, fn_cost=True))\n])\n\nscorer = make_scorer(\n    expected_cost_loss,\n    response_method=\"predict_proba\",\n    greater_is_better=False,\n).set_score_request(fp_cost=True, fn_cost=True)\n\ncross_val_score(pipeline, X, y, scoring=scorer, params={\"fp_cost\": fp_cost, \"fn_cost\": fn_cost})\n```\n\n### Cost-aware resampling and relabeling\n\nEmpulse uses the [imbalanced-learn](https://imbalanced-learn.org/) \npackage to provide cost-aware resampling and relabeling techniques:\n- [CostSensitiveSampler](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.samplers.CostSensitiveSampler.html)\n- [BiasResampler](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.samplers.BiasResampler.html)\n- [BiasRelabler](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.samplers.BiasRelabler.html)\n\n```python\nfrom empulse.samplers import CostSensitiveSampler\nfrom sklearn.datasets import make_classification\n\nX, y = make_classification()\nsampler = CostSensitiveSampler()\nX_resampled, y_resampled = sampler.fit_resample(X, y, fp_cost=2, fn_cost=1)\n```\n\nThey can be used in an imbalanced-learn pipeline:\n\n```python\nimport numpy as np\nfrom empulse.samplers import CostSensitiveSampler\nfrom imblearn.pipeline import Pipeline\nfrom sklearn import set_config\nfrom sklearn.datasets import make_classification\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\n\nset_config(enable_metadata_routing=True)\n\nX, y = make_classification()\nfp_cost = np.random.rand(y.size)\nfn_cost = np.random.rand(y.size)\npipeline = Pipeline([\n    (\"scaler\", StandardScaler()),\n    (\"sampler\", CostSensitiveSampler().set_fit_resample_request(fp_cost=True, fn_cost=True)),\n    (\"model\", LogisticRegression())\n])\n\npipeline.fit(X, y, fp_cost=fp_cost, fn_cost=fn_cost)\n```\n\n### Find the optimal decision threshold\n\nEmpulse provides the \n[`CSThresholdClassifier`](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.models.CSThresholdClassifier.html)\nwhich allows you to find the optimal decision threshold for a given cost matrix to minimize the expected cost loss.\n\nThe meta-estimator changes the `predict` method of the base estimator to predict the class with the lowest expected cost.\n\n```python\nfrom empulse.models import CSThresholdClassifier\nfrom sklearn.datasets import make_classification\nfrom sklearn.linear_model import LogisticRegression\n\nX, y = make_classification()\nmodel = CSThresholdClassifier(estimator=LogisticRegression())\nmodel.fit(X, y)\nmodel.predict(X, fp_cost=2, fn_cost=1)\n```\n\nMetrics like the maximum profit score conveniently return the optimal target threshold.\nFor example, the Expected Maximum Profit measure for customer churn (EMPC) \ntells you what fraction of the customer base should be targeted to maximize profit.\n\n```python\nfrom empulse.metrics import empc\nfrom sklearn.datasets import make_classification\nfrom sklearn.linear_model import LogisticRegression\n\nX, y = make_classification()\nmodel = LogisticRegression()\npredictions = model.fit(X, y).predict_proba(X)[:, 1]\n\nscore, threshold = empc(y, predictions, clv=50)\n```\n\nThis score can then be converted to a decision threshold by using the \n[`classification_threshold`](https://empulse.readthedocs.io/en/stable/reference/generated/empulse.metrics.classification_threshold.html) \nfunction.\n\n```python\nfrom empulse.metrics import classification_threshold\n\ndecision_threshold = classification_threshold(y, predictions, customer_threshold=threshold)\n```\n\nThis can then be combined with sci-kit learn's \n[`FixedThresholdClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.FixedThresholdClassifier.html)\nto create a model that predicts the class with the highest expected profit.\n\n```python\nfrom sklearn.model_selection import FixedThresholdClassifier\n\nmodel = FixedThresholdClassifier(estimator=model, threshold=decision_threshold)\nmodel.predict(X)\n```\n\n### Easy access to real-world datasets for benchmarking\n\nEmpulse provides easy access to real-world datasets for benchmarking cost-sensitive models.\n\nEach dataset returns the features, the target, and the instance-dependent costs, ready to use in a cost-sensitive model.\n\n```python\nfrom empulse.datasets import load_give_me_some_credit\nfrom empulse.models import CSLogitClassifier\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nX, y, tp_cost, fp_cost, tn_cost, fn_cost = load_give_me_some_credit(return_X_y_costs=True)\n\npipeline = Pipeline([\n    ('scaler', StandardScaler()),\n    ('model', CSLogitClassifier())\n])\npipeline.fit(\n    X, \n    y, \n    model__tp_cost=tp_cost, \n    model__fp_cost=fp_cost, \n    model__tn_cost=tn_cost, \n    model__fn_cost=fn_cost\n)\n```\n\n\u003c!-- end-of-readme-usage --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshimantorahman%2Fempulse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshimantorahman%2Fempulse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshimantorahman%2Fempulse/lists"}