{"id":19283019,"url":"https://github.com/david-cortes/costsensitive","last_synced_at":"2026-03-01T13:03:26.227Z","repository":{"id":43478890,"uuid":"125932588","full_name":"david-cortes/costsensitive","owner":"david-cortes","description":"(Python, R) Cost-sensitive multiclass classification (Weighted-All-Pairs, Filter-Tree \u0026 others)","archived":false,"fork":false,"pushed_at":"2025-05-09T18:16:59.000Z","size":1797,"stargazers_count":49,"open_issues_count":2,"forks_count":19,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-01-06T23:11:16.202Z","etag":null,"topics":["cost-sensitive-classification","multi-label-classification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/david-cortes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-03-19T23:17:44.000Z","updated_at":"2025-09-12T16:42:40.000Z","dependencies_parsed_at":"2024-06-18T21:08:49.550Z","dependency_job_id":"9f9b441d-a79e-490d-9b46-809ca3df8e32","html_url":"https://github.com/david-cortes/costsensitive","commit_stats":{"total_commits":49,"total_committers":2,"mean_commits":24.5,"dds":"0.10204081632653061","last_synced_commit":"ce9e93d5ead16d7523a7b1aa3f1cd75c4a902789"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/david-cortes/costsensitive","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fcostsensitive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fcostsensitive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fcostsensitive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fcostsensitive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/david-cortes","download_url":"https://codeload.github.com/david-cortes/costsensitive/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/david-cortes%2Fcostsensitive/sbom","scorecard":{"id":325840,"data":{"date":"2025-08-11","repo":{"name":"github.com/david-cortes/costsensitive","commit":"0a3c4411fb7ba931509f2119319748b210abd162"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.5,"checks":[{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: BSD 2-Clause \"Simplified\" License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":6,"reason":"4 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2022-288 / GHSA-6hrg-qmvc-2xh8","Warn: Project is vulnerable to: PYSEC-2019-156 / GHSA-xp76-357g-9wqq","Warn: Project is vulnerable to: PYSEC-2023-102","Warn: Project is vulnerable to: PYSEC-2023-114"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T02:28:08.830Z","repository_id":43478890,"created_at":"2025-08-18T02:28:08.830Z","updated_at":"2025-08-18T02:28:08.830Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29969700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T12:56:10.327Z","status":"ssl_error","status_checked_at":"2026-03-01T12:55:24.744Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cost-sensitive-classification","multi-label-classification"],"created_at":"2024-11-09T21:29:20.094Z","updated_at":"2026-03-01T13:03:26.211Z","avatar_url":"https://github.com/david-cortes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cost-Sensitive Multi-Class Classification\n\nThis Python/R package contains implementations of reduction-based algorithms for cost-sensitive multi-class classification from different papers, plus some simpler heuristics for comparison purposes.\n\n## Problem description\n\nCost-sensitive multi-class classification is a problem related to multi-class classification, in which instead of there being one or more \"correct\" labels for each observation, there is an associated vector of costs for labeling each observation under each label, and the goal is to build a classifier that predicts the class with the minimum expected cost.\n\nIt is a more general problem than classification with costs defined for its confusion matrix (i.e. specifying how costly it is for each label to predict each other different label) or multi-class classification with observation weights (i.e. misclassifying each observation has a different cost, but this cost is the same regardless of the true and the predicted class), as here each observation can have a different cost for each type of misprediction.\n\nWhen the costs are in the form of `C = {1*I(yhat = y)}` (that is, the cost for predicting the right labels is zero, while the cost for predicting a wrong label is one), the problem is equivalent to maximizing multiclass classification accuracy.\n\nThe aim of the algorithms here is to reduce this problem to binary classification with sample weights, which is a more well-studied problem for which many good algorithms are available. A further reduction to binary classification without sample weights is possible through the costing-proportionate rejection-sampling method, also implemented here.\n\n\n## Algorithms\n\nThe following algorithms are implemented:\n* `WeightedAllPairs` (see \"Error limiting reductions between classification tasks\" and \"Machine learning techniques—reductions between prediction quality metrics\")\n* `RegressionOneVsRest` (see \"Machine learning techniques—reductions between prediction quality metrics\")\n* `WeightedOneVsRest` (a heuristic with no theoretical guarantees based on the minimum cost of the 'rest' choice)\n* `FilterTree` (see \"Multiclass classification with filter trees\")(Python only)\n\nFor binary classifiers which don't support importance weighting, also an implementation of Cost-Proportionate Rejection Sampling is provided (`CostProportionateClassifier`, see \"Machine learning techniques—reductions between prediction quality metrics\").\n\nThese are implemented as classes under the same names above, with corresponding `fit` and `predict` methods, plus a `decision_function` method (only when base classifier has `predict_proba` method). They require as input a base classifier with `fit` and `predict` methods that would allow a `sample_weight` argument to its `fit` method (e.g. pretty much all classifiers from scikit-learn and scikit-learn-compatible such as xgboost).\n\nThey also contain options to try slight variations, such as using weights as simply the difference between the cost of one class vs. another for `WeightedAllPairs`, which don't enjoy the same theoretical regret bounds but in practice can do better than more elaborate choices. You can check these options in the documentation of each algorithm.\n\nThe variants implemented here are based on multiple oracle calls (building a series of classifiers) rather than on single oracle call with index as features (building only one classifier, with the labels compared as extra columns in the data), as these tend to result in easer subproblems and to give more consistent results across problems.\n\n\n## Installation\n\n\n* Python:\n\n```pip install costsensitive``` \n\n** *\n**IMPORTANT:** the setup script will try to add compilation flag `-march=native`. This instructs the compiler to tune the package for the CPU in which it is being installed (by e.g. using AVX instructions if available), but the result might not be usable in other computers. If building a binary wheel of this package or putting it into a docker image which will be used in different machines, this can be overriden either by (a) defining an environment variable `DONT_SET_MARCH=1`, or by (b) manually supplying compilation `CFLAGS` as an environment variable with something related to architecture. For maximum compatibility (but slowest speed), it's possible to do something like this:\n\n```\nexport DONT_SET_MARCH=1\npip install costsensitive\n```\n\nor, by specifying some compilation flag for architecture:\n```\nexport CFLAGS=\"-march=x86-64\"\npip install costsensitive\n```\n** *\n\n\n* R:\n\n```r\ninstall.packages(\"costsensitive\")\n```\n\n\n## Sample usage\n\n(For the R version, see the documentation inside the package for examples - link to [CRAN](https://cran.r-project.org/web/packages/costsensitive/index.html))\n\n```python \nimport numpy as np\nfrom sklearn.linear_model import LogisticRegression, Ridge\nfrom costsensitive import WeightedAllPairs, WeightedOneVsRest,\n\t\tRegressionOneVsRest, FilterTree, CostProportionateClassifier\n\n### Generating totally random observations and costs\n### This is a dataset with 1000 observations, 20 features, and 5 classes\nX = np.random.normal(size = (1000, 20))\nC = np.random.gamma(1, 5, size=(1000, 5))\n\n### In case your classifier doesn't support sample weights\nclassifier_with_weights = CostProportionateClassifier(LogisticRegression())\n\n### WeightedAllPairs\ncostsensitive_classifier = WeightedAllPairs(LogisticRegression(), weigh_by_cost_diff = True)\ncostsensitive_classifier.fit(X, C)\ncostsensitive_classifier.predict(X, method='most-wins')\ncostsensitive_classifier.decision_function(X, method='goodness')\n\n### WeightedOneVsRest\ncostsensitive_classifier = WeightedOneVsRest(LogisticRegression(), weight_simple_diff = False)\ncostsensitive_classifier.fit(X, C)\ncostsensitive_classifier.predict(X)\ncostsensitive_classifier.decision_function(X)\n\n### RegressionOneVsRest\n### Takes a regressor rather than a classifier\ncostsensitive_classifier = RegressionOneVsRest(Ridge())\ncostsensitive_classifier.fit(X, C)\ncostsensitive_classifier.predict(X)\ncostsensitive_classifier.decision_function(X)\n\n### FilterTree\n### Implemented for comparison purposes, not recommended to use in practice\ncostsensitive_classifier = FilterTree(LogisticRegression())\ncostsensitive_classifier.fit(X, C)\ncostsensitive_classifier.predict(X)\n``` \n\nFor a more detailed example, see the IPython notebook [Cost-Sensitive Multi-Class Classification](http://nbviewer.jupyter.org/github/david-cortes/costsensitive/blob/master/example/costsensitive_multiclass_classification.ipynb).\n\n**Results on CovType data set, artificially set costs (see link above)**\n![image](plots/covtype_results.png \"simulation_covtype\")\n\n## Documentation\n\nDocumentation for Python is available at [http://costsensitive.readthedocs.io/en/latest/](http://costsensitive.readthedocs.io/en/latest/). For R, it's available as part of the package (see [cran link](https://cran.r-project.org/web/packages/costsensitive/index.html)).\n\nAll code is internally documented through docstrings (e.g. you can try `help(WeightedAllPairs)`, `help(WeightedAllPairs.fit)`, `help(WeightedAllPairs.predict)`, etc. - in R: `help(costsensitive::weighted.all.pairs)` and so on).\n\n## Some comments\n\nIn general, you would most likely be best served by using `WeightedAllPairs` with default arguments. The pairwise weighting technique from \"Error limiting reductions between classification tasks\" doesn't seem to improve expected cost in practice compared to simply defining weight as the difference in cost between two classes.\n\nAll-Pairs however requires fitting `m*(m-1)/2` classifiers, where `m` is the number of classes. If there are too many classes, this means fitting a very large number of classifiers, in which case you might want to consider `RegressionOneVsRest` - it works with a regressor rather than a classifier, as the name suggests.\n\nThe `FilterTree` method from \"Multiclass classification with filter trees\" tends to work really bad in practice with linear classifiers such as logistic regression, as it implies mixing together classes, which can result in very hard classification problems. Only recommended for tree-based classifiers.\n\n## References \n\n* Beygelzimer, A., Dani, V., Hayes, T., Langford, J., \u0026 Zadrozny, B. (2005, August). Error limiting reductions between classification tasks. In Proceedings of the 22nd international conference on Machine learning (pp. 49-56). ACM. \n* Beygelzimer, A., Langford, J., \u0026 Zadrozny, B. (2008). Machine learning techniques—reductions between prediction quality metrics. In Performance Modeling and Engineering (pp. 3-28). Springer US. \n* Beygelzimer, A., Langford, J., \u0026 Ravikumar, P. (2007). Multiclass classification with filter trees. Preprint, June, 2. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-cortes%2Fcostsensitive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavid-cortes%2Fcostsensitive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavid-cortes%2Fcostsensitive/lists"}