{"id":33253376,"url":"https://github.com/felix-last/kmeans_smote","last_synced_at":"2025-11-21T18:02:55.155Z","repository":{"id":51233523,"uuid":"108667830","full_name":"felix-last/kmeans_smote","owner":"felix-last","description":"Oversampling for imbalanced learning based on k-means and SMOTE","archived":false,"fork":false,"pushed_at":"2021-05-19T16:02:28.000Z","size":41,"stargazers_count":128,"open_issues_count":0,"forks_count":59,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-09-23T12:00:54.438Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/felix-last.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-28T17:04:36.000Z","updated_at":"2025-08-06T08:59:32.000Z","dependencies_parsed_at":"2022-08-29T20:41:01.229Z","dependency_job_id":null,"html_url":"https://github.com/felix-last/kmeans_smote","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/felix-last/kmeans_smote","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-last%2Fkmeans_smote","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-last%2Fkmeans_smote/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-last%2Fkmeans_smote/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-last%2Fkmeans_smote/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/felix-last","download_url":"https://codeload.github.com/felix-last/kmeans_smote/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felix-last%2Fkmeans_smote/sbom","scorecard":{"id":396230,"data":{"date":"2025-08-11","repo":{"name":"github.com/felix-last/kmeans_smote","commit":"e65e17e5456a8a0722cd81334c4a6661c92d400c"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Code-Review","score":0,"reason":"Found 0/15 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 18 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T19:05:23.943Z","repository_id":51233523,"created_at":"2025-08-18T19:05:23.943Z","updated_at":"2025-08-18T19:05:23.943Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285663971,"owners_count":27210638,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-21T02:00:06.175Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-11-17T01:00:33.918Z","updated_at":"2025-11-21T18:02:55.150Z","avatar_url":"https://github.com/felix-last.png","language":"Python","funding_links":[],"categories":["Exploration"],"sub_categories":[],"readme":"Oversampling for Imbalanced Learning based on K-Means and SMOTE\n---------------------------------------------------------------\n\n|PyPI version| |Build Status| |Docs Status| |codecov|\n\nK-Means SMOTE is an oversampling method for class-imbalanced data. It\naids classification by generating minority class samples in safe and\ncrucial areas of the input space. The method avoids the generation of\nnoise and effectively overcomes imbalances between and within classes.\n\nThis project is a python implementation of k-means SMOTE. It is\ncompatible with the scikit-learn-contrib project\n`imbalanced-learn \u003chttps://github.com/scikit-learn-contrib/imbalanced-learn\u003e`__.\n\nInstallation\n------------\n\nDependencies\n~~~~~~~~~~~~\n\nThe implementation is tested under python 3.6 and works with the latest\nrelease of the imbalanced-learn framework:\n\n-  imbalanced-learn (\u003e=0.4.0, \u003c0.5)\n-  numpy (numpy\u003e=1.13, \u003c1.16)\n-  scikit-learn (\u003e=0.19.0, \u003c0.21)\n\nInstallation\n~~~~~~~~~~~~\n\nPypi\n^^^^\n\n.. code:: sh\n\n    pip install kmeans-smote\n\nFrom Source\n^^^^^^^^^^^\n\nClone this repository and run the setup.py file. Use the following\ncommands to get a copy from GitHub and install all dependencies:\n\n.. code:: sh\n\n    git clone https://github.com/felix-last/kmeans_smote.git\n    cd kmeans-smote\n    pip install .\n\nDocumentation\n-------------\n\nFind the API documentation at https://kmeans_smote.readthedocs.io. As\nthis project follows the imbalanced-learn API, the `imbalanced-learn\ndocumentation \u003chttp://contrib.scikit-learn.org/imbalanced-learn\u003e`__\nmight also prove helpful.\n\nExample Usage\n~~~~~~~~~~~~~\n\n.. code:: python\n\n    import numpy as np\n    from imblearn.datasets import fetch_datasets\n    from kmeans_smote import KMeansSMOTE\n\n    datasets = fetch_datasets(filter_data=['oil'])\n    X, y = datasets['oil']['data'], datasets['oil']['target']\n\n    [print('Class {} has {} instances'.format(label, count))\n     for label, count in zip(*np.unique(y, return_counts=True))]\n\n    kmeans_smote = KMeansSMOTE(\n        kmeans_args={\n            'n_clusters': 100\n        },\n        smote_args={\n            'k_neighbors': 10\n        }\n    )\n    X_resampled, y_resampled = kmeans_smote.fit_sample(X, y)\n\n    [print('Class {} has {} instances after oversampling'.format(label, count))\n     for label, count in zip(*np.unique(y_resampled, return_counts=True))]\n\nExpected Output:\n\n::\n\n    Class -1 has 896 instances\n    Class 1 has 41 instances\n    Class -1 has 896 instances after oversampling\n    Class 1 has 896 instances after oversampling\n\nTake a look at `imbalanced-learn\npipelines \u003chttp://contrib.scikit-learn.org/imbalanced-learn/stable/generated/imblearn.pipeline.Pipeline.html\u003e`__\nfor efficient usage with cross-validation.\n\nAbout\n-----\n\nK-means SMOTE works in three steps:\n\n1. Cluster the entire input space using k-means [1].\n2. Distribute the number of samples to generate across clusters:\n\n   1. Filter out clusters which have a high number of majority class\n      samples.\n   2. Assign more synthetic samples to clusters where minority class\n      samples are sparsely distributed.\n\n3. Oversample each filtered cluster using SMOTE [2].\n\nContributing\n~~~~~~~~~~~~\n\nPlease feel free to submit an issue if things work differently than\nexpected. Pull requests are also welcome - just make sure that tests are\ngreen by running ``pytest`` before submitting.\n\nCitation\n~~~~~~~~\n\nIf you use k-means SMOTE in a scientific publication, we would\nappreciate citations to the following\n`paper \u003chttps://arxiv.org/abs/1711.00837\u003e`__:\n\n::\n\n    @article{kmeans_smote,\n        title = {Oversampling for Imbalanced Learning Based on K-Means and SMOTE},\n        author = {Last, Felix and Douzas, Georgios and Bacao, Fernando},\n        year = {2017},\n        archivePrefix = \"arXiv\",\n        eprint = \"1711.00837\",\n        primaryClass = \"cs.LG\"\n    }\n\nReferences\n~~~~~~~~~~\n\n[1] MacQueen, J. “Some Methods for Classification and Analysis of\nMultivariate Observations.” Proceedings of the Fifth Berkeley Symposium\non Mathematical Statistics and Probability, 1967, p. 281-297.\n\n[2] Chawla, Nitesh V., et al. “SMOTE: Synthetic Minority over-Sampling\nTechnique.” Journal of Artificial Intelligence Research, vol. 16, Jan.\n2002, p. 321357, doi:10.1613/jair.953.\n\n.. |PyPI version| image:: https://badge.fury.io/py/kmeans-smote.svg\n   :target: https://badge.fury.io/py/kmeans-smote\n.. |Build Status| image:: https://travis-ci.org/felix-last/kmeans_smote.svg?branch=master\n   :target: https://travis-ci.org/felix-last/kmeans_smote\n.. |Docs Status| image:: https://readthedocs.org/projects/kmeans-smote/badge/?version=latest\n   :target: http://kmeans-smote.readthedocs.io/en/latest/?badge=latest\n.. |codecov| image:: https://codecov.io/gh/felix-last/kmeans_smote/branch/master/graph/badge.svg\n   :target: https://codecov.io/gh/felix-last/kmeans_smote\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelix-last%2Fkmeans_smote","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffelix-last%2Fkmeans_smote","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelix-last%2Fkmeans_smote/lists"}