{"id":24767396,"url":"https://github.com/ofai/hub-toolbox-python3","last_synced_at":"2025-10-11T17:30:27.494Z","repository":{"id":62569726,"uuid":"44373317","full_name":"OFAI/hub-toolbox-python3","owner":"OFAI","description":"Hubness analysis and removal functions","archived":false,"fork":false,"pushed_at":"2023-04-11T08:52:04.000Z","size":1982,"stargazers_count":19,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-09-28T05:29:07.865Z","etag":null,"topics":["data-mining","high-dimensional-data","hubness","machine-learning"],"latest_commit_sha":null,"homepage":"http://www.ofai.at","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OFAI.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-10-16T08:45:34.000Z","updated_at":"2024-03-11T11:44:19.000Z","dependencies_parsed_at":"2022-11-03T17:02:19.272Z","dependency_job_id":null,"html_url":"https://github.com/OFAI/hub-toolbox-python3","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/OFAI/hub-toolbox-python3","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFAI%2Fhub-toolbox-python3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFAI%2Fhub-toolbox-python3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFAI%2Fhub-toolbox-python3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFAI%2Fhub-toolbox-python3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OFAI","download_url":"https://codeload.github.com/OFAI/hub-toolbox-python3/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OFAI%2Fhub-toolbox-python3/sbom","scorecard":{"id":103549,"data":{"date":"2025-08-11","repo":{"name":"github.com/OFAI/hub-toolbox-python3","commit":"b76fa405dc6ffc80484a9bfed7e68fa828b7dc8e"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 0/27 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: GNU General Public License v3.0: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"15 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2022-288 / GHSA-6hrg-qmvc-2xh8","Warn: Project is vulnerable to: PYSEC-2018-34 / GHSA-2fc2-6r4j-p65h","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: PYSEC-2019-108 / GHSA-9fq2-x9r6-wfmf","Warn: Project is vulnerable to: PYSEC-2018-33 / GHSA-cw6w-4rcx-xphc","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2017-1 / GHSA-frgw-fgh6-9g52","Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2020-107 / GHSA-jjw5-xxj6-pcv5","Warn: Project is vulnerable to: PYSEC-2024-110 / GHSA-jw8x-6495-233v","Warn: Project is vulnerable to: PYSEC-2020-108","Warn: Project is vulnerable to: PYSEC-2019-156 / GHSA-xp76-357g-9wqq","Warn: Project is vulnerable to: PYSEC-2023-102","Warn: Project is vulnerable to: PYSEC-2023-114"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 5 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-15T10:40:01.057Z","repository_id":62569726,"created_at":"2025-08-15T10:40:01.057Z","updated_at":"2025-08-15T10:40:01.057Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279008116,"owners_count":26084396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-mining","high-dimensional-data","hubness","machine-learning"],"created_at":"2025-01-29T00:53:35.044Z","updated_at":"2025-10-11T17:30:27.060Z","avatar_url":"https://github.com/OFAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. image:: https://badge.fury.io/py/hub-toolbox.svg\n    :target: https://badge.fury.io/py/hub-toolbox\n\n.. image:: https://readthedocs.org/projects/hub-toolbox-python3/badge/?version=latest\n\t:target: http://hub-toolbox-python3.readthedocs.io/en/latest/?badge=latest\n\t:alt: Documentation Status\n\n.. image:: https://travis-ci.org/OFAI/hub-toolbox-python3.svg?branch=master\n    :target: https://travis-ci.org/OFAI/hub-toolbox-python3\n\n.. image:: https://coveralls.io/repos/github/OFAI/hub-toolbox-python3/badge.svg?branch=master\n\t:target: https://coveralls.io/github/OFAI/hub-toolbox-python3?branch=master \n\n.. image:: https://img.shields.io/aur/license/yaourt.svg?maxAge=2592000   \n\t:target: https://github.com/OFAI/hub-toolbox-python3/blob/master/LICENSE.txt\n\n\nHUB-TOOLBOX\n===========\n\n#-----------------------------------------------------------------------------------\n\nCheckout our new project `scikit-hubness \u003chttps://github.com/VarIr/scikit-hubness\u003e`_\nwhich provides the functionality of the Hub-Toolbox while integrating nicely into\n`scikit-learn` workflows.\n\nUse `skhubness.neighbors` as a drop-in replacement for `sklearn.neighbors`.\nIt offers the same functionality and adds transparent support for hubness reduction,\napproximate nearest neighbor search (HNSW, LSH), and approximate hubness reduction.\n\nWe strive to improve usability of hubness reduction with the development of\n`scikit-hubness`, and we are very interested in\n`user feedback \u003chttps://github.com/VarIr/scikit-hubness/issues\u003e`_!\n\n#-----------------------------------------------------------------------------------\n\nThe Hub Toolbox is a software suite for hubness analysis and\nhubness reduction in high-dimensional data.\n\nIt allows to\n\n- analyze, whether your datasets show hubness\n- reduce hubness via a variety of different techniques \n  (including scaling and centering approaches)\n  and obtain secondary distances for downstream analysis inside or \n  outside the Hub Toolbox\n- perform evaluation tasks with both internal and external measures\n  (e.g. Goodman-Kruskal index and k-NN classification)\n- NEW IN 2.5:\n  The ``approximate`` module provides approximate hubness reduction methods\n  with linear complexity which allow to analyze large datasets.\n- NEW IN 2.5:\n  Measure hubness with the recently proposed Robin-Hood index\n  for fast and reliable hubness estimation.\n\t\nInstallation\n------------\n\nMake sure you have a working Python3 environment (at least 3.6) with\nnumpy, scipy and scikit-learn packages. Use pip3 to install the latest \nstable version:\n\n.. code-block:: bash\n\n  pip3 install hub-toolbox\n\nFor more details and alternatives, please see the `Installation instructions\n\u003chttp://hub-toolbox-python3.readthedocs.io/en/latest/user/installation.html\u003e`_.\n\nDocumentation\n-------------\n\nDocumentation is available online: \nhttp://hub-toolbox-python3.readthedocs.io/en/latest/index.html\n\nExample\n-------\n\nTo run a full hubness analysis on the example dataset (DEXTER) \nusing some of the provided hubness reduction methods, \nsimply run the following in a Python shell:\n\n.. code-block:: python\n\n\t\u003e\u003e\u003e from hub_toolbox.HubnessAnalysis import HubnessAnalysis\n\t\u003e\u003e\u003e ana = HubnessAnalysis()\n\t\u003e\u003e\u003e ana.analyze_hubness()\n\t\nSee how you can conduct the individual analysis steps:\n\n.. code-block:: python\n\n\timport hub_toolbox\n\t\n\t# load the DEXTER example dataset\n\tD, labels, vectors = hub_toolbox.io.load_dexter()\n\n\t# calculate intrinsic dimension estimate\n\td_mle = hub_toolbox.intrinsic_dimension.intrinsic_dimension(vector)\n\t\n\t# calculate hubness (here, skewness of 5-occurence)\n\tS_k, _, _ = hub_toolbox.hubness.hubness(D=D, k=5, metric='distance')\n\n\t# perform k-NN classification LOO-CV for two different values of k\n\tacc, _, _ = hub_toolbox.knn_classification.score(\n                D=D, target=labels, k=[1,5], metric='distance')\n\n\t# calculate Goodman-Kruskal index\n\tgamma = hub_toolbox.goodman_kruskal.goodman_kruskal_index(\n\t    D=D, classes=labels, metric='distance')\n\t \t\n\t# Reduce hubness with Mutual Proximity (Empiric distance distribution)\n\tD_mp = hub_toolbox.global_scaling.mutual_proximity_empiric(\n\t    D=D, metric='distance')\n\t\t\n\t# Reduce hubness with Local Scaling variant NICDM\n\tD_nicdm = hub_toolbox.local_scaling.nicdm(D=D, k=10, metric='distance')\n\t\n\t# Check whether indices improve after hubness reduction\n\tS_k_mp, _, _ = hub_toolbox.hubness.hubness(D=D_mp, k=5, metric='distance')\n\tacc_mp, _, _ = hub_toolbox.knn_classification.score(\n\t\tD=D_mp, target=labels, k=[1,5], metric='distance')\n\tgamma_mp = hub_toolbox.goodman_kruskal.goodman_kruskal_index(\n\t\tD=D_mp, classes=labels, metric='distance')\n\t\t\n\t# Repeat the last steps for all secondary distances you calculated\n\t...\n\nCheck the `Tutorial\n\u003chttp://hub-toolbox-python3.readthedocs.io/en/latest/user/tutorial.html\u003e`_ \nfor in-depth explanations of the same. \n\n\nDevelopment\n-----------\n\nDevelopment of the Hub Toolbox has finished. Check out its successor\n`scikit-hubness \u003chttps://github.com/VarIr/scikit-hubness\u003e`_ for fully\nscikit-learn compatible hubness analysis and approximate neighbor search.\n\n.. code-block:: text\n\n\t(c) 2011-2018, Dominik Schnitzer and Roman Feldbauer\n\tAustrian Research Institute for Artificial Intelligence (OFAI)\n\tContact: \u003croman.feldbauer@ofai.at\u003e\n\nCitation\n--------\n\nIf you use the Hub Toolbox in your scientific publication, please cite:\n\n.. code-block:: text\n\n\t@InProceedings{Feldbauer2018b,\n                   author        = {Roman Feldbauer and Maximilian Leodolter and Claudia Plant and Arthur Flexer},\n                   title         = {Fast Approximate Hubness Reduction for Large High-Dimensional Data},\n                   booktitle     = {2018 {IEEE} International Conference on Big Knowledge, {ICBK} 2018, Singapore, November 17-18, 2018},\n                   year          = {2018},\n                   editor        = {Xindong Wu and Yew{-}Soon Ong and Charu C. Aggarwal and Huanhuan Chen},\n                   pages         = {358--367},\n                   publisher     = {{IEEE} Computer Society},\n                   bibsource     = {dblp computer science bibliography, https://dblp.org},\n                   biburl        = {https://dblp.org/rec/conf/icbk/FeldbauerLPF18.bib},\n                   doi           = {10.1109/ICBK.2018.00055},\n                 }\n\nRelevant literature:\n\n2018: ``Fast approximate hubness reduction for large high-dimensional data``, available as\ntechnical report at `\u003chttp://www.ofai.at/cgi-bin/tr-online?number+2018-02\u003e`_.\n\n2018: ``A comprehensive empirical comparison of hubness reduction in high-dimensional spaces``,\nfull paper available at https://doi.org/10.1007/s10115-018-1205-y\n\n2016: ``Centering Versus Scaling for Hubness Reduction``, available as technical report\nat `\u003chttp://www.ofai.at/cgi-bin/tr-online?number+2016-05\u003e`_ .\n\n2012: ``Local and Global Scaling Reduce Hubs in Space``, full paper available at\n`\u003chttp://www.jmlr.org/papers/v13/schnitzer12a.html\u003e`_ .\n\nLicense\n-------\nThe HUB TOOLBOX is licensed under the terms of the GNU GPLv3.\n\nAcknowledgements\n----------------\nPyVmMonitor is being used to support the development of this free open source \nsoftware package. For more information go to http://www.pyvmmonitor.com\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fofai%2Fhub-toolbox-python3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fofai%2Fhub-toolbox-python3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fofai%2Fhub-toolbox-python3/lists"}