{"id":18521618,"url":"https://github.com/transferwise/hisel","last_synced_at":"2025-04-07T16:53:18.867Z","repository":{"id":191595139,"uuid":"620456799","full_name":"transferwise/hisel","owner":"transferwise","description":"Feature selection tool based on Hilbert-Schmidt Independence Criterion","archived":false,"fork":false,"pushed_at":"2024-05-03T21:37:35.000Z","size":367,"stargazers_count":3,"open_issues_count":2,"forks_count":1,"subscribers_count":4,"default_branch":"trunk","last_synced_at":"2025-02-13T19:18:26.476Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/transferwise.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-28T18:07:26.000Z","updated_at":"2025-01-13T15:28:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"2a91ef4d-a94c-4879-abc7-7fe5b3138f36","html_url":"https://github.com/transferwise/hisel","commit_stats":null,"previous_names":["transferwise/hisel"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fhisel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fhisel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fhisel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/transferwise%2Fhisel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/transferwise","download_url":"https://codeload.github.com/transferwise/hisel/tar.gz/refs/heads/trunk","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247694892,"owners_count":20980731,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T17:26:45.441Z","updated_at":"2025-04-07T16:53:18.841Z","avatar_url":"https://github.com/transferwise.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HISEL\n## Feature selection tool based on Hilbert-Schmidt Independence Criterion\nFeature selection is\nthe machine learning \ntask\nof selecting from a data set\nthe features \nthat are relevant \nfor the prediction of a given target.\nThe `hisel` package \nprovides feature selection methods \nbased on \nHilbert-Schmidt Independence Criterion.\nIn particular,\nit provides an implementation of the HSIC Lasso algorithm of\n[Yamada, M. et al. (2012)](https://arxiv.org/abs/1202.0515). \n\n## Why is `hisel` cool?\n\n#### `hisel` is accurate\nHSIC Lasso is an excellent algorihtm for feature selection.\nThis makes `hisel` an accurate tool in your machine learning modelling.\nMoreover, \n`hisel` implements clever routines \nthat address common causes of poor accuracy in other feature selection methods.\n\nExamples of where `hisel` outperforms the methods in \n[sklearn.feature\\_selection](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection)\nare given in the notebooks\n`ensemble-example.ipynb`\nand\n`nonlinear-transform.ipynb`.\n\n\n#### `hisel` is fast\nA crucial step in the HSIC Lasso algorithm \nis the computation of\ncertain Gram matrices. \n`hisel` implemets such computations\nin a highly vectorised and performant way. \nMoreover, \n`hisel` allows you to \naccelerate these computations\n using a GPU. \nThe image below shows \nthe average run time \nof the computations\nof Gram matrices \nvia \n`hisel` on CPU, \nvia\n`hisel` on GPU,\nand \nvia \n[pyHSICLasso](https://pypi.org/project/pyHSICLasso/).\nThe performance has been measured \non the computation \nof Gram matrices required \nby HSIC Lasso \nfor the selection \nfrom a dataset of 300 features \nwith as many samples as reported on the x-axis. \n\n![gramtimes](gramtimes.png)\n\n\n#### `hisel` has a friendly user interface\n\nGetting started with `hisel` is as straightforward as the following code snippet:\n```\n    \u003e\u003e\u003e import pandas as pd\n    \u003e\u003e\u003e import hisel\n    \u003e\u003e\u003e df = pd.read_csv('mydata.csv')\n    \u003e\u003e\u003e xdf = df.iloc[:, :-1]\n    \u003e\u003e\u003e yser = df.iloc[:, -1]\n    \u003e\u003e\u003e hisel.feature_selection.select_features(xdf, yser)\n    ['d2', 'd7', 'c3', 'c10', 'c12', 'c24', 'c22', 'c21', 'c5']\n```\nIf you are not interested in more details, \nplease read no further. \nIf you would like to \nexplore more about\nhow to tune the hyper-parameters used by `hisel` \nor \nhow to have more advanced control on `hisel`'s selection,\nplease browse the examples in \n[examples/](https://github.com/transferwise/hisel/tree/trunk/examples)\nand in\n[notebooks](https://github.com/transferwise/hisel/tree/trunk/notebooks).\n\n\n\n\n## Installation\n\n### Install via `pip`\n\nThe package [hisel](https://pypi.org/project/hisel/) is available from [PyPi](https://pypi.org/). \nYou can install it via `pip`:\n```\npip install hisel \n```\n\nIf you want to install the extra support for GPU computations, you can do\n```\npip install hisel[cudaXXX]\n```\nwhere `cudaXXX` is one of the following:\n`cuda102` if you have version 10.2 of cuda-toolkit;\n`cuda110` if you have version 11.0 of cuda-toolkit;\n`cuda111` if you have version 11.1 of cuda-toolkit;\n`cuda11x` if you have version 11.2 - 11.8 of cuda-toolkit;\n`cuda12x` if you have version 12.x of cuda-toolkit.\n\n### Install from source\n\n#### Basic installation:\nCheckout the repo and navigate to the root directory. Then, \n```\npoetry install\n```\n\n\n#### Installation with GPU support\nYou need to have cuda-toolkit installed and you need to know its version.\nTo know that, you can do \n```\nnvidia-smi\n```\nand read the cuda version from the top right corner of the table that is printed out. \nOnce you know your version of `cuda`, do \n```\npoetry install -E cudaXXX\n```\nwhere `cudaXXX` is one of the following:\n`cuda102` if you have version 10.2;\n`cuda110` if you have version 11.0;\n`cuda111` if you have version 11.1;\n`cuda11x` if you have version 11.2 - 11.8;\n`cuda12x` if you have version 12.x.\nThis aligns to the [installation guide of CuPy](https://docs.cupy.dev/en/stable/install.html#installing-cupy).\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftransferwise%2Fhisel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftransferwise%2Fhisel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftransferwise%2Fhisel/lists"}