{"id":21186532,"url":"https://github.com/thieu1995/mafese","last_synced_at":"2025-08-14T01:31:12.356Z","repository":{"id":112540596,"uuid":"545209353","full_name":"thieu1995/mafese","owner":"thieu1995","description":"Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python","archived":false,"fork":false,"pushed_at":"2025-06-03T17:09:44.000Z","size":4695,"stargazers_count":81,"open_issues_count":0,"forks_count":25,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-05T15:14:12.813Z","etag":null,"topics":["decision-tree-classifier","dimensionality-reduction","feature-extraction","feature-selection","genetic-algorithm","harris-hawks-optimization","knn-classifier","machine-learning","mutual-information","optimization","pearson-correlation-coefficient","relief-f","subset-selection","svm-classifier","wrapper-methods"],"latest_commit_sha":null,"homepage":"https://mafese.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thieu1995.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-10-04T01:04:49.000Z","updated_at":"2025-06-23T08:25:21.000Z","dependencies_parsed_at":"2024-06-07T04:29:09.210Z","dependency_job_id":"94fe15c2-62ac-4adb-9e38-30554568616a","html_url":"https://github.com/thieu1995/mafese","commit_stats":null,"previous_names":["thieu1995/mafese"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/thieu1995/mafese","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thieu1995%2Fmafese","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thieu1995%2Fmafese/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thieu1995%2Fmafese/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thieu1995%2Fmafese/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thieu1995","download_url":"https://codeload.github.com/thieu1995/mafese/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thieu1995%2Fmafese/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270347431,"owners_count":24568569,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-13T02:00:09.904Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decision-tree-classifier","dimensionality-reduction","feature-extraction","feature-selection","genetic-algorithm","harris-hawks-optimization","knn-classifier","machine-learning","mutual-information","optimization","pearson-correlation-coefficient","relief-f","subset-selection","svm-classifier","wrapper-methods"],"created_at":"2024-11-20T18:24:17.123Z","updated_at":"2025-08-14T01:31:12.350Z","avatar_url":"https://github.com/thieu1995.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\n\u003cimg style=\"max-width:100%;\" \nsrc=\"https://thieu1995.github.io/post/2023-08/mafese-02.png\" \nalt=\"MAFESE\"/\u003e\n\u003c/p\u003e\n\n---\n\n[![GitHub release](https://img.shields.io/badge/release-1.0.0-yellow.svg)](https://github.com/thieu1995/mafese/releases)\n[![Wheel](https://img.shields.io/pypi/wheel/gensim.svg)](https://pypi.python.org/pypi/mafese) \n[![PyPI version](https://badge.fury.io/py/mafese.svg)](https://badge.fury.io/py/mafese)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mafese.svg)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/mafese.svg)\n[![Downloads](https://static.pepy.tech/badge/mafese)](https://pepy.tech/project/mafese)\n[![Run Tests](https://github.com/thieu1995/mafese/actions/workflows/test.yml/badge.svg)](https://github.com/thieu1995/mafese/actions/workflows/test.yml)\n[![Documentation Status](https://readthedocs.org/projects/mafese/badge/?version=latest)](https://mafese.readthedocs.io/en/latest/?badge=latest)\n[![Chat](https://img.shields.io/badge/Chat-on%20Telegram-orange)](https://t.me/+fRVCJGuGJg1mNDg1)\n[![DOI](https://img.shields.io/badge/DOI-10.1016%2Fj.future.2024.06.006-blue)](https://doi.org/10.1016/j.future.2024.06.006)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-yellow.svg)](https://www.gnu.org/licenses/gpl-3.0)\n\n\n---\n\n**MAFESE (Metaheuristic Algorithms for FEature SElection)** is the **largest open-source Python library** dedicated to \nthe feature selection (FS) problem using metaheuristic algorithms. It contains filter, wrapper, embedded, and unsupervised-based methods with modern optimization techniques.\nWhether you're tackling classification or regression tasks, MAFESE helps automate and enhance feature selection to improve model performance.\n\n---\n\n## 🔥 Key Features\n\n* **🆓 Free software:** GNU General Public License (GPL) V3 license\n* **🔄 Total Wrapper-based (Metaheuristic Algorithms):** \u003e 200 methods\n* **📊 Total Filter-based (Statistical-based):** \u003e 15 methods\n* **🌳 Total Embedded-based (Tree and Lasso):** \u003e 10 methods\n* **🔍 Total Unsupervised-based:** ≥ 4 methods\n* **📂 Built-in Datasets**: ≥ 30 datasets (47 classifications, 7 regressions) \n* **📈 Total performance metrics:** ≥ 61 (45 regressions and 16 classifications)\n* **⚙️ Total objective functions (as fitness functions):** ≥ 61 (45 regressions and 16 classifications)\n* **📖 Documentation:** [https://mafese.readthedocs.io/en/latest/](https://mafese.readthedocs.io/en/latest/)\n* **🐍 Python versions:** ≥ 3.8.x\n* **📦 Dependencies:** `numpy`, `scipy`, `scikit-learn`, `pandas`, `mealpy`, `permetrics`, `plotly`, `kaleido`\n\n\n## 🎯 Goals\nMAFESE provides all state-of-the-art feature selection (FS) methods:\n\n* 🧠 Unsupervised-based FS\n\n* 🔎 Filter-based FS\n\n* 🌲 Embedded-based FS\n  * Regularization (Lasso-based)\n  * Tree-based methods\n\n* ⚙️ Wrapper-based FS\n\n  * Sequential-based: forward and backward\n  * Recursive-based\n  * MHA-based: Metaheuristic Algorithms\n\n\n## 📝 Citation\n\nPlease include these citations if you plan to use this incredible library:\n\n```bibtex\n@article{van2024feature,\n  title={Feature selection using metaheuristics made easy: Open source MAFESE library in Python},\n  author={Van Thieu, Nguyen and Nguyen, Ngoc Hung and Heidari, Ali Asghar},\n  journal={Future Generation Computer Systems},\n  year={2024},\n  publisher={Elsevier},\n  doi={10.1016/j.future.2024.06.006},\n  url={https://doi.org/10.1016/j.future.2024.06.006},\n}\n\n@article{van2023mealpy,\n  title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},\n  author={Van Thieu, Nguyen and Mirjalili, Seyedali},\n  journal={Journal of Systems Architecture},\n  year={2023},\n  publisher={Elsevier},\n  doi={10.1016/j.sysarc.2023.102871}\n}\n```\n\n## Installation\n\nInstall the latest release from PyPI:\n\n```bash\n$ pip install mafese\n```\n\nAfter installation, check the version:\n\n```bash\n$ python\n\u003e\u003e\u003e import mafese\n\u003e\u003e\u003e mafese.__version__\n```\n\n\n## 🚀 Quick Start\n\n### 1. Load Dataset\n\nUse a built-in dataset:\n\n```python\nfrom mafese import get_dataset\ndata = get_dataset(\"Arrhythmia\")\n```\n\nOr load your own:\n\n```python\nimport pandas as pd\nfrom mafese import Data\n\ndf = pd.read_csv('examples/dataset.csv', index_col=0).values\nX, y = df[:, :-1], df[:, -1]\ndata = Data(X, y)\n```\n\n### 2. Next, prepare your dataset\n\n#### Split Train/Test\n\n```python\ndata.split_train_test(test_size=0.2)\nprint(data.X_train[:2].shape)\nprint(data.y_train[:2].shape)\n```\n\n#### Scale Features and Labels\n\n```python\ndata.X_train, scaler_X = data.scale(data.X_train, scaling_methods=(\"standard\", \"minmax\"))\ndata.X_test = scaler_X.transform(data.X_test)\n\ndata.y_train, scaler_y = data.encode_label(data.y_train)  # Classification only\ndata.y_test = scaler_y.transform(data.y_test)\n```\n\n### 3. Select Feature Selection Method\n\n```python\n## First way, we recommended \nfrom mafese import UnsupervisedSelector, FilterSelector, LassoSelector, TreeSelector\nfrom mafese import SequentialSelector, RecursiveSelector, MhaSelector, MultiMhaSelector\n\n## Second way\nfrom mafese.unsupervised import UnsupervisedSelector\nfrom mafese.filter import FilterSelector\nfrom mafese.embedded.lasso import LassoSelector\nfrom mafese.embedded.tree import TreeSelector\nfrom mafese.wrapper.sequential import SequentialSelector\nfrom mafese.wrapper.recursive import RecursiveSelector\nfrom mafese.wrapper.mha import MhaSelector, MultiMhaSelector\n```\n\n### 4. Next, create an instance of Selector class you want to use:\n\n```python\nfeat_selector = UnsupervisedSelector(problem='classification', method='DR', n_features=5)\n\nfeat_selector = FilterSelector(problem='classification', method='SPEARMAN', n_features=5)\n\nfeat_selector = LassoSelector(problem=\"classification\", estimator=\"lasso\", estimator_paras={\"alpha\": 0.1})\n\nfeat_selector = TreeSelector(problem=\"classification\", estimator=\"tree\")\n\nfeat_selector = SequentialSelector(problem=\"classification\", estimator=\"knn\", n_features=3, direction=\"forward\")\n\nfeat_selector = RecursiveSelector(problem=\"classification\", estimator=\"rf\", n_features=5)\n\nfeat_selector = MhaSelector(problem=\"classification\",obj_name=\"AS\",\n                            estimator=\"knn\", estimator_paras=None,\n                            optimizer=\"BaseGA\", optimizer_paras=None,\n                            mode='single', n_workers=None, termination=None, seed=None, verbose=True)\n\nfeat_selector = MultiMhaSelector(problem=\"classification\", obj_name=\"AS\",\n                                 estimator=\"knn\", estimator_paras=None,\n                                 list_optimizers=(\"OriginalWOA\", \"OriginalGWO\", \"OriginalTLO\", \"OriginalGSKA\"), \n                                 list_optimizer_paras=[{\"epoch\": 10, \"pop_size\": 30}, ]*4,\n                                 mode='single', n_workers=None, termination=None, seed=None, verbose=True)\n```\n\n### 5. Fit the model to X_train and y_train\n\n```python\nfeat_selector.fit(data.X_train, data.y_train)\n```\n\n### 6. Get the information\n\n```python\n# check selected features - True (or 1) is selected, False (or 0) is not selected\nprint(feat_selector.selected_feature_masks)\nprint(feat_selector.selected_feature_solution)\n\n# check the index of selected features\nprint(feat_selector.selected_feature_indexes)\n```\n\n### 7. Call transform() on the X that you want to filter it down to selected features\n\n```python\nX_train_selected = feat_selector.transform(data.X_train)\nX_test_selected = feat_selector.transform(data.X_test)\n```\n\n### 8.You can build your own evaluating method or use our method.\n\nIf you use our method, don't transform the data.\n\n#### 8.1 You can use difference estimator than the one used in feature selection process \n```python\nfeat_selector.evaluate(estimator=\"svm\", data=data, metrics=[\"AS\", \"PS\", \"RS\"])\n\n## Here, we pass the data that was loaded above. So it contains both train and test set. So, the results will look \nlike this: \n{'AS_train': 0.77176, 'PS_train': 0.54177, 'RS_train': 0.6205, 'AS_test': 0.72636, 'PS_test': 0.34628, 'RS_test': 0.52747}\n```\n\n#### 8.2 You can use the same estimator in feature selection process \n```python\nX_test, y_test = data.X_test, data.y_test\nfeat_selector.evaluate(estimator=None, data=data, metrics=[\"AS\", \"PS\", \"RS\"])\n```\n\nFor more usage examples please look at [examples](/examples) folder.\n\n\n## ❓ Troubleshooting\n\n1. Where do I find the supported metrics like above [\"AS\", \"PS\", \"RS\"]. What is that?\n\nYou can find it here: https://github.com/thieu1995/permetrics or use this \n\n```python\nfrom mafese import MhaSelector \n\nprint(MhaSelector.SUPPORTED_REGRESSION_METRICS)\nprint(MhaSelector.SUPPORTED_CLASSIFICATION_METRICS)\n```\n\n2. How do I know my Selector support which estimator? which methods?\n\n```python\nprint(feat_selector.SUPPORT) \n```\nOr you better read the document from: https://mafese.readthedocs.io/en/latest/\n\n3. I got this type of error. How to solve it?\n\n```python\nraise ValueError(\"Existed at least one new label in y_pred.\")\nValueError: Existed at least one new label in y_pred.\n```\n\n\u003e This occurs only when you are working on a classification problem with a small dataset that has many classes. For \n  instance, the \"Zoo\" dataset contains only 101 samples, but it has 7 classes. If you split the dataset into a \n  training and testing set with a ratio of around 80% - 20%, there is a chance that one or more classes may appear \n  in the testing set but not in the training set. As a result, when you calculate the performance metrics, you may \n  encounter this error. You cannot predict or assign new data to a new label because you have no knowledge about the \n  new label. There are several solutions to this problem.\n\n\n+ 1st: Use the SMOTE method to address imbalanced data and ensure that all classes have the same number of samples.\n\n```python\nfrom imblearn.over_sampling import SMOTE\nimport pandas as pd\nfrom mafese import Data\n\ndataset = pd.read_csv('examples/dataset.csv', index_col=0).values\nX, y = dataset[:, 0:-1], dataset[:, -1]\n\nX_new, y_new = SMOTE().fit_resample(X, y)\ndata = Data(X_new, y_new)\n```\n\n+ 2nd: Use different random_state numbers in split_train_test() function.\n```python\nimport pandas as pd \nfrom mafese import Data \n\ndataset = pd.read_csv('examples/dataset.csv', index_col=0).values\nX, y = dataset[:, 0:-1], dataset[:, -1]\ndata = Data(X, y)\ndata.split_train_test(test_size=0.2, random_state=10)   # Try different random_state value \n```\n\n\n\n## 📞 Community \u0026 Support\n\n- 📖 [Official Source Code](https://github.com/thieu1995/mafese)\n- 📖 [Official Releases](https://pypi.org/project/mafese/)\n- 📖 [Official Docs](https://mafese.readthedocs.io/)\n- 💬 [Telegram Chat](https://t.me/+fRVCJGuGJg1mNDg1)\n- 🐛 [Report Issues](https://github.com/thieu1995/mafese/issues)\n- 🔄 [Changelog](https://github.com/thieu1995/mafese/blob/master/ChangeLog.md)\n\n\n---\n\nDeveloped by: [Thieu](mailto:nguyenthieu2102@gmail.com?Subject=Mafese_QUESTIONS) @ 2023\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthieu1995%2Fmafese","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthieu1995%2Fmafese","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthieu1995%2Fmafese/lists"}