{"id":20858051,"url":"https://github.com/adrienc21/vulpes","last_synced_at":"2025-10-25T05:11:37.837Z","repository":{"id":41430958,"uuid":"506321663","full_name":"AdrienC21/vulpes","owner":"AdrienC21","description":"Vulpes: Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset","archived":false,"fork":false,"pushed_at":"2023-08-27T16:41:17.000Z","size":3020,"stargazers_count":8,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-15T10:54:27.222Z","etag":null,"topics":["automl","data-analysis","data-science","machine-learning","models","package","python","scikit-learn","statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AdrienC21.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-22T16:18:41.000Z","updated_at":"2023-03-06T19:20:28.000Z","dependencies_parsed_at":"2024-06-12T17:25:42.854Z","dependency_job_id":"61a97124-2c7b-43b2-a4fd-754a0f0d6177","html_url":"https://github.com/AdrienC21/vulpes","commit_stats":{"total_commits":25,"total_committers":2,"mean_commits":12.5,"dds":"0.19999999999999996","last_synced_commit":"419428c31a79c8a2cd2fc61833e581ae0c0c79fc"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdrienC21%2Fvulpes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdrienC21%2Fvulpes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdrienC21%2Fvulpes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdrienC21%2Fvulpes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AdrienC21","download_url":"https://codeload.github.com/AdrienC21/vulpes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225130805,"owners_count":17425506,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","data-analysis","data-science","machine-learning","models","package","python","scikit-learn","statistics"],"created_at":"2024-11-18T04:44:23.352Z","updated_at":"2025-10-25T05:11:32.779Z","avatar_url":"https://github.com/AdrienC21.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vulpes\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![pypi version](https://img.shields.io/pypi/v/vulpes.svg)](https://pypi.python.org/pypi/vulpes)\n[![Documentation Status](https://readthedocs.org/projects/vulpes/badge/?version=latest)](https://vulpes.readthedocs.io/en/latest/?badge=latest)\n[![visitors](https://visitor-badge.laobi.icu/badge?page_id=AdrienC21.vulpes\u0026right_color=%23FFA500)](https://github.com/AdrienC21/vulpes)\n[![Downloads](https://static.pepy.tech/badge/vulpes)](https://pepy.tech/project/vulpes)\n\n\u003cimg src=\"https://github.com/AdrienC21/vulpes/blob/main/logo_large.png?raw=true\"  width=60% height=60%\u003e\n\n**Vulpes: Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset.**\n\nVulpes 🦊 is a Python package that allows you to test many models, whether you want to do classification, regression or clustering in your projects. It calculates many metrics for each model to compare them. It is highly customizable and it contains many features to save time building robust ML models.\n\nIf you like this project, please leave a star ⭐ on GitHub !\n\nAlpha version.\n\nAuthor \u0026 Maintainer: Adrien Carrel.\n\n## Installation\n\nUsing pip:\n\n```python\npip install vulpes\n```\n\n## Dependencies\n\nvulpes requires:\n\n- Python (\u003e= 3.7)\n- numpy (\u003e= 1.22)\n- pandas (\u003e= 1.3.5)\n- scikit-learn (\u003e= 1.0.2)\n- tqdm (\u003e= 4.64.0)\n- xgboost (\u003e= 1.6.1)\n- lightgbm (\u003e= 3.3.2)\n\n## Documentation\n\nLink to the documentation: [https://vulpes.readthedocs.io/en/latest/](https://vulpes.readthedocs.io/en/latest/)\n\n## Examples\n\nGeneral case, import one of the classes Classifiers, Regressions, Clustering from vulpes.automl, add some parameters to the object (optional), fit your dataset:\n\n```python\nfrom vulpes.automl import Classifiers\nclassifiers = Classifiers()\nclassifiers.fit(X, y)\n```\n\nMore examples below and in notebooks in the folter **examples**.\n\n### Classification\n\nFit many classification algorithms on the iris dataset from scikit-learn:\n\n```python\nimport pandas as pd\nfrom sklearn.datasets import load_iris\nfrom vulpes.automl import Classifiers\n\ndataset = load_iris()\nX = pd.DataFrame(dataset[\"data\"], columns=dataset[\"feature_names\"])\ny = dataset[\"target\"]\n\nclassifiers = Classifiers(preprocessing=\"default\")\ndf_models = classifiers.fit(X, y)\ndf_models\n```\n\nAnalysis of each model using different metrics and repeated cross-validation by K-fold:\n\n|                          Model | Balanced Accuracy | Accuracy | Precision |   Recall | F1 Score |    AUROC |    AUPRC | Micro avg Precision | Running time |\n|-------------------------------:|------------------:|---------:|----------:|---------:|---------:|---------:|---------:|--------------------:|-------------:|\n|   LinearDiscriminantAnalysis   |          0.977625 | 0.977333 |  0.978024 | 0.977625 | 0.976933 | 0.998161 | 0.996891 |            0.996940 |     4.372556 |\n|  QuadraticDiscriminantAnalysis |          0.973219 | 0.973333 |  0.975460 | 0.973219 | 0.973162 | 0.999063 | 0.997595 |            0.997634 |     4.470590 |\n|      LogisticRegressionCV      |          0.961609 | 0.961333 |  0.964101 | 0.961609 | 0.960668 | 0.997218 | 0.993264 |            0.993375 |    12.895212 |\n|               SVC              |          0.961287 | 0.960000 |  0.962045 | 0.961287 | 0.959960 | 0.996825 | 0.994421 |            0.994510 |     4.437862 |\n|     RandomForestClassifier     |          0.957220 | 0.956000 |  0.959982 | 0.957220 | 0.955394 | 0.993473 | 0.990367 |            0.989958 |    10.645725 |\n|           GaussianNB           |          0.957169 | 0.954667 |  0.956188 | 0.957169 | 0.954521 | 0.993825 | 0.990463 |            0.990619 |     4.345500 |\n|      ExtraTreesClassifier      |          0.956438 | 0.956000 |  0.958665 | 0.956438 | 0.955157 | 0.995156 | 0.991795 |            0.991704 |    10.440453 |\n|       LogisticRegression       |          0.956094 | 0.954667 |  0.957273 | 0.956094 | 0.954427 | 0.997726 | 0.994765 |            0.994848 |     5.691309 |\n|   GradientBoostingClassifier   |          0.955871 | 0.953333 |  0.956984 | 0.955871 | 0.953364 | 0.983221 | 0.967145 |            0.971317 |     9.005045 |\n|          XGBClassifier         |          0.952846 | 0.950667 |  0.952745 | 0.952846 | 0.950324 | 0.985892 | 0.969083 |            0.972853 |     4.802282 |\n|        BaggingClassifier       |          0.952712 | 0.950667 |  0.955214 | 0.952712 | 0.950581 | 0.985295 | 0.982312 |            0.971742 |     8.354026 |\n|      KNeighborsClassifier      |          0.952699 | 0.950667 |  0.951586 | 0.952699 | 0.950683 | 0.990842 | 0.986716 |            0.980262 |     6.960091 |\n|       AdaBoostClassifier       |          0.950432 | 0.946667 |  0.949250 | 0.950432 | 0.947114 | 0.988202 | 0.981889 |            0.977999 |     8.127254 |\n|         LGBMClassifier         |          0.950009 | 0.948000 |  0.950426 | 0.950009 | 0.947522 | 0.991721 | 0.985483 |            0.985704 |     5.063474 |\n|         LabelSpreading         |          0.948757 | 0.945333 |  0.947960 | 0.948757 | 0.946091 | 0.988827 | 0.981177 |            0.981552 |     4.332253 |\n| HistGradientBoostingClassifier |          0.948195 | 0.945333 |  0.949260 | 0.948195 | 0.945352 | 0.988212 | 0.976375 |            0.976866 |     7.706454 |\n|        LabelPropagation        |          0.946091 | 0.944000 |  0.946373 | 0.946091 | 0.944250 | 0.990341 | 0.984098 |            0.984373 |     4.406253 |\n|          MLPClassifier         |          0.944773 | 0.941333 |  0.945336 | 0.944773 | 0.942314 | 0.992075 | 0.985516 |            0.985762 |     7.662322 |\n|     DecisionTreeClassifier     |          0.942681 | 0.941333 |  0.944493 | 0.942681 | 0.940183 | 0.957011 | 0.951111 |            0.908000 |     4.367503 |\n|            LinearSVC           |          0.936713 | 0.936000 |  0.937548 | 0.936713 | 0.933929 | 0.989648 | 0.983251 |            0.983539 |     4.474272 |\n|       ExtraTreeClassifier      |          0.933964 | 0.932000 |  0.934967 | 0.933964 | 0.931137 | 0.950473 | 0.943333 |            0.893289 |     4.336813 |\n|          SGDClassifier         |          0.922581 | 0.918667 |  0.927593 | 0.922581 | 0.919651 | 0.981940 | 0.962839 |            0.963484 |     5.666082 |\n|     CalibratedClassifierCV     |          0.894860 | 0.888000 |  0.896616 | 0.894860 | 0.887397 | 0.972231 | 0.957643 |            0.958332 |     5.699280 |\n|           Perceptron           |          0.873581 | 0.865333 |  0.887799 | 0.873581 | 0.864172 | 0.976069 | 0.945789 |            0.946695 |     4.482433 |\n|         NearestCentroid        |          0.854566 | 0.854667 |  0.854707 | 0.854566 | 0.849341 | 0.973214 | 0.963677 |            0.964257 |     5.783815 |\n|         RidgeClassifier        |          0.843743 | 0.834667 |  0.848879 | 0.843743 | 0.831310 | 0.945148 | 0.920905 |            0.922219 |     4.415888 |\n|        RidgeClassifierCV       |          0.841049 | 0.832000 |  0.846498 | 0.841049 | 0.828592 | 0.944421 | 0.919460 |            0.920816 |     4.484041 |\n|           BernoulliNB          |          0.757425 | 0.758667 |  0.771867 | 0.757425 | 0.728847 | 0.883542 | 0.839397 |            0.823834 |     4.479535 |\n|         DummyClassifier        |          0.333333 | 0.249333 |  0.083111 | 0.333333 | 0.132452 | 0.500000 | 0.379100 |            0.299444 |     4.396426 |\n|                                |                   |          |           |          |          |          |          |                     |              |\n\nHere, the \"default\" preprocessing pipeline has been used. It consists of SimpleImputer (median strategy) with a StandardScaler for the features and a OneHotEncoder for the categorical features.\n\n### Regressions\n\nFit many regression algorithms:\n\n```python\nfrom sklearn.datasets import make_regression\nfrom vulpes.automl import Regressions\n\nX, y = make_regression(\n          n_samples=100, n_features=4, random_state=42, noise=4.0,\n          bias=100.0)\n\nregressions = Regressions()\ndf_models = regressions.fit(X, y)\ndf_models\n```\n\n### Clustering\n\nFit many clustering algorithms on the iris dataset from scikit-learn:\n\n```python\nimport pandas as pd\nfrom sklearn.datasets import load_iris\nfrom vulpes.automl import Clustering\n\ndataset = load_iris()\nX = pd.DataFrame(dataset[\"data\"], columns=dataset[\"feature_names\"])\n\nclustering = Clustering()\ndf_models = clustering.fit(X)\ndf_models\n```\n\n### Fit a \"best model\"\n\nWe can automatically build a VotingClassifier or a VotingRegressor using the build_best_models method once the models are fitted.\n\n```python\ndf_best = classifiers.build_best_models(X, y, nb_models=3)\ndf_best\n```\n\n|           Model | Balanced Accuracy | Accuracy | Precision |  Recall | F1 Score | Running time |\n|----------------:|------------------:|---------:|----------:|--------:|---------:|-------------:|\n| Voting (3-best) |           0.97508 | 0.974667 |  0.976034 | 0.97508 | 0.974447 |     11.82946 |\n\n### Check missing data\n\n```python\nimport pandas as pd\nimport numpy as np\ndf = pd.DataFrame([[\"a\", \"x\"],\n                   [np.nan, \"y\"],\n                   [\"a\", np.nan],\n                   [\"b\", np.nan]],\n                  dtype=\"category\",\n                  columns=[\"feature1\", \"feature2\"])\nclassifiers.missing_data(df)\n```\n\n| Total Missing | Percentage (%) | Accuracy |\n|--------------:|---------------:|---------:|\n|    feature2   |              2 |     50.0 |\n|    feature1   |              1 |     25.0 |\n\n## Testing\n\nIf you want to submit a pull request or if you want to test in local the package, you can run some tests with the library pytest by running the following command:\n\n```python\npytest vulpes/tests/\n```\n\n## Why Vulpes?\n\nVulpes stands for: **V**ector (**U**n)supervised **L**earning **P**rogram **E**stimation **S**ystem.\n\nNah, I'm kidding, I just love foxes, they are cute! The most common and widespread species of fox is the red fox (Vulpes vulpes).\n\n![alt text](https://github.com/AdrienC21/vulpes/blob/main/fox.jpg?raw=true)\n\n## Acknowledgment\n\n- Shankar Rao Pandala (and some contributors). Their package (Lazy Predict) has been an inspiration.\n\n## License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrienc21%2Fvulpes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadrienc21%2Fvulpes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrienc21%2Fvulpes/lists"}