{"id":15063966,"url":"https://github.com/g0bel1n/tinyautoml","last_synced_at":"2025-07-21T04:06:20.620Z","repository":{"id":40304111,"uuid":"458900735","full_name":"g0bel1n/TinyAutoML","owner":"g0bel1n","description":"TinyAutoML is a comprehensive Pipeline Classifier Project thought as a Scikit-learn plugin","archived":false,"fork":false,"pushed_at":"2023-05-19T18:40:03.000Z","size":2863,"stargazers_count":4,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-01T11:51:46.529Z","etag":null,"topics":["automl-pipeline","machine-learning","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/g0bel1n.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-13T18:50:07.000Z","updated_at":"2023-11-11T13:14:30.000Z","dependencies_parsed_at":"2024-10-13T00:00:56.645Z","dependency_job_id":"34fbf81f-f5fd-4c3b-a801-5e0f1740779a","html_url":"https://github.com/g0bel1n/TinyAutoML","commit_stats":{"total_commits":175,"total_committers":5,"mean_commits":35.0,"dds":"0.20571428571428574","last_synced_commit":"dd4fc4fdb6f5e010bf428d6ef84f2fe395eb219b"},"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"purl":"pkg:github/g0bel1n/TinyAutoML","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0bel1n%2FTinyAutoML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0bel1n%2FTinyAutoML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0bel1n%2FTinyAutoML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0bel1n%2FTinyAutoML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/g0bel1n","download_url":"https://codeload.github.com/g0bel1n/TinyAutoML/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/g0bel1n%2FTinyAutoML/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266236765,"owners_count":23897247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl-pipeline","machine-learning","scikit-learn"],"created_at":"2024-09-25T00:09:27.653Z","updated_at":"2025-07-21T04:06:20.517Z","avatar_url":"https://github.com/g0bel1n.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  \u003cimg alt=\"TinyAutoML Logo\" src=\"https://user-images.githubusercontent.com/73651505/166115086-2cd01294-75ed-4e36-a65f-419c530a0dbe.png\" width=\"448px\"/\u003e\u003cbr/\u003e\n\u003c/h1\u003e\n\n\n\u003cp align=\"center\"\u003eTinyAutoML is a Machine Learning Python3.9 library thought as an extension of Scikit-Learn.\u003cbr/\u003e It builds an \u003cb\u003eadaptable\u003c/b\u003e and \u003cb\u003eauto-tuned\u003c/b\u003e pipeline to handle binary classification tasks.\u003cbr/\u003e \u003c/p\u003e\n\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/g0bel1n/TinyAutoML/actions/workflows/python-app.yml\" \ntarget=\"_blank\"\u003e\u003cimg src=\"https://github.com/g0bel1n/TinyAutoML/actions/workflows/python-app.yml/badge.svg?branch=master\" alt=\"Tests\" /\u003e\u003c/a\u003e\n\u003cimg src=\"https://img.shields.io/github/license/g0bel1n/TinyAutoML?style=flat-square\" alt=\"Licence MIT\" /\u003e\n\u003cimg src=\"https://img.shields.io/pypi/v/TinyAutoML?style=flat-square\" alt=\"Pypi\" /\u003e\n\u003cimg src=\"https://img.shields.io/github/repo-size/g0bel1N/TinyAutoML?style=flat-square\" alt=\"Size\" /\u003e\n\u003cimg src=\"https://img.shields.io/github/commit-activity/m/g0bel1n/TinyAutoML?style=flat-square\" alt=\"Commits\" /\u003e\n\u003ca href=\"https://www.python.org/downloads/release/python-390/\" \ntarget=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.9-blue.svg\" alt=\"Python Version\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n\u003cp align=\"center\"\u003e\nIn a few words, your data goes through 2 main preprocessing steps. \u003cbr/\u003e\nThe first one is scaling and NonStationnarity correction, which is followed by Lasso Feature selection.\u003cbr/\u003e\nFinally, one of the three \u003cb\u003eMetaModels\u003c/b\u003e is fitted on the transformed data.\n\u003c/p\u003e\n\n\n---\n\n### Latest News ! :\n\n* Logging format changed from default to [TinyAutoML]\n* Added Github Actions Workflow for CI, for updating the README.md !\n* Added parallel computation of `LassoFeatureSelector` -\u003e [LassoFeatureSelectionParallel](https://github.com/g0bel1n/TinyAutoML/blob/master/TinyAutoML/Preprocessing/LassoFeatureSelectionParallel.py)\n* New [example notebook](https://github.com/g0bel1n/TinyAutoML/blob/master/notebooks/vix_example.ipynb) based on VIX index directionnal forecasting\n\n\n## ⚡️ Quick start \n\nFirst, let's install and import the library !\n\n- Install the last release using pip\n\n```python\n%pip install TinyAutoML\n````\n\n\n```python\nimport os\nos.chdir('..') #For Github CI, you don't have to run that\n```\n\n\n```python\nfrom TinyAutoML.Models import *\nfrom TinyAutoML import MetaPipeline\n```\n\n## `MetaModels`\n\n`MetaModels` inherit from the `MetaModel` Abstract Class. They all implement ensemble methods and therefore are based on `EstimatorPools`.\n\nWhen training `EstimatorPools`, you are faced with a choice :  doing `parameterTuning` on entire pipelines with the estimators on the top or training the estimators using the same pipeline and only training the top. The first case refers to what we will be calling `comprehensiveSearch`.\n\nMoreover, as we will see in details later, those `EstimatorPools` can be shared across `MetaModels`.\n\nThey are all initialised with those minimum arguments :\n\n```python\nMetaModel(comprehensiveSearch: bool = True, parameterTuning: bool = True, metrics: str = 'accuracy', nSplits: int=10)\n```\n- `nSplits` corresponds to the number of split of the cross validation\n- The other parameters are equivoque\n\n\n**They need to be put in the `MetaPipeline` wrapper to work**\n\n**There are 3 `MetaModels`**\n\n1- `BestModel` : selects the best performing model of the pool\n\n\n```python\nbest_model = MetaPipeline(BestModel(comprehensiveSearch = False, parameterTuning = False))\n```\n\n2- `OneRulerForAll` : implements Stacking using a `RandomForestClassifier` by default. The user is free to use another classifier using the ruler arguments\n\n\n```python\norfa_model = MetaPipeline(OneRulerForAll(comprehensiveSearch=False, parameterTuning=False))\n```\n\n3- `DemocraticModel` : implements Soft and Hard voting models through the voting argument\n\n\n```python\ndemocratic_model = MetaPipeline(DemocraticModel(comprehensiveSearch=False, parameterTuning=False, voting='soft'))\n```\n\nAs of release v0.2.3.2 (13/04/2022) there are 5 models on which these `MetaModels` rely in the `EstimatorPool`:\n- Random Forest Classifier\n- Logistic Regression\n- Gaussian Naive Bayes\n- Linear Discriminant Analysis\n- XGBoost\n\n\n***\n\n\nWe'll use the breast_cancer dataset from `sklearn` as an example:\n\n\n```python\nimport pandas as pd\nfrom sklearn.datasets import load_breast_cancer\n\ncancer = load_breast_cancer()\n \nX = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)\ny = cancer.target\n\ncut = int(len(y) * 0.8)\n\nX_train, X_test = X[:cut], X[cut:]\ny_train, y_test = y[:cut], y[cut:]\n```\n\nLet's train a `BestModel` first and reuse its Pool for the other `MetaModels`\n\n\n```python\nbest_model.fit(X_train,y_train)\n```\n\n    [TinyAutoML] Training models...\n    [TinyAutoML] The best estimator is random forest classifier with a cross-validation accuracy (in Sample) of 1.0\n\n\n\n\n\n    MetaPipeline(model=BestModel(comprehensiveSearch=False, parameterTuning=False))\n\n\n\nWe can now extract the pool\n\n\n```python\npool = best_model.get_pool()\n```\n\nAnd use it when fitting the other `MetaModels` to skip the fitting of the underlying models:\n\n\n```python\norfa_model.fit(X_train,y_train,pool=pool)\ndemocratic_model.fit(X_train,y_train,pool=pool)\n```\n\n    [TinyAutoML] Training models...\n    [TinyAutoML] Training models...\n\n\n\n\n\n    MetaPipeline(('model', Democratic Model))\n\n\n\nGreat ! Let's look at the results with the sk_learn `classification_report` :\n\n\n```python\norfa_model.classification_report(X_test,y_test)\n```\n\n                  precision    recall  f1-score   support\n    \n               0       0.89      0.92      0.91        26\n               1       0.98      0.97      0.97        88\n    \n        accuracy                           0.96       114\n       macro avg       0.93      0.94      0.94       114\n    weighted avg       0.96      0.96      0.96       114\n    \n\n\nLooking good! What about the `roc_curve` ?\n\n\n```python\ndemocratic_model.roc_curve(X_test,y_test)\n```\n\n\n    \n![png](README_files/README_24_0.png)\n    \n\n\nLet's see how the estimators of the pool are doing individually:\n\n\n```python\nbest_model.get_scores(X_test,y_test)\n```\n\n\n\n\n    [('random forest classifier', 1.0),\n     ('Logistic Regression', 0.9473684210526315),\n     ('Gaussian Naive Bayes', 0.956140350877193),\n     ('LDA', 0.9473684210526315),\n     ('xgb', 0.956140350877193)]\n\n\n\n## What's next ? \n\nYou can do the same steps with `comprehensiveSearch` set to True if you have the time and if you want to improve your results. You can also try new rulers and so on.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fg0bel1n%2Ftinyautoml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fg0bel1n%2Ftinyautoml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fg0bel1n%2Ftinyautoml/lists"}