{"id":21492563,"url":"https://github.com/societe-generale/aikit","last_synced_at":"2025-07-15T18:30:42.886Z","repository":{"id":44677033,"uuid":"148752098","full_name":"societe-generale/aikit","owner":"societe-generale","description":"Automated machine learning package","archived":false,"fork":false,"pushed_at":"2023-03-08T09:04:47.000Z","size":1569,"stargazers_count":27,"open_issues_count":31,"forks_count":11,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-11-20T09:22:31.401Z","etag":null,"topics":["automl","data-science","machine-learning","python"],"latest_commit_sha":null,"homepage":"http://aikit.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/societe-generale.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-09-14T07:28:28.000Z","updated_at":"2023-07-26T18:35:42.000Z","dependencies_parsed_at":"2024-01-15T03:41:44.398Z","dependency_job_id":null,"html_url":"https://github.com/societe-generale/aikit","commit_stats":{"total_commits":103,"total_committers":6,"mean_commits":"17.166666666666668","dds":0.6116504854368932,"last_synced_commit":"395800ef57cd4533dd9e4b6c1373f8282b35b7fa"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/societe-generale%2Faikit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/societe-generale%2Faikit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/societe-generale%2Faikit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/societe-generale%2Faikit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/societe-generale","download_url":"https://codeload.github.com/societe-generale/aikit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226061657,"owners_count":17567707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","data-science","machine-learning","python"],"created_at":"2024-11-23T15:30:00.862Z","updated_at":"2024-11-23T15:30:02.465Z","avatar_url":"https://github.com/societe-generale.png","language":"Python","funding_links":[],"categories":["Libraries"],"sub_categories":[],"readme":"![Build Status](https://travis-ci.org/societe-generale/aikit.svg?branch=master)\n[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://github.com/societe-generale/aikit)\n[![PyPI version](https://badge.fury.io/py/aikit.svg)](https://badge.fury.io/py/aikit)\n[![Documentation Status](https://readthedocs.org/projects/aikit/badge/?version=latest)](https://aikit.readthedocs.io/en/latest/?badge=latest)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/societe-generale/aikit/master?filepath=/notebooks)\n\n# aikit\nAutomatic Tool Kit for Machine Learning and Datascience.\n\nThe objective is to provide tools to ease the repetitive part of the DataScientist job and so that he/she can focus on modelization. This package is still in alpha and more features will be added.\nIts mains features are:\n * improved and new \"scikit-learn like\" transformers ;\n * GraphPipeline : an extension of sklearn Pipeline that handles more generic chains of tranformations ;\n * an AutoML to automatically search throught several transformers and models.\n\nFull documentation is available here: https://aikit.readthedocs.io/en/latest/\n\nYou can run examples [here](https://mybinder.org/v2/gh/societe-generale/aikit/master?filepath=/notebooks), thanks to [Binder](https://mybinder.org).\n\n\n### GraphPipeline\n\nThe GraphPipeline object is an extension of `sklearn.pipeline.Pipeline` but the transformers/models can be chained with any directed graph.\n\nThe objects takes as input two arguments:\n * models: dictionary of models (each key is the name of a given node, and each corresponding value is the transformer corresponding to that node)\n * edges: list of tuples that links the nodes to each other\n\nExample:\n```python\ngpipeline = GraphPipeline(\n    models = {\n        \"vect\": CountVectorizerWrapper(analyzer=\"char\",\n                                       ngram_range=(1, 4),\n                                       columns_to_use=[\"text1\", \"text2\"]),\n        \"cat\": NumericalEncoder(columns_to_use=[\"cat1\", \"cat2\"]),\n        \"rf\": RandomForestClassifier(n_estimators=100)\n    },\n    edges = [(\"vect\", \"rf\"), (\"cat\", \"rf\")]\n)\n```\n\n![Alt text](docs/img/graphpipeline_mergingpipe.png?raw=true \"Title\")\n\n### AutoML\n\nAikit contains an AutoML part which will test several models and transformers for a given dataset.\n\nFor example, you can create the following python script `run_automl_titanic.py`:\n```python\nfrom aikit.datasets import load_dataset, DatasetEnum\nfrom aikit.ml_machine import MlMachineLauncher\n\ndef loader():\n    dfX, y, *_ = load_dataset(DatasetEnum.titanic)\n    return dfX, y\n\ndef set_configs(launcher):\n    \"\"\" modify that function to change launcher configuration \"\"\"\n    launcher.job_config.score_base_line = 0.75\n    launcher.job_config.allow_approx_cv = True\n    return launcher\n\nif __name__ == \"__main__\":\n    launcher = MlMachineLauncher(base_folder = \"~/automl/titanic\",\n                                 name = \"titanic\",\n                                 loader = loader,\n                                 set_configs = set_configs)\n    launcher.execute_processed_command_argument()\n```\n\nAnd then run the command:\n```\npython run_automl_titanic.py run -n 4\n```\n\nTo run the automl using 4 workers, the results will be stored in the specified folder\nYou can aggregate those result using:\n```\npython run_automl_titanic.py result\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsociete-generale%2Faikit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsociete-generale%2Faikit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsociete-generale%2Faikit/lists"}