{"id":15288134,"url":"https://github.com/lai-bluejay/diego","last_synced_at":"2025-04-13T07:33:28.414Z","repository":{"id":57418153,"uuid":"173958665","full_name":"lai-bluejay/diego","owner":"lai-bluejay","description":"Diego: Data in, IntElliGence Out. A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (Study) and generate correlated trials (Trial). Then run the code and get a machine learning model. Implemented using Scikit-learn API glossary, using Bayesian optimization and genetic algorithms for automated machine learning. Inspired by [Fast.ai](https://github.com/fastai/fastai).","archived":false,"fork":false,"pushed_at":"2021-02-03T07:35:23.000Z","size":181,"stargazers_count":8,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-16T13:05:00.755Z","etag":null,"topics":["automl","autosklearn","bayesian-optimization","generation-algorithms","hyperparameter-optimization","machine-learning","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lai-bluejay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-05T14:10:06.000Z","updated_at":"2022-11-19T23:33:31.000Z","dependencies_parsed_at":"2022-09-03T08:52:09.563Z","dependency_job_id":null,"html_url":"https://github.com/lai-bluejay/diego","commit_stats":null,"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lai-bluejay%2Fdiego","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lai-bluejay%2Fdiego/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lai-bluejay%2Fdiego/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lai-bluejay%2Fdiego/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lai-bluejay","download_url":"https://codeload.github.com/lai-bluejay/diego/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240045016,"owners_count":19739186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","autosklearn","bayesian-optimization","generation-algorithms","hyperparameter-optimization","machine-learning","scikit-learn"],"created_at":"2024-09-30T15:44:17.674Z","updated_at":"2025-02-23T02:30:40.908Z","avatar_url":"https://github.com/lai-bluejay.png","language":"Python","funding_links":[],"categories":["Libraries"],"sub_categories":[],"readme":"\n# Diego\n\nDiego: Data in,  IntElliGence Out.\n\n[简体中文](README_zh_CN.md)\n\nA fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (`Study`) and generate correlated trials (`Trial`). Then run the code and get a machine learning model. Implemented using Scikit-learn API [glossary](https://scikit-learn.org/stable/glossary.html), using Bayesian optimization and genetic algorithms for automated machine learning.\n\nInspired by [Fast.ai](https://github.com/fastai/fastai) and [MicroSoft nni](https://github.com/Microsoft/nni).\n\n[![Build Status](https://travis-ci.org/lai-bluejay/diego.svg?branch=master)](https://travis-ci.org/lai-bluejay/diego)\n![PyPI](https://img.shields.io/pypi/v/diego.svg?style=flat)\n![GitHub](https://img.shields.io/github/license/lai-bluejay/diego.svg)\n![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/lai-bluejay/diego.svg)\n\n- [x] the classifier trained by a Study.\n- [x] AutoML classifier with support for scikit-learn api. Support for exporting models and use them directly.\n- [x] Hyperparametric optimization using Bayesian optimization and genetic algorithms\n- [x] Supports bucketing/binning algorithm and LUS sampling method for preprocessing\n- [ ] Supports scikit-learn api classifier custom classifier for parameter search and super parameter optimization\n\n\n## Installation\n\nYou need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation\n\n```shell\nconda install --yes pip gcc swig libgcc=5.2.0\npip install diego\n```\n\nAfter installation, start with 6 lines of code to solve a machine learning classification problem.\n\n## Usage\n\nEach task is considered to be a `Study`, and each Study consists of multiple `Trial`.\nIt is recommended to create a Study first and then generate a Trial from the Study:\n\n```python\nfrom diego.study import create_study\nimport sklearn.datasets\ndigits = sklearn.datasets.load_digits()\nX_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)\n\ns = create_study(X_train, y_train)\n# can use default trials in Study\n\n# or generate one\n# s.generate_trials(mode='fast')\ns.optimize(X_test, y_test)\n# all_trials = s.get_all_trials()\n# for t in all_trials:\n#     print(t.__dict__)\n#     print(t.clf.score(X_test, y_test))\n\n```\n\n## RoadMap\nideas for releases in the future\n- [ ] 回归。\n- [ ] add documents.\n- [ ] 不同类型的Trial。TPE， BayesOpt， RandomSearch\n- [ ] 自定义的Trial。Trials by custom Classifier (like sklearn, xgboost)\n- [ ] 模型保存。model persistence\n- [ ] 模型输出。model output\n- [ ] basic Classifier\n- [ ] fix mac os hanged in optimize pipeline\n- [ ] add preprocessor\n- [ ] add FeatureTools for automated feature engineering\n\n\n## \n\n## Project Structure\n\n### study, trials\nStudy: \n\nTrial:\n\n### 如果在OS X或者Linux多进程被 hang/crash/freeze\n\nSince n_jobs\u003e1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)\n\nIn Python 3.4+, one solution is to directly configure `multiprocessing` to use `forkserver` or `spawn` to start process pool management (instead of the default `fork`). For example, the `forkserver` mode is enabled globally directly in the code.\n\n```python\nimport multiprocessing\n# other imports, custom code, load data, define model...\nif __name__ == '__main__':\n    multiprocessing.set_start_method('forkserver')\n\n    # call scikit-learn utils with n_jobs \u003e 1 here\n```\n\nmore info :[multiprocessing document](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)\n\n### core\n\n#### storage\n\nFor each study, the data storage and parameters, and the model is additionally stored in the `Storage` object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.\n\n#### update result\n\nWhen creating `Study`, you need to specify the direction of optimization `maximize` or `minimize`. Also specify the metrics for optimization when creating `Trials`. The default is `maximize accuracy`.\n\n## auto ml 补完计划\n\n[overview](https://hackernoon.com/a-brief-overview-of-automatic-machine-learning-solutions-automl-2826c7807a2a)\n\n### bayes opt\n\n1. [fmfn/bayes](https://github.com/fmfn/BayesianOptimization)\n2. [auto-sklearn](https://github.com/automl/auto-sklearn)\n\n### grid search\n\n1. H2O.ai\n\n### tree parzen\n\n1. hyperopt\n2. mlbox\n\n### metaheuristics grid search\n\n1. pybrain\n\n### generation\n\n1.tpot\n\n### dl\n\n1. ms nni\n\n## issues\n\n## updates\n\n### TODO 文档更新。\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flai-bluejay%2Fdiego","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flai-bluejay%2Fdiego","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flai-bluejay%2Fdiego/lists"}