{"id":28471609,"url":"https://github.com/openml/sklearn-bot","last_synced_at":"2025-08-04T06:33:36.210Z","repository":{"id":66771329,"uuid":"142508461","full_name":"openml/sklearn-bot","owner":"openml","description":"Random bot running sklearn classifiers on OpenML","archived":false,"fork":false,"pushed_at":"2023-03-13T19:52:05.000Z","size":20725,"stargazers_count":1,"open_issues_count":2,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-01T22:36:25.615Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":"openml","open_collective":"openml"}},"created_at":"2018-07-27T00:33:29.000Z","updated_at":"2023-03-13T18:52:37.000Z","dependencies_parsed_at":"2023-09-25T07:00:29.901Z","dependency_job_id":null,"html_url":"https://github.com/openml/sklearn-bot","commit_stats":{"total_commits":93,"total_committers":1,"mean_commits":93.0,"dds":0.0,"last_synced_commit":"3fdb449f18ca1aa78c6c0f4843cc376abef34170"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/openml/sklearn-bot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fsklearn-bot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fsklearn-bot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fsklearn-bot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fsklearn-bot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openml","download_url":"https://codeload.github.com/openml/sklearn-bot/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fsklearn-bot/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268657995,"owners_count":24285611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-07T11:08:04.224Z","updated_at":"2025-08-04T06:33:36.190Z","avatar_url":"https://github.com/openml.png","language":"Python","funding_links":["https://github.com/sponsors/openml","https://opencollective.com/openml"],"categories":[],"sub_categories":[],"readme":"# sklearn-bot\n[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\nsklearn-bot that can be used to automatically run scikit-learn classifiers\non OpenML tasks. This is an improved version of the sklearn-bot that was created\nfor the [Parameter IMPortance](https://github.com/janvanrijn/openml-pimp)\nproject. \n\n## Usage\nCurrently, two use cases are supported. Furthermore, we describe how to obtain\nthe results back from OpenML, once uploaded. \n\n### Single Task\nTo run the sklearn-bot on a single task from OpenML, please use the following\ncommand:\n```\npython examples/run_on_task.py --task_id 5 --openml_server https://www.openml.org/api/v1/ --openml_apikey abcdef --upload_result\n```\n\nThe following command line options are accepted: \n* `n_executions`: By default, the sklearn-bot will execute 1000 runs and \nterminate after these. Using this option this behavior can be overridden to any\nother number of runs. \n* `task_id`: the OpenML task id to run on. \n* `openml_server`: due to the beta-state of the sklearn-bot, the default \nbehavior is to upload results to the test server. By using this option, this \nbehavior can be overridden\n* `openml_apikey`: API key to authenticate yourself with. Can be found on your\nOpenML profile. \n* `classifier_name`: the classifier to run. The default behavior is that a \nrandom classifier will be selected. Select one of the list that is returned by\n`sklearnbot.config_spaces.get_available_config_spaces(False)`\n* `output_dir`: local directory where the results can be stored before\nuploading. \n* `upload_result`: the default behavior of the sklearn-bot is to store the runs\nlocally, before uploading them to the server. By specifying this flag, the runs\nwill be uploaded and the local files will be deleted.\n\nAdditionally, the sklearn-bot can be ran on a OpenML benchmark suite, for\nexample the [OpenML100](https://arxiv.org/abs/1708.03731). The sklearn-bot will\nexecute a set of `n_executions`, each time selecting a task at random from the\nfull set of tasks. \n\n### Benchmark suite\nTo run the sklearn-bot on a a benchmark suite from OpenML, please use the\nfollowing command:\n\n```\npython examples/run_on_study.py --study_id OpenML100 --openml_server https://www.openml.org/api/v1/ --openml_apikey abcdef --upload_result\n```\n\nThis function has the same command line options as `run_on_task`, except for the\noption `task_id`. Additionally,\n* `study_id` (int or string) refers to the study ID on which the sklearn-bot\nshall be ran. \n\n### Obtaining results\nUsually, running the sklearn-bot is done so that the results can be re-used\nin one or another way. Once the results have been stored on OpenML, it is \nimportant to be able to acquire them back. Although it is slightly out of the\nscope of the sklearn-bot, the following command obtains all results that have \nbeen created using the given search space. Note that this can also include\nresults from other people that ran the sklearn-bot, or happened to run a\nscikit-learn classifier with hyperparameter settings that also fell within the\nsearch range. \n\nTo obtain results from the sklearn-bot that were uploaded to OpenML (using the\n`--upload_result` flag), please use the following command:\n\n```\npython examples/obtain_results.py --study_id OpenML100 --openml_server https://www.openml.org/api/v1/ --scoring predictive_accuracy --classifier_name decision_tree\n```\n\nThe following command line options are accepted: \n* `output_directory`: This is where the results will be placed as ARFF file. \nAlso cache files will be stored here, that allow for fast regeneration of the\ndatasets.\n* `num_runs`: The number of runs per task that will be obtained. Setting this to\na number lower than the actual available runs will allow for efficient caching.\n* `study_id`: Refers to the benchmark suite (which tasks will be included)\n* `scoring`: The performance measure to download. Defaults to \n`predictive_accuracy`, but for example `area_under_roc_curve`, `f_measure` and \n`precision` are also sensible options. \n* `normalize` (flag): If set, all performance results will be normalized to the\ninterval [0, 1] per task. \n* `openml_server`: The server from which the results should be downloaded. Make\nsure this is the same as the server to which the results where uploaded.\n* `openml_apikey`: API key to authenticate yourself with. Can be found on your\nOpenML profile. (Although most operations involve obtaining results, we need to\ndo a single post-request to the server as well.)\n* `classifier_name`: The classifier from which the results should be downloaded.\nMake sure that this is the same as the classifier with which the bot was ran. \nNote that the current functionality does not support the option to obtain the\nresults from all classifiers at once yet. \n\n## Feature Requests\n\nThe following features will gradually be added to the sklearn-bot (contributors\nare welcome):\n* Non-static pipelines. Although the current definition of a pipeline is fixed, \nwe aim to add the notion of non-static pipelines. This should be incorporated\nin a modular way. \n* More sampling methods. Currently, the sklearn-bot sampled uniformly from a set\nof tasks, however it would be great if it was able to sample according to the\nnumber of runs per task present on OpenML. \n\n\n## Dependencies\n* [OpenML-Python](https://pypi.org/project/openml/) - Base functionality for\nconnecting with OpenML. \n* [OpenML-Python-Contrib](https://github.com/openml/openml-python-contrib/) - \nNot on pypi yet. Used for convenience functions to obtain the results from \nOpenML. The bot itself does not rely on this.\n* [Scikit-learn](https://pypi.org/project/scikit-learn/) - Version 0.20.0 and\nup.\n* [ConfigSpace](https://pypi.org/project/ConfigSpace/) - For defining search \nspaces. \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenml%2Fsklearn-bot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenml%2Fsklearn-bot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenml%2Fsklearn-bot/lists"}