{"id":18520152,"url":"https://github.com/mlr-org/mlr3oml","last_synced_at":"2025-08-09T12:33:00.336Z","repository":{"id":37857197,"uuid":"214813032","full_name":"mlr-org/mlr3oml","owner":"mlr-org","description":"Connect mlr3 with OpenML","archived":false,"fork":false,"pushed_at":"2024-08-12T17:32:39.000Z","size":7597,"stargazers_count":7,"open_issues_count":13,"forks_count":5,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-24T04:08:47.793Z","etag":null,"topics":["data","data-science","datasets","machine-learning","mlr3","openml","r","r-package"],"latest_commit_sha":null,"homepage":"https://mlr3oml.mlr-org.com","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlr-org.png","metadata":{"funding":{"github":"mlr-org"},"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-13T12:05:23.000Z","updated_at":"2025-03-07T21:06:42.000Z","dependencies_parsed_at":"2024-01-09T10:59:58.341Z","dependency_job_id":"65392417-9e1a-43ad-9a15-f08c8c74cbce","html_url":"https://github.com/mlr-org/mlr3oml","commit_stats":{"total_commits":219,"total_committers":6,"mean_commits":36.5,"dds":0.4429223744292238,"last_synced_commit":"ce5012972b7707e9087728841280137ee37692b5"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmlr3oml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmlr3oml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmlr3oml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlr-org%2Fmlr3oml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlr-org","download_url":"https://codeload.github.com/mlr-org/mlr3oml/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248012599,"owners_count":21033226,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-science","datasets","machine-learning","mlr3","openml","r","r-package"],"created_at":"2024-11-06T17:18:49.210Z","updated_at":"2025-04-09T09:32:40.729Z","avatar_url":"https://github.com/mlr-org.png","language":"R","funding_links":["https://github.com/sponsors/mlr-org"],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r, include = FALSE}\nlibrary(\"mlr3\")\nlibrary(\"mlr3oml\")\nlgr::get_logger(\"mlr3\")$set_threshold(\"warn\")\nlgr::get_logger(\"mlr3oml\")$set_threshold(\"warn\")\nset.seed(1)\noptions(datatable.print.class = FALSE, datatable.print.keys = FALSE, mlr3oml.verbose = FALSE)\n```\n\n# mlr3oml\n\nPackage website: [release](https://mlr3oml.mlr-org.com/) | [dev](https://mlr3oml.mlr-org.com/dev/)\n\nOpenML integration to the [mlr3 ecosystem](https://mlr-org.com/).\n\n[![r-cmd-check](https://github.com/mlr-org/mlr3oml/actions/workflows/r-cmd-check.yml/badge.svg)](https://github.com/mlr-org/mlr3oml/actions/workflows/r-cmd-check.yml)\n[![CRAN Status Badge](https://www.r-pkg.org/badges/version-ago/mlr3oml)](https://cran.r-project.org/package=mlr3oml)\n[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3)\n[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)\n\n\n## What is `mlr3oml`?\n\n[OpenML](https://www.openml.org) is an open-source platform that facilitates the sharing and dissemination of machine learning research data.\nAll entities on the platform have unique identifiers and standardized (meta)data that can be accessed via an open-access REST API or the web interface.\n`mlr3oml` allows to work with the REST API through R and integrates [OpenML](https://www.openml.org) with the `mlr3` ecosystem.\nNote that some upload options are currently not supported, use the [OpenML package](https://cran.r-project.org/package=OpenML) package for this.\n\nAs a brief demo, we show how to access an OpenML task, convert it to an `mlr3::Task` and associated `mlr3::Resampling`, and conduct a simple resample experiment.\n\n```{r}\nlibrary(mlr3oml)\nlibrary(mlr3)\n\n# Download and print the OpenML task with ID 145953\noml_task = otsk(145953)\noml_task\n\n# Access the OpenML data object on which the task is built\noml_task$data\n\n# Convert the OpenML task to an mlr3 task and resampling\ntask = as_task(oml_task)\nresampling = as_resampling(oml_task)\n\n# Conduct a simple resample experiment\nrr = resample(task, lrn(\"classif.rpart\"), resampling)\nrr$aggregate()\n```\n\nBesides working with objects with known IDs, data of interest can also be queried using listing functions. Below, we search for datasets with 10 - 20 features, 100 to 10000 observations and 2 classes.\n\n```{r}\nodatasets = list_oml_data(\n  number_features = c(10, 20),\n  number_instances = c(100, 10000),\n  number_classes = 2\n)\n\nhead(odatasets[, c(\"data_id\", \"name\")])\n```\n\nTo retrieve individual datasets, you can use `odt` and either manually construct a new `Task` object using `as_task()` or use it `data.table` format.\n\n```{r}\nodataset = odt(29)\n\n# Dataset as data.table\nstr(odataset$data)\n\n# Creating a new task\notask = as_task(odataset)\notask\n```\n\n## Feature Overview\n\n* Datasets, tasks, flows, runs, and collections can be downloaded from [OpenML](https://www.openml.org) and are represented as `R6` classes.\n* OpenML objects can be easily converted to the corresponding `mlr3` counterpart.\n* Filtering of OpenML objects can be achieved using listing functions.\n* Downloaded objects can be cached by setting the `mlr3oml.cache` option.\n* Both the `arff` and `parquet` filetype for datasets are supported.\n* You can upload datasets, tasks, and collections to OpenML.\n\n## Documentation\n\n* Start by reading the [Large-Scale Benchmarking chapter](https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html) from the `mlr3` book.\n* The [package website](https://mlr3oml.mlr-org.com/dev/) contains a getting started guide.\n* The OpenML [API documentation](https://www.openml.org/apis) is also a good resource.\n\n\n## Bugs, Questions, Feedback\n\n*mlr3oml* is a free and open source software project that\nencourages participation and feedback. If you have any issues,\nquestions, suggestions or feedback, please do not hesitate to open an\n“issue” about it on the GitHub page\\!\n\nIn case of problems / bugs, it is often helpful if you provide a\n“minimum working example” that showcases the behaviour (but don’t\nworry about this if the bug is obvious).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlr-org%2Fmlr3oml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlr-org%2Fmlr3oml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlr-org%2Fmlr3oml/lists"}