Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mlr-org/mlr3oml
Connect mlr3 with OpenML
https://github.com/mlr-org/mlr3oml
data data-science datasets machine-learning mlr3 openml r r-package
Last synced: about 2 months ago
JSON representation
Connect mlr3 with OpenML
- Host: GitHub
- URL: https://github.com/mlr-org/mlr3oml
- Owner: mlr-org
- License: lgpl-3.0
- Created: 2019-10-13T12:05:23.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2024-08-12T17:32:39.000Z (4 months ago)
- Last Synced: 2024-10-12T23:42:43.808Z (2 months ago)
- Topics: data, data-science, datasets, machine-learning, mlr3, openml, r, r-package
- Language: R
- Homepage: https://mlr3oml.mlr-org.com
- Size: 7.25 MB
- Stars: 6
- Watchers: 8
- Forks: 5
- Open Issues: 13
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
library("mlr3")
library("mlr3oml")
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("mlr3oml")$set_threshold("warn")
set.seed(1)
options(datatable.print.class = FALSE, datatable.print.keys = FALSE, mlr3oml.verbose = FALSE)
```# mlr3oml
Package website: [release](https://mlr3oml.mlr-org.com/) | [dev](https://mlr3oml.mlr-org.com/dev/)
OpenML integration to the [mlr3 ecosystem](https://mlr-org.com/).
[![r-cmd-check](https://github.com/mlr-org/mlr3oml/actions/workflows/r-cmd-check.yml/badge.svg)](https://github.com/mlr-org/mlr3oml/actions/workflows/r-cmd-check.yml)
[![CRAN Status Badge](https://www.r-pkg.org/badges/version-ago/mlr3oml)](https://cran.r-project.org/package=mlr3oml)
[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3)
[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)## What is `mlr3oml`?
[OpenML](https://www.openml.org) is an open-source platform that facilitates the sharing and dissemination of machine learning research data.
All entities on the platform have unique identifiers and standardized (meta)data that can be accessed via an open-access REST API or the web interface.
`mlr3oml` allows to work with the REST API through R and integrates [OpenML](https://www.openml.org) with the `mlr3` ecosystem.
Note that some upload options are currently not supported, use the [OpenML package](https://cran.r-project.org/package=OpenML) package for this.As a brief demo, we show how to access an OpenML task, convert it to an `mlr3::Task` and associated `mlr3::Resampling`, and conduct a simple resample experiment.
```{r}
library(mlr3oml)
library(mlr3)# Download and print the OpenML task with ID 145953
oml_task = otsk(145953)
oml_task# Access the OpenML data object on which the task is built
oml_task$data# Convert the OpenML task to an mlr3 task and resampling
task = as_task(oml_task)
resampling = as_resampling(oml_task)# Conduct a simple resample experiment
rr = resample(task, lrn("classif.rpart"), resampling)
rr$aggregate()
```Besides working with objects with known IDs, data of interest can also be queried using listing functions. Below, we search for datasets with 10 - 20 features, 100 to 10000 observations and 2 classes.
```{r}
odatasets = list_oml_data(
number_features = c(10, 20),
number_instances = c(100, 10000),
number_classes = 2
)head(odatasets[, c("data_id", "name")])
```To retrieve individual datasets, you can use `odt` and either manually construct a new `Task` object using `as_task()` or use it `data.table` format.
```{r}
odataset = odt(29)# Dataset as data.table
str(odataset$data)# Creating a new task
otask = as_task(odataset)
otask
```## Feature Overview
* Datasets, tasks, flows, runs, and collections can be downloaded from [OpenML](https://www.openml.org) and are represented as `R6` classes.
* OpenML objects can be easily converted to the corresponding `mlr3` counterpart.
* Filtering of OpenML objects can be achieved using listing functions.
* Downloaded objects can be cached by setting the `mlr3oml.cache` option.
* Both the `arff` and `parquet` filetype for datasets are supported.
* You can upload datasets, tasks, and collections to OpenML.## Documentation
* Start by reading the [Large-Scale Benchmarking chapter](https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html) from the `mlr3` book.
* The [package website](https://mlr3oml.mlr-org.com/dev/) contains a getting started guide.
* The OpenML [API documentation](https://www.openml.org/apis) is also a good resource.## Bugs, Questions, Feedback
*mlr3oml* is a free and open source software project that
encourages participation and feedback. If you have any issues,
questions, suggestions or feedback, please do not hesitate to open an
“issue” about it on the GitHub page\!In case of problems / bugs, it is often helpful if you provide a
“minimum working example” that showcases the behaviour (but don’t
worry about this if the bug is obvious).