{"id":24537543,"url":"https://github.com/tigthor/r-dataflow-programming","last_synced_at":"2025-03-16T01:23:02.672Z","repository":{"id":56183468,"uuid":"314959132","full_name":"tigthor/r-dataflow-programming","owner":"tigthor","description":"Dataflow Programming for Machine Learning in R","archived":false,"fork":false,"pushed_at":"2020-11-22T04:33:48.000Z","size":4735,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-15T03:46:00.488Z","etag":null,"topics":["dataflow-programming","feedback","learner","machine-learning","mlr3pipelines"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tigthor.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-22T04:29:39.000Z","updated_at":"2024-09-13T13:48:29.000Z","dependencies_parsed_at":"2022-08-15T14:11:00.622Z","dependency_job_id":null,"html_url":"https://github.com/tigthor/r-dataflow-programming","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tigthor%2Fr-dataflow-programming","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tigthor%2Fr-dataflow-programming/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tigthor%2Fr-dataflow-programming/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tigthor%2Fr-dataflow-programming/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tigthor","download_url":"https://codeload.github.com/tigthor/r-dataflow-programming/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243811081,"owners_count":20351650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataflow-programming","feedback","learner","machine-learning","mlr3pipelines"],"created_at":"2025-01-22T14:13:08.697Z","updated_at":"2025-03-16T01:23:02.632Z","avatar_url":"https://github.com/tigthor.png","language":"R","readme":"---\noutput: github_document\n---\n\n# mlr3pipelines \u003cimg src=\"man/figures/logo.png\" align=\"right\" /\u003e\n\nPackage website: [release](https://mlr3pipelines.mlr-org.com/) | [dev](https://mlr3pipelines.mlr-org.com/dev/)\n\nDataflow Programming for Machine Learning in R.\n\n\u003c!-- badges: start --\u003e\n[![tic](https://github.com/mlr-org/mlr3pipelines/workflows/tic/badge.svg?branch=master)](https://github.com/mlr-org/mlr3pipelines/actions)\n[![CRAN](https://www.r-pkg.org/badges/version/mlr3pipelines)](https://cran.r-project.org/package=mlr3pipelines)\n[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3)\n[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)\n\u003c!-- badges: end --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  cache = FALSE,\n  collapse = TRUE,\n  comment = \"#\u003e\"\n)\nset.seed(8008135)\nlibrary(\"paradox\")\nlibrary(\"mlr3\")\nlibrary(\"mlr3pipelines\")\nlibrary(\"mlr3learners\")\nlgr::get_logger(\"mlr3\")$set_threshold(\"warn\")\n```\n\n## What is `mlr3pipelines`?\n\n\nWatch our \"WhyR 2020\" Webinar Presentation on Youtube for an introduction! Find the slides [here](https://github.com/mlr-org/mlr-outreach/raw/master/2020_whyr/slides.pdf).\n\n[![WhyR 2020\nmlr3pipelines](https://img.youtube.com/vi/4r8K3GO5wk4/0.jpg)](https://www.youtube.com/watch?v=4r8K3GO5wk4)\n\n**`mlr3pipelines`** is a [dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming) toolkit for machine learning in R utilising the **[mlr3](https://github.com/mlr-org/mlr3)** package. Machine learning workflows can be written as directed \"Graphs\" that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the **[mlr3tuning](https://github.com/mlr-org/mlr3tuning)** package, it is even possible to simultaneously optimize parameters of multiple processing units.\n\nIn principle, *mlr3pipelines* is about defining singular data and model manipulation steps as \"PipeOps\":\n```{r}\npca        = po(\"pca\")\nfilter     = po(\"filter\", filter = mlr3filters::flt(\"variance\"), filter.frac = 0.5)\nlearner_po = po(\"learner\", learner = lrn(\"classif.rpart\"))\n```\n\nThese pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a `GraphLearner` that behave like any other `Learner` in `mlr3`.\n```{r}\ngraph = pca %\u003e\u003e% filter %\u003e\u003e% learner_po\nglrn = GraphLearner$new(graph)\n```\nThis learner can be used for resampling, benchmarking, and even tuning.\n```{r}\nresample(tsk(\"iris\"), glrn, rsmp(\"cv\"))\n```\n## Feature Overview\n\nSingle computational steps can be represented as so-called **PipeOps**, which can then be connected with directed edges in a **Graph**. The scope of *mlr3pipelines* is still growing; currently supported features are:\n\n* Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering\n* Task subsampling for speed and outcome class imbalance handling\n* *mlr3* *Learner* operations for prediction and stacking\n* Simultaneous path branching (data going both ways)\n* Alternative path branching (data going one specific way, controlled by hyperparameters)\n* Ensemble methods and aggregation of predictions\n\n## Documentation\n\nThe easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:\n\n* [Quick Introduction](https://mlr3book.mlr-org.com/pipelines.html), with short examples to get started\n\n## Bugs, Questions, Feedback\n\n*mlr3pipelines* is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an \"issue\" about it on the GitHub page!\n\nIn case of problems / bugs, it is often helpful if you provide a \"minimum working example\" that showcases the behaviour (but don't worry about this if the bug is obvious).\n\nPlease understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.\n\n## Similar Projects\n\nA predecessor to this package is the [*mlrCPO*-package](https://github.com/mlr-org/mlrCPO), which works with *mlr* 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the *[caret](https://github.com/topepo/caret)* package and the related *[recipes](https://recipes.tidymodels.org/)* project, and the *[dplyr](https://github.com/tidyverse/dplyr)* package.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftigthor%2Fr-dataflow-programming","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftigthor%2Fr-dataflow-programming","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftigthor%2Fr-dataflow-programming/lists"}