{"id":27396788,"url":"https://github.com/dachosen1/formulaic","last_synced_at":"2025-04-14T00:22:17.041Z","repository":{"id":54756717,"uuid":"186492223","full_name":"dachosen1/formulaic","owner":"dachosen1","description":"Dynamic Generation and Quality Checks of Formula Objects","archived":false,"fork":false,"pushed_at":"2024-01-27T04:11:30.000Z","size":99144,"stargazers_count":10,"open_issues_count":1,"forks_count":2,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-03-20T09:07:18.457Z","etag":null,"topics":["cran","formula-objects","formula-parser","machine-learning","quality-check","r","statistical-models"],"latest_commit_sha":null,"homepage":"https://dachosen1.github.io/formulaic/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dachosen1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-13T20:39:29.000Z","updated_at":"2025-02-16T17:44:44.000Z","dependencies_parsed_at":"2022-08-14T01:50:53.164Z","dependency_job_id":null,"html_url":"https://github.com/dachosen1/formulaic","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dachosen1%2Fformulaic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dachosen1%2Fformulaic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dachosen1%2Fformulaic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dachosen1%2Fformulaic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dachosen1","download_url":"https://codeload.github.com/dachosen1/formulaic/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248800048,"owners_count":21163404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cran","formula-objects","formula-parser","machine-learning","quality-check","r","statistical-models"],"created_at":"2025-04-14T00:22:16.181Z","updated_at":"2025-04-14T00:22:17.008Z","avatar_url":"https://github.com/dachosen1.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# formulaic\n[![CodeFactor](https://www.codefactor.io/repository/github/dachosen1/formulaic/badge/master)](https://www.codefactor.io/repository/github/dachosen1/formulaic/overview/master)\n![R-CMD-check](https://github.com/dachosen1/formulaic/workflows/R-CMD-check/badge.svg)\n[![codecov](https://codecov.io/gh/dachosen1/formulaic/branch/master/graph/badge.svg)](https://codecov.io/gh/dachosen1/formulaic)\n[![license](https://img.shields.io/badge/license-GPL--3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html)\n[![CRAN status](https://www.r-pkg.org/badges/version/formulaic)](https://CRAN.R-project.org/package=formulaic)\n[![](http://cranlogs.r-pkg.org/badges/grand-total/formulaic?color=blue)](https://cran.r-project.org/package=formulaic)\n[![](https://cranlogs.r-pkg.org/badges/formulaic)](https://cran.r-project.org/package=formulaic)\n\n\n## Overview \n\n**Formulaic** is useful in creating a dynamic formula with multiple features. It not only diminishes the time required for modeling and implementing, but also enriches the quality of the result.\nMany statistical models and analyses in R are implemented through formula objects. The formulaic package creates a unified approach for programmatically and dynamically generating formula objects in R. Users may specify the inputs and outcomes of a model directly, search for variables to include based upon naming patterns, and identify variables to exclude. A wide range of quality checks is implemented to identify issues such as misspecified variables, duplication, a lack of contrast in the inputs, and a large number of levels in categorical data. These issues are documented and reported in a manner that provides greater accountability and useful information to guide the investigator's choices in selecting features.\nDynamically generated formula objects can enable the development of applications that incorporate a user's inputs for selecting the variables of a model and the subsets of data to include. The quality checks enable a programmatic reduction of the inputs to those that are appropriate for inclusion in the model. By automatically limiting these selections, many models that would lead to errors or intractable computations may be proactively avoided. By performing this process in the creation of the formula, all statistical models and analyses relying on formula objects can easily make use of these features. Furthermore, the list of quality checks can be easily expanded to incorporate additional concerns.\n\n## Install the current release from CRAN:\n```r\ninstall.packages('formulaic')\n```\n\n## Install the development version from GitHub:\n```r\ndevtools::install_github('dachosen1/formulaic')\n```\n\n## Usage \n\nFormulaic package has two main functions – **create.formula** and **reduce.existing.formula** – and one subsidiary function, **add.backticks**. The main purpose of developing the package is to help users to build a robust model faster and more convenient.\n \n**create.formula** automatically creates a formula from a provided list of input variables and the output variable. The variables will undergo a series of qualification tests such as automatic variable/categories reduction, typo, duplication, and lack of contrasted features elimination, etc. to make sure that a given feature is used for modeling. This will reduce the time to build a model and set the users free from the trivial maneuver: manually inputting variables for modeling. The outcome of this formula can be used in a wide range from simple linear regression to any machine learning techniques such as random forest, neural network, etc. \n\nThe principal advantages of using create.formula are as followed:\n \n1) Being able to dynamically generate a formula from a vector of inputs, without necessarily having to spell them all out by name. \n\n2) Adding variables by searching for patterns.\n \n3) Simple integration of interactions.\n\n4) Easy removal of specific variables. \n \n5) Quality checks that resolve a variety of issues -- typos, duplication, lack of contrast, etc. -- while providing a transparent explanation.\n \n**reduce.existing.formula** Trims existing formula . Users plug an existing formula into the function; then, it will undergo the same test as create.formula.\n \n**add.backticks** applies backticks to the variables that require backticks to be employed in a formula as default. Users can also add backticks to all the variables; however, it is not necessary.   \n\n \n### Example \n \n``` r \nn \u003c- 10\ndd \u003c- data.table::data.table(w = rnorm(n= n), x = rnorm(n = n), pixel_1 = rnorm(n = n))\ndd[, 'pixel 2' := 0.3 * pixel_1 + rnorm(n)]\ndd[, pixel_3 := 0.3 * pixel_1 + rnorm(n)]\ndd[, item_1 := 0.3 * pixel_3 + rnorm(n)]\ndd[, item_2 := 0.3 * pixel_3 + rnorm(n)]\ndd[, y := 5 * x + 3 * pixel_1 + 2 * pixel_3 + rnorm(n)]\n```\nThe resulting script create a data.table with 8 unique features.\n\n```r \nnames(dd)\n\"w\"       \"x\"       \"pixel_1\" \"pixel 2\" \"pixel_3\" \"item_1\"  \"item_2\"  \"y\"  \n```\nTraditionally, creating a formula in R required the user to select the desired variables or use the y~ . notation to select all the features. **`formulaic`** is useful to choose variables programmatically to include and perform a quality check against the input.\n\n**use case:** Creating a model with all pixel patterns \n\n``` r \n# create formula object \nformula1 \u003c- create.formula(outcome.name = \"y\", input.names = c(\"x\",\"Random error\", \"y\"), input.patterns = c(\"pix\"), dat = dd)\n\nformula1$formula\ny ~ x + pixel_1 + `pixel 2` + pixel_3\n```\nThe result is a formula object of all the pixel variables and the input x. Notice that the 'Random error' variable was automatically excluded from the output, and a backtick was automatically added to the variable pixel 2. Since the independent variable y was included as a feature, it was excluded from the output formula. \n\nDetails of the variables included are provided in the inclusion table \n\n```r \n# inclustion table\n\u003e formula1$inclusion.table\n       variable   class order specified.from exclude.user.specified exclude.not.in.names.dat exclude.matches.outcome.name include.variable\n1:            x numeric     1    input.names                  FALSE                    FALSE                        FALSE             TRUE\n2: Random error    \u003cNA\u003e     2    input.names                  FALSE                     TRUE                        FALSE            FALSE\n3:            y numeric     3    input.names                  FALSE                    FALSE                         TRUE            FALSE\n4:      pixel_1 numeric     4 input.patterns                  FALSE                    FALSE                        FALSE             TRUE\n5:      pixel 2 numeric     5 input.patterns                  FALSE                    FALSE                        FALSE             TRUE\n6:      pixel_3 numeric     6 input.patterns                  FALSE                    FALSE                        FALSE             TRUE\n\n# implement formula object\nmodel \u003c- lm(formula = formula1, data = dd)\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdachosen1%2Fformulaic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdachosen1%2Fformulaic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdachosen1%2Fformulaic/lists"}