{"id":32204165,"url":"https://github.com/stefan-stein/igate","last_synced_at":"2025-10-22T04:51:34.111Z","repository":{"id":56936744,"uuid":"206361964","full_name":"stefan-stein/igate","owner":"stefan-stein","description":"Guided Analytics for Testing Manufacturing Parameters","archived":false,"fork":false,"pushed_at":"2020-10-30T12:52:36.000Z","size":294,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-08T15:48:18.602Z","etag":null,"topics":["igate"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stefan-stein.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-04T16:16:58.000Z","updated_at":"2021-03-08T01:00:47.000Z","dependencies_parsed_at":"2022-08-21T01:10:26.194Z","dependency_job_id":null,"html_url":"https://github.com/stefan-stein/igate","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/stefan-stein/igate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefan-stein%2Figate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefan-stein%2Figate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefan-stein%2Figate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefan-stein%2Figate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stefan-stein","download_url":"https://codeload.github.com/stefan-stein/igate/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stefan-stein%2Figate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280382978,"owners_count":26321423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["igate"],"created_at":"2025-10-22T04:51:31.835Z","updated_at":"2025-10-22T04:51:34.099Z","avatar_url":"https://github.com/stefan-stein.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n# igate\n\n\u003c!-- badges: start --\u003e\n[![Travis build status](https://travis-ci.org/stefan-stein/igate.svg?branch=master)](https://travis-ci.org/stefan-stein/igate)\n[![CRAN Version](https://www.r-pkg.org/badges/version/igate)](https://CRAN.R-project.org/package=igate)\n[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/igate?color=blue)](https://r-pkg.org/pkg/igate)\n\u003c!-- badges: end --\u003e\n\n\n\nThe goal of igate is to provide you with a quick and powerful, yet easy to understand toolbox that lets you extract relevant process parameters from manufacturing data, validate these parameters and find their optimal ranges and automatically create concise reports of the conducted analysis. The igate methodology has been published in [A guided analytics tool for feature selection in steel manufacturing with an application to blast furnace top gas efficiency](https://doi.org/10.1016/j.commatsci.2020.110053).\n\n# Methodology\n\nThe igate package implements the **i**nitial **G**uided **A**nalytics for parameter **T**esting and controland **E**xtraction (iGATE) framework for manufacturing data.\n\nHaving identified a manufacturing ‘problem’ to be investigated, a data set is assembled for a ‘typical’ period of\noperation, i.e. excluding known disturbances such as maintenance or equipment failures. This\ndata set includes the so called *target variable*, a direct indication or proxy for the problem under\nconsideration and the variale whose variation we want to explain. It also includes a number of covariate parameters representing *suspected sources of variation*\n(SSVs), i.e. variables that we consider potentially influetial for the value of the `target`. Parameters with known and explainable relationships with the target variable should be\nexcluded from the analysis, although this can be addressed in an iterative manner though\nsubsequent exclusion and repeating of the process. The iGATE procedure consists of the following seven steps (detailed explanations follow below):\n\n1. Select 8 Best of the Best (BOB) and 8 Worst of the Worst (WOW) products. The number of observations chosen can be changed using the `versus` argument of `igate`/ `categorical.igate`.\n1. Perform the Tukey-Duckworth test for each SSV (see details below).\n1. For each SSV selected by the said test, perform Wilcoxon Rank test.\n1. Extract upper/ lower control limit for kept parameters.\n1. Perform sanity check via regression plot; decide which parameters to keep.\n1. Validate choice of parameters and control limits.\n1. Report findings in standardized format.\n\nSteps 1-4 are performed using the `igate` function for continuous target variables or the `categorical.igate` function for categorical target variables. Especially for categorical targets with few categories `robust.categorical.igate` is a robustified version of `categorical.igate` and should be considered.\n\nWhen running `igate`/ `categorical.igate` with default settings, any outliers for the target variable are excluded and the observations corresponding to the best 8 (B) and worst 8 (W) instances of the target variable are identified. For each of\nthese 16 observations, each SSV is inspected in turn. The distribution of the values of the SSV of the 8 BOB and 8 WOW are analyzed by applying the [Tukey-Duckworth test](https://en.wikipedia.org/wiki/Tukey–Duckworth_test) (see reference in link for original paper). If the critical value returned by the test is larger than 6 (this corresponds to a p-value of less than 0.05), the SSV is retained as being potentially significant.\nThis test was chosen for its simplicity and ease of interpretation and visualization. SSVs failing the test are highly unlikely to be influential whilst SSVs\npassing the test may be influential. The Wilcoxon-Rank test performed in step three of iGATE serves as a possibly more widely known alternative, that might, however, be harder to explain to non-statisticians. The main function of these steps is to facilitate dimensionality reduction in the data set to generate a manageable population for expert consideration.\n\nStep 5 is performed by calling `igate.regressions`, resp. `categorical.freqplot`. These functions produce a regression (for continuous target) resp. frequency (for categorical targets) plot and save it to the current working directory. A domain expert familiar with the manufacturing process should review these plots and decide which parameters to keep for further analysis based on goodness of fit of the data to the plot.\n\nFor the validation step, the production period from which the validation data is selected is dependent on the business situation, but should be from a period of operation consistent with that from which the initial\npopulation was drawn, i.e. similar product types, similar level of equipment status etc. The\nvalidation step then considers all the retained SSV as a collective in terms of good and bad\nbands, and extracts from the validation sample all the records which satisfy the condition that all\nretained SSVs lie within these bands. The expectation is that where all the SSVs lie within the\ngood band, then the target should also correspond to the best performance, and vice versa\nwhere the retained SSVs all lie in the bad bands we expect to see bad performance. The application gives feedback on the extent to\nwhich this criterion is satisfied in order to help the user conclude the exploration and make\nrecommendations for subsequent improvements. Validation is performed via the `validate` function.\n\nWe consider the last step, the reporting of the results in a standardized manner, an integral part of iGATE that ensures that knowledge about past analyses is retained within a company. This is achieved by calling the `report` function.\n\n\n## Installation\n\nYou can install the package directly from CRAN.\n```{r, eval=FALSE}\ninstall.packages(\"igate\")\n```\n\n\nOr you can install the development version from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"devtools\")\ndevtools::install_github(\"stefan-stein/igate\")\n```\n\n## Example\n\nThis is a basic example which shows you how to conduct an iGATE analysis for a continuous target variable. We are using the built in `iris` data set and consider `\"Sepal.Length\"` as out target.\n\n```{r example, eval=FALSE}\nlibrary(igate)\n\nset.seed(123)\nn \u003c- nrow(iris)*2/3\nrows \u003c- sample(1:nrow(iris), n)\ndf \u003c- iris[rows, ]\nresults \u003c- igate(df, target = \"Sepal.Length\", good_end = \"high\", savePlots = TRUE)\nresults\n```\n\nThe significant variables are shown alongside their count summary statistic from the Tukey-Duckworth Test as well as the p-value from the Wilcoxon-Rank test. Also, we see the good and bad control bands as well as several summary statistics to ascertain the randomness in the results (see documentation of `igate` for details). Remember to use the option `savePlots = TRUE` if you want to save the boxplot of the target variable as a png. This png will be needed for producing the final report of the analysis.\n\nFor details on how to conduct the other steps in the iGATE framework, please refer to the package vignette, by running\n\n```{r, eval=FALSE}\nbrowseVignettes(\"igate\")\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstefan-stein%2Figate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstefan-stein%2Figate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstefan-stein%2Figate/lists"}