{"id":13689243,"url":"https://github.com/dominance-analysis/dominance-analysis","last_synced_at":"2025-05-01T23:33:11.909Z","repository":{"id":33883453,"uuid":"160509963","full_name":"dominance-analysis/dominance-analysis","owner":"dominance-analysis","description":"This package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.","archived":false,"fork":false,"pushed_at":"2023-08-17T12:57:54.000Z","size":4758,"stargazers_count":148,"open_issues_count":5,"forks_count":57,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-09-16T19:22:20.048Z","etag":null,"topics":["classification-model","dominance","dominance-analysis","dominance-statistics","feature-engineering","feature-importance","feature-selection","keydrivers","logistic-regression","multiple-regression","predictor","predictor-importance","pseudo-r-square","r-square","regression-models","relative-importance","shapley-value"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/dominance-analysis/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dominance-analysis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-12-05T11:44:41.000Z","updated_at":"2024-09-06T16:39:25.000Z","dependencies_parsed_at":"2023-10-21T10:56:38.516Z","dependency_job_id":null,"html_url":"https://github.com/dominance-analysis/dominance-analysis","commit_stats":{"total_commits":497,"total_committers":8,"mean_commits":62.125,"dds":0.5251509054325956,"last_synced_commit":"4dc18f7ee00a8e04f6bf2ad387ac543efbd65ccf"},"previous_names":["bhagatsajan0073/dominance_analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominance-analysis%2Fdominance-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominance-analysis%2Fdominance-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominance-analysis%2Fdominance-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominance-analysis%2Fdominance-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dominance-analysis","download_url":"https://codeload.github.com/dominance-analysis/dominance-analysis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224282190,"owners_count":17285786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification-model","dominance","dominance-analysis","dominance-statistics","feature-engineering","feature-importance","feature-selection","keydrivers","logistic-regression","multiple-regression","predictor","predictor-importance","pseudo-r-square","r-square","regression-models","relative-importance","shapley-value"],"created_at":"2024-08-02T15:01:39.714Z","updated_at":"2024-11-12T13:31:08.850Z","avatar_url":"https://github.com/dominance-analysis.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Dominance-Analysis : A Python Library for Accurate and Intuitive Relative Importance of Predictors \n[![PyPI version](https://badge.fury.io/py/dominance-analysis.svg)](https://badge.fury.io/py/dominance-analysis)\n[![Downloads](https://static.pepy.tech/badge/dominance-analysis)](https://pepy.tech/project/dominance-analysis) \n\u003ca href=\"https://github.com/dominance-analysis/dominance-analysis/blob/master/LICENSE\"\u003e\u003cimg alt=\"GitHub license\" src=\"https://img.shields.io/github/license/dominance-analysis/dominance-analysis\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/dominance-analysis/dominance-analysis/stargazers\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/stars/dominance-analysis/dominance-analysis.svg\"\u003e\n\u003c/a\u003e\n\u003ca href=\"https://github.com/dominance-analysis/dominance-analysis/network/members\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/forks/dominance-analysis/dominance-analysis.svg\"\u003e\n\u003c/a\u003e\n\u003ca href=\"https://github.com/dominance-analysis/dominance-analysis/issues\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/issues/dominance-analysis/dominance-analysis.svg\"\u003e\n\u003c/a\u003e\n[![Build Status](https://travis-ci.org/dominance-analysis/dominance-analysis.svg?branch=master)](https://travis-ci.org/dominance-analysis/dominance-analysis)\n[![Build Status](https://dev.azure.com/dominanceanalysis/dominance-analysis/_apis/build/status/dominance-analysis.dominance-analysis?branchName=master)](https://dev.azure.com/dominanceanalysis/dominance-analysis/_build/latest?definitionId=1\u0026branchName=master)\n[![Build status](https://ci.appveyor.com/api/projects/status/fqly6m0p7kuocic1?svg=true)](https://ci.appveyor.com/project/dominance-analysis/dominance-analysis)\n[![Documentation Status](https://readthedocs.org/projects/docs/badge/?version=latest)](https://dominance-analysis.readthedocs.io/en/latest/?badge=latest)\n[![Maintainability](https://api.codeclimate.com/v1/badges/6bb5585a19b168e5a425/maintainability)](https://codeclimate.com/github/dominance-analysis/dominance-analysis/maintainability)\n[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/dominance-analysis/dominance-analysis/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/dominance-analysis/dominance-analysis/?branch=master)\n[![Build Status](https://scrutinizer-ci.com/g/dominance-analysis/dominance-analysis/badges/build.png?b=master)](https://scrutinizer-ci.com/g/dominance-analysis/dominance-analysis/build-status/master)\n[![Code Intelligence Status](https://scrutinizer-ci.com/g/dominance-analysis/dominance-analysis/badges/code-intelligence.svg?b=master)](https://scrutinizer-ci.com/code-intelligence)\n[![Join the chat at https://gitter.im/dominance_analysis/community](https://badges.gitter.im/dominance_analysis/community.svg)](https://gitter.im/dominance_analysis/community?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/6769/badge)](https://bestpractices.coreinfrastructure.org/projects/6769)\n\n*This package is designed to determine relative importance of predictors for both regression and classification models. The determination of relative importance depends on how one defines importance; Budescu (1993) and Azen and Budescu (2003) proposed using dominance analysis (DA) because it invokes a general and intuitive definition of \"relative importance\" that is based on the additional contribution of a predictor in all subset models. The purpose of determining predictor importance in the context of DA is not model selection but rather uncovering the individual contributions of the predictors.*\n\n*In case the target is a continuous variable, the package determines the dominance of one predictor over another by comparing their incremental R-squared contribution across all subset models. In case the target variable is binary, the package determines the dominance over another by comparing their incremental Pseudo R-Squared contribution across all subset models.*\n\n\u003chr\u003e\n\n### Installation\n\nUse the following command to install the package:\n```  \npip install dominance-analysis\n```  \n\u003chr\u003e\n\n## Important Parameters \n \n  * data : Complete Dataset, should be a Pandas DataFrame.   \n  * target : Name of the target variable, it should be present in passed dataset.\n  * top_k : No. of features to choose from all available features. By default, the package will run for top 15 features.\n  * objective : It can take value either 0 or 1. 0 for Classification and 1 for Regression. By default, the package will run for Regression.\n  * pseudo_r2 : It can take one of the Pseudo R-Squared measures - \"mcfadden\",\"nagelkerke\", \"cox_and_snell\" or \"estrella\", where default=\"mcfadden\". It's not needed in case of regression (objective=1).\n  * data_format: It can take value 0, 1 or 2. 0 is for raw data, 1 is when correlation matrix (correlation of predictors with the target variable) is being passed, 2 is when covraiance matrix (covariance of predictors with the the traget variable) is being passed. By default, the package will run for raw data (data_format=0). This parameter is not needed in case of classification.\n\n\u003chr\u003e\n\n### Dominance Analysis - The Significance!\n\nDominance Analysis, according to Azen and Budescu meets three important criteria for measuring relative importance. First, the technique should be defined in terms of its ability to reduce error in predicting the outcome variable. Next, it should permit direct comparison of measures within a model (that is, X\u003csub\u003e1\u003c/sub\u003e is twice as important as X\u003csub\u003e2\u003c/sub\u003e). Finally, the technique should permit inferences concerning an attribute's direct effect (that is, when considered by itself), total effect (that is, when considered with other attributes) and partial effect (that is, when considered with various combinations of other predictors). Hence, Dominance analysis is both robust and intuitive and its interpretation is also very straightforward.\n\n\n### Dominance Analysis - The Math!\n\nDominance Analysis is unique as it measures relative importance in a pairwise fashion, and the two predictors are compared\nin the context of all 2\u003csup\u003e(p−2)\u003c/sup\u003e models that contain some subset of the other predictors. So, if we have a total of 'p' predictors, we will build 2\u003csup\u003ep\u003c/sup\u003e-1 models (all possible subset models) and compute the incremental R\u003csup\u003e2\u003c/sup\u003e contribution of each predictor to the subset model of all other predictors. The additional contribution of a given predictor is measured by the increase in R\u003csup\u003e2\u003c/sup\u003e that results from adding that predictor to the regression model.\n\nLet's consider a scenario when we have 4 predictors; X\u003csub\u003e1\u003c/sub\u003e, X\u003csub\u003e2\u003c/sub\u003e, X\u003csub\u003e3\u003c/sub\u003e and X\u003csub\u003e4\u003c/sub\u003e. We will have to build a total of 2\u003csup\u003e4\u003c/sup\u003e-1 models i.e. 15 models- \u003csup\u003e4\u003c/sup\u003eC\u003csub\u003e1\u003c/sub\u003e = 4 models with only one predictor, \u003csup\u003e4\u003c/sup\u003eC\u003csub\u003e2\u003c/sub\u003e = 6 models with two predictors each, \u003csup\u003e4\u003c/sup\u003eC\u003csub\u003e3\u003c/sub\u003e = 4 models with three predictors each and 1  (\u003csup\u003e4\u003c/sup\u003eC\u003csub\u003e4\u003c/sub\u003e) complete model with all 4 predictors. Thus, the additional contributions of X\u003csub\u003e1\u003c/sub\u003e are computed as the increases in the proportion of variance accounted for when X\u003csub\u003e1\u003c/sub\u003e is added to each subset of the remaining predictors (i.e., the null subset {.}, {X\u003csub\u003e2\u003c/sub\u003e}, {X\u003csub\u003e3\u003c/sub\u003e}, {X\u003csub\u003e4\u003c/sub\u003e}, {X\u003csub\u003e2\u003c/sub\u003eX\u003csub\u003e3\u003c/sub\u003e}, {X\u003csub\u003e2\u003c/sub\u003eX\u003csub\u003e4\u003c/sub\u003e}, {X\u003csub\u003e3\u003c/sub\u003eX\u003csub\u003e4\u003c/sub\u003e} and {X\u003csub\u003e2\u003c/sub\u003eX\u003csub\u003e3\u003c/sub\u003eX\u003csub\u003e4\u003c/sub\u003e}). Similarly, the additional contributions of X\u003csub\u003e2\u003c/sub\u003e are the increases in the proportion of variance accounted for when X\u003csub\u003e2\u003c/sub\u003e is added to each subset of the remaining predictors (i.e., the null subset {.}, {X\u003csub\u003e1\u003c/sub\u003e}, {X\u003csub\u003e3\u003c/sub\u003e}, {X\u003csub\u003e4\u003c/sub\u003e}, {X\u003csub\u003e1\u003c/sub\u003eX\u003csub\u003e3\u003c/sub\u003e}, {X\u003csub\u003e1\u003c/sub\u003eX\u003csub\u003e4\u003c/sub\u003e}, {X\u003csub\u003e3\u003c/sub\u003eX\u003csub\u003e4\u003c/sub\u003e} and {X\u003csub\u003e1\u003c/sub\u003eX\u003csub\u003e3\u003c/sub\u003eX\u003csub\u003e4\u003c/sub\u003e})\n\nBelow is the illustration of formulas used to compute the averaged additional contributions of X\u003csub\u003e1\u003c/sub\u003e and X\u003csub\u003e2\u003c/sub\u003e within model size in the poupulation with four predictors (We use the notation \u003cimg src='images/formula1.JPG'\u003e to represent the proportion of variance in Y that is accounted for by the predictors in the model X. For example,\u003cimg src='images/formula2.JPG'\u003e represents the proportion of variance in Y that is accounted for by the model consisting of X\u003csub\u003e1\u003c/sub\u003e and X\u003csub\u003e3\u003c/sub\u003e. The additional contribution of a given predictor is measured by the increase in proportion of variance that results from adding that predictor to the regression model):\n\n\u003cimg src='images/formulas.JPG'\u003e \n\u003cp align=\"center\"\u003e Table 1\u003c/p\u003e\n\n\nThe measure for proportion of variance that we have used for regression is R\u003csup\u003e2\u003c/sup\u003e but since we don't have R\u003csup\u003e2\u003c/sup\u003e in logsitic regression/classification models, we have used Pseudo R\u003csup\u003e2\u003c/sup\u003e.\n\n\nThe beauty of the math of Dominance Analysis is that the sum of the  overall average incremental R\u003csup\u003e2\u003c/sup\u003e of all predictors is equal to the R\u003csup\u003e2\u003c/sup\u003e of the complete model (model with all predictors). Hence, the total R\u003csup\u003e2\u003c/sup\u003e can be attributed to each predictor in the model. Below is an illustration of Dominance Analysis in the Population for Hypothetical example with four predictors:\n\n\u003cimg src='images/PercentRel.jpg'\u003e\n\u003cp align=\"center\"\u003e Table 2\u003c/p\u003e\n\nIt can bee seen that the Percentage Relative Importance of predictors has been computed by dividing the Overall Average Incremental R\u003csup\u003e2\u003c/sup\u003e contribution of predictors by the R\u003csup\u003e2\u003c/sup\u003e of the complete model. This explains the intuitive nature of Dominance Analysis wherein the overall R\u003csup\u003e2\u003c/sup\u003e of the model can be attributed to individual predictors within the model.\n\n\n\u003chr\u003e\n\n### Pseudo R-Squared for Classification Task / Logistic Regression\n\nMeasures of fit in logistic regression can be classified by those based on sums of squares and those based on maximum likelihood statistics. Reviews of a variety of measures of fit proposed for logistic regression can be found in Amemiya (1981), Menard (2000), Mittlbock and Schemper (1996) and Zheng and Agresti (2000). Given the large number of proposed measures, criteria for defining appropriate R\u003csup\u003e2\u003c/sup\u003e analogues need to he determined. The following criteria, which are also found in the linear regression literature (e.g., Kvilseth. 1985: Van den Burg \u0026 Lewis, 1988), were used to select R\u003csup\u003e2\u003c/sup\u003e analogues for logistic regression:\n1. Boundedness: The measure should vary between a minimum of zero, indicating complete lack of fit, and a maximum of one, indicating perfect fit. \n2. Linear invariance: The measure should be invariant to nonsingular linear transformations of the variables (Ys and Xs). \n3. Monotonicity: The measure should not decrease with the addition of a predictor.\n4. Intuitive Interpretability: The measure of fit is intuitively interpretable, in that it agrees with the scale of the linear case for intermediate values.\n\nBased on these criteria, the following four R\u003csup\u003e2\u003c/sup\u003e analogues were chosen that satisfied at least three of these four properties:\n\n**1. McFadden's Pseudo-R Squared**\n\n McFadden's Pseudo-R squared measure is defined as :\n\n\u003cimg src=\"https://latex.codecogs.com/svg.latex?\\Large\u0026space;R_{McFadden}^{2}=1-\\frac{log(L_{full})}{log(L_{null})}\" title=\"\\Large R_{McFadden}^{2}=1-\\frac{log(L_{full})}{log(L_{null})}\" /\u003e\n\nThis measure satisfies all the four properties.\n\n**2. Nagelkerke Pseudo-R Squared**\n\nNagelkerke Pseudo-R squared measure is defined as :\n\n\u003cimg src=\"https://latex.codecogs.com/svg.latex?\\Large\u0026space;R_{Nagelkerke}^{2}=\\frac{1-\\{\\frac{L_{null}}{L_{full}}\\}^{2/N}}{1-L_{null}^{2/N}}\" title=\"\\Large R_{Nagelkerke}^{2}=\\frac{1-\\{\\frac{L_{null}}{L_{full}}\\}^{2/N}}{1-L_{null}^{2/N}}\" /\u003e\n\nThis measure satisfies three of the four properties and doesn't satisfy the property of Interpretability.\n\n**3. Cox and Snell R-Squared**\n\nCox and Snell Pseudo-R squared measure is defined as :\n\n\u003cimg src=\"https://latex.codecogs.com/svg.latex?\\Large\u0026space;R_{Cox\\\u0026Snell}^{2}=1-\\{\\frac{L_{null}}{L_{full}}\\}^{2/N}\" title=\"\\Large R_{Cox\\\u0026Snell}^{2}=1-\\{\\frac{L_{null}}{L_{full}}\\}^{2/N}\" /\u003e\n\nThis measure satisfies three of the four properties.\n\n**4. Estrella R-Squared**\n\nEstrella Pseudo-R squared measure is defined as :\n\n\u003cimg src=\"https://latex.codecogs.com/svg.latex?\\Large\u0026space;R_{Estrella}^{2}=1-\\{\\frac{LL_{full}}{LL_{null}}\\}^{\\frac{-2}{N}*LL_{null}}\" title=\"\\Large R_{Estrella}^{2}=1-\\}\\frac{LL_{full}}{LL_{null}}\\}^{\\frac{2}{N}*LL_{null}}\" /\u003e\n\nThis measure satisfies all the four properties.\n\nUsing each of these four R\u003csup\u003e2\u003c/sup\u003e analogues, the additional contribution of a given predictor to a specific logistic model can be measured as the change (i.e., increase) in the R\u003csup\u003e2\u003c/sup\u003e analogues when the predictor is added to the model. Even though, all the four measures will give similar results, we recommend using either Estrella's (1998) model fit measure or McFadden's (1974) measure for conducting dominance analysis in logistic regression. We have a slight preference for McFadden's measure (and that is what the package will compute by default) because it is computationally simpler, but both McFadden's and Estrella’s measures satisfy the minimum requirements for an R\u003csup\u003e2\u003c/sup\u003e analogues.\n\n\u003chr\u003e\n\nNote: Since, Dominance Analysis is computationally intensive as it builds all subset model (2\u003csup\u003ep\u003c/sup\u003e-1 models), we have provided the user the flexibility to choose number of top predictors that they want to compute relative importance for. For regression, Top K features are selected based on F-regression and for classification it is based on Chi-Squared statistic. Dominance Analysis can be used in combination with Principal Component Analysis (PCA) or Factor Analysis or any other feature reduction algorithm for getting accurate and intutive importance of predictors.\n\n\u003chr\u003e\n\n### Dominance Statistics\n\nAs described earlier, a relative importance measure should be able to describe a predictor's direct, total and partial effect, therefore in the Dominance Statistics, we have come up with four different types of Dominance measures. **These measures have been conceptualized, defined and formulated by us and are unique to this library**. Below are the definitions and interpretations of the measures:\n\n1. **Interactional Dominance** - This is the incremental R\u003csup\u003e2\u003c/sup\u003e contribution of the predictor to the complete model. Hence, the Interactional Dominance of a particular predictor 'X' will be the diffrence between the R\u003csup\u003e2\u003c/sup\u003e of the complete model and the R\u003csup\u003e2\u003c/sup\u003e of the model with all other predictors except the particular predictor 'X'. \u003cbr\u003e\nConsider a scenario when we have Y as the dependent variable and four predictors X\u003csub\u003e1\u003c/sub\u003e, X\u003csub\u003e2\u003c/sub\u003e, X\u003csub\u003e3\u003c/sub\u003e and X\u003csub\u003e4\u003c/sub\u003e, let  R\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eY.X\u003csub\u003e1\u003c/sub\u003e,X\u003csub\u003e2\u003c/sub\u003e\u003c/sub\u003e be the R\u003csup\u003e2\u003c/sup\u003e of the model between Y and X\u003csub\u003e1\u003c/sub\u003e, X\u003csub\u003e2\u003c/sub\u003e ;\nR\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eY.X\u003csub\u003e1\u003c/sub\u003e,X\u003csub\u003e3\u003c/sub\u003e\u003c/sub\u003e be the R\u003csup\u003e2\u003c/sup\u003e of the model between Y and X\u003csub\u003e1\u003c/sub\u003e, X\u003csub\u003e3\u003c/sub\u003e so on and so forth. In this case, the interactional dominance of predictor X\u003csub\u003e1\u003c/sub\u003e will be R\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eY.X\u003csub\u003e1\u003c/sub\u003e,X\u003csub\u003e2\u003c/sub\u003e,X\u003csub\u003e3\u003c/sub\u003e,X\u003csub\u003e4\u003c/sub\u003e\u003c/sub\u003e - R\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eY.X\u003csub\u003e2\u003c/sub\u003e,X\u003csub\u003e3\u003c/sub\u003e,X\u003csub\u003e4\u003c/sub\u003e\u003c/sub\u003e. \u003cbr\u003e\nHence, interactional dominance can be interpreted as the incremental impact or incremental variability explained by the predictor in presence of all other predictors.\n\n2. **Individual Dominance -** The individual dominance of a predictor is the R\u003csup\u003e2\u003c/sup\u003e of the model between the dependent variable and the predictor. So, the individual dominanace of predictor X\u003csub\u003e1\u003c/sub\u003e will be R\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eY.X\u003csub\u003e1\u003c/sub\u003e\u003c/sub\u003e. \u003cbr\u003e\nHence, individual dominance can be interpreted as the variability explained by the predictor alone or the quantum of impact that a predictor will have in absence of all other predictors.\n\n3. **Average Partial Dominance -** This is average of average incremental R\u003csup\u003e2\u003c/sup\u003e contributions of the predictor to all subset models except complete model and bi-variate (when only one predcitor is present) model. \u003cbr\u003e\nHence, this can be interpreted as the average impact that a predictor has when it is available in all possible combinations with other predictors except the combination when all predcitors are available.\n\n4. **Total Dominance -** The last measure of dominance summarizes the additional contributions of each predictor to all subset models by averaging all the conditional values. In the example on table 3, this consists of averaging the four averaged entries in each column. \u003cbr\u003e\n\nIn below table, we have illustrated the calculation used to arrive at the four measures of dominance.\n\n\u003cimg src='images/Dom Stat.jpg'\u003e\n\u003cp align=\"center\"\u003e Table 3\u003c/p\u003e\n\u003cbr\u003e\nIf we calculate the four measures of dominance from the above example, we will get the following values:\n\u003cimg src='images/Dominance_Analysis.JPG'\u003e\n\u003cp align=\"center\"\u003e Table 4\u003c/p\u003e\n\n### Dominance Levels\nThe following three levels of dominance can be achieved between each pair of predictors in Dominance Analysis: \n* **Complete Dominance** - One predictor is said to completely dominate another predictor if its dominance holds across all possible subset models (that do not include the two predictors under comparison). Back to the four-predictor model, for example, complete dominance of X\u003csub\u003e1\u003c/sub\u003e over X\u003csub\u003e2\u003c/sub\u003e is achieved if the additional R\u003csup\u003e2\u003c/sup\u003e contribution of X\u003csub\u003e1\u003c/sub\u003e is more than that of X\u003csub\u003e2\u003c/sub\u003e to the null model, the model consisting of X\u003csub\u003e3\u003c/sub\u003e, the model consisting of X\u003csub\u003e4\u003c/sub\u003e, and the model consisting of both X\u003csub\u003e3\u003c/sub\u003e and X\u003csub\u003e4\u003c/sub\u003e. In Table 3, we can see that incremental R\u003csup\u003e2\u003c/sup\u003e of X\u003csub\u003e1\u003c/sub\u003e is greater than that of X\u003csub\u003e2\u003c/sub\u003e for all subset models and hence X\u003csub\u003e1\u003c/sub\u003e completely dominates X\u003csub\u003e2\u003c/sub\u003e.\n\u003cbr\u003e If the additional contributions are inconsistent in favoring the same predictor across all subset models, then complete dominance is undetermined while weaker levels of dominance may still be achieved. \n* **Conditional Dominance** - If a predictor’s averaged additional contribution within each model size is greater than that of another predictor, then the first predictor is said to conditionally dominate the latter. Here, the model size is indicated by the number of predictors included in a given model. If a predictor’s averaged additional contribution is greater for some model sizes but not for all, then conditional dominance between the two predictors cannot be established.\n* **General Dominance** - If overall averaged additional R\u003csup\u003e2\u003c/sup\u003e contribution of one predictor is greater than the other then that predictor is said to generally dominate the other. \u003cbr\u003e\n\nThe three levels of dominance (complete, conditional,and general) are related to each other in a hierarchical fashion: Complete dominance implies conditional dominance, which, in turn, implies general dominance. However, for p \u003e 3 the converse may not\nhold; that is, general dominance does not imply conditional dominance and conditional dominance does not necessarily imply complete dominance.\n\n\u003chr\u003e\n\n*Complete code for below examples is available in example folder or the following public kernels on Kaggle:\nRegression - \u003ca href=\"https://www.kaggle.com/iamsajanbhagat/dominance-analysis-on-boston-house-prices-dataset\" target=\"_blank\"\u003eDominane Analysis on Boston House Price Data\u003c/a\u003e \u0026\nClassification- \u003ca href=\"https://www.kaggle.com/iamsajanbhagat/dominance-analysis-on-breast-cancer-dataset\" target=\"_blank\"\u003e Dominance Analysis on Breast Cancer Dataset\u003c/a\u003e*\n\n\u003chr\u003e\n\n### User Guide for computing Relative Importance when the response variable is Continous\n\n**Using Boston Housing Dataset downloaded from: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html**\n\n**Selecting top K features and getting R\u003csup\u003e2\u003c/sup\u003e of the Complete Model**\n\n```\nfrom dominance_analysis import Dominance_Datasets\nfrom dominance_analysis import Dominance\nboston_dataset=Dominance_Datasets.get_boston()\ndominance_regression=Dominance(data=boston_dataset,target='House_Price',objective=1)\n``` \n\u003cimg src='images/Housing_Data_Dominance.JPG'\u003e\n\n\u003chr\u003e\n\n**Incremental R-Squared**\n```\nincr_variable_rsquare=dominance_regression.incremental_rsquare()\n```\n\u003cimg src='images/Housing_Data_Model_Training.JPG'\u003e\n\n\u003chr\u003e\n\n**Plot Incremental R-Squared and the Dominance Curve**\n```\ndominance_regression.plot_incremental_rsquare()\n```\n\u003cimg src='images/bar_boston_house_price.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/pie_boston_house_price.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/WaterFall_Boston.png'\u003e\n\u003chr\u003e\n\n\n**Dominance Statistics (R-Squared)**\n```\ndominance_regression.dominance_stats()\n```\n\u003cimg src='images/housing_data_dominance_stats.JPG'\u003e\n\n\u003chr\u003e\n\n\n**Dominance Level**\n```\ndominance_regression.dominance_level()\n```\n\u003cimg src='images/Dominance_Level_Boston.JPG'\u003e\n\n\u003chr\u003e\n\n\n\n### User Guide for computing Relative Importance when the response variable is Binary\n\n**Breast Cancer Wisconsin (Diagnostic) dataset downloaded from: https://goo.gl/U2Uwz2**\n\n**Selecting top K features and getting Pseudo R\u003csup\u003e2\u003c/sup\u003e of the Complete Model**\n```\nfrom dominance_analysis import Dominance_Datasets\nfrom dominance_analysis import Dominance\nbreast_cancer_data=Dominance_Datasets.get_breast_cancer()\ndominance_classification=Dominance(data=breast_cancer_data,target='target',objective=0,pseudo_r2=\"mcfadden\")\n``` \n\u003cimg src='images/Breast_Cancer_Data_Dominance_Latest.JPG'\u003e\n\n\u003chr\u003e\n\n**Incremental Pseudo R-Squared**\n```\nincr_variable_rsquare=dominance_classification.incremental_rsquare()\n```\n\u003cimg src='images/Breast_Cancer_Data_Model_Training_Latest.JPG'\u003e\n\n\u003chr\u003e\n\n**Plot Incremental Pseudo R-Squared**\n```\ndominance_classification.plot_incremental_rsquare()\n```\n\u003cimg src='images/bar_breast_cancer_latest.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/pie_breast_cancer_latest.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/WaterFall_Breast_Cancer.png'\u003e\n\u003chr\u003e\n\n\n**Dominance Statistics (R-Squared)**\n```\ndominance_classification.dominance_stats()\n```\n\u003cimg src='images/breast_cancer_data_dominance_stats.JPG'\u003e\n\n\u003chr\u003e\n\n**Dominance Level**\n\n```\ndominance_classification.dominance_level()\n```\n\u003cimg src='images/Dominance_Levels_Breast_Cancer.JPG'\u003e\n\n\u003chr\u003e\n\n\n### User Guide for computing Relative Importance when a Correlation Matrix is provided\n\n**Using Boston Housing Dataset downloaded from: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html**\n\n**Getting R\u003csup\u003e2\u003c/sup\u003e of the Complete Model**\n\n```\nfrom dominance_analysis import Dominance_Datasets\nfrom dominance_analysis import Dominance\nboston_dataset=Dominance_Datasets.get_boston()\ncorr_data = boston_dataset.corr()\ndominance_regression=Dominance(data=corr_data,target='House_Price',data_format=1)\n``` \n\u003cimg src='images/Housing_Data_Corr_Dominance.PNG'\u003e\n\n\u003chr\u003e\n\n**Incremental R-Squared**\n```\nincr_variable_rsquare=dominance_regression.incremental_rsquare()\n```\n\u003cimg src='images/Housing_Data_Correlation_Model_Training.PNG'\u003e\n\n\u003chr\u003e\n\n**Plot Incremental R-Squared and the Dominance Curve**\n```\ndominance_regression.plot_incremental_rsquare()\n```\n\u003cimg src='images/bar_boston_house_price_corr.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/pie_boston_house_price_corr.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/WaterFall_Boston_corr.png'\u003e\n\u003chr\u003e\n\n\n**Dominance Statistics (R-Squared)**\n```\ndominance_regression.dominance_stats()\n```\n\u003cimg src='images/housing_data_corr_dominance_stats.PNG'\u003e\n\n\u003chr\u003e\n\n**Dominance Level**\n\n```\ndominance_classification.dominance_level()\n```\n\u003cimg src='images/Dominance_Level_Boston_Correlation.PNG'\u003e\n\n\u003chr\u003e\n\n### User Guide for computing Relative Importance when a Covariance Matrix is provided\n\n**Using Boston Housing Dataset downloaded from: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html**\n\n**Getting R\u003csup\u003e2\u003c/sup\u003e of the Complete Model**\n\n```\nfrom dominance_analysis import Dominance_Datasets\nfrom dominance_analysis import Dominance\nboston_dataset=Dominance_Datasets.get_boston()\ncov_data = boston_dataset.cov()\ndominance_regression=Dominance(data=cov_data,target='House_Price',data_format=2)\n``` \n\u003cimg src='images/Housing_Data_Covariance_Dominance.PNG'\u003e\n\n\u003chr\u003e\n\n**Incremental R-Squared**\n```\nincr_variable_rsquare=dominance_regression.incremental_rsquare()\n```\n\u003cimg src='images/Housing_Data_Covariance_Model_Training.PNG'\u003e\n\n\u003chr\u003e\n\n**Plot Incremental R-Squared and the Dominance Curve**\n```\ndominance_regression.plot_incremental_rsquare()\n```\n\u003cimg src='images/bar_boston_house_price_cov.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/pie_boston_house_price_cov.png'\u003e\n\u003chr\u003e\n\u003cimg src='images/WaterFall_Boston_cov.png'\u003e\n\u003chr\u003e\n\n\n**Dominance Statistics (R-Squared)**\n```\ndominance_regression.dominance_stats()\n```\n\u003cimg src='images/housing_data_cov_dominance_stats.PNG'\u003e\n\n\u003chr\u003e\n\n**Dominance Level**\n\n```\ndominance_classification.dominance_level()\n```\n\u003cimg src='images/Dominance_Level_Boston_Covariance.PNG'\u003e\n\n\u003chr\u003e\n\n### Authors \u0026 License\nThe Dominance Analysis package is based on the concept developed by Azen and Budescu (see references). This package is released under a \u003ca href=\"https://github.com/bhagatsajan0073/dominance_analysis/blob/master/LICENSE\"\u003e MIT License\u003c/a\u003e. Dominance Analysis Python package has been developed by \u003ca href=\"https://github.com/quintshekhar\"\u003e Shashank Shekhar\u003c/a\u003e, \u003ca href=\"https://github.com/bhagatsajan0073\"\u003e Sajan Bhagat\u003c/a\u003e, \u003ca href=\"https://github.com/Vibish\"\u003e Kunjithapatham Sivakumar\u003c/a\u003e and \u003ca href=\"https://github.com/balakolluri\"\u003e Bala Koteshwar Kolluri\u003c/a\u003e . Pull requests submitted to the GitHub Repo are highly encouraged!\n\n\u003chr\u003e\n\n### References\n1. Azen, R. (2000). Inference for predictor comparisons:Dominance analysis and the distribution of R\u003csup\u003e2\u003c/sup\u003e differences. Dissertation Abstracts International B, 61/10, 5616.\n2. Azen, R., Budescu, D. V., \u0026 Reiser, B. (2001). Criticality of predictors in multiple regression. British Journal of Mathematical and Statistical Psychology, 54, 201–225.\n3. Azen, R., Budescu, D. V. (2003). The Dominance Analysis Approach for Comparing Predictors in Multiple Regression. Psychological Methods, 2003, Vol. 8, No. 2, 129–148. https://doi.org/10.1037/1082-989X.8.2.129\n4. Azen, R., Budescu, D. V. (2006). Comparing Predictors in Multivariate Regression Models: An Extension of Dominance Analysis. Journal of Educational and Behavioral Statistics Summer 2006, Vol. 31, No. 2, pp. 157-180. https://doi.org/10.3102/10769986031002157\n5. Azen, R., Traxel, N. (2009). Using Dominance Analysis to Determine Predictor Importance in Logistic Regression. Journal of Educational and Behavioral Statistics September 2009, Vol. 34, No. 3, pp. 319-347. https://doi.org/10.3102/1076998609332754\n6. Budescu, D. V. (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114(3), 542-551. https://doi.org/10.1037/0033-2909.114.3.542\n7. Luo, W., \u0026 Azen, R. (2013). Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis. Journal of Educational and Behavioral Statistics, 38(1), 3-31. https://doi.org/10.3102/1076998612458319\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdominance-analysis%2Fdominance-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdominance-analysis%2Fdominance-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdominance-analysis%2Fdominance-analysis/lists"}