incremental net effects in multiple regression

This article was downloaded by: [Nipissing University]On: 06 October 2014, At: 03:59Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of MathematicalEducation in Science and TechnologyPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tmes20

Incremental net effects in multipleregressionStan Lipovetsky & Michael Conklina GfK Custom Research Inc. , 8401 Golden Valley Road,Minneapolis, MN 55427, USAb GfK Custom Research Inc. , 8401 Golden Valley Road,Minneapolis, MN 55427, USA E-mail:Published online: 10 Aug 2006.

To cite this article: Stan Lipovetsky & Michael Conklin (2005) Incremental net effects in multipleregression, International Journal of Mathematical Education in Science and Technology, 36:4,361-373, DOI: 10.1080/00207390512331325941

To link to this article: http://dx.doi.org/10.1080/00207390512331325941

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/tmes20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00207390512331325941

http://dx.doi.org/10.1080/00207390512331325941

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

International Journal of Mathematical Education inScience and Technology, Vol. 36, No. 4, 2005, 361–373

Incremental net effects in multiple regression

STAN LIPOVETSKY* and MICHAEL CONKLIN

GfK Custom Research Inc., 8401 Golden Valley Road,Minneapolis, MN 55427, USA

(Received 13 February 2004)

A regular problem in regression analysis is estimating the comparativeimportance of the predictors in the model. This work considers the ‘net effects’,or shares of the predictors in the coefficient of the multiple determination, whichis a widely used characteristic of the quality of a regression model. Estimationof the net effects can be a difficult task because multicollinearity among theregressors can produce negative inputs to multiple determination. This papersuggests estimating the incremental net effects as subsequent marginal inputs tothe coefficient of multiple determination, and it is shown that the results coincidewith estimation by cooperative game theory. This approach guarantees positiveand interpretable net effects, which offers a better interpretation of the regressionresults.

1. Introduction

In applied regression analysis the net effect values are widely used to evaluatethe regressors’ individual importance. Net effects are the predictors’ shares inthe coefficient of multiple determination R2, which is one of the main characteristicof regression quality in practical research [1–5]. Net effect is a combination of thedirect effect of a variable (as measured by its regression coefficient squared) andthe indirect effects (measured by the combination of its correlations with othervariables). However, the net effect values are influenced by the collinear redundancyin the data. Multicollinearity is not important for prediction of the criterion variable,but it can have detrimental effects in the analysis of the influence of individualvariables on the criterion variable: parameter estimates could fluctuate wildlywith negligible change in the sample; they could have signs opposite to signs ofeasily understood pair correlations; theoretically important variables could haveinsignificant coefficients; and net effects can be of negative signs [6–8]. But even inthe presence of multicollinearity, it is often desirable to keep all possible variables inthe model and estimate their comparative importance relative to the dependentvariable.

We consider evaluation of the net effects by marginal increments to multipledetermination from each predictor, that produce clear results for net effect estima-tion, even with collinear regressors. This approach has several advantages over

International Journal of Mathematical Education in Science and TechnologyISSN 0020–739X print/ISSN 1464–5211 online # 2005 Taylor & Francis Group Ltd

http://www.tandf.co.uk/journalsDOI: 10.1080/00207390512331325941

*Corresponding author. Email: [email protected]

Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

the traditional evaluation of the contribution of regressors. For example, highlycorrelated attributes will tend to have similar incremental net effects, but stepwiseregression techniques tend to choose one of the highly correlated variablesand exclude the others. Since we are trying to make specific recommendations,this arbitrary variable choice by regression modelling is counterproductive. If weare unsure of the cause of variation of the dependent variable, it is better torecommend attending to both attributes instead of picking one. Such an approachis described in [9–12], and in the comprehensive review of related techniques byJohnson [13].

We further show that this approach corresponds to application of the coopera-tive game theory to regression analysis. We can think of the particular model as away of building coalitions among players (predictors) to maximize the total value(quality of fitting). Then we apply the so-called Shapley Value imputation to producethe net effect contributions. This method of arbitration in many players’ coalitionswas introduced by Shapley [14] and is described in numerous works on thecooperative games theory [15–19]. Shapley partitioning is based on the axioms ofsymmetry (or anonymity) of players, dummy (or zero value player), and additivity.The latter two axioms are also known in the form of ‘carriers’ (or effectiveness),and ‘linearity’ (or aggregation). In numerous developments and generalizationsit has been shown that these axioms can be weakened (see [20–23]). For ourpurposes the Shapley Value presents a useful and convenient instrument of solvingpractical problems of regression analysis. In our problem the predictors takethe role of the players, and the goal of the game is the partitioning of the qualityof the fitting data. We can interpret the regressors as player-representatives ofthe real players (respondents, whose opinions constitute observations by theattributes). The actual players (respondents) via the player-representatives (attri-butes) define the results of the arbitration (or the model quality distributed by theattributes of influence) because their cumulative opinions are expressed in theattributes. We use Shapley Value in the assumption of the transferable utility that,in the case of regression modelling, corresponds to substitution of one of thecorrelated predictors by another one in their mutual influence on the dependentvariable. This approach provides convenient, stable and easily interpretable resultsfor the predictors’ contribution to a regression. An interesting way to interpret theShapley Value imputation can be found in the information theory of ‘lower entropy’evaluations [24]. Applications of Shapley Value analysis to practical problems can befound in Conklin and Lipovetsky [25] and Conklin et al. [26].

This paper is organized as follows: in Section 2 we discuss the estimationof predictor influence shares. In Section 3 we consider Shapley Value analysisof regression quality. A numerical example is given in Section 4, and Section 5summarizes findings.

2. Marginal increments in multiple determination

Let us consider briefly some relations of least squares (LS) regression modelling.For the standardized variables (dependent y, and n predictor variables x1, . . . , xn)a multiple linear regression with added noise is

y ¼ a1x1 þ a2x2 þ � � � þ anxn þ e ð1Þ

362 S. Lipovetsky and M. Conklin

Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

where e denotes deviations from the model, and a are coefficients of the standardizedregression. The LS objective of the squared deviations is

S2 ¼XNi¼1

ð yi�a1xi1 � a2xi2 � � � � � anxinÞ2

ð2Þ

where observations are denoted by i¼ 1, . . . ,N. Minimizing equation (2) by param-eters of the model yields a normal system of equations in matrix form:

C~aa ¼ ~rr ð3Þ

where C¼X 0X is a matrix of correlations rij between the predictors, ~rr ¼ X 0y is avector of n correlations ryj of y with each xj variable, ~aa is the vector of regressioncoefficients, X denotes a design matrix of N by n order, y is the vector-column ofthe order N of observations by dependent variable, and X’ is the transposed matrix.The solution of equation (3) is

~aa ¼ C�1~rr ð4Þ

where C�1 is an inverted correlation matrix. Substituting coefficients of regression (4)into the objective (2) yields the residual sum of squared errors S2.

The characteristic of the regression quality is presented by the coefficient ofmultiple determination that can be expressed in one of the following forms:

R2 ¼ 1� S2 ¼ ~rr 0 ~aa ¼ ~rr 0C�1~rr ¼ ~aa 0C~aa ð5Þ

where prime denotes transposition. This coefficient belongs to the range from zero toone. Items ryjaj in total R2 are called the net effects (NEF ) of each jth predictor:

R2 ¼Xnj¼1

ryjaj ¼Xnj¼1

NEFj ð6Þ

From (5) we can represent R2 as follows:

R2 ¼ ~aa 0C~aa ¼Xj

a2j þ ajXk 6¼j

rjkak

!ð7Þ

where each net effect ryjaj (6) is divided into direct and indirect inputs:

ðNEF directÞj ¼ a2j , ðNEF indirectÞ ¼ ajXk6¼j

rjkak ð8Þ

Multicollinearity can change a sign aj in the multiple regression in comparison withthe pairwise correlation ryj between y and xj variables. Then the jth net effect ryjaj inR2 (equation (6)) is negative, so the indirect effect is negative and bigger by absolutevalue than the positive direct effect (equation (8)). But a negative net effect does notmean that this variable is not useful in the regression. In fact, any additional variableincreases the coefficient of multiple determination. Thus, the possibility of the neteffects being negative only means that the widely used definition (6) only poorlycharacterises the variables contribution, and it needs to be improved.

Let us elaborate a better definition for the predictors’ shares in the quality ofa regression model. Consider, for instance, the influence of the last variable xn in

Incremental net effects in multiple regression 363

Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

regression (1). The total matrix C (equation (3)) of correlations among all the x canbe presented in block form:

C ¼A ~��~�� 0 1

� �ð9Þ

where A and ~�� are a matrix and vector of n� 1 order of correlations of the first n� 1variables x1, . . . , xn�1 among themselves and with the last variable xn, respectively.Inverting the block-matrix [27] we obtain:

C�1 ¼ðA�1 þ q�1 ~bb ~bb 0Þ �q�1 ~bb

�q�1 ~bb 0 q�1

!ð10Þ

where we denote:

~bb ¼ A�1 ~��, q ¼ 1� ~��0 ~bb ð11Þ

Using equations (11) we see that ~bb has a structure similar to (4), so it is a vector ofcoefficients in the regression xn of all the other n� 1 variables x1, . . . , xn�1. In Yule’snotation [28, 29] such a regression can be written as

xn ¼ bn1:x1 þ bn2:x2 þ � � � þ bn,n�1:xn�1 ð12Þ

where the dot means all the other variables, so in a more explicit form bn1¼bn123 . . . n�1, etc. The scalar product ~��0 ~bb in (11) has the structure of the multipledetermination (5) constructed for the regression (12), so we can denote it as R2

n:ð�nÞ,that is a multiple determination in the model of the nth variable xn by all the other xwithout xn. The constant q is the residual sum of deviations in the model (12) and isrepresented as

q ¼ 1� R2n:ð�nÞ ð13Þ

Now using equation (10) in equation (4) we rewrite the solution for multipleregression (1) in the block form:

~aa ¼ C�1~rr ¼ðA�1 þ q�1 ~bb ~bb0Þ �q�1 ~bb

�q�1 ~bb0 q�1

!~rryð�nÞ

ryn

!

¼A�1~rryð�nÞ � q�1ðryn � ~bb0~rryð�nÞÞ

~bb

q�1ðryn � ~bb0~rryð�nÞÞ

!ð14Þ

where we split the vector ~rr of all n correlations between y and all the x to the vector~rryð�nÞ of n� 1 correlations of y with all the x except the last one and the componentof this last correlation ryn of y with xn. In (14) the expression A�1~rryð�nÞ is a solutionfor the regression of y by n� 1 predictors in the model:

y ¼ �1x1 þ �2x2 þ � � � þ �n�1xn�1 ð15Þ

so we can use the notation

~�� ¼ A�1~rryð�nÞ ð16Þ

for the coefficients of regression (15). The left-hand side of formula (14) denotes avector of n coefficients of regression (1). The right-hand side of (14) is expressed asa vector of n� 1 order (it shows how to change coefficients of the model (15) to


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

construct the first n� 1 coefficients of the model (1) as a linear combination oftwo solutions ~�� (16) and ~bb (11)), and by the additional an nth element that definesthe last coefficient of regression (1).

Multiplying the vector-row ~rr 0 ¼ ð~rr 0yð�nÞ, rynÞ by the vector (14) we obtain thecoefficient of multiple determination (5):

R2 ¼~rryð�nÞ

ryn

!0A�1~rryð�nÞ � q�1ðryn � ~bb 0~rryð�nÞÞ

~bb

q�1ðryn � ~bb 0~rryð�nÞÞ

0@

1A

¼ ~rr 0yð�nÞA�1~rryð�nÞ þ q�1 ryn � ~bb 0~rryð�nÞ

� �2ð17Þ

In the first term on the right-hand side of (17) we recognize the coefficient of multipledetermination of the model (15), (16) of y by n�1 regressors (without the last one),so we can write it as

R2y:ð�nÞ ¼ ~rr 0yð�nÞ

~�� ¼ ~rr 0yð�nÞA�1~rryð�nÞ ð18Þ

Thus, we can present the coefficient of multiple determination of n regressors (17) bythat of n� 1 of them without the last one (18) as

R2 ¼ R2y:ð�nÞ þ ryn � ~bb0~rryð�nÞ

� �2.1� R2

n:ð�nÞ

� �ð19Þ

where the definition (13) is used. The expression (19) shows that a positive incrementin the multiple determination is achieved by including an additional predictor inmodel (15) with n� 1 variables, so it transforms to the model (1) with n variables.

The last row in (14) shows that

an ¼ q�1 ryn � ~bb0~rryð�nÞ

� �ð20Þ

so the relation (19) can be represented as

R2 ¼ R2y:ð�nÞ þ a2n 1� R2

n:ð�nÞ

� �ð21Þ

Thus, the coefficient of multiple determination in model (1) with n variables canbe decomposed to the coefficient R2

y:ð�nÞ in the model without the nth variable (15)and the increment defined by the squared coefficient a2n for the last variable (1)adjusted by the residual sum (13) in the model (12) of the nth regressor by all the restof them. More generally, taking any jth regressor in place of the last one, werepresent (21) as

R2 ¼ R2y:ð�jÞ þUj ð22Þ

where R2y:ð�jÞ is a multiple determination in the regression of y by n� 1 predictors

without variable xj, Uj is the increment defined by the jth coefficient of regression (1)and by the multiple determination R2

j:ð�jÞ in the regression of xj by all n� 1 otherpredictors:

Uj ¼ a2j 1� R2j:ð�jÞ

� �ð23Þ

Such a measure of predictor importance in multiple regression is considered inDarlington [30] and Harris [3]. The increments (23) can be compared across differentregressors to find the most influential among them.


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

Let us consider how to interpret the increments (23). Taking the scalar product~bb 0~rryð�nÞ in (20), we see that it is composed of the vector of coefficients in regression(12) and the vector ~rryð�nÞ ¼ X 0

ð�nÞy of correlations between y and n� 1 regressorswithout the last one. By X 0

�n we denote the transposed design matrix without the nthpredictor. This scalar product in (20) is

~bb0~rryð�nÞ ¼~bb0X 0

ð�nÞy ¼ ~xx0ny � ~rryn ð24Þ

where ~xxn denotes the theoretical values of the nth predictor estimated via itsregression (12) by the other predictors. So ~rryn (24) makes sense of the correlationbetween y and the theoretical value of the nth regressor (12). Rewriting expression(20) for any coefficient of regression yields:

aj ¼ ryj � ~rryj� �

1� R2j:ð�jÞ

� ��1

ð25Þ

where ~rryj are correlations of y with the theoretical values of predictors obtained intheir mutual regressions of each jth by all the rest of them, and R2

j:ð�jÞ are multipledeterminations in these models. Formula (25) suggests an interesting interpretationof regression coefficients – each aj equals the correlation of y with the residualdeviation xj � ~xxj divided by the residual sum of squares 1� R2

j:ð�jÞ in the regression xjby the rest of the predictors. The second term in (25) equals the so-called varianceinflation factor, or VIF [31, 5]. The VIF value for each regressor can be found by thediagonal elements of the correlation matrix of all the x:

VIFj ¼ 1� R2j:ð�jÞ

� ��1

¼ ðC�1Þjj ð26Þ

Using relations (25) and (26) we obtain several representations of the expression (23):

Uj ¼ðryj � ~rryjÞ

2

1� R2j:ð�jÞ

¼ a2j VIF�1j ¼ ðNEF directÞjVIF

�1j ð27Þ

where the direct net effect is defined in (8). We see that increments to the coefficientof multiple determination are equal to the direct net effects adjusted by the varianceinflation factors.

By averaging all n expressions (22) for the marginal contributions from theregressors, we obtain the coefficient of multiple determination expressed in asymmetric form:

R2 ¼1

n

Xnj¼1

R2y:ð�jÞ þ

Xnj¼1

1

nUj

� �ð28Þ

with increments defined in expression (27). Each coefficient R2y:ð�jÞ corresponds to

a regression y by all the x except the jth one. So each R2y:ð�jÞ contains contributions

from all the x without xj. This means that each coefficient R2y:ð�jÞ in turn can be

represented by the expression (28) for its components of multiple determinations inregressions by subsets of n-2 regressors and by subsequent increments from theseregressors. Combining all the marginal increments related to each jth variable weobtain the share of this variable’s importance in the multiple regression. We interpretthese shares as the modified net effects. Due to the structure (28) with always positivecontributions (27), the modified net effects are also always positive, independently ofthe possible influence of multicollinearity.


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

For a more clear exposition of the results, let us consider some simple examples.In the case of just one predictor in the model, y¼ a1x1, the matrix C (equation (3))degenerates to the constant 1, so the solution (4), (5) is

aj ¼ ry1, R2 ¼ r2y1 ð29Þ

and the net effect coincides with the coefficient of determination. Using the modifiednet effect we see that the first sum in expression (28) is R2

y:ð�1Þ ¼ 0, and the secondsum is reduced to U1 ¼ a21 ¼ r2y1, which coincides with the result (29).

In the case of two predictors in the regression y¼ a1x1þ a2x2 , the normal system(3) is:

1 r12r12 1

� �a1a2

� �¼

ry1ry2

� �ð30Þ

so the solution (4) is

a1 ¼ry1 � ry2r12

1� r212, a2 ¼

ry2 � ry1r12

1� r212ð31Þ

The coefficient of multiple determination (equation (5)) in this case is

R2 ¼ ry1a1 þ ry2a2 ¼r2y1 þ r2y2 � 2ry1ry2r12

1� r212ð32Þ

By the regular definition of the net effects (equation (6)) as inputs ry1a1 and ry2a2,each of which can be negative if the coefficients (31) and the pair correlations areof opposite signs.

The case n¼ 2 reveals another reason to see the regular net effect as a poormeasure for the variable contribution to the regression. As was noted in Kendall andStuart [29, chapter 27.27], if the correlation of one cofactor with the criterionvariable is zero, for instance, ry2¼ 0 while r12 6¼ 0, then the multiple determination(32) is reduced to:

R2 ¼ ry1a1 þ ry2a2 ¼r2y1

1� r212þ 0 � NEF1 þNEF2 ð32aÞ

This value is always bigger than the coefficient of determination in the pair regressionby the first variable, R2 > r2y1, so the second predictor plays its role in improving theregression fitting. It can be explained by a better adjustment of the dependentvariable by a linear combination of two predictors with two varying parameters ofthe regression even if one of the x is not directly correlated with y. However, judgingby the net effects (expression (32a)), only the first variable constitutes the coefficientof multiple determination. Thus, the input of the second predictor should beevaluated more adequately than by a regular net effect.

Let us consider the incremental net effects in the case n¼ 2. In this case (28) canbe divided into two inputs associated with the contribution of each predictor:

R2 ¼1

2R2

y:ð�2Þ þ1

2a21 1� r212� ��

þ1

2R2

y:ð�1Þ þ1

2a22 1� r212� ��

¼1

2r2y1 þ

ðry1 � ry2r12Þ2

1� r212

!þ1

2r2y2 þ

ðry2 � ry1r12Þ2

1� r212

!� INEF1 þ INEF2: ð33Þ


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

The incremental net effects (INEF ) in definition (33) are positive and their sumequals the coefficient of multiple determination (32). For the example from Kendalland Stuart [29, Chapter 27.27] with ry2¼ 0 and r12 6¼ 0, the incremental net effects(33) are reduced to

R2 ¼ INEF1 þ INEF2 ¼ r2y11� r212=2

1� r212þ r2y1

r212=2

1� r212, ð33aÞ

so both variable inputs are positive. In comparison with regular net effects (32a),incremental net effects (33a) are more reasonable estimates of the predictors’contribution to the coefficient of multiple determination. By (33a) we see that onlyif the predictors are not correlated, r12 ¼ 0, is the input from the second predictorzero and the coefficient of multiple determination coincides with the determination(29) in the pair regression.

The relations (33a) could produce the impression that when r12 is close to zero themultiple determination and the net effects can be greater than one. But the values ofany three correlations are restricted by the requirement that their total correlationmatrix is non-negative definite, so their Gram determinant is non-negative:

1þ 2ry1ry2r12 � r2y1 � r2y2 � r212 � 0 ð34Þ

For the given values of any two correlations, this quadratic inequality yields therange for the third correlation, for instance

ry2r12 �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2y2

� �1� r212� �r

� ry1 � ry2r12 þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2y2

� �1� r212� �r

ð35Þ

In the case ry2¼0, the correlation ry1 is restricted by the inequality r2y1 � 1� r212, sousing its largest value in (33a) we reduce it to

R2 ¼ INEF1 þ INEF2 ¼ 1� r212=2� �

þ r212=2, ð33bÞ

so both net effects are less than one.For the next example, n¼ 3, the expression (28) for R2 in explicit form is:

R2¼1

3R2

y:23þR2y:13þR2

y:12

� �þ1

3a2y1:23 1�R2

1:23

� �þa2y2:13 1�R2

2:13

� �þa2y3:12 1�R2

3:12

� �� ð36Þ

Items in the second parentheses (36) correspond to the increments Uj (28) forj¼ 1, 2,3, and the items in the first parentheses we decompose further, using thesame rule (28):

R2y:23 ¼

1

2R2

y:2 þ R2y:3

� �þ1

2a2y2:3 1� R2

2:3

� �þ a2y3:2 1� R2

3:2

� �� ð37aÞ

R2y:13 ¼

1

2R2

y:1 þ R2y:3

� �þ1

2a2y1:3 1� R2

1:3

� �þ a2y3:1 1� R2

3:1

� �� ð37bÞ

R2y:12 ¼

1

2R2

y:1 þ R2y:2

� �þ1

2a2y1:2 1� R2

1:2

� �þ a2y2:1 1� R2

2:1

� �� ð37cÞ

In the formulae (36) and (37) indices in multiple determinations show specificallywhich model is used. For example, R2

y:23 corresponds to the model of y by x2 and x3;R2

3:12 is for the model of x3 by x1 and x2; or R22:1 is for the model of x2 by x1, etc. Also,

coefficients a in (36) are the parameters of regression y ¼ ay1:23x1 þ ay2:13x2 þ ay3:12x3in Yule’s notation (analogue to those used in (12)). Similarly, coefficients of the


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

model y ¼ ay1:2x1 þ ay2:1x2 are used in (37c), etc. Substitution of (37) into (36) yieldsdecomposition of the multiple determination by increments of all orders, and takingfrom it the items related to each predictor we obtain their incremental net effects. Forinstance, the first of them is:

INEF1 ¼1

3R2

y:1 þ1

6a2y1:2 1� R2

1:2

� �þ a2y1:3 1� R2

1:3

� �� þ1

3a2y1:23 1� R2

1:23

� �ð38Þ

and the other two incremental net effects are defined similarly. The total of all threecoincides with the multiple determination value. It can be easily seen if in (38) werepresent the increments (23) via the differences of the subsequent coefficients ofmultiple determinations (22):

INEF1 ¼1

3R2

y:1 þ1

6R2

y:12 � R2y:2

� �þ R2

y:13 � R2y:3

� �� þ1

3R2

y:123 � R2y:23

� �ð39aÞ

and similarly:

INEF2 ¼1

3R2

y:2 þ1

6R2

y:12 � R2y:1

� �þ R2

y:23 � R2y:3

� �� þ1

3R2

y:123 � R2y:13

� �ð39bÞ

INEF3 ¼1

3R2

y:3 þ1

6R2

y:23 � R2y:2

� �þ R2

y:13 � R2y:1

� �� þ1

3R2

y:123 � R2y:12

� �ð39cÞ

All incremental net effects (39) are positive and their sum yields the value R2y:123,

which is the total multiple determination R2 for the model with three regressors.In the generalization of incremental estimations for any number of predictors

it is convenient to consider using the Shapley Value – a tool from cooperative gametheory.

3. Shapley Value net effects

The Shapley Value (SV) evaluates the worth of each participant over all possiblecombinations of their coalitions. In the problem of comparative usefulness ofregressors, the SV assigns a value for each predictor calculated over all possiblecombinations of predictors in regression models. In most real-world situations,researchers and practitioners have neither a comprehensive theory, nor completecontrol over all variables used in the regression modelling. While a modeller mayintroduce a set of n predictor variables, there is usually no guarantee that someother additional variables would not appear or be obtained for this problem, or thatall these n variables will always be available as predictors in the model. The ShapleyValue performs a weighted averaging of all variants of predictors in the modeland produces results coinciding with those of the incremental net effects (28), (33),(38), (39).

The Shapley Value is defined as the jth participant’s input to a coalition:

SVj ¼XallM

�nðMÞ vðM [ f jgÞ � vðMÞð Þ ð40Þ

with weights of proportions to enter into a coalition M defined as:

�n Mð Þ ¼m! n�m� 1ð Þ!

n!: ð41Þ


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

In equations (40) and (41) n is the total number of participants, m the number ofparticipants in theMth coalition, M [ f jg is a subset of participants that includes thejth participant,M is a coalition without the jth participant, and v(.) is a characteristicfunction used as a measure of utility for each coalition. In our case, the participantsof a coalition are predictors in the regression. We define the characteristic functionvia the R2 values estimated by the results of regression modelling. For example, thecharacteristic function for the first predictor among all n of them is

vð0Þ ¼ 0, vðx1Þ ¼ R2y:1, vðx1, x2Þ ¼ R2

y:12, . . . vðx1, . . . ,xnÞ ¼ R2y:12...n ð42Þ

where the right-hand side values are the multiple determination coefficients of allpossible subsets of regressors including the first of them. Substituting characteristicfunction (42) into SV expression (40) we see that each difference in parenthesescoincides with the increment Uj defined in (22). This means that SVj for the jthpredictor is a measure of its marginal contribution to the quality of fitting averagedby all the models containing this regressor. The weights (41) of SV imputationcorrespond to the subsequent sizes of subsets used in the series evaluating themarginal increments (28).

For our example with n¼ 3, the weights (41) are

� 0ð Þ ¼ 1=3, � 1ð Þ ¼ 1=6, � 2ð Þ ¼ 1=3 ð43Þ

Substituting (42) and (43) in (40) yields the first predictor’s SV in the explicit form:

SV1 ¼1

3R2

y:1 � 0� �

þ1

6R2

y:12 � R2y:2

� �þ R2

y:13 � R2y:3

� �� þ1

3R2

y:123 � R2y:23

� �ð44Þ

This expression coincides with expression (39a) obtained above using formula (28) toaccount for all marginal increments from the first regressor. The SV for the secondand third predictors coincide with the expressions (39b) and (39c).

Similarly, we can produce the same results both by the iterative application of(28) for the incremental net effects and by using the SV formulae (40) and (41). Thus,we can re-name the incremental net effects the Shapley Value net effects. The total ofall SV net effects equals the coefficient of multiple determination in the model for alln regressors:

Xnj¼1

SVj ¼ R2y:12...n ð45Þ

Shares of SV in the total R2 define contributions of each predictor to the regressionmodel. We can also note that the SV imputation, being a kind of weighted averagedvalue, yields results not prone to the possible presence of multicollinearity among thepredictors.

Regrouping items in (40) with the help of (42), we represent the SV as follows:

SV1 ¼ R2y:1 �

�RR2y:ð1Þ

� �=ðn� 1Þ þ �RR2

y:1� ��RR2y:ð2Þ

� �=ðn� 2Þ þ �RR2

y:1�� RR2y:ð3Þ

� �=ðn� 3Þ

þ � � � þ �RR2y:1�...� �

�RR2y:ðn�1Þ

� �=ðn� ðn� 1ÞÞ þ R2

y:12...n=n: ð46Þ

In the first item of sum (46) we see a difference of R2y:1 for the model by the first

predictor and the mean value �RR2y:ð1Þ (marked by bar over R2) for all the models with

just one predictor (marked by the sub-index with 1 in parentheses). In the seconditem of this sum we see a difference between mean �RR2

y:1� for all the models with two


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

predictors, one of which is x1 (marked by sub-index 1* with asterisk denoting anyother variable x) and mean �RR2

y:ð2Þ for all the models with any two predictors (markedby sub-index 2 in parentheses), etc. The last item corresponds to the share of thepredictor x1 in total R2 of the model with all predictors. All the other SV can bewritten by the similar formula (46) that presents them by the sequent inputs of thesubsets with one, two, three, etc. regressors to the total SV net effects.

4. Numerical example

For a numerical illustration we consider data from a real research project on a studyof customer satisfaction with electrical tools. Data were elicited from 334 respon-dents in terms of Overall Satisfaction and ten independent attribute variables asfollows: x1 – Simple Tools, x2 – Multiple Features, x3 – Look and Feel, x4 – Vision,x5 – Discount Price, x6 – Project Advice, x7 – Warranty and Repair, x8 – ConvenientService, x9 – Superior Performance, x10 – Durable Product. All variables arepositively correlated – correlations r (equation (3)) between the dependent variableare presented in the first numerical column in table 1. The aim of the study was toevaluate the importance of the predictors and to present their contribution to thevariability of the dependent variable graphically on a pie chart. A multiple linearregression plot of customer satisfaction was constructed by these variables – seethe next two columns in table 1 for the original and standardized coefficientsof regression (4). We see that the variable x2, Multiple Features, has a negativecoefficient in the multiple regression, which can be explained by multicollinearity,although it does not help in comparing predictor importance and even less in agraphical representation of predictor shares. The next column, VIF, in table 1 showsvalues of the variance inflation factor (equation (26)), indicating the presence ofsome multicollinearity. The next column presents the net effects (equation (6)), theirtotal equalling the coefficient of multiple determination, R2

¼ 0.732; thus the model isof high quality. Table 1 also shows the net effect percentage shares in the total value

Table 1. Regression model and shares of predictors.

Variable

Paircorrelationwith y

Coefficients ofregression

VIF

Net effectsIncrementalnet effects

Original Standardized ValueShare%

ShapleyValue

Share%

Intercept �18.637Simple Tools 0.364 0.275 0.029 1.293 0.010 1.420 0.021 2.915Multiple Features 0.569 �0.494 �0.046 2.042 �0.026 �3.570 0.050 6.765Look & Feel 0.693 1.277 0.126 2.546 0.087 11.940 0.097 13.207Vision 0.670 1.283 0.122 2.480 0.082 11.210 0.085 11.648Discount Price 0.013 0.751 0.085 1.173 0.001 0.150 0.005 0.669Project Advice 0.516 0.199 0.018 1.943 0.010 1.300 0.043 5.806Warranty 0.671 0.730 0.067 2.562 0.045 6.150 0.083 11.404Convenient 0.559 1.696 0.169 1.838 0.095 12.940 0.064 8.805Performance 0.768 3.499 0.332 3.206 0.255 34.880 0.154 21.071Durable 0.740 2.513 0.233 3.292 0.173 23.580 0.130 17.712

R2 0.732 0.732 0.732 100 0.732 100


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

of the coefficient of multiple determination. Estimated by the regular net effect, thecontribution of the variable x2 is negative, although any additional variable in theregression can only increase the quality of data fitting. More adequate estimationof the incremental net effects performed by the Shapley Value (equation (45)) ispresented in the last part of table 1. We see that all Shapley Value net effects arepositive, and their shares in the total regression quality are positive as well. Forinstance, the input of the attribute x2 (Multiple Features), of course, has a positivecontribution to customer satisfaction with the tools (6.76% by SV share, instead of�3.57% by regular evaluation for the net effect share). So the Shapley Valuedemonstrates interpretable results that can be used without any difficulties, par-ticularly for a graphical representation of regressor importance. In numericalestimations we use the expression (46) to find the sequential inputs to the totalSV. Comparison of these cumulative values allows one to evaluate the stability of theSV by each predictor, and suggests an approach for reducing the computationalburden by limiting the evaluation to those iterations where stability is achieved.

5. Summary

We considered incremental net effects defined via the marginal inputs of eachpredictor to the quality of the regression. We showed that these incrementalestimates coincide with the shares of predictors produced by Shapley Valueimputation, as used in the evaluation of the relative usefulness of participants’strength in cooperative game theory. The results of this approach yield robustestimates of the importance of predictors in the regression, independently of thepresence of multicollinearity among the predictors. These results can be understood,by the specific structure of Shapley Value inputs, as mean values of the regressorshares averaged by all possible models. The incremental SV technique can besuccessfully used in multiple regression analysis, facilitating and enriching itsapplication to numerous practical situations.

References

[1] Cooley, W., and Lohnes, P., 1971, Multivariate Data Analysis (New York: Wiley).[2] Goldberger, A., 1964, Econometrics (New York: Wiley).[3] Harris, R., 1975, A Primer of Multivariate Statistics (New York, London: Academic

Press).[4] Timm, N., 1975, Multivariate Analysis with Applications in Education and Psychology

(Monterey, CA: Brooks/Cole).[5] Weisberg, S., 1985, Applied Linear Regression (New York: Wiley).[6] Lipovetsky, S., and Conklin, M., 2001, Analysis of regression in game theory approach,

Applied Stochastic Models in Business and Industry, 17, 319–330.[7] Lipovetsky, S., and Conklin, M., 2003, A model for considering multicollinearity,

International Journal of Mathematical Education in Science and Technology, 34, 771–777.[8] Lipovetsky, S., and Conklin, M., 2004, Enhance-synergism and suppression effects in

multiple regression, International Journal of Mathematical Education in Science andTechnology, 35, 391–402.

[9] Newton, R.G., and Spurrell, D.J., 1967, A development of multiple regression for theanalysis of routine data, Applied Statistics, Journal of the Royal Statistical Society, Ser. C,16, 51–64, and 165–172.


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

[10] Kruskal, W., 1987, Relative importance by averaging over orderings. The AmericanStatistician, 41, 6–10.

[11] Kruskal, W., and Majors, R., 1989, Concepts of relative importance in recent scientificliterature, The American Statistician, 43, 2–6.

[12] Budescu, D.V., 1993, Dominance analysis: A new approach to the problem of relativeimportance in multiple regression, Psychological Bulletin, 114, 542–551.

[13] Johnson, J.W., 2000, A heuristic method for estimating the relative weight of predictorvariables in multiple regression, Multivariate Behavioral Research, 35, 1–19.

[14] Shapley, L.S., 1953, In: H.W. Kuhn and A.W. Tucker (eds), Contribution to the Theory ofGames, II (Princeton, NJ: Princeton University Press), pp. 307–317.

[15] Luce, R.D., and Raiffa, H., 1958, Games and Decisions (New York: Wiley).[16] Roth, A.E. (ed.), 1988, The Shapley Value – Essays in Honor of Lloyd S. Shapley

(Cambridge: Cambridge University Press).[17] Straffin, P.D., 1993, Game Theory and Strategy (Reston, VA: The Mathematical

Association of America).[18] Owen, G., 1995, Game Theory (Monterey, CA: Monterey Naval School).[19] Jones, A.J., 2000, Game Theory: Mathematical Models of Conflict (Chichester: Horwood

Publishing).[20] Weber, R.J., 1988, In: A.E. Roth (ed.), The Shapley Value – Essays in Honor of Lloyd

S. Shapley (Cambridge: Cambridge University Press), pp. 101–119.[21] Nowak, A.S., and Radzik, T., 1995, On axiomatizations of the weighted Shapley Value,

Games and Economic Behavior, 8, 389–405.[22] Myerson, R.B., 1997, Game Theory: Analysis of Conflict (Cambridge, MA: Harvard

University Press).[23] Bilbao, J.M., 1998, Axioms for the Shapley value on convex geometries, European

Journal of Operational Research, 110, 368–376.[24] Dukhovny, A., 2002, General entropy of general measures, International Journal of

Uncertainty, Fuzziness and Knowledge-Based Systems, 10, 213–225.[25] Conklin, M., and Lipovetsky, S., 2000, A winning tool for CPG, Marketing Research,

11(4), 23–27.[26] Conklin, M., Powaga, K., and Lipovetsky, S., 2004, Customer satisfaction analysis:

identification of key drivers, European Journal of Operational Research, 154, 819–827.[27] Rao, C.R., 1965, Linear Statistical Inference and Its Applications (New York: Wiley).[28] Yule, G.U., and Kendall, M.G., 1950, An Introduction to the Theory of Statistics London:

Griffin).[29] Kendall, M.G., and Stuart, A., 1973, The Advanced Theory of Statistics, Vol. 2

(New York: Hafner).[30] Darlington, R., 1968, Multiple regression in psychological research and practice,

Psychological Bulletin, 69, 161–182.[31] Marquardt, D., 1970, Generalized inverses, ridge regression and biased linear estimation,

Technometrics, 12, 591–612.


Dow

nloa

ded

by [

Nip

issi

ng U

nive

rsity

] at

03:

59 0

6 O

ctob

er 2

014

incremental net effects in multiple regression

Documents