the self-weighting model

8
This article was downloaded by: [UQ Library] On: 05 November 2014, At: 18:34 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Communications in Statistics - Theory and Methods Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20 The Self-Weighting Model Edel Garcia a a Mi Islita.com , Bayamon , Puerto Rico Published online: 12 Mar 2012. To cite this article: Edel Garcia (2012) The Self-Weighting Model, Communications in Statistics - Theory and Methods, 41:8, 1421-1427, DOI: 10.1080/03610926.2011.654037 To link to this article: http://dx.doi.org/10.1080/03610926.2011.654037 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: edel

Post on 12-Mar-2017

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Self-Weighting Model

This article was downloaded by: [UQ Library]On: 05 November 2014, At: 18:34Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lsta20

The Self-Weighting ModelEdel Garcia aa Mi Islita.com , Bayamon , Puerto RicoPublished online: 12 Mar 2012.

To cite this article: Edel Garcia (2012) The Self-Weighting Model, Communications in Statistics - Theory and Methods, 41:8,1421-1427, DOI: 10.1080/03610926.2011.654037

To link to this article: http://dx.doi.org/10.1080/03610926.2011.654037

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: The Self-Weighting Model

Communications in Statistics—Theory and Methods, 41: 1421–1427, 2012Copyright © Taylor & Francis Group, LLCISSN: 0361-0926 print/1532-415X onlineDOI: 10.1080/03610926.2011.654037

The Self-WeightingModel

EDEL GARCIA

Mi Islita.com, Bayamon, Puerto Rico

In this brief article, we present the Self-Weighting Model (SWM), a new weightingmodel for statistical analysis. SWM allows within/between-set comparisons,producing estimates with a discriminatory power not found through currentweighting strategies. The model is applicable to a wide range of statistical problemsfor which conditional weighted means are required.

Keywords Meta analysis; Statistical analysis; Weighting strategies.

Mathematics Subject Classification Primary 62-07; Secondary 62G05.

1. Introduction

Pearson’s product-moment correlation coefficient, r, can be defined as thecovariance between two variables (x� y� normalized by their standard deviations(Rodgers and Nicewander, 1988):

r = covxysxsy

(1)

Although covariances and variances are additive, correlations coefficients andstandard deviations are not. Furthermore, from Eq. (1) is evident that since the sxsyproduct is specific to a sample, any two r values are dissimilar ratios and such typesof ratios are not additive. Therefore, computing an arithmetic mean from k numberof correlations, r = �1/k�

∑kj rj , is not possible.

The purpose of this article is to present a solution to the problem of averagingcorrelations and other types of statistics frequently encountered in statisticalanalysis. This is accomplished with a new model for analysis: The Self-WeightingModel.

The discussion is organized as follows. In Sec. 2, a background on previoustechniques is presented. In Sec. 3, we derive our model. Next, illustrative examplesare described in Sec. 4. In Sec. 5, results from the proposed model are compared

Received February 8, 2011; Accepted December 27, 2011Address correspondence to Edel Garcia, Minerazzi Project, Microsoft Innovation

Center, Inter American University of Puerto Rico, Metropolitan Campus; Road 1, Km16.3; Corner Francisco Sein Street, Rio Piedras 00919; 787-250-1912 X-2027; PR; E-mail:[email protected]

1421

Dow

nloa

ded

by [

UQ

Lib

rary

] at

18:

34 0

5 N

ovem

ber

2014

Page 3: The Self-Weighting Model

1422 Garcia

against current weighting strategies. Since this is a brief communication, limitedresults are provided. Finally, in Sec. 6, we present our conclusions and suggestpossible areas for future research work.

2. Background

To overcome the problem of averaging correlations, several strategies have beenproposed. For instance, a vendor of statistical software (Statsoft, 2011) hassuggested converting r values into coefficients of determinations, Rj = r2j , whichare additive, or into Fisher’s Z scores, Zj = 0�5 ln��1+ rj�/�1− rj��, which are alsoadditive. The latter is known as Fisher’s Z Transformation (Fisher, 1921). Once eitherapproach is adopted, averages of the form R = �1/k�

∑kj Rj and r =

√R, or of the

form Z = �1/k�∑k

j Zj and r = �eZj − e−Zj �/�eZj + e−Zj � are computed (Silver andDunlap, 1987; Zsak, 2006).

Other weighting strategies are found in the meta-analysis literature, specificallywhen correlations are taken for effect sizes. For instance, in Hunter-Schmidt’smodel (Hunter and Schmidt, 2000), a weighted mean of the form r = ∑k

j njrj/∑k

j nj

is computed while in the Hedges-Olkin’s fixed effect model (Hedges and Olkin,1985; Field, 2001), a weighted Z score of the form Z = ∑k

j �nj − 3�Zj/∑k

j �nj − 3� iscomputed and, if needed, transformed back into a correlation score. In both models,nj is the sample size, which is given in terms of the size of the x − y dataset.

In spite of their success, these two meta-analysis models are not free fromdrawbacks. For instance if a constant sample size is used during a meta analysis,these models return the arithmetic means of the corresponding r and Z scores! Sincecorrelations are not additive, studies based on arithmetically averaged correlationscan be challenged. As noted by Field (2003), arithmetic means from correlationswith mixed signs and of same sample size can be misleading.

Moreover, Zimmerman et al. (2003) showed that arbitrarily applying Fisher’sZ Transformation to correlations, especially from distributions that violate bivariatenormality, can lead to spurious results. To overcome all these drawbacks, wepropose a new weighting strategy that we call the Self-Weighting Model (SWM).

3. Derivation

3.1. Initial Approach

The procedure adopted in SWM is a straightforward one. First, the statistic to beaveraged and its own constituent statistical terms are identified. Next, local andglobal weights are constructed from constituent terms. These weights are then usedto compute weighted means.

To illustrate, consider Eq. (1). Pearson’s r is a statistic that consists ofthree constituent statistics: covxy, sx, and sy. Let denotes this by writing m = 3.Therefore, there are at least 2m − 1 = 7 ways of defining a local weight, w, fromthe constituent statistics of r. The possible weights that multiply rare: sy, sx, sxsy,1/covxy, sy/covxy, sx/covxy, and �sxsy�/covxy = 1/r. Similarly, there are at least 2m − 1ways of computing a global weight, g, from a set of r values and, theoretically, equalnumber of weighted means that can be constructed from this set.

Let assume for now that w = sy. Multiplying Eq. (1) by this weight, squaringthe result, taking summations from j to k, multiplying by a global weight [defined

Dow

nloa

ded

by [

UQ

Lib

rary

] at

18:

34 0

5 N

ovem

ber

2014

Page 4: The Self-Weighting Model

The Self-Weighting Model 1423

in this example as g = 1/∑k

j �wj�2�, and taking the square root of the result leads to

a weighted mean that turns out to be equivalent to a root mean square ratio (rms),

r =[∑k

j �syj�2�rj�

2

∑kj �syj�

2

]1/2

=√∑k

j �syj �2�rj �

2

k√∑kj �syj �

2

k

� (2)

Applying a similar procedure to a coefficient of variations of the form cvx = sx/x ,where w = x, it can be demonstrated that

cvx =[∑k

j �sxj�2∑k

j �xj�2

]1/2

=√∑k

j �sxj �2

k√∑kj �xj �

2

k

(3)

which also reduces to an rms ratio. At first glance, this may look like an arbitraryheuristic approach. Therefore, a formal description of the model is presented in thefollowing sections.

3.2. Systematic Derivation of the Model

Let S = �Y1� Y2� � � � Yk� be a set of k statistics, where Yj consists of m number ofconstituent statistics

Yj = f�X1j� X2j

� � � � Xmj�� (4)

Assume that wj is a local weight defined in terms of the constituent statistics ofYj . Multiplying Eq. (4) by wj ,

wjYj = wjf�X1j� X2j

� � � � Xmj�� (5)

Raising the result to a power and taking summations, from j to k,

k∑j

�wj�p�Yj�

p =k∑j

�wj�p�f�X1j

� X2j� � � � Xmj

��p (6)

Assume now that g = 1/∑k

j �wj�p defines a global weight over a given set, S.

Since any two sets, S1 and S2, have their own global weights, incorporating theseweights into the model should allow comparisons between the sets. So, multiplyingEq. (6) by g = 1/

∑kj �wj�

p and taking the pth root leads to a statistic that we shallcall the self-weighted power mean, M�p�w�

M�p�w� =[∑k

j �wj�p�Yj�

p∑kj �wj�

p

]1/p

={∑k

j �wj�p�f�X1j

� X2j� � � � Xmj

��p∑kj �wj�

p

}1/p

� (7)

where

cj =�wj�

p�Yj�p∑k

j �wj�p�Yj�

p(8)

Dow

nloa

ded

by [

UQ

Lib

rary

] at

18:

34 0

5 N

ovem

ber

2014

Page 5: The Self-Weighting Model

1424 Garcia

is the contribution of each weighted Y to the M�p�w� of a given set, S. Thus, within-set comparisons are also possible. Several examples are provided in the next section.

4. Illustrative Examples

Tables 1 and 2 list the family of M�p�w� expressions that can be derived for aPearson correlation coefficients and a coefficient of variation.

When p = 1, a family of self-weighted means is obtained. When p = 2, the resultis a family of self-weighted means that, as demonstrated before, are equivalent to

Table 1The M�p�w� family of Pearson’s correlation, where

Y = r = covxy/�sxsy�

w M�p�w�

sy M�p�w� =[∑k

j �syj �p�rj �

p∑kj �syj �

p

]1/p

=[∑k

j �covxyj/sxj �p∑k

j �syj �p

]1/p

sx M�p�w� =[∑k

j �sxj �p�rj �

p∑kj �sxj �

p

]1/p=

[∑kj �covxyj/syj �

p∑kj �sxj �

p

]1/p

sxsy M�p�w� =[∑k

j �sxj syj �p�rj �

p∑kj �sxj syj �

p

]1/p

=[∑k

j �covxyj �p∑k

j �sxj syj �p

]1/p

1/covxy M�p�w� =[∑k

j �1/covxyj �p�rj �

p∑kj �1/covxyj �

p

]1/p

=[∑k

j �1/sxj �p�1/syj �

p∑kj �1/covxyj �

p

]1/p

sy/covxy M�p�w� =[∑k

j �syj/covxyj�p�rj �

p

∑kj �syj/covxyj

�p

]1/p

=[ ∑k

j �1/sxj �p∑k

j �syj/covxyj�p

]1/p

sx/covxy M�p�w� =[∑k

j �sxj/covxyj �p�rj �

p∑kj �sxj/covxyj �

p

]1/p

=[ ∑k

j �1/syj �p∑k

j �sxj/covxyj �p

]1/p

1/r M�p�w� =[∑k

j �1/rj �p�rj �

p∑kj �1/rj �

p

]1/p=

[k∑k

j �1/rj �p

]1/p

Table 2The M�p�w� family of a coefficient of variation, where Y = cvx = sx/x

w M�p�w�

x M�p�w� =[∑k

j �xj �p�cvxj �

p∑kj �xj �

p

]1/p=

[∑kj �sxj �

p∑kj �xj �

p

]1/p

1/sx M�p�w� =[∑k

j �1/sxj �p�cvxj �

p∑kj �1/sxj �

p

]1/p=

[ ∑kj �1/xj�

p∑kj �1/sxj �

p

]1/p

1/cvx M�p�w� =[∑k

j �1/cvxj �p�cvxj �

p∑kj �1/cvxj �

p

]1/p=

[k∑k

j �1/cvxj �p

]1/p

Dow

nloa

ded

by [

UQ

Lib

rary

] at

18:

34 0

5 N

ovem

ber

2014

Page 6: The Self-Weighting Model

The Self-Weighting Model 1425

rms ratios. When p = 3, a family of asymmetric estimates (in theory, applicable todatasets with mixed signs) is obtained, and so on.

It can also be demonstrated that if k number of wjYj values form a vector, wY,

with p-norm equal to �∑k

j �wjYj�p�1/p and k number of wj values form another vector,

w, with p-norm equal to �∑k

j �wj�p�1/p, then whenp =2 an M�p�w� is equivalentto anL2 norm ratio. Similarly, when p =1 and all wjYj and wj are real positivequantities, M�p�w� is equivalent to anL1 norm ratio; see Eqs. (2), (3), and (7).

4.1. Unfeasible Solutions

Depending on the nature of the statistics involved, some self-weighted power meansmight not be statistically feasible. For example, in Tables 1 and 2 the first twoM�p�w� expressions are valid solutions when p = 2, but not when p = 1. Thereason is that when p = 1 these expressions involve additions of standard deviationswhich are not additive quantities. Thus, before modeling with SWM, one must payattention to the nature of the statistical constituents to be used as the buildingblocks of candidate M�p�w� expressions.

5. Applications

A practical application of our model follows. Since this is a brief communication,the discussion is limited to the first M�p�w� expression of Table 1.

Setting p = 2, and squaring the result yields an expression equivalent to theweighted average coefficient of determination derived by Faller (1981, 1982) andGlahn (1982), i.e.,

r2 =∑k

j �syj�2�rj�

2

∑kj �syj�

2 � (9)

To understand how SWM compares with the weighting strategies mentionedearlier, consider the two sets, S1 and S2, given in Table 3 and adapted from Faller’sarticle. Assume that all four samples are of same size. Hunter-Schmidt’s modelreduces to computing an arithmetic mean correlation of 0.64 for both sets. Bycontrast, Hedges-Olkin’s fixed effect model returns an average mean Z score of 1.07,which when is Z-to-r transformed returns a correlation of 0.79 for both sets. Thatis, both meta-analysis models fail to discriminate between the two sets.

Computing an average correlation of the form r =√R, where R = �1/k�

∑kj Rj

and Rj = r2j , as suggested by Statsoft (2011), would not help either since thisapproach returns R = 0�50 and r = 0�71 for both sets. The reason as to why all theseweighting strategies fail to discriminate between the sets can be ascribed to the factthat these do not incorporate variability information present in the original datasets.

By contrast, SWM incorporates the missing piece of information through thelocal and global weights. Thus, in Table 3, SWM returns a self-weighted meancorrelation of 0.91 for S1 and of 0.42 for S2. Therefore, between-set comparisons arepossible.

Within-set comparisons are also possible. In S1, c1 = 0�99 and c2 = 0�01,meaning that the first sample influences more the self-weighted mean correlation ofthe set than the second sample. In S2, the difference in contributions from individual

Dow

nloa

ded

by [

UQ

Lib

rary

] at

18:

34 0

5 N

ovem

ber

2014

Page 7: The Self-Weighting Model

1426 Garcia

Table 3SWM vs. meta-analysis results for two sets of correlation coefficients at the same

sample size level

Data

SWMM�p�w� = r,p = 2, w = sy

j syj2 rj

2 rj Zj cj r2 r

Hunter-Schmidtr =∑k

j njrj/∑k

j nj

Hedges-OlkinZ = ∑k

j �nj −3�Zj/

∑kj �nj − 3�

S1 1 1.00 0.90 0.95 1.82 0.99 0.83 0.91 0.64 Z = 1�07 r = 0�792 0.10 0.10 0.32 0.33 0.01

S2 1 1.00 0.10 0.32 0.33 0.53 0.17 0.42 0.64 Z = 1�07 r = 0�792 0.10 0.90 0.95 1.82 0.47

samples to the self-weighted mean is now smaller: c1 = 0�53 vs. c2 = 0�47. Theseresults agree with those of Faller (1981). Accordingly, Faller’s transformation is aparticular solution of the SWM framework.

6. Conclusion

The Self-Weighting Model and a new measure, the self-weighted power mean,M�p�w�, have been presented and compared with current meta-analysis models.We have shown that Fisher, Hunter-Schmidt, and Hedges-Olkin transformationscan fail to discriminate between correlations cases. Furthermore, when a constantsample size is used, Hunter-Schmidt’s model returns an arithmetic average. Studiesbased on such averages can be challenged on the grounds that correlations are notadditive and, therefore, that the corresponding means are invalid statistics.

The model herein presented is not limited to correlation coefficients orcoefficients of variations. In our opinion, the model can be incorporated into currentmeta-analysis strategies and software as a new discriminatory layer for statisticalmodeling. We are currently testing the model with business and risk data and withmultivariate statistics wherein matrices can be populated with M�p�w� scores.

If p = 1 and all wjYj and wj are real positive quantities, M�p�w� is equivalentto an L1 norm ratio. On the other hand, when p = 2, M�p�w� is equivalent to anrms ratio and L2 norm ratio. Establishing the meaning of M�p�w� for higher pvalues requires further studies. During revision of this article, we realized from thelast row of Tables 1 and 2 that when p = 1� wj = 1/Yj , and

∑kj �wj�

p�Yj�p = k the

M�p�w� statistic reduces to the harmonic mean, a statistic that arises frequently inengineering and science. All these findings suggest that SWM could be used as aframework for a broad range of engineering, science, and data mining problems, notnecessarily involving the constituent statistics herein discussed.

References

Faller, A. J. (1981). An average correlation coefficient. J. Appl. Meteorol. 20:203–205.Faller, A. J. (1982). Reply. J. Appl. Meteoro. 21:1203–1205.Field, A. P. (2001). Meta-analysis of correlation coefficients: A Monte Carlo comparison of

fixed- and random-effects methods. Psycholog. Meth. 6-2:161–180.

Dow

nloa

ded

by [

UQ

Lib

rary

] at

18:

34 0

5 N

ovem

ber

2014

Page 8: The Self-Weighting Model

The Self-Weighting Model 1427

Field, A. P. (2003). Can meta-analysis be trusted? Psychologist 16:642–645.Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a

small sample. Metron 1:3–32.Glahn, H. R. (1982). Comments on An average correlation coefficient. J. Appl. Meteorol.

21:1202–1203.Hedges, L. V., Olkin, I. (1985). Statistical Methods for Meta-analysis. Orlando: Academic

Press.Hunter, J. E., Schmidt, F. L. (2000). Fixed effects vs. random effects meta-analysis models:

Implications for cumulative knowledge in psychology. Int. J. Select. Assess. 8:275–292.Rodgers, J. L., Nicewander, W. A. (1988). Thirteen ways to look at the correlation

coefficient. Amer. Statistician 42(1):59–66.Silver, N. C., Dunlap, W. P. (1987). Averaging correlation coefficients: should Fisher’s Z

Transformation be used? J. Appl. Psychol. 72(1):146–148.Statsoft Textbook. (2011). Basic Statistics. Retrieved from http://www.statsoft.com/textbook

/basic-statistics/#CorrelationsoZimmerman, D. W., Zumbo, B. D., Williams, R. H. (2003). Bias in estimation and hypothesis

testing of correlation. Psicológica 24:133–158.Zsak, M. I. (2006). Decision Support for Energy Technology Investments in Built Environment.

Master thesis, p. 54 and Appendix 5. Norwegian University of Science andTechnology. Retrieved from http://www.iot.ntnu.no/users/fleten/students/tidligere_veiledning/Zsak_V06.pdf

Dow

nloa

ded

by [

UQ

Lib

rary

] at

18:

34 0

5 N

ovem

ber

2014