reliability, validity, generalizability and the use of multi-item scales

Reliability, validity,

generalizability and the use of

multi-item scales

Edward Shiu (Dept of Marketing)

edward.shiu@strath.ac.uk

Reliable? Valid?

Generalizable?

Multi-item scales

How to use a questionnaire from

published work

• Appendix with items

• Methodology section

Existing multi-item scales

• Used by many

• Reliability and validity may be known

• Good starting block

• Basis to compare / contrast results

Development of a Multi-item Scale (Doing it the HARD way!! See Malhotra & Birks, 2007)

Develop Theory

Generate Initial Pool of Items: Theory, Secondary Data, and Qualitative Research

Collect Data from a Large Pretest Sample

Statistical Analysis

Develop Purified Scale

Collect More Data from a Different Sample

Final Scale

Select a Reduced Set of Items Based on Qualitative Judgment

Evaluate Scale Reliability, Validity, and Generalizability

Example of Scale Development

• See Richins & Dawson (1992) “A Consumer

Values Orientation for Materialism and its

Measurement: Scale Development and

Validation,” Journal of Consumer Research, 19

(December), 303-316.

• Materialism scale (7 items)

– Marketing Scales Handbook (Vol IV) p. 352.

1. It is important to me to have really nice things.

2. I would like to be rich enough to buy anything I want.

3. I‟d be happier if I could afford to buy more things.

4. ......

• Note, published scales not always perfect!!!

Scale Evaluation (See Malhotra & Birks, 2007)

Discriminant Nomological Convergent

Test/ Retest

Alternative Forms

Internal Consistency

Content Criterion Construct

Generalizability Reliability Validity

Scale Evaluation

Reliability & Validity

• Reliability - extent a measuring

procedure yields consistent results on

repeated administrations of the scale

• Validity - degree a measuring

procedure accurately reflects or assesses

or captures the specific concept that the

researcher is attempting to measure

Reliable Valid

Reliability • Internal consistency reliability

DO THE ITEMS IN THE SCALE GEL WELL TOGETHER

• Split-half reliability, the items on the scale are divided

into two halves and the resulting half scores are

correlated

• Cronbach alpha (α)

– average of all possible „split-half‟ correlation coefficients resulting

from different ways of splitting the scale items

– value varies from 0 to 1

– α < 0.6 indicates unsatisfactory internal consistency reliability

(see Malhotra & Birks, 2007, p.358)

– Note: alpha tends to increase with an increase in the number of

items in scale

• test-retest reliability – identical scale items administered at two different

times to same set of respondents

– assess (via correlation) if respondents give similar answers

• alternative-forms reliability – two equivalent forms of the scale are constructed

– same respondents are measured at two different times, with a different form being used each time

– assess (via correlation) if respondents give similar answers

– Note. Hardly ever practical

Construct Validity

• Construct validity is evidenced if we can establish – convergent validity, discriminant validity and nomological validity

• Convergent validity extent to which scale correlates positively with other measures of the same construct

• Discriminant validity extent to which scale does not correlate with other conceptually distinct constructs

• Nomological validity extent to which scale correlates in theoretically predicted ways with other distinct but related constructs.

• Also read Malhotra & Birks, 2007, 358-359 on – content (or face) validity, criterion (concurrent & predictive)

validity

Generalizability

• Refers to extent you can generalise from

your specific observations to beyond your

limited study, situation, items used,

method of administration, context.....

• Hardly even possible!!!

Fun time

• Now onto the data (COCB.sav) !!!!!!

• Read my forthcoming JBR article for

background on COCB and the scale

• 1st SPSS and Cronbach alpha

• Next, Amos and CFA

• Followed by Excel to calculate

composite/construct reliability and AVE, as

well as establish discriminant validity

Cronbach alpha (α)

• SPSS (Analyze…Scale…Reliability Analysis)

• α < 0.6 indicates unsatisfactory internal consistency reliability (see Malhotra & Birks, 2007, p.358)

• α > 0.7 indicates satisfactory internal consistency reliability (Nunnally & Berstein,1994)

Ref: Nunnally JC & Berstein IH. (1994) Psychometric Theory. New York: McGraw-Hill.

SPSS output for α

Alpha value for dimension Credibility = 0.894 > 0.7 hence satisfactory

SPSS further output for α

• We note that alpha value for the Credibility dimension would increase in value (from 0.894 to 0.902) if item cred4 is removed.

• However, unless the improvement is dramatic AND there is separate reasons (e.g. similar findings from other studies), then we should leave the item as part of the dimension.

Limitations for Cronbach alpha

• We should employ multiple measures of

reliability (Cronbach alpha, composite/construct

reliability CR & Average Variance Extracted

– Alpha and CR values often are very similar

but AVE‟s can vary much more from alpha

values

– AVE‟s are also used to assess construct

discriminant validity

Composite/Construct Reliability • CR = {(sum of standardized loadings)2} / {(sum of

standardized loadings)2 + (sum of indicator measurement errors)}

• AVE = Average Variance Extracted = Variance Extracted

= {sum of (standardzied loadings squared)} / {[sum of (standardzied loadings squared)] + (sum of indicator measurement errors)}

• Note: Recommended thresholds: CR > 0.6 & AVE > 0.5, then construct internal consistency is evidenced (Fornell & Larker, 1981).

Ref: Fornell, Claes and David G. Larcker (1981). “Evaluating Structural

Equation Models with Unobservable Variables and Measurement Error,” Journal of Marketing Research, 18(1, February): 39-50.

Discriminant validity

• Discriminant validity is assessed by comparing

the shared variance (squared correlation)

between each pair of constructs against the

minimum of the AVEs for these two constructs.

• If within each possible pairs of constructs, the

shared variance observed is lower than the

minimum of their AVEs, then discriminant validity

is evidenced (Fornell and Larker, 1981).

Amos (Analysis of Moment Structures)

Commcomm2e2

comm1e3 11

bene3e4

bene2e5

bene1e6

cred3e8

cred2e9

cred1e10

cred4e11

ave_SSI e12

ave_POC e13

ave_Voice e14

ave_wom e15

ave_BAoSF e161

ave_DoRA e171

ave_Flex e181

ave_PiFA e191

Loyalty

Rectangles

= observed variables

Ellipses

= unobserved variables

loy1; loy2; loy3; comm1;

comm2;….; cred1; ….

bene1;....;ave_PiFA

= SPSS variables

e1 to e24

= error variances

= uniqueness

Loyalty; Comm; Cred;

Bene; COCB

= latent factors

= unobserved factors

CFA and goodness of fit

• See Hair et al.‟s book

• E.g.,

• The CFA resulted in an acceptable overall fit

(GFI=.90, CFI=.94, TLI=.92, RMSEA=.068, and

χ2=524.64, df=160, p<.001). All indicators load

significantly (p<.001) and substantively

(standardized coef >.5) on to their respective

constructs; thus providing evidence of

convergent validity.

• Baumgartner H, Homburg C. (1996). “Applications of structural

equation modeling in marketing and consumer research: a review,”

International Journal of Research in Marketing,13(2):139–61.

• Churchill, Gilbert A., Jr. (1979). “A Paradigm for Developing Better

Measures of Marketing Constructs,” Journal of Marketing Research,

16(1, February): 64-73.

• Fornell, Claes and David G. Larcker (1981). “Evaluating Structural

Equation Models with Unobservable Variables and Measurement

Error,” Journal of Marketing Research, 18(1, February): 39-50.

• Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and

William C. Black (1998), Multivariate Data Analysis. 5th ed.

Englewood Cliffs, NJ: Prentice Hall.

• Nunnally JC & Berstein IH. (1994) Psychometric Theory. New York:

McGraw-Hill.

reliability, validity, generalizability and the use of multi-item scales

standardzied

malhotra amp

indicatormeasurement

cronbach alpha

reliability

spss

scale

data

Technology

internal and external validity - san jose state...

(2011). validity of the pai interpersonal scales

be termed “authentic geisinger, shaw, and mccormick (this...

reliability and validity of the sound relationship house...

math candel maastricht university. 1.internal validity do...

improving performance and generalizability in

reliability and validity of the gottman sound relationship...

generalizability and transferability

i toefl ibt nsight tm research - educational testing service...

generalizability theory (gt)

investigating the structural validity of ryff’s...

reliability coefﬁcients and generalizability...

lecture 6: reliability and validity of scales (cont)

generalizability theory

types of scales & validity, reliability & sensitivity of ......

an assessment of the construct validity of...

predictive validity of select scales of the mmpi-a · pdf...

negation’s not solved: generalizability versus

survey validity and reliability study · pdf...

the validity, generalizability and feasibility of