dictionary of statistics

199
The SAGE Dictionary of Statistics Duncan Cramer and Dennis Howitt

Upload: remas-mohamed

Post on 27-Jan-2015

137 views

Category:

Economy & Finance


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Dictionary of statistics

The SAGE Dictionary of

Statistics

The SAGEDictionary

of Duncan Cram

er and Dennis Howitt

Statistics

Duncan Cramer and Dennis Howitt

SAGE

Page 2: Dictionary of statistics

The SAGE Dictionary of Statistics

Cramer-Prelims.qxd 4/22/04 2:09 PM Page i

Page 3: Dictionary of statistics

Cramer-Prelims.qxd 4/22/04 2:09 PM Page ii

Page 4: Dictionary of statistics

The SAGE Dictionary of Statistics

a practical resource for students

in the social sciences

Duncan Cramer and Dennis Howitt

SAGE PublicationsLondon ● Thousand Oaks ● New Delhi

Cramer-Prelims.qxd 4/22/04 2:09 PM Page iii

Page 5: Dictionary of statistics

© Duncan Cramer and Dennis Howitt 2004

First published 2004

Apart from any fair dealing for the purposes of research orprivate study, or criticism or review, as permitted underthe Copyright, Designs and Patents Act, 1988, this publicationmay be reproduced, stored or transmitted in any form, or byany means, only with the prior permission in writing of thepublishers, or in the case of reprographic reproduction, inaccordance with the terms of licences issued by theCopyright Licensing Agency. Inquiries concerningreproduction outside those terms should be sent tothe publishers.

SAGE Publications Ltd1 Oliver’s Yard55 City RoadLondon EC1Y 1SP

SAGE Publications Inc.2455 Teller RoadThousand Oaks, California 91320

SAGE Publications India Pvt LtdB-42, Panchsheel EnclavePost Box 4109New Delhi 110 017

British Library Cataloguing in Publication data

A catalogue record for this book is availablefrom the British Library

ISBN 0 7619 4137 1ISBN 0 7619 4138 X (pbk)

Library of Congress Control Number: 2003115348

Typeset by C&M Digitals (P) Ltd.Printed in Great Britain by The Cromwell Press Ltd,Trowbridge, Wiltshire

Cramer-Prelims.qxd 4/22/04 2:09 PM Page iv

Page 6: Dictionary of statistics

Contents

Preface viiSome Common Statistical Notation ix

A to Z 1–186

Some Useful Sources 187

Cramer-Prelims.qxd 4/22/04 2:09 PM Page v

Page 7: Dictionary of statistics

To our mothers – it is not their fault that lexicography took its toll.

Cramer-Prelims.qxd 4/22/04 2:09 PM Page vi

Page 8: Dictionary of statistics

Preface

Writing a dictionary of statistics is not many people’s idea of fun. And it wasn’t ours.Can we say that we have changed our minds about this at all? No. Nevertheless, nowthe reading and writing is over and those heavy books have gone back to the library,we are glad that we wrote it. Otherwise we would have had to buy it. The dictionaryprovides a valuable resource for students – and anyone else with too little time ontheir hands to stack their shelves with scores of specialist statistics textbooks.

Writing a dictionary of statistics is one thing – writing a practical dictionary of sta-tistics is another. The entries had to be useful, not merely accurate. Accuracy is not thatuseful on its own. One aspect of the practicality of this dictionary is in facilitating thelearning of statistical techniques and concepts. The dictionary is not intended to standalone as a textbook – there are plenty of those. We hope that it will be more importantthan that. Perhaps only the computer is more useful. Learning statistics is a complexbusiness. Inevitably, students at some stage need to supplement their textbook. A tripto the library or the statistics lecturer’s office is daunting. Getting a statistics dictio-nary from the shelf is the lesser evil. And just look at the statistics textbook next to it –you probably outgrew its usefulness when you finished the first year at university.

Few readers, not even ourselves, will ever use all of the entries in this dictionary.That would be a bit like stamp collecting. Nevertheless, all of the important things arehere in a compact and accessible form for when they are needed. No doubt there areomissions but even The Collected Works of Shakespeare leaves out Pygmalion! Let us knowof any. And we are not so clever that we will not have made mistakes. Let us know ifyou spot any of these too – modern publishing methods sometimes allow correctionswithout a major reprint.

Many of the key terms used to describe statistical concepts are included as entrieselsewhere. Where we thought it useful we have suggested other entries that arerelated to the entry that might be of interest by listing them at the end of the entryunder ‘See’ or ‘See also’. In the main body of the entry itself we have not drawnattention to the terms that are covered elsewhere because we thought this could betoo distracting to many readers. If you are unfamiliar with a term we suggest youlook it up.

Many of the terms described will be found in introductory textbooks on statistics.We suggest that if you want further information on a particular concept you look it upin a textbook that is ready to hand. There are a large number of introductory statistics

Cramer-Prelims.qxd 4/22/04 2:09 PM Page vii

Page 9: Dictionary of statistics

texts that adequately discuss these terms and we would not want you to seek out aparticular text that we have selected that is not readily available to you. For the lesscommon terms we have recommended one or more sources for additional reading.The authors and year of publication for these sources are given at the end of the entryand full details of the sources are provided at the end of the book. As we have dis-cussed some of these terms in texts that we have written, we have sometimesrecommended our own texts!

The key features of the dictionary are:

• Compact and detailed descriptions of key concepts.• Basic mathematical concepts explained.• Details of procedures for hand calculations if possible. • Difficulty level matched to the nature of the entry: very fundamental concepts are

the most simply explained; more advanced statistics are given a slightly moresophisticated treatment.

• Practical advice to help guide users through some of the difficulties of the applica-tion of statistics.

• Exceptionally wide coverage and varied range of concepts, issues and procedures –wider than any single textbook by far.

• Coverage of relevant research methods.• Compatible with standard statistical packages.• Extensive cross-referencing.• Useful additional reading.

One good thing, we guess, is that since this statistics dictionary would be hard to dis-tinguish from a two-author encyclopaedia of statistics, we will not need to write oneourselves.

Duncan CramerDennis Howitt

THE SAGE DICTIONARY OF STATISTICSviii

Cramer-Prelims.qxd 4/22/04 2:09 PM Page viii

Page 10: Dictionary of statistics

Some CommonStatistical Notation

Roman letter symbols or abbreviations:

a constantdf degrees of freedomF F testlog n natural or Napierian logarithmM arithmetic meanMS mean squaren or N number of cases in a samplep probabilityr Pearson’s correlation coefficientR multiple correlationSD standard deviationSS sum of squarest t test

Greek letter symbols:

� (lower case alpha) Cronbach’s alpha reliability, significance level or alpha error� (lower case beta) regression coefficient, beta error� (lower case gamma)� (lower case delta)� (lower case eta)� (lower case kappa)� (lower case lambda)� (lower case rho) (lower case tau) (lower case phi)� (lower case chi)

Cramer-Prelims.qxd 4/22/04 2:09 PM Page ix

Page 11: Dictionary of statistics

Some common mathematical symbols:

� sum of� infinity equal to� less than� less than or equal to� greater than� greater than or equal to�� square root

THE SAGE DICTIONARY OF STATISTICSx

Cramer-Prelims.qxd 4/22/04 2:09 PM Page x

Page 12: Dictionary of statistics

A

a posteriori tests: see post hoc tests

a priori comparisons or tests: wherethere are three or more means that may becompared (e.g. analysis of variance withthree groups), one strategy is to plan theanalysis in advance of collecting the data (orexamining them). So, in this context, a priorimeans before the data analysis. (Obviouslythis would only apply if the researcher wasnot the data collector, otherwise it is inadvance of collecting the data.) This is impor-tant because the process of deciding whatgroups are to be compared should be on thebasis of the hypotheses underlying the plan-ning of the research. By definition, this impliesthat the researcher is generally disinterested ingeneral or trivial aspects of the data which arenot the researcher’s primary focus. As a conse-quence, just a few of the possible comparisonsare needed to be made as these contain thecrucial information relative to the researcher’sinterests. Table A.1 involves a simple ANOVAdesign in which there are four conditions –two are drug treatments and there are twocontrol conditions. There are two control con-ditions because in one case the placebo tabletis for drug A and in the other case the placebotablet is for drug B.

An appropriate a priori comparison strategyin this case would be:

• Meana against Meanb

• Meana against Meanc

• Meanb against Meand

Notice that this is fewer than the maximumnumber of comparisons that could be made(a total of six). This is because the researcherhas ignored issues which perhaps are of littlepractical concern in terms of evaluatingthe effectiveness of the different drugs. Forexample, comparing placebo control A withplacebo control B answers questions aboutthe relative effectiveness of the placebo con-ditions but has no bearing on which drug isthe most effective overall.

The a priori approach needs to be com-pared with perhaps the more typical alterna-tive research scenario – post hoc comparisons.The latter involves an unplanned analysis ofthe data following their collection. While thismay be a perfectly adequate process, it isnevertheless far less clearly linked with theestablished priorities of the research than apriori comparisons. In post hoc testing, theretends to be an exhaustive examination of allof the possible pairs of means – so in theexample in Table A.1 all four means would becompared with each other in pairs. This givesa total of six different comparisons.

In a priori testing, it is not necessary tocarry out the overall ANOVA since thismerely tests whether there are differencesacross the various means. In these circum-stances, failure of some means to differ from

Table A.1 A simple ANOVA design

Placebo Placebo Drug A Drug B control A control B

Meana = Meanb = Meanc = Meand =

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 1

Page 13: Dictionary of statistics

the others may produce non-significantfindings due to conditions which are of littleor no interest to the researcher. In a priori test-ing, the number of comparisons to be madehas been limited to a small number of keycomparisons. It is generally accepted that ifthere are relatively few a priori comparisonsto be made, no adjustment is needed for thenumber of comparisons made. One rule ofthumb is that if the comparisons are fewer intotal than the degrees of freedom for the maineffect minus one, it is perfectly appropriate tocompare means without adjustment for thenumber of comparisons.

Contrasts are examined in a priori testing.This is a system of weighting the means inorder to obtain the appropriate mean differencewhen comparing two means. One mean isweighted (multiplied by) �1 and the other isweighted �1. The other means are weighted 0.The consequence of this is that the two keymeans are responsible for the mean differ-ence. The other means (those not of interest)become zero and are always in the centre ofthe distribution and hence cannot influencethe mean difference.

There is an elegance and efficiency in the apriori comparison strategy. However, it doesrequire an advanced level of statistical andresearch sophistication. Consequently, themore exhaustive procedure of the post hoctest (multiple comparisons test) is morefamiliar in the research literature. See also:analysis of variance; Bonferroni test; con-trast; Dunn’s test; Dunnett’s C test; Dunnett’sT3 test; Dunnett’s test; Dunn–Sidak multi-ple comparison test; omnibus test; post hoctests

abscissa: this is the horizontal or x axis in agraph. See x axis

absolute deviation: this is the differencebetween one numerical value and anothernumerical value. Negative values areignored as we are simply measuring the dis-tance between the two numbers. Most

commonly, absolute deviation in statistics isthe difference between a score and the mean(or sometimes median) of the set of scores.Thus, the absolute deviation of a score of 9from the mean of 5 is 4. The absolute devia-tion of a score of 3 from the mean of 5 is2 (Figure A.1). One advantage of theabsolute deviation over deviation is that theformer totals (and averages) for a set ofscores to values other than 0.0 and so givessome indication of the variability of thescores. See also: mean deviation; mean,arithmetic

acquiescence or yea-saying responseset or style: this is the tendency to agree orto say ‘yes’ to a series of questions. This ten-dency is the opposite of disagreeing or saying‘no’ to a set of questions, sometimes called anay-saying response set. If agreeing or saying‘yes’ to a series of questions results in a highscore on the variable that those questions aremeasuring, such as being anxious, then ahigh score on the questions may indicateeither greater anxiety or a tendency to agree.To control or to counteract this tendency,half of the questions may be worded in theopposite or reverse way so that if a personhas a tendency to agree the tendency willcancel itself out when the two sets of itemsare combined.

adding: see negative values

THE SAGE DICTIONARY OF STATISTICS2

Absolutedeviation � 4

Absolutedeviation � 2

9 5

3 5

Figure A.1 Absolute deviations

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 2

Page 14: Dictionary of statistics

addition rule: a simple principle ofprobability theory is that the probability ofeither of two different outcomes occurring isthe sum of the separate probabilities for thosetwo different events (Figure A.2). So, theprobability of a die landing 3 is 1 divided by6 (i.e. 0.167) and the probability of a die land-ing 5 is 1 divided by 6 (i.e. 0.167 again). Theprobability of getting either a 3 or a 5 whentossing a die is the sum of the two separateprobabilities (i.e. 0.167 � 0.167 � 0.333). Ofcourse, the probability of getting any of thenumbers from 1 to 6 spots is 1.0 (i.e. the sumof six probabilities of 0.167).

adjusted means, analysis of covariance:see analysis of covariance

agglomeration schedule: a table that showswhich variables or clusters of variables arepaired together at different stages of a clusteranalysis. See cluster analysis

Cramer (2003)

algebra: in algebra numbers are representedas letters and other symbols when givingequations or formulae. Algebra therefore isthe basis of statistical equations. So a typicalexample is the formula for the mean:

In this m stands for the numerical value of themean, X is the numerical value of a score,

N is the number of scores and � is the symbolindicating in this case that all of the scoresunder consideration should be addedtogether.

One difficulty in statistics is that there is adegree of inconsistency in the use of the sym-bols for different things. So generally speak-ing, if a formula is used it is important toindicate what you mean by the letters in aseparate key.

algorithm: this is a set of steps whichdescribe the process of doing a particular cal-culation or solving a problem. It is a commonterm to use to describe the steps in a computerprogram to do a particular calculation. Seealso: heuristic

alpha error: see Type I or alpha error

alpha (��) reliability, Cronbach’s: one of anumber of measures of the internal consis-tency of items on questionnaires, tests andother instruments. It is used when all theitems on the measure (or some of the items)are intended to measure the same concept(such as personality traits such as neuroti-cism). When a measure is internally consis-tent, all of the individual questions or itemsmaking up that measure should correlatewell with the others. One traditional way ofchecking this is split-half reliability in whichthe items making up the measure are splitinto two sets (odd-numbered items versus

ALPHA (α) RELIABILITY, CRONBACH’S 3

Probability of heador tail is the sum ofthe two separateprobabilitiesaccording toaddition rule: 0.5 +0.5 = 1

+ =Probability ofhead = 0.5

Probability of tail = 0.5

Figure A.2 Demonstrating the addition rule for the simple case of either heads or tails when tossing a coin

m ��XN

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 3

Page 15: Dictionary of statistics

even-numbered items, the first half of theitems compared with the second half). Thetwo separate sets are then summated to givetwo separate measures of what would appearto be the same concept. For example, the fol-lowing four items serve to illustrate a shortscale intended to measure liking for differentfoodstuffs:

1 I like bread Agree Disagree2 I like cheese Agree Disagree3 I like butter Agree Disagree4 I like ham Agree Disagree

Responses to these four items are given inTable A.2 for six individuals. One split half ofthe test might be made up of items 1 and 2,and the other split half is made up of items 3and 4. These sums are given in Table A.3. Ifthe items measure the same thing, then thetwo split halves should correlate fairly welltogether. This turns out to be the case sincethe correlation of the two split halves with

each other is 0.5 (although it is not significantwith such a small sample size). Another namefor this correlation is the split-half reliability.

Since there are many ways of splitting theitems on a measure, there are numerous splithalves for most measuring instruments. Onecould calculate the odd–even reliability forthe same data by summing items 1 and 3and summing items 2 and 4. These two formsof reliability can give different values. This isinevitable as they are based on different com-binations of items.

Conceptually alpha is simply the averageof all of the possible split-half reliabilities thatcould be calculated for any set of data. With ameasure consisting of four items, these areitems 1 and 2 versus items 3 and 4, items 2and 3 versus items 1 and 4, and items 1 and 3versus items 2 and 4. Alpha has a big advan-tage over split-half reliability. It is not depen-dent on arbitrary selections of items since itincorporates all possible selections of items.

In practice, the calculation is based on therepeated-measures analysis of variance. Thedata in Table A.2 could be entered into arepeated-measures one-way analysis of vari-ance. The ANOVA summary table is to befound in Table A.4. We then calculate coeffi-cient alpha from the following formula:

Of course, SPSS and similar packages simplygive the alpha value. See internal consis-tency; reliability

Cramer (1998)

alternative hypothesis: see hypothesis;hypothesis testing

AMOS: this is the name of one of the com-puter programs for carrying out structural

THE SAGE DICTIONARY OF STATISTICS4

Table A.2 Preferences for four foodstuffsplus a total for number ofpreferences

Q1: Q2: Q3: Q4:bread cheese butter ham Total

Person 1 0 0 0 0 0Person 2 1 1 1 0 3Person 3 1 0 1 1 3Person 4 1 1 1 1 4Person 5 0 0 0 1 1Person 6 0 1 0 0 1

Table A.3 The data from Table A.2 with Q1and Q2 added, and Q3 and Q4added

Half A: Half B:bread ++ cheese butter ++ ham

items items Total

Person 1 0 0 0Person 2 2 1 3Person 3 1 2 3Person 4 2 2 4Person 5 0 1 1Person 6 1 0 1

mean square between people �

alpha�mean square residual

mean square between people

�0.600 − 0.200

�0.400

� 0.670.600 0.600

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 4

Page 16: Dictionary of statistics

equation modelling. AMOS stands forAnalysis of Moment Structures. Informationabout AMOS can be found at the followingwebsite:http://www.smallwaters.com/amos/index.html

See structural equation modelling

analysis of covariance (ANCOVA):analysis of covariance is abbreviated asANCOVA (analysis of covariance). It is a formof analysis of variance (ANOVA). In the sim-plest case it is used to determine whether themeans of the dependent variable for two ormore groups of an independent variable orfactor differ significantly when the influenceof another variable that is correlated withthe dependent variable is controlled. Forexample, if we wanted to determine whetherphysical fitness differed according to maritalstatus and we had found that physical fitnesswas correlated with age, we could carry outan analysis of covariance. Physical fitness isthe dependent variable. Marital status is theindependent variable or factor. It may consistof the four groups of (1) the never married,(2) the married, (3) the separated anddivorced, and (4) the widowed. The variablethat is controlled is called the covariate,which in this case is age. There may be morethan one covariate. For example, we may alsowish to control for socio-economic status ifwe found it was related to physical fitness.The means may be those of one factor or ofthe interaction of that factor with other fac-tors. For example, we may be interested inthe interaction between marital status andgender.

There is no point in carrying out an analysisof covariance unless the dependent variableis correlated with the covariate. There are twomain uses or advantages of analysis ofcovariance. One is to reduce the amount ofunexplained or error variance in the depen-dent variable, which may make it more likelythat the means of the factor differ signi-ficantly. The main statistic in the analysis ofvariance or covariance is the F ratio which isthe variance of a factor (or its interaction)divided by the error or unexplained variance.Because the covariate is correlated with thedependent variable, some of the variance ofthe dependent variable will be shared with thecovariate. If this shared variance is part of theerror variance, then the error variance willbe smaller when this shared variance isremoved or controlled and the F ratio will belarger and so more likely to be statisticallysignificant.

The other main use of analysis of covarianceis where the random assignment of casesto treatments in a true experiment has notresulted in the groups having similar meanson variables which are known to be corre-lated with the dependent variable. Suppose,for example, we were interested in the effectof two different programmes on physicalfitness, say swimming and walking. We ran-domly assigned participants to the two treat-ments in order to ensure that participants inthe two treatments were similar. It would beparticularly important that the participants inthe two groups would be similar in physicalfitness before the treatments. If they differedsubstantially, then those who were fitter mayhave less room to become more fit becausethey were already fit. If we found that theydiffered considerably initially and we foundthat fitness before the intervention wasrelated to fitness after the intervention, wecould control for this initial difference withanalysis of covariance. What analysis ofcovariance does is to make the initial meanson fitness exactly the same for the differenttreatments. In doing this it is necessary tomake an adjustment to the means after theintervention. In other words, the adjustedmeans will differ from the unadjusted ones.The more the initial means differ, the greaterthe adjustment will be.

ANALYSIS OF COVARIANCE (ANCOVA) 5

Table A.4 Repeated-measures ANOVAsummary table for data in Table A.2

Sums of Degrees of Means squares freedom square

Between 0.000 3 0.000 treatments (not needed)

Between people 3.000 5 0.600Error (residual) 3.000 15 0.200

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 5

Page 17: Dictionary of statistics

Analysis of covariance assumes that therelationship between the dependent variableand the covariate is the same in the differentgroups. If this relationship varies between thegroups it is not appropriate to use analysis ofcovariance. This assumption is known ashomogeneity of regression. Analysis of cova-riance, like analysis of variance, also assumesthat the variances within the groups are sim-ilar or homogeneous. This assumption is calledhomogeneity of variance. See also: analysis ofvariance; Bryant–Paulson simultaneoustest procedure; covariate; multivariateanalysis of covariance

Cramer (2003)

analysis of variance (ANOVA): analysisof variance is abbreviated as ANOVA (analy-sis of variance). There are several kinds ofanalyses of variance. The simplest kind is aone-way analysis of variance. The term ‘one-way’ means that there is only one factor orindependent variable. ‘Two-way’ indicatesthat there are two factors, ‘three-way’ threefactors, and so on. An analysis of variancewith two or more factors may be called a fac-torial analysis of variance. On its own, analy-sis of variance is often used to refer to ananalysis where the scores for a group areunrelated to or come from different casesthan those of another group. A repeated-measures analysis of variance is one wherethe scores of one group are related to or arematched or come from the same cases. Thesame measure is given to the same or a verysimilar group of cases on more than oneoccasion and so is repeated. An analysis ofvariance where some of the scores are fromthe same or matched cases and others arefrom different cases is known as a mixedanalysis of variance. Analysis of covariance(ANCOVA) is where one or more variableswhich are correlated with the dependentvariable are removed. Multivariate analysisof variance (MANOVA) and covariance(MANCOVA) is where more than one depen-dent variable is analysed at the same time.Analysis of variance is not normally used toanalyse one factor with only two groups butsuch an analysis of variance gives the same

significance level as an unrelated t test withequal variances or the same number of casesin each group. A repeated-measures analysisof variance with only two groups producesthe same significance level as a related t test.The square root of the F ratio is the t ratio.

Analysis of variance has a number ofadvantages. First, it shows whether the meansof three or more groups differ in some wayalthough it does not tell us in which waythose means differ. To determine that, it isnecessary to compare two means (or combi-nation of means) at a time. Second, it pro-vides a more sensitive test of a factor wherethere is more than one factor because theerror term may be reduced. Third, it indi-cates whether there is a significant inter-action between two or more factors. Fourth,in analysis of covariance it offers a more sen-sitive test of a factor by reducing the errorterm. And fifth, in multivariate analysis ofvariance it enables two or more dependentvariables to be examined at the same timewhen their effects may not be significantwhen analysed separately.

The essential statistic of analysis of vari-ance is the F ratio, which was named bySnedecor in honour of Sir Ronald Fisher whodeveloped the test. It is the variance or meansquare of an effect divided by the varianceor mean square of the error or remainingvariance:

An effect refers to a factor or an interactionbetween two or more factors. The larger the Fratio, the more likely it is to be statisticallysignificant. An F ratio will be larger, the big-ger are the differences between the meansof the groups making up a factor or inter-action in relation to the differences within thegroups.

The F ratio has two sets of degrees offreedom, one for the effect variance and theother for the error variance. The mean squareis a shorthand term for the mean squareddeviations. The degrees of freedom for a factorare the number of groups in that factor minusone. If we see that the degrees of freedom fora factor is two, then we know that the factorhas three groups.

THE SAGE DICTIONARY OF STATISTICS6

F ratio�effect varianceerror variance

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 6

Page 18: Dictionary of statistics

Traditionally, the results of an analysis ofvariance were presented in the form of atable. Nowadays research papers are likely tocontain a large number of analyses and thereis no longer sufficient space to show such atable for each analysis. The results for theanalysis of an effect may simply be describedas follows: ‘The effect was found to be statis-tically significant, F2, 12 � 4.72, p � 0.031.’ Thefirst subscript (2) for F refers to the degrees offreedom for the effect and the second sub-script (12) to those for the error. The value(4.72) is the F ratio. The statistical significanceor the probability of this value being statisti-cally significant with those degrees of free-dom is 0.031. This may be written as p � 0.05.This value may be looked up in the appropriatetable which will be found in most statisticstexts such as the sources suggested below.The statistical significance of this value isusually provided by statistical softwarewhich carries out analysis of variance. Valuesthat the F ratio has to be or exceed to be sig-nificant at the 0.05 level are given in Table A.5for a selection of degrees of freedom. It isimportant to remember to include the relevantmeans for each condition in the report as oth-erwise the statistics are somewhat meaning-less. Omitting to include the relevant meansor a table of means is a common error amongnovices.

If a factor consists of only two groups andthe F ratio is significant we know that themeans of those two groups differ significantly.If we had good grounds for predicting whichof those two means would be bigger, weshould divide the significance level of the Fratio by 2 as we are predicting the direction ofthe difference. In this situation an F ratio witha significance level of 0.10 or less will be signifi-cant at the 0.05 level or lower (0.10/2 � 0.05).

When a factor consists of more than twogroups, the F ratio does not tell us which ofthose means differ from each other. For exam-ple, if we have three means, we have threepossible comparisons: (1) mean 1 and mean 2;(2) mean 1 and mean 3; and (3) mean 2 andmean 3. If we have four means, we have sixpossible comparisons: (1) mean 1 and mean 2;(2) mean 1 and mean 3; (3) mean 1 and mean 4;(4) mean 2 and mean 3; (5) mean 2 and mean4; and (6) mean 3 and mean 4. In this

situation we need to compare two means at atime to determine if they differ significantly. Ifwe had strong grounds for predicting whichmeans should differ, we could use a one-tailed t test. If the scores were unrelated, wewould use the unrelated t test. If the scoreswere related, we would use the related t test.This kind of test or comparison is called aplanned comparison or a priori test becausethe comparison and the test have beenplanned before the data have been collected.

If we had not predicted or expected the Fratio to be statistically significant, we shoulduse a post hoc or an a posteriori test to deter-mine which means differ. There are a numberof such tests but no clear consensus aboutwhich tests are the most appropriate to use.One option is to reduce the two-tailed 0.05significance level by dividing it by thenumber of comparisons to obtain the family-wise or experimentwise level. For example,the familywise significance level for threecomparisons is 0.0167 (0.05/3 � 0.0167). Thismay be referred to as a Bonferroni adjustmentor test. The Scheffé test is suitable for unre-lated means which are based on unequalnumbers of cases. It is a very conservativetest in that means are less likely to differ sig-nificantly than with some other tests. Fisher’sprotected LSD (Least Significant Difference)test is used for unrelated means in an analysisof variance where the means have beenadjusted for one or more covariates.

A factorial analysis of variance consistingof two or more factors may be a more sensi-tive test of a factor than a one-way analysis of

ANALYSIS OF VARIANCE (ANOVA) 7

Table A.5 Critical values of F

df forerrorvariance df for effect variance

1 2 3 4 5 ��

8 5.32 4.46 4.07 3.84 3.69 2.9312 4.75 3.89 3.49 3.26 3.11 2.3020 4.35 3.49 3.10 2.87 2.71 1.8430 4.17 3.32 2.92 2.69 2.53 1.6240 4.08 3.23 2.84 2.61 2.45 1.5160 4.00 3.15 2.76 2.53 2.37 1.39

120 3.92 3.07 2.68 2.45 2.29 1.25� 3.84 3.00 2.60 2.37 2.21 1.00

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 7

Page 19: Dictionary of statistics

variance because the error term in a factorialanalysis of variance may be smaller than aone-way analysis of variance. This is becausesome of the error or unexplained variance ina one-way analysis of variance may be due toone or more of the factors and their inter-actions in a factorial analysis of variance.

There are several ways of calculating thevariance in an analysis of variance which can bedone with dummy variables in multiple regres-sion. These methods give the same results in aone-way analysis of variance or a factorialanalysis of variance where the number of casesin each group is equal or proportionate. In atwo-way factorial analysis where the number ofcases in each group is unequal and dispropor-tionate, the results are the same for the inter-action but may not be the same for the factors.There is no clear consensus on which methodshould be used in this situation but it dependson what the aim of the analysis is.

One advantage of a factorial analysis ofvariance is that it determines whether theinteraction between two or more factors issignificant. An interaction is where the differ-ence in the means of one factor depends onthe conditions in one or more other factors. Itis more easily described when the means ofthe groups making up the interaction areplotted in a graph as shown in Figure A.3.

The figure represents the mean number oferrors made by participants who had beendeprived of either 4 or 12 hours of sleep andwho had been given either alcohol or no alcohol.The vertical axis of the graph reflects thedependent variable, which is the number oferrors made. The horizontal axis depicts oneof the independent variables, which is sleepdeprivation, while the two types of lines inthe graph show the other independent vari-able, which is alcohol. There may be a signifi-cant interaction where these lines are notparallel as in this case. The difference in themean number of errors between the 4 hours’and the 12 hours’ sleep deprivation conditionswas greater for those given alcohol than thosenot given alcohol. Another way of describingthis interaction is to say the difference in themean number of errors between the alcoholand the no alcohol group is greater for thosedeprived of 12 hours of sleep than for thosedeprived of 4 hours of sleep.

The analysis of variance assumes that thevariance within each of the groups is equal orhomogeneous. There are several tests for deter-mining this. Levene’s test is one of these. If thevariances are not equal, they may be made tobe equal by transforming them arithmeticallysuch as taking their square root or logarithm.See also: Bartlett’s test of sphericity;Cochran’s C test; Duncan’s new multiplerange test; factor, in analysis of variance; Fratio; Hochberg GT2 test; mean square;repeated-measures analysis of variance; sumof squares; Type I hierarchical or sequentialmethod; Type II classic experimental method

Cramer (1998, 2003)

ANCOVA: see analysis of covariance

ANOVA: see analysis of variance

arithmetic mean: see mean, arithmetic

asymmetry: see symmetry

asymptotic: this describes a curve thatapproaches a straight line but never meets it.For example, the tails of the curve of a normaldistribution approach the baseline but nevertouch it. They are said to be asymptotic.

THE SAGE DICTIONARY OF STATISTICS8

4 hours 12 hours

Sleep deprivation

High

Low

Errors

Alcohol

No alcohol

Figure A.3 Errors as a function of alcohol andsleep deprivation

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 8

Page 20: Dictionary of statistics

attenuation, correcting correlationsfor: many variables in the social sciences aremeasured with some degree of error or unre-liability. For example, intelligence is notexpected to vary substantially from day today. Yet scores on an intelligence test mayvary suggesting that the test is unreliable. Ifthe measures of two variables are known tobe unreliable and those two measures are cor-related, the correlation between these twomeasures will be attenuated or weaker thanthe correlation between those two variables ifthey had been measured without any error.The greater the unreliability of the measures,the lower the real relationship will bebetween those two variables. The correlationbetween two measures may be corrected fortheir unreliability if we know the reliability ofone or both measures.

The following formula corrects the correla-tion between two measures when the reliabilityof those two measures is known:

For example, if the correlation of the twomeasures is 0.40 and their reliability is 0.80and 0.90 respectively, then the correlationcorrected for attenuation is 0.47:

The corrected correlation is larger than theuncorrected one.

When the reliability of only one of themeasures is known, the formula is

For example, if we only knew the reliabilityof the first but not the second measure thenthe corrected correlation is 0.45:

Typically we are interested in the associationor relationship between more than two vari-ables and the unreliability of the measures ofthose variables is corrected by using struc-tural equation modelling.

attrition: this is a closely related concept todrop-out rate, the process by which someparticipants or cases in research are lost overthe duration of the study. For example, in afollow-up study not all participants in theearlier stages can be contacted for a numberof reasons – they have changed address, theychoose no longer to participate, etc.

The major problem with attrition is whenparticular kinds of cases or participants leavethe study in disproportionate numbers toother types of participants. For example, if astudy is based on the list of electors then it islikely that members of transient populationswill leave and may not be contactable at theirlisted address more frequently than membersof stable populations. So, for example, aspeople living in rented accommodation aremore likely to move address quickly but, per-haps, have different attitudes and opinions toothers, then their greater rate of attrition infollow-up studies will affect the researchfindings.

Perhaps a more problematic situation is anexperiment (e.g. such as a study of the effectof a particular sort of therapy) in which drop-out from treatment may be affected by thenature of the treatment so, possibly, manymore people leave the treatment group thanthe control group over time.

Attrition is an important factor in assess-ing the value of any research. It is not a mat-ter which should be hidden in the report ofthe research. See also: refusal rates

average: this is a number representing theusual or typical value in a set of data. It is vir-tually synonymous with measures of central

AVERAGE 9

correlation between measure 1and measure 2

Rc �

� measure 1 reliability � measure2 reliability

0.40 0.40 0.40�0.80 � 0.90

� �0.72 �

0.85 � 0.47

correlation between measure 1and measure 2

Rc �� measure 1 or measure 2 reliability

0.40 0.40�0.80

�0.89

� 0.45

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 9

Page 21: Dictionary of statistics

tendency. Common averages in statistics arethe mean, median and mode. There is nosingle conception of average and every aver-age contributes a different type of informa-tion. For example, the mode is the mostcommon value in the data whereas the meanis the numerical average of the scores andmay or may not be the commonest score.There are more averages in statistics than areimmediately apparent. For example, the har-monic mean occurs in many statistical calcu-lations such as the standard error ofdifferences often without being explicitlymentioned as such. See also: geometric mean

In tests of significance, it can be quite impor-tant to know what measure of central tendency(if any) is being assessed. Not all statistics com-pare the arithmetic means or averages. Somenon-parametric statistics, for example, makecomparisons between medians.

averaging correlations: see correlations,averaging

axis: this refers to a straight line, especially inthe context of a graph. It constitutes a refer-ence line that provides an indication of thesize of the values of the data points. In agraph there is a minimum of two axes – a hor-izontal and a vertical axis. In statistics, oneaxis provides the values of the scores (mostoften the horizontal line) whereas the otheraxis is commonly an indication of the fre-quencies (in univariate statistical analyses) oranother variable (in bivariate statistical analy-sis such as a scatterplot).

Generally speaking, an axis will start at zeroand increase positively since most data in psy-chology and the social sciences only take posi-tive values. It is only when we are dealing with

extrapolations (e.g. in regression or factoranalysis) that negative values come into play.

The following need to be considered:

• Try to label the axes clearly. In Figure A.4the vertical axis (the one pointing up thepage) is clearly labelled as Frequencies.The horizontal axis (the one pointingacross the page) is clearly labelled Year.

• The intervals on the scale have to be care-fully considered. Too many points on anyof the axes and trends in the data can beobscured; too few points on the axes andnumbers may be difficult to read.

• Think very carefully about the implica-tions if the axes do not meet at zero oneach scale. It may be appropriate to useanother intersection point but in some cir-cumstances doing so can be misleading.

• Although axes are usually presented as atright angles to each other, they can beat other angles to indicate that they arecorrelated. The only common statisticalcontext in which this occurs is obliquerotation in factor analysis.

Axis can also refer to an axis of symmetry –the line which divides the two halves of asymmetrical distribution such as the normaldistribution.

THE SAGE DICTIONARY OF STATISTICS10

Fre

quen

cies

Year

1980 1985 1990 1995 2000 2005

Figure A.4 Illustrating axes

Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 10

Page 22: Dictionary of statistics

bar chart, diagram or graph: describesthe frequencies in each category of a nominal(or category variable). The frequencies arerepresented by bars of different length pro-portionate to the frequency. A space shouldbe left between each of the bars to symbolizethat it is a bar chart not a histogram. See also:compound bar chart; pie chart

Bartlett’s test of sphericity: used in fac-tor analysis to determine whether the correla-tions between the variables, examinedsimultaneously, do not differ significantlyfrom zero. Factor analysis is usually con-ducted when the test is significant indicatingthat the correlations do differ from zero. It isalso used in multivariate analysis of varianceand covariance to determine whether thedependent variables are significantly corre-lated. If the dependent variables are not signi-ficantly correlated, an analysis of variance orcovariance should be carried out. The largerthe sample size, the more likely it is that thistest will be significant. The test gives a chi-square statistic.

Bartlett–Box F test: one of the tests usedfor determining whether the variances withingroups in an analysis of variance are similaror homogeneous, which is one of the assump-tions underlying analysis of variance. It isrecommended where the number of cases inthe groups varies considerably and where no

group is smaller than three and most groupsare larger than five.

Cramer (1998)

baseline: a measure to assess scores on avariable prior to some intervention orchange. It is the starting point before a vari-able or treatment may have had its influence.Pre-test and pre-test measure are equivalentconcepts. The basic sequence of the researchwould be baseline measurement → treatment→ post-treatment measure of same variable.

For example, if a researcher were to studythe effectiveness of a dietary programme onweight reduction, the research design mightconsist of a baseline (or pre-test) of weightprior to the introduction of the dietary pro-gramme. Following the diet there may be apost-test measure of weight to see whetherweight has increased or decreased over theperiod before the diet to after the diet.

Without the baseline or pre-test measure, itwould not be possible to say whether or notweights had increased or decreased follow-ing the diet. With the research design illus-trated in Table B.1 we cannot say whether thechange was due to the diet or some other fac-tor. A control group that did not diet wouldbe required to assess this.

Baseline measures are problematic inthat the pre-test may sensitize participantsin some way about the purpose of the exper-iment or in some other way affect theirbehaviour. Nevertheless, their absence leadsto many problems of interpretation even

B

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 11

Page 23: Dictionary of statistics

in well-known published research. Conse-quently they should always be consideredas part of the research even if it is decidednot to include them. Take the following sim-ple study which is illustrated in Table B.2.Participants in the research have eitherseen a war film or a romantic film. Theiraggressiveness has been measured after-wards. Although there is a differencebetween the war film and the romantic filmconditions in terms of the aggressiveness ofparticipants, it is not clear whether this is theconsequence of the effects of the war filmincreasing aggression or the romantic filmreducing aggression – or both things happen-ing. The interpretation would be clearer with abaseline or pre-test measure. See also: pre-test;quasi-experiments

Bayesian inference: an approach to infer-ence based on Bayes’s theorem which was ini-tially proposed by Thomas Bayes. There aretwo main interpretations of the probability orlikelihood of an event occurring such as a cointurning up heads. The first is the relative fre-quency interpretation, which is the number oftimes a particular event happens over the

number of times it could have happened. Ifthe coin is unbiased, then the probability ofheads turning up is about 0.5, so if we toss thecoin 10 times, then we expect heads to turn upon 5 of those 10 times or 0.50 (5/10 � 0.50) ofthose occasions. The other interpretation ofprobability is a subjective one, in which wemay estimate the probability of an eventoccurring on the basis of our experience ofthat event. So, for example, on the basis of ourexperience of coin tossing we may believe thatheads are more likely to turn up, say 0.60 ofthe time. Bayesian inference makes use ofboth interpretations of probability. However,it is a controversial approach and not widelyused in statistics. Part of the reluctance to useit is that the probability of an event (such asthe outcome of a study) will also depend onthe subjective probability of that outcomewhich may vary from person to person. Thetheorem itself is not controversial.

Howson and Urbach (1989)

Bayes’s theorem: in its simplest form, thistheorem originally put forward by ThomasBayes determines the probability or likelihoodof an event A given the probability of anotherevent B. Event A may be whether a person isfemale or male and event B whether they passor fail a test. Suppose, the probability or pro-portion of females in a class is 0.60 and theprobability of being male is 0.40. Suppose fur-thermore, that the probability of passing thetest is 0.90 for females and 0.70 for males.Being female may be denoted as A1 and beingmale A2 and passing the test as B. If we wantedto work out what the probability (Prob) was ofa person being female (A1) knowing that theyhad passed the test (B), we could do this usingthe following form of Bayes’s theorem:

where Prob(B|A1) is the probability of passingbeing female (which is 0.90), Prob(A1) is theprobability of being female (which is 0.60),Prob(B|A2) is the probability of passing beingmale (which is 0.70) and Prob(A2) is the prob-ability of being male (which is 0.40).

THE SAGE DICTIONARY OF STATISTICS12

Table B.2 Results of a study of the effectsof two films on aggression

War film Romantic film

14 419 712 514 313 617 3Mean aggression = 14.83 Mean aggression = 4.67

Prob(B�A1) � Prob(A1)Prob(A1�B) �

[Prob(B�A1) � Prob(A1)] � [Prob(B�A2) � Prob(A2)]

Table B.1 Illustrating baseline

Baseline/Person pre-test Treatment Post-test

A 45 kg 42 kgB 51 kg 47 kgC 76 kg 69 kgD 58 kg 52 kgE 46 kg 41 kg

DIET

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 12

Page 24: Dictionary of statistics

Substituting these probabilities into thisformula, we see that the probability of some-one passing being female is 0.66:

Our ability to predict whether a person isfemale has increased from 0.60 to 0.66 whenwe have additional information aboutwhether or not they had passed the test. Seealso: Bayesian inference

Novick and Jackson (1974)

beta (��) or beta weight: see standardizedpartial regression coefficient

beta (��) error: see Type II or beta error

between-groups or subjects design: com-pares different groups of cases (participants orsubjects). They are among the commonest sortsof research design. Because different groups ofindividuals are compared, there is little controlover a multiplicity of possibly influential vari-ables other than to the extent they can be con-trolled by randomization. Between-subjectsdesigns can be contrasted with within-subjectsdesigns. See mixed design

between-groups variance or meansquare (MS): part of the variance in thedependent variable in an analysis of variancewhich is attributed to an independent vari-able or factor. The mean square is a shortform for referring to the mean squared devi-ations. It is calculated by dividing the sum ofsquares (SS), which is short for the sum ofsquared deviations, by the between-groupsdegrees of freedom. The between-groupsdegrees of freedom are the number of groupsminus one. The sum of squares is calculatedby subtracting the mean of each group fromthe overall or grand mean, squaring this

difference, multiplying it by the number ofcases within the group and summing thisproduct for all the groups. The between-groups variance or mean square is divided bythe error variance or mean square to form theF ratio which is the main statistic of the analysisof variance. The larger the between-groupsvariance is in relation to the error variance,the bigger the F ratio will be and the morelikely it is to be statistically significant.

between-judges variance: used in thecalculation of Ebel’s intraclass correlationwhich is worked out in the same way as thebetween-groups variance with the judgesrepresenting different groups or conditions.To calculate it, the between-judges sum ofsquares is worked out and then divided bythe between-judges degrees of freedomwhich are the number of judges minus one.The sum of squares is calculated by subtract-ing the mean of each judge from the overallor grand mean of all the judges, squaringeach difference, multiplying it by the numberof cases for that judge and summing thisproduct for all the judges.

between-subjects variance: used in thecalculation of a repeated-measures analysis ofvariance and Ebel’s intraclass correlation. It isthe between-subjects sum of squares dividedby the between-subjects degrees of freedom.The between-subjects degrees of freedomare the number of subjects or cases minus one.The between-subjects sum of squares is calcu-lated by subtracting the mean for each subjectfrom the overall or grand mean for all the sub-jects, squaring this difference, multiplying itby the number of conditions or judges andadding these products together. The greaterthe sum of squares or variance, the more thescores vary between subjects.

bias: occurs when a statistic based on asample systematically misestimates theequivalent characteristic (parameter) of thepopulation from which the samples were

BIAS 13

0.90 � 0.60 0.54 0.54(0.90 � 0.60) � (0.70 � 0.40)

�0.54 � 0.28

�0.82

� 0.66

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 13

Page 25: Dictionary of statistics

drawn. For example, if an infinite number ofrepeated samples produced too low an esti-mate of the population mean then the statisticwould be a biased estimate of the parameter.An illustration of this is tossing a coin. This isassumed generally to be a ‘fair’ process aseach of the outcomes heads or tails is equallylikely. In other words, the population of cointosses has 50% heads and 50% tails. If the coinhas been tampered with in some way, in thelong run repeated coin tosses produce a dis-tribution which favours, say, heads.

One of the most common biases in statisticsis where the following formula for standarddeviation is used to estimate the populationstandard deviation:

While this defines standard deviation, unfor-tunately it consistently underestimates thestandard deviation of the population fromwhich it came. So for this purpose it is abiased estimate. It is easy to incorporatea small correction which eliminates thebias in estimating from the sample to thepopulation:

It is important to recognize that there is a dif-ference between a biased sampling methodand an unrepresentative sample, for example.A biased sampling method will result in asystematic difference between samples in thelong run and the population from which thesamples were drawn. An unrepresentativesample is simply one which fails to reflect thecharacteristics of the population. This canoccur using an unbiased sampling methodjust as it can be the result of using a biasedsampling method. See also: estimated stan-dard deviation

biased sample: is produced by methodswhich ensure that the samples are generally

systematically different from the characteristicsof the population from which they are drawn.It is really a product of the method by whichthe sample is drawn rather than the actualcharacteristics of any individual sample.Generally speaking, properly randomly drawnsamples from a population are the only wayof eliminating bias. Telephone interviews area common method of obtaining samples. Asample of telephone numbers is selected atrandom from a telephone directory. Unfortu-nately, although the sample drawn may be arandom (unbiased) sample of people on thattelephone list, it is likely to be a biased sam-ple of the general population since it excludesindividuals who are ex-directory or who donot have a telephone.

A sample may provide a poor estimate ofthe population characteristics but, neverthe-less, is not unbiased. This is because thenotion of bias is about systematically beingincorrect over the long run rather than abouta single poor estimate.

bi-directional relationship: a causal rela-tionship between two variables in which bothvariables are thought to affect each other.

bi-lateral relationship: see bi-directionalrelationship

bimodal: data which have two equallycommon modes. Table B.3 is a frequency tablewhich gives the distribution of the scores 1 to8. It can be seen that the score 2 and the score6 both have the maximum frequency of 16.Since the most frequent score is also knownas the mode, two values exist for the mode: 2and 6. Thus, this is a bimodal distribution. Seealso: multimodal

When a bimodal distribution is plottedgraphically, Figure B.1 illustrates its appear-ance. Quite simply, two points of the his-togram are the highest. These, since the dataare the same as for Table B.3, are for the values2 and 6.

THE SAGE DICTIONARY OF STATISTICS14

standard deviation � ���X � X� �2

N

���X � XX� �2

N � 1

unbiased estimate ofpopulation standard �deviation

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 14

Page 26: Dictionary of statistics

Bimodal distributions can occur in all typesof data including nominal categories (cate-gory or categorical data) as well as numericalscores as in this example. If the data are nomi-nal categories, the two modes are the names(i.e. values) of the two categories.

binomial distribution: describes the prob-ability of an event or outcome occurring,such as a person passing or failing or being awoman or a man, on a number of indepen-dent occasions or trials when the event hasthe same probability of occurring on eachoccasion. The binomial theorem can be usedto calculate these probabilities.

binomial theorem: deals with situations inwhich we are assessing the probability of get-ting particular outcomes when there are just

two values. These may be heads versus tails,males versus females, success versus failure,correct versus incorrect, and so forth. Toapply the theorem we need to know the pro-portions of each of the alternatives in thepopulation (though this may, of course, bederived theoretically such as when tossing acoin). P is the proportion in one category andQ is the proportion in the other category. Inthe practical application of statistics (e.g. as inthe sign test), the two values are often equallylikely or assumed to be equally likely just asin the case of the toss of a coin. There aretables of the binomial distribution availablein statistics textbooks, especially older ones.However, binomials can be calculated.

In order to calculate the likelihood of get-ting 9 heads out of 10 tosses of a coin, P � 0.5and Q � 0.5. N is the number of coin tosses(10). X is the number of events in one cate-gory (9) and Y is the number of events in theother category (1).

The formula for the probability of getting Xobjects out of N (i.e. X plus Y) in one category is

N! is the symbol for a factorial. The factor-ial of 10 is 10 � 9 � 8 � 7 � 6 � 5 � 4 � 3 �2 � 1 � 3,628,800. Factorials are easily calcu-lated on a scientific calculator although theytend to produce huge numbers which can beoff-putting and difficult for those of us notused to working with exponentials. So, sub-stituting values in the above,

This is the basic calculation. Remember thatthis gives the probability of 9 heads and 1 tail.More usually researchers will be interestedin the probability, say, of 9 or more heads. Inthis case, the calculation would be done for9 heads exactly as above but then a similarcalculation for 10 heads out of 10. These twoprobabilities would then be added together

BINOMIAL THEOREM 15

Table B.3 Bimodal distribution

Frequency % Valid % Cumulative %

Valid 1.00 7 7.1 7.1 7.12.00 16 16.2 15.2 23.23.00 14 14.1 14.1 37.44.00 12 12.1 12.1 49.55.00 13 13.1 13.1 62.66.00 16 16.2 16.2 78.67.00 12 12.1 12.1 90.98.00 9 9.1 9.1 100.0Total 99 100.0 100.0

18

16

14

12

10

8

6

41.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00

coun

t

Figure B.1 Example of a bimodal distribution

N!binomial probability �X!(Y)!

PXQY

10! � 0.59 � 0.511binomial probability �

9!(1)!

� 10 � PXQY

� 10 � 0.0009765625 � 0.5

� 0.00488

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 15

Page 27: Dictionary of statistics

to give the probability of 9 or more heads in10 tosses.

There is also the multinomial theoremwhich is the distribution of several categories.

Generally speaking, the binomial theoremis rare in practice for most students and prac-titioners. There are many simple alternativeswhich can be substituted in virtually anyapplication. Therefore, for example, the signtest can be used to assess whether the distri-bution of two alternative categories is equalor not. Alternatively, the single-sample chi-square distribution would allow any numberof categories to be compared in terms of theirfrequency.

The binomial distribution does notrequire equal probabilities of outcomes.Nevertheless, the probabilities need to beindependent so that the separate probabil-ities for the different events are equal to 1.00.This means, for example, that the outcomesbeing considered can be unequal as in thecase of the likelihood of twins. Imagine thatthe likelihood of any birth yielding twins is0.04 (i.e. 4 chances in 100). The probabilityof a non-twin birth is therefore 0.96. Thesevalues could be entered as the probabilitiesof P and Q in the binomial formula to workout the probability that, say, 13 out of 20sequential births at a hospital turn out tobe twins.

bivariate: involving the simultaneous analy-sis of two variables. Two-way chi-square,correlation, unrelated t test and ANOVA areamong the inferential statistics which involvetwo variables. Scattergrams, compound histo-grams, etc., are basic descriptive methodsinvolving a bivariate approach. Bivariateanalysis involves the exploration of interrela-tionships between variables and, hence, pos-sible influences of one variable on another.Conceptually, it is a fairly straightforwardprogression from bivariate analysis to multi-variate analysis.

bivariate regression: see simple or bivariateregression

blocks: see randomization

blocking: see matching

BMDP: an abbreviation for Bio-Medical DataPackage which is one of several widely usedstatistical packages for manipulating andanalysing data. Information about BMDP canbe found at the following website:http://www.statsol.ie/bmdp/bmdp.htm

Bonferroni adjustment: see analysis ofvariance; Bonferroni test; Dunn’s test

Bonferroni test: also known as Dunn’s test,it is one test for controlling the probability ofmaking a Type I error in which two groupsare assumed to differ significantly when theydo not differ. The conventional level fordetermining whether two groups differ is the0.05 or 5% level. At this level the probabilityof two groups differing by chance when theydo not differ is 1 out of 20 or 5 out of 100.However, the more groups we compare themore likely it is that two groups will differ bychance. To control for this, we may reduce thesignificance level by dividing the conven-tional significance level of 0.05 by the numberof comparisons we want to make. So, if wewant to compare six groups, we woulddivide the 0.05 level by 6 to give us a level of0.008 (0.05/6 � 0.008). At this more conserva-tive level, it is much less likely that we willassume that two groups differ when they donot differ. However, we are more likely to bemaking a Type II error in which we assumethat there is no difference between twogroups when there is a difference.

This test has generally been recommendedas an a priori test for planned comparisonseven though it is a more conservative testthan some post hoc tests for unplanned com-parisons. It is listed as a post hoc test in SPSS.

THE SAGE DICTIONARY OF STATISTICS16

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 16

Page 28: Dictionary of statistics

It can be used for equal and unequal groupsizes where the variances are equal. The for-mula for this test is the same as that for theunrelated t test where the variances are equal:

Where the variances are unequal, it is recom-mended that the Games–Howell procedurebe used. This involves calculating a criticaldifference for every pair of means being com-pared which uses the studentized rangestatistic.

Howell (2002)

bootstrapping: bootstrapping statistics lit-erally take the distribution of the obtaineddata in order to generate a sampling distribu-tion of the particular statistic in question. Thecrucial feature or essence of bootstrappingmethods is that the obtained sample data are,conceptually speaking at least, reproducedan infinite number of times to give an infi-nitely large sample. Given this, it becomespossible to sample from the ‘bootstrappedpopulation’ and obtain outcomes which dif-fer from the original sample. So, for example,imagine the following sample of 10 scoresobtained by a researcher:

There is only one sample of 10 scores possiblefrom this set of 10 scores – the original sam-ple (i.e. the 10 scores above). However, if weendlessly repeated the string as we do inbootstrapping then we would get

With this bootstrapped population, it is pos-sible to draw random samples of 10 scoresbut get a wide variety of samples many ofwhich differ from the original sample. This issimply because there is a variety of scoresfrom which to choose now.

So long as the original sample is selectedwith care to be representative of the wider sit-uation, it has been shown that bootstrappedpopulations are not bad population estimatesdespite the nature of their origins.

The difficulty with bootstrapping statisticsis the computation of the sampling distribu-tion because of the sheer number of samplesand calculations involved. Computer pro-grams are increasingly available to do boot-strapping calculations though these have notyet appeared in the most popular computerpackages for statistical analysis. The Web pro-vides fairly up-to-date information on this.

The most familiar statistics used today hadtheir origins in pre-computer times whenmethods had to be adopted which were capa-ble of hand calculation. Perhaps bootstrap-ping methods (and the related procedures ofresampling) would be the norm had high-speed computers been available at the birthof statistical analysis. See also: resamplingtechniques

box plot: a form of statistical diagram to rep-resent the distribution of scores on a variable.It consists (in one orientation) of a horizontalnumerical scale to represent the values of thescores. Then there is a vertical line to mark thelowest value of a score and another verticalline to mark the highest value of a score in thedata (Figure B.2). In the middle there is a boxto indicate the 25 to the 50th percentile (ormedian) and an adjacent one indicating the50th to the 75th percentile (Figure B.3).

Thus the lowest score is 5, the highest scoreis 16, the median score (50th percentile) is 11,and the 75th percentile is about 13.

From such a diagram, not only are thesevalues to an experienced eye an indicationof the variation of the scores, but also the

BOX PLOT 17

group 1 mean � group 2 mean

�(group 1 variance/group 1 n) � (group 2 variance/group 2 n)

5, 10, 6, 3, 9, 2, 5, 6, 7, 11

5, 10, 6, 3, 9, 2, 5, 6, 7, 11, 5, 10, 6, 3, 9, 2, 5, 6, 7,11, 5, 10, 6, 3, 9, 2, 5, 6, 7, 11, 5, 10, 6, 3, 9, 2, 5,6, 7, 11, 5, 10, 6, 3, 9, 2, 5, 6, 7, 11, 5, 10, 6, 3, 9,2, 5, 6, 7, 11, 5, 10, 6, 3, 9, 2, 5, 6, 7, 11, . . . , 5, 10,6, 3, 9, 2, 5, 6, 7, 11, etc.

5 6 7 8 9 10 11 12 13 14 15 16 17

Figure B.2 Illustration of box plot

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 17

Page 29: Dictionary of statistics

symmetry of the distribution may beassessed. In some disciplines box plots areextremely common whereas in others theyare somewhat rare. The more concrete thevariables being displayed, the more useful abox plot is. So in economics and sociologywhen variables such as income are being tab-ulated, the box plot has clear and obviousmeaning. The more abstract the concept andthe less linear the scale of measurement, theless useful is the box plot.

Box’s M test: one test used to determinewhether the variance/covariance matrices oftwo or more dependent variables in a multi-variate analysis of variance or covariance aresimilar or homogeneous across the groups,which is one of the assumptions underlyingthis analysis. If this test is significant, it maybe possible to reduce the variances by trans-forming the scores by taking their square rootor natural logarithm.

brackets (): commonly used in statisticalequations. They indicate that their contentsshould be calculated first. Take the followingequation:

A � B(C � D)

The brackets mean that C and D should beadded together before multiplying by B. So,

Bryant–Paulson simultaneous testprocedure: a post hoc or multiple comparisontest which is used to determine which ofthree or more adjusted means differ from oneanother when the F ratio in an analysis ofcovariance is significant. The formula for thistest varies according to the number of covari-ates and whether cases have been assigned totreatments at random or not.

The following formula is used for a non-randomized study with one covariate wherethe subscripts 1 and 2 denote the two groupsbeing compared and n is the sample size ofthe group:

The error term must be computed separatelyfor each comparison.

For a randomized study with one covariatewe need to use the following formula:

The error term is not computed separately foreach comparison. Where the group sizes areunequal, the harmonic mean of the samplesize is used. For two groups the harmonicmean is defined as follows:

Stevens (1996)

THE SAGE DICTIONARY OF STATISTICS18

Lowestscore

25thpercentile Median

75thpercentile

Highestscore

5 6 7 8 9 10 11 12 13 14 15 16 17

Figure B.3 Interpretation of the components ofa box plot

A � 2(3 � 4)� 2(7)� 2 � 7� 14

adjusted mean1 � adjusted mean2

adjusted 2 (covariate mean1 � covariate mean2)2

��error mean �� � �2square n covariate error sum of squares

adjusted mean1 � adjusted mean2

adjusted covariate between � groups mean square

�error mean ��1 � �square covariate error sum of squares

number of cases in a group

harmonic mean �2 � n1 � n2

n1 � n2

Cramer Chapter-B.qxd 4/22/04 5:14 PM Page 18

Page 30: Dictionary of statistics

canonical correlation: normal correlationinvolves the correlation between one variableand another. Multiple correlation involvesthe correlation between a set of variables anda single variable. Canonical correlationinvolves the correlation between one set of Xvariables and another set of Y variables.However, these variables are not those asactually recorded in the data but abstractvariables (like factors in factor analysis)known as latent variables (variables underly-ing a set of variables). There may be severallatent variables in any set of variables just asthere may be several factors in factor analy-sis. This is true for the X variables and the Yvariables. Hence, in canonical correlationthere may be a number of coefficients – onefor each possible pair of a latent root of the Xvariables and a latent root of the Y variables.

Canonical correlation is a rare technique inmodern published research. See also:Hotelling’s trace criterion; Roy’s gcr;Wilks’slambda

carryover or asymmetrical transfereffect: may occur in a within-subjects orrepeated-measures design in which the effectof a prior condition or treatments ‘carriesover’ onto a subsequent condition. For exam-ple, we may be interested in the effect ofwatching violence on aggression. We conducta within-subjects design in which partici-pants are shown a violent and a non-violentscene in random order, with half the partici-pants seeing the violent scene first and the

other half seeing it second. If the effect ofwatching violence is to make participantsmore aggressive, then participants maybehave more aggressively after viewing thenon-violent scene. This will have the effectof reducing the difference in aggressionbetween the two conditions. One way ofcontrolling for this effect is to increase theinterval between one condition and another.

case: a more general term than participant orsubject for the individuals taking part in astudy. It can apply to non-humans and inani-mate objects so is preferred for some disci-plines. See also: sample

categorical (category) variable: alsoknown as qualitative, nominal or categoryvariables. A variable measured in terms ofthe possession of qualities and not in terms ofquantities. Categorical variables contain aminimum of two different categories (or values)and the categories have no underlying order-ing of quantity. Thus, colour could be consid-ered a categorical variable and, say, thecategories blue, green and red chosen to bethe measured categories. However, bright-ness such as sunny, bright, dull and darkwould seem not to be a categorical variablesince the named categories reflect an under-lying dimension of degrees of brightnesswhich would make it a score (or quantitativevariable).

C

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 19

Page 31: Dictionary of statistics

Categorical variables are analysed usinggenerally distinct techniques such as chi-square, binomial, multinomial, logistic regres-sion, and log–linear. See also: qualitativeresearch; quantitative research

Cattell’s scree test: see scree test,Cattell’s

causal: see quasi-experiments

causal effect: see effect

causal modelling: see partial correlation;path analysis

causal relationship: is one in which onevariable is hypothesized or has been shownto affect another variable. The variablethought to affect the other variable may becalled an independent variable or cause whilethe variable thought to be affected may beknown as the dependent variable or effect. Thedependent variable is assumed to ‘depend’on the independent variable, which is consid-ered to be ‘independent’ of the dependentvariable. The independent variable mustoccur before the dependent variable. However,a variable which precedes another variable isnot necessarily a cause of that other variable.Both variables may be the result of anothervariable.

To demonstrate that one variable causes orinfluences another variable, we have to be ableto manipulate the independent or causal vari-able and to hold all other variables constant. Ifthe dependent variable varies as a function ofthe independent variable, we may be moreconfident that the independent variable affectsthe dependent variable. For example, if wethink that noise decreases performance, we

will manipulate noise by varying its level orintensity and observe the effect this has onperformance. If performance decreases as afunction of noise, we may be more certain thatnoise influences performance.

In the socio-behavioural sciences it may bedifficult to be sure that we have only mani-pulated the independent variable. We mayhave inadvertently manipulated one or moreother variables such as the kind of noise weplayed. It may also be difficult to control allother variables. In practice, we may try tocontrol the other variables that we thinkmight affect performance, such as illumina-tion. We may overlook other variables whichalso affect performance, such as time of dayor week. One factor which may affect perfor-mance is the myriad ways in which people oranimals differ. For example, performancemay be affected by how much experiencepeople have of similar tasks, their eyesight,how tired or anxious they are, and so on. Themain way of controlling for these kinds ofindividual differences is to assign cases ran-domly to the different conditions in abetween-subjects design or to different ordersin a within-subjects design. With very smallnumbers of cases in each condition, randomassignment may not result in the cases beingsimilar across the conditions. A way to deter-mine whether random assignment may haveproduced cases who are comparable acrossconditions is to test them on the dependentvariable before the intervention, which isknown as a pre-test. In our example, thisdependent variable is performance.

It is possible that the variable we assume tobe the dependent variable may also affect thevariable we considered to be the independentvariable. For example, watching violencemay cause people to be more aggressive butaggressive people may also be inclined towatch more violence. In this case we have acausal relationship which has been variouslyreferred to as bi-directional, bi-lateral, two-way, reciprocal or non-recursive. A causalrelationship in which one variable affects butis not affected by another variable is vari-ously known as a uni-directional, uni-lateral,one-way, non-reciprocal or recursive one.

If we simply measure two variables at thesame time as in a cross-sectional survey or

THE SAGE DICTIONARY OF STATISTICS20

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 20

Page 32: Dictionary of statistics

study, we cannot determine which variableaffects the other. Furthermore, if we measurethe two variables on two or more occasions asin a panel or longitudinal study, we also can-not determine if one variable affects the otherbecause both variables may be affected byother variables. In such studies, it is moreaccurate and appropriate simply to refer toan association or relationship between vari-ables unless one is postulating a causal rela-tionship. See also: path analysis

ceiling effect: occurs when scores on a vari-able are approaching the maximum they canbe. Thus, there may be bunching of valuesclose to the upper point. The introduction ofa new variable cannot do a great deal to ele-vate the scores any further since they are vir-tually as high as they can go. Failure torecognize the possibility that there is a ceilingeffect may lead to the mistaken conclusionthat the independent variable has no effect.There are several reasons for ceiling effectswhich go well beyond statistical issues intomore general methodological matters. Forexample, if the researcher wished to knowwhether eating carrots improved eyesight, itwould probably be unwise to use a sample oface rifle marksmen and women. The reason isthat their eyesight is likely to be as good as itcan get (they would not be exceptional atshooting if it were not) so the diet of extra car-rots is unlikely to improve matters. With adifferent sample such as a sample of steamrailway enthusiasts, the ceiling effect may notoccur. Similarly, if a test of intelligence is toodifficult, then improvement may be impossi-ble in the majority of people. So ceiling effectsare a complex of matters and their avoidancea matter of careful evaluation of a range ofissues. See also: floor effect

cell: a subcategory in a cross-tabulation orcontingency table. A cell may refer to just singlevalues of a nominal, category or categoricalvariable. However, cells can also be formed bythe intersection of two categories of the two(or more) independent nominal variables.Thus, a 2 � 3 cross-tabulation or contingencytable has six cells. Similarly, a 2 � 2 � 2 ANOVAhas a total of eight cells. The two-way contin-gency table in Table C.1 illustrates the notionof a cell. One box or cell has been filled in asgrey. This cell consists of the cases which are insample X and fall into category B of the otherindependent variable. That is, a cell consists ofcases which are defined by the vertical columnand the horizontal row it is in.

According to the type of variable, thecontents of the cells will be frequencies (e.g.for chi-square) or scores (e.g. for analysis ofvariance).

central limit theorem: a description of thesampling distribution of means of samplestaken from a population. It is an importanttool in inferential statistics which enables cer-tain conclusions to be drawn about the charac-teristics of samples compared with thepopulation. The theorem makes a number ofimportant statements about the distribution ofan infinite number of samples drawn at ran-dom from a population. These to some extentmay be grasped intuitively though it may behelpful to carry out an empirical investigationof the assumptions of the theory:

1 The mean of an infinite number of ran-dom sample means drawn from the pop-ulation is identical to the mean of thepopulation. Of course, the means of indi-vidual samples may depart from the meanof the population.

CENTRAL LIMIT THEOREM 21

Table C.1 A contingency table with a single cell highlighted

Independent variable 1

Category A Category B Category C

Independent Sample Xvariable 2 Sample Y

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 21

Page 33: Dictionary of statistics

2 The standard deviation of the distributionof sample means drawn from the popula-tion is proportional to the square root ofthe sample size of the sample means inquestion. In other words, if the standarddeviation of the scores in the population issymbolized by � then the standard devia-tion of the sample means is �/�N where Nis the size of the sample in question. Thestandard deviation of sample means isknown as the standard error of samplemeans.

3 Even if the population is not normally dis-tributed, the distribution of means of sam-ples drawn at random from the populationwill tend towards being normally distrib-uted. The larger the sample size involved,the greater the tendency towards the distri-bution of samples being normal.

Part of the practical significance of this is thatsamples drawn at random from a populationtend to reflect the characteristics of that popu-lation. The larger the sample size, the morelikely it is to reflect the characteristics of thepopulation if it is drawn at random. Withlarger sample sizes, statistical techniquesbased on the normal distribution will fit thetheoretical assumptions of the techniqueincreasingly well. Even if the population isnot normally distributed, the tendency of thesampling distribution of means towards nor-mality means that parametric statistics maybe appropriate despite limitations in the data.

With means of small-sized samples, thedistribution of the sample means tends to beflatter than that of the normal distribution sowe typically employ the t distribution ratherthan the normal distribution.

The central limit theorem allows researchersto use small samples knowing that theyreflect population characteristics fairly well.If samples showed no such meaningful andsystematic trends, statistical inference fromsamples would be impossible. See also: sam-pling distribution

central tendency, measure of: any mea-sure or index which describes the central valuein a distribution of values. The three most

common measures of central tendency are themean, the median and the mode. These threeindices are the same when the distribution isunimodal and symmetrical. See also: average

characteristic root, value or number:another term for eigenvalue. See eigenvalue,in factor analysis

chi-square or chi-squared (�2): symbol-ized by the Greek letter � and sometimescalled Pearson’s chi-square after the personwho developed it. It is used with frequencyor categorical data as a measure of goodnessof fit where there is one variable and as ameasure of independence where there aretwo variables. It compares the observed fre-quencies with the frequencies expected bychance or according to a particular distribu-tion across all the categories of one variableor all the combinations of categories of twovariables. The categories or combination ofcategories may be represented as cells in atable. So if a variable has three categoriesthere will be three cells. The greater the dif-ference between the observed and theexpected frequencies, the greater chi-squarewill be and the more likely it is that theobserved frequencies will differ significantly.

Differences between observed and expectedfrequencies are squared so that chi-square isalways positive because squaring negative val-ues turns them into positive ones (e.g. �� 22 �4). Furthermore, this squared difference isexpressed as a function of the expected fre-quency for that cell. This means that largerdifferences, which should by chance resultfrom larger expected frequencies, do nothave an undue influence on the value of chi-square. When chi-square is used as a measureof goodness of fit, the smaller chi-square is,the better the fit of the observed frequenciesto the expected ones. A chi-square of zeroindicates a perfect fit. When chi-square isused as a measure of independence, thegreater the value of chi-square is the morelikely it is that the two variables are relatedand not independent.

THE SAGE DICTIONARY OF STATISTICS22

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 22

Page 34: Dictionary of statistics

The more categories there are, the biggerchi-square will be. Consequently, the statisti-cal significance of chi-square takes account ofthe number of categories in the analysis.Essentially, the more categories there are, thebigger chi-square has to be to be statisticallysignificant at a particular level such as the 0.05or 5% level. The number of categories isexpressed in degrees of freedom (df ). For chi-square with only one variable, the degrees offreedom are the number of categories minusone. So, if there are three categories, there are2 degrees of freedom (3 � 1 � 2). For chi-square with two variables, the degrees of free-dom are one minus the number of categoriesin one variable multiplied by one minus thenumber of categories in the other variable. So,if there are three categories in one variableand four in the other, the degrees of freedomare 6 [(3 � 1) � (4 � 1) � 6].

With 1 degree of freedom it is necessary tohave a minimum expected frequency of fivein each cell to apply chi-square. With morethan 1 degree of freedom, there should be aminimum expected frequency of one in eachcell and an expected minimum frequency offive in 80% or more of the cells. Where theserequirements are not satisfied it may be pos-sible to meet them by omitting one or morecategories and/or combining two or morecategories with fewer than the minimumexpected frequencies.

Where there is 1 degree of freedom, if weknow what the direction of the results is forone of the cells, we also know what the direc-tion of the results is for the other cell wherethere is only one variable and for one of theother cells where there are two variables withtwo categories. For example, if there are onlytwo categories or cells, if the observed fre-quency in one cell is greater than that expectedby chance, the observed frequency in the othercell must be less than that expected by chance.Similarly, if there are two variables with twocategories each, and if the observed frequencyis greater than the expected frequency in one ofthe cells, the observed frequency must be lessthan the expected frequency in one of the othercells. If we had strong grounds for predictingthe direction of the results before the data wereanalysed, we could test the statistical signi-ficance of the results at the one-tailed level.

Where there is more than 1 degree of freedom,we cannot tell which observed frequencies inone cell are significantly different from thosein another cell without doing a separate chi-square analysis of the frequencies for thosecells.

We will use the following example to illus-trate the calculation and interpretation of chi-square. Suppose we wanted to find whetherwomen and men differed in their support forthe death penalty. We asked 110 women and90 men their views and found that 20 of thewomen and 30 of the men agreed with thedeath penalty. The frequency of women andmen agreeing, disagreeing and not knowingare shown in the 2 � 3 contingency table inTable C.2.

The number of women expected to sup-port the death penalty is the proportion ofpeople agreeing with the death penaltywhich is expressed as a function of the num-ber of women. So the proportion of peoplesupporting the death penalty is 50 out of 200or 0.25(50/200 � 0.25) which as a function ofthe number of women is 27.50(0.25 � 110 �27.50). The calculation of the expected fre-quency can be expressed more generally inthe following formula:

For women supporting the death penalty therow total is 110 and the column total is 50.The grand total is 200. Thus the expected fre-quency is 27.50(110 � 50/200 � 27.50).

Chi-square is the sum of the squared dif-ferences between the observed and expectedfrequency divided by the expected frequencyfor each of the cells:

CHI-SQUARE OR CHI-SQUARED (χ2) 23

Table C.2 Support for the death penaltyin women and men

Yes No Don’t know

Women 20 70 20Men 30 50 10

expected frequency �row total � column total

grand total

chi-square �(observed frequency � expected frequency)2

expected frequency

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 23

Page 35: Dictionary of statistics

This summed across cells. This value forwomen supporting the death penalty is 2.05:

The sum of the values for all six cells is 6.74(2.05 � 0.24 � 0.74 � 2.50 � 0.30 � 0.91 �6.74). Hence chi-square is 6.74.

The degrees of freedom are 2[(2 � 1) �(3 � 1) � 2]. With 2 degrees of freedom chi-square has to be 5.99 or larger to be statisti-cally significant at the two-tailed 0.05 level,which it is.

Although fewer women support the deathpenalty than expected by chance, we do notknow if this is statistically significant becausethe chi-square examines all three answerstogether. If we wanted to determine whetherfewer women supported the death penaltythan men we could just examine the two cellsin the first column. The chi-square for thisanalysis is 2.00 which is not statistically signi-ficant. Alternatively, we could carry out a2 � 2 chi-square. We could compare thoseagreeing with either those disagreeing orthose disagreeing and not knowing. Boththese chi-squares are statistically significant,indicating that fewer women than men sup-port the death penalty.

The value that chi-square has to reach orexceed to be statistically significant at thetwo-tailed 0.05 level is shown in Table C.3 forup to 12 degrees of freedom. See also: contin-gency coefficient; expected frequencies;Fisher (exact probability) test; log–linearanalysis; partitioning; Yates’s correction

Cramer (1998)

classic experimental method, in analy-sis of variance: see Type II, classic experi-mental or least squares method in analysisof variance

cluster analysis: a set of techniques for sort-ing variables, individuals, and the like, intogroups on the basis of their similarity to each

other. These groupings are known as clusters.Really it is about classifying things on thebasis of having similar patterns of character-istics. For example, when we speak of familiesof plants (e.g. cactus family, rose family, andso forth) we are talking of clusters of plantswhich are similar to each other. Cluster analy-sis appears to be less widely used than factoranalysis, which does a very similar task. Oneadvantage of cluster analysis is that it is lesstied to the correlation coefficient than factoranalysis is. For example, cluster analysissometimes uses similarity or matching scores.Such a score is based on the number of char-acteristics that, say, a case has in commonwith another case.

Usually, depending on the method of clus-tering, the clusters are hierarchical. That is,there are clusters within clusters or, if oneprefers, clusters of clusters. Some methods ofclustering (divisive methods) start with oneall-embracing cluster and then break this intosmaller clusters. Agglomerative methods ofclustering usually start with as many clustersas there are cases (i.e. each case begins as acluster) and then the cases are broughttogether to form bigger and bigger clusters.There is no single set of clusters which alwaysapplies – the clusters are dependent on whatways of assessing similarity and dissimilarityare used. Clusters are groups of things whichhave more in common with each other thanthey do with other clusters.

There are a number of ways of assessinghow closely related the entities being enteredinto a cluster analysis are. This may bereferred to as their similarity or the proximity.This is often expressed in terms of correlationcoefficients but these only indicate highcovariation, which is different from precise

THE SAGE DICTIONARY OF STATISTICS24

(20 � 27.5)2

��7.52

�56.25

� 2.0527.5 27.5 27.5

Table C.3 The 0.05 probability two-tailedcritical values of chi-square

df X 2 df X 2

1 3.84 7 14.072 5.99 8 15.513 7.82 9 16.924 9.49 10 18.315 11.07 11 19.686 12.59 12 21.03

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 24

Page 36: Dictionary of statistics

matching. One way of assessing matchingwould be to use squared Euclidean distancesbetween the entities. To do this, one simplycalculates the difference between values,squares each difference, and then sums thesquare of the difference. This is illustrated inTable C.4 for the similarity of facial featuresfor just two of a larger number of individualsentered into a cluster analysis. Obviously ifwe did the same for, say, 20 people, we wouldend up with a 20 � 20 matrix indicating theamount of similarity between every possiblepair of individuals – that is, a proximitymatrix. The pair of individuals in Table C.4are not very similar in terms of their facialfeatures.

Clusters are formed in various ways in var-ious methods of cluster analysis. One way ofstarting a cluster is simply to identify the pairof entities which are closest or most similar toeach other. That is, one would choose from thecorrelation matrix or the proximity matrix thepair of entities which are most similar. Theywould be the highest correlating pair if onewere using a correlation matrix or the pairwith the lowest sum of squared Euclidean dis-tances between them in the proximity matrix.This pair of entities would form the nucleus ofthe first cluster. One can then look through thematrix for the pair of entities which have thenext highest level of similarity. If this is a com-pletely new pair of entities, then we have abrand-new cluster beginning. However, if onemember of this pair is in the cluster firstformed, then a new cluster is not formed butthe additional entity is added to the first clus-ter making it a three-entity cluster at this stage.

According to the form of clustering, arefined version of this may continue until all

of the entities are joined together in a singlegrand cluster. In some other methods, clus-ters are discrete in the sense that only entitieswhich have their closest similarity withanother entity which is already in the clustercan be included. Mostly, the first option isadopted which essentially is hierarchicalclustering. That is to say, hierarchical cluster-ing allows for the fact that entities have vary-ing degrees of similarity to each other.Depending on the level of similarity required,clusters may be very small or large. The con-sequence of this is that this sort of clusteranalysis results in clusters within clusters –that is, entities are conceived as havingdifferent levels of similarity. See also: agglom-eration schedule; dendrogram; hierarchicalagglomerative clustering

Cramer (2003)

cluster sample: cluster sampling employsonly limited portions of the population. Thismay be for a number of reasons – there maynot be available a list which effectivelydefines the population. For example, ifan education researcher wished to study11 year old students, it is unlikely that a list ofall 11 year old students would be available.Consequently, the researcher may opt forapproaching a number of schools each ofwhich might be expected to have a list of its11 year old students. Each school would be acluster.

In populations spread over a substantialgeographical area, random sampling is enor-mously expensive since random samplingmaximizes the amount of travel and consequent

CLUSTER SAMPLE 25

Table C.4 Squared Euclidean distances and the sum of the squaredEuclidean distances for facial features

Feature Person 1 Person 2 Difference Difference2

Nose size 4 4 0 0Bushiness of eyebrows 3 1 2 4Fatness of lips 4 2 2 4Hairiness 1 4 −3 9Eye spacing 5 4 1 1Ear size 4 1 −3 9

Sum = 27

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 25

Page 37: Dictionary of statistics

expense involved. So it is fairly common toemploy cluster samples in which the largergeographical area is subdivided into repre-sentative clusters or sub-areas. Thus, largetowns, small towns and rural areas might beidentified as the clusters. In this way, charac-teristics of the stratified sample may be builtin as well as gaining the advantages ofreduced geographical dispersion of partici-pants or cases. Much research is only possiblebecause of the use of a limited number ofclusters in this way.

In terms of statistical analysis, cluster sam-pling techniques may affect the conceptualbasis of the underlying statistical theory asthey cannot be regarded as random samples.Hence, survey researchers sometimes usealternative statistical techniques from theones common in disciplines such as psycho-logy and related fields.

clustered bar chart, diagram or graph:see compound bar chart

Cochran’s C test: one test for determiningwhether the variance in two or more groupsis similar or homogeneous, which is anassumption that underlies the use of analysisof variance. The group with the largest vari-ance is divided by the sum of variances of allthe groups. The statistical significance of thisvalue may be looked up in a table of criticalvalues for this test.

Cramer (1998)

Cochran’s Q test: used to determinewhether the frequencies of a dichotomousvariable differ significantly for more than tworelated samples or groups.

Cramer (1998)

coefficient of alienation: indicates theamount of variation that two variables do not

have in common. If there is a perfect correla-tion between two variables then the coeffi-cient of alienation is zero. If there is nocorrelation between two variables then thecoefficient of alienation is one. To calculatethe coefficient of alienation, we use the fol-lowing formula:

Where r 2 is the squared correlation coefficientbetween the two variables. So if we know thatthe correlation between age and intelligenceis �0.2 then

In a sense, then, it is the opposite of the coef-ficient of determination which assesses theamount of variance that two variables have incommon.

coefficient of determination: an index ofthe amount of variation that two variableshave in common. It is simply the square ofthe correlation coefficient between the twovariables:

Thus, if the correlation between two variablesis 0.4, then the coefficient of determination is0.42 � 0.16.

The coefficient of determination is a clearerindication of the relationship between twovariables than the correlation coefficient. Forexample, the difference between a correlationcoefficient of 0.5 and one of 1.0 is not easy fornewcomers to statistics to appreciate. However,converted to the corresponding coefficients ofdetermination of 0.25 and 1.00, then it is clearthat a correlation of 1.00 (i.e. coefficient ofdetermination � 1.0) is four times the magni-tude as one of 0.5 (coefficient of determina-tion � 0.25) in terms of the amount ofvariance explained.

Table C.5 gives the relationship betweenthe Pearson correlation (or point biserial

THE SAGE DICTIONARY OF STATISTICS26

coefficient of alienation � 1 � r2

coefficient of alienation � 1 � (�0.2)2

� 1 � 0.04 � 0.96

coefficient of determination � r2

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 26

Page 38: Dictionary of statistics

or phi coefficients for that matter) and thecoefficient of determination. The percentageof shared variance is also given. This tableshould help understand the meaning of thecorrelation coefficient values. See coefficientof alienation

coefficient of variation: it would seem intu-itive to suggest that samples with big meanscores of, say, 100 are likely to have larger vari-ation around the mean than samples withsmaller means such as 5. In order to indicaterelative variability adjusting the variance ofsamples for their sample size, we can calculatethe coefficient of variation. This is merely thestandard deviation of the sample divided bythe mean score. (Standard deviation itself is anindex of variation, being merely the squareroot of variance.) This allows comparison ofvariation between samples with large meansand small means. Essentially, it scales down (orpossibly up) all standard deviations as a ratioof a single unit on the measurement scale.

Thus, if a sample mean is 39.0 and its stan-dard deviation is 5.3, we can calculate thecoefficient of variation as follows:

Despite its apparent usefulness, the coeffi-cient of variation is more common in somedisciplines than others.

Cohen’s d: one index of effect size used inmeta-analysis and elsewhere. Compared

with using Pearson’s correlation for thispurpose, it lacks intuitive appeal. The twoare readily converted to each other. Seemeta-analysis

cohort: a group of people who share thesame or similar experience during the sameperiod of time such as being born or marriedduring a particular period. This period mayvary in duration.

cohort analysis: usually the analysis ofsome characteristic from one or more cohortsat two or more points in time. For example,we may be interested in how those in a par-ticular age group vote in two consecutiveelections. The individuals in a cohort neednot be the same at the different points intime. A study in which the same individualsare measured on two or more occasions isusually referred to as a panel or prospectivestudy.

Glenn (1977)

cohort design: a design in which groups ofindividuals pass through an institution suchas a school but experience different eventssuch as whether or not they have beenexposed to a particular course. The groupshave not been randomly assigned to whetheror not they experience the particular event soit is not possible to determine whether anydifference between the groups experiencingthe event and those not experiencing theevent is due to the event itself.

Cook and Campbell (1979)

COHORT DESIGN 27

Table C.5 The relationship between correlation, coefficient of determination and percentage ofshared variance

Correlation 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0Coefficient of 1.00 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01 0.00

determinationShared variance 100% 81% 64% 49% 36% 25% 16% 9% 4% 1% 0%

coefficient of variation �standard deviation

mean

�5.3

� 0.1439.0

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 27

Page 39: Dictionary of statistics

cohort effect: may be present if thebehaviour of one cohort differs from that ofanother. Suppose, for example, we askedpeople their voting intentions in the year 1990and found that 70% of those born in 1950 saidthey would vote compared with 50% of thoseborn in 1960 as shown in Table C.6. We mayconsider that this difference may reflect acohort effect. However, this effect may also bedue to an age difference in that those born in1960 will be younger than those born in 1950.Those born in 1960 will be aged 30 in the year1990 while those born in 1950 will be aged 40.So this difference may be more an age effectthan a cohort effect.

To determine whether this difference is anage effect, we would have to compare the vot-ing intentions of the two cohorts at the sameage. This would mean finding out the votingintention of these two groups at two otherperiods or times of measurement. Thesetimes could be the year 1980 for those born in1950 who would then be aged 30 and the year2000 for those born in 1960 who would thenbe 40. Suppose, of those born in 1950 andasked in 1980, we found that 60% said theywould vote compared with 60% of those bornin 1960 and asked in 2000 as shown in Table C.6.This would suggest that there might also bean age effect in that older people may bemore inclined to vote than younger people.However, this age effect could also be a timeof measurement or period effect in that therewas an increase in people’s intention to voteover this period.

If we compare people of the same age forthe two times we have information on them,we see that there appears to be a decrease intheir voting intentions. For those aged 30,60% intended to vote in 1980 compared with50% in 1990. For those aged 40, 70% intended

to vote in 1990 compared with 60% in 2000.However, this difference could be more acohort effect than a period effect.

Menard (1991)

collinearity: a feature of the data whichmakes the interpretation of analyses such asmultiple regression sometimes difficult. Inmultiple regression, a number of predictor (orindependent) variables are linearly combinedto estimate the criterion (or dependent vari-able). In collinearity, some of the predictor orindependent variables correlate extremelyhighly with each other. Because of the way inwhich multiple regression operates, thismeans that some variables which actuallypredict the dependent variable do not appearin the regression equation, but other predic-tor variables which appear very similar havea lot of impact on the regression equation.Table C.7 has a simple example of a correla-tion matrix which may have a collinearityproblem. The correlations between the inde-pendent variables are the major focus. Areaswhere collinearity may have an effect havebeen highlighted. These are independentvariables which have fairly high correlationswith each other. In the example, the correla-tion matrix indicates that independent vari-able 1 correlates at 0.7 with independentvariable 4. Both have got (relatively) fairlyhigh correlations with the dependent variableof 0.4 and 0.3. Thus, both are fairly good pre-dictors of the dependent variable. If one butnot the other appears as the significant pre-dictor in multiple regression, the researchershould take care not simply to take the inter-pretation offered by the computer output ofthe multiple regression as adequate.Another solution to collinearity problems isto combine the highly intercorrelated vari-ables into a single variable which is thenused in the analysis. The fact that they arehighly intercorrelated means that they aremeasuring much the same thing. The bestway of combining variables is to converteach to a z score and sum the z scores to givea total z score.

It is possible to deal with collinearity in anumber of ways. The important thing is that

Table C.6 Percentage intending to vote fordifferent cohorts and periodswith age in brackets

Year of measurement (period)

1980 1990 2000

Year of birth 1950 60 (30) 70 (40)(cohort) 1960 50 (30) 60 (40)

THE SAGE DICTIONARY OF STATISTICS28

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 28

Page 40: Dictionary of statistics

Table C.7 Examples of high intercorrelations which may make collinearity a problem

Independent Independent Independent Independent Dependentvariable 2 variable 3 variable 4 variable 5 variable

Independent variable 1 0.3 0.2 0.7 0.1 0.4Independent variable 2 0.8 0.3 0.2 0.2Independent variable 3 0.2 0.1 0.2Independent variable 4 0.3 0.3Independent variable 5 0.0

the simple correlation matrix between theindependent variables and the dependentvariable is informative about which vari-ables ought to relate to the dependent vari-able. Since collinearity problems tend toarise when the predictor variables correlatehighly (i.e. are measuring the same thing aseach other) then it may be wise to combinedifferent measures of the same thing soeliminating their collinearity. It might alsobe possible to carry out a factor analysis ofthe independent variable to find a set of fac-tors among the independent variables whichcan then be put into the regression analysis.

combination: in probability theory, a set ofevents which are not structured by order ofoccurrence is called a combination. Combina-tions are different from permutations, whichinvolve order. So if we get a die and toss it sixtimes, we might get three 2s, two 4s and a 5.So the combination is 2, 2, 2, 4, 4, 5. So com-binations are less varied than permutationssince there is no time sequence order dimen-sion in combinations. Remember that order isimportant. The following two permutations(and many others) are possible from thiscombination:

or

See also: permutation

combining variables: one good and fairlysimple way to combine two or more variablesto give total scores for each case is to turn eachscore on a variable into a z score and sum thosescores. This means that each score is placed onthe same unit of measurement or standardized.

common variance: the variation which two(or more) variables share. It is very differentfrom error variance, which is variation in thescores and which is not measured or controlledby the research method in a particular study.One may then conceptually describe error vari-ance in terms of the Venn diagram (Figure C.1).Each circle represents a different variable andwhere they overlap is the common variance orvariance they share. The non-overlappingparts represent the error variance. It has tobe stressed that the common and errorvariances are as much a consequence of thestudy in question and are not really simply acharacteristic of the variables in question.

An example which might help is to imaginepeople’s weights as estimated by themselvesas one variable and their weights as estimatedby another person as being the other variable.Both measures will assess weight up to a pointbut not completely accurately. The extent towhich the estimates agree across a samplebetween the two is a measure of the commonor shared variance; the extent of the disagree-ment or inaccuracy is the error variance.

communality, in factor analysis: the totalamount of variance a variable is estimatedto share with all other variables in a factor

COMMUNALITY, IN FACTOR ANALYSIS 29

4, 2, 4, 5, 2, 2

5, 2, 4, 2, 4, 2

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 29

Page 41: Dictionary of statistics

analysis. The central issue is to do with the factthat a correlation matrix normally has 1.0through one of the diagonals indicating thatthe correlation of a variable with itself isalways 1.0. In factor analysis, we are normallytrying to understand the pattern of associationof variables with other variables, not the corre-lation of a variable with itself. In some circum-stances, having 1.0 in the diagonal results in thefactor analysis being dominated by the correla-tions of individual variables with themselves.If the intercorrelations of variables in the factoranalysis are relatively small (say correlationswhich are generally of the order 0.1 to 0.4), thenit is desirable to estimate the communalityusing the methods employed by principal axisfactoring. This is because the correlations in thediagonal (i.e. the 1.0s) swamp the intercorrela-tions of the variables with the other variables.Communalities are essentially estimated in fac-tor analysis by initially substituting the bestcorrelation a variable has with another variablein place of the correlation of the variable withitself. An iterative process is then employed torefine the communality estimate. The commu-nality is the equivalent of squaring the factorloadings of a variable for a factor and summingthem. There will be one communality estimateper factor. In principal component analysiscommunality is always 1.00, though the termcommunality is not fully appropriate in thatcontext. In principal axis factoring it is almostinvariably less than 1.00 and usually substantially

less. To understand communality better, it isuseful to compare the results of a principalcomponents analysis with those of a principalaxis analysis of the same data. In the former,you are much more likely to find factors whichare heavily loaded on just one variable. The dif-ferences between these two forms of factoranalysis are small when the correlation matrixconsists of high intercorrelations.

compound bar chart: a form of bar chartwhich allows the introduction of a secondvariable to the analysis. For example, onecould use a bar chart to indicate the numbersof 18 year olds going to university to studyone of four disciplines (Figure C.2). Thiswould be a simple bar chart. In this context, acompound bar chart might involve the use ofa second variable gender. Each bar of the sim-ple bar chart would be differentiated into asection indicating the proportion of malesand another section indicating the proportionof females. In this way, it would be possible tosee pictorially whether there is a gender dif-ference in terms of the four types of univer-sity course chosen. In a clustered bar chart,the bars for the two genders are placed sideby side rather than stacked into a single col-umn so that it is possible to see the propor-tions of females going to university, malesgoing to university, females not going to uni-versity, and males not going to universitymore directly.

Compound bar charts work well onlywhen there are small numbers of values foreach of the variables. With many values thecharts become too complex and trends moredifficult to discern.

compound frequency distributions: seefrequency distribution

computational formulae: statistical con-cepts are largely defined by their formulae.Concepts such as standard deviation are dif-ficult to understand or explain otherwise.

THE SAGE DICTIONARY OF STATISTICS30

Common orshared variance

VariableA

VariableB

Error variance

Figure C.1 A Venn diagram illustrating thedifference between common varianceand error variance

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 30

Page 42: Dictionary of statistics

Such formulae are not always the mosteasily calculated by hand or using a calcula-tor. So more easily calculated versions of theformulae have been derived from the basicdefining formulae. These are known ascomputational formulae. They have the dis-advantage of obscuring what the basic con-cept is since their purpose is to minimizearithmetic labour. Consequently, they areprobably best disregarded by those tryingto learn statistical concepts as such, espe-cially if it is intended to do all calculationsusing a standard statistical package on acomputer.

concurrent validity: the extent to which ameasure is related to another measure whichit is expected to be related to and which ismeasured at the same time. For example, ameasure of intelligence will be expected to berelated to a measure of educational achieve-ment which is assessed at the same time.Individuals with higher intelligence will beexpected to show higher educational achieve-ment. See also: predictive validity

condition: in experiments, a condition refers tothe particular circumstances a particular groupof cases or participants experience. Conditionsare the various alternative treatments for theindependent variables. If the experiment hasonly an experimental group and a controlgroup, there are two conditions (which are thesame as the different levels of treatment for theindependent variable). Where the study is morecomplex, the number of conditions goes upcommensurately. Thus, in a 2 � 2 � 2 analysisof variance there would be eight different con-ditions – two for each variable giving 2 �2 � 2 � 8. Sometimes the different conditionsfor a particular independent variable are men-tioned. Thus, in a study of the effects of a drug,the treatments would perhaps be 5 mg, 3 mg,1 mg and 0 mg. See also: levels of treatment

conditional probability: the probability ofany event occurring can be affected byanother event. For example, the likelihood ofbeing knocked over by a car on any daymight be 0.0002. However, what if onedecides to spend the entire day in bed? Inthese circumstances, the probability of beingknocked down by a car would probably godown to 0.00000 – or smaller. In other words,a probability can be substantially affected byconsideration of the particular antecedentconditions applying. So the chances of chok-ing to death are probably greater when eatingbony fish than when eating tomato soup.Hence, it is misleading to report general prob-abilities of a particular event occurring iffactors (conditions) which greatly change thelikelihood are present.

CONDITIONAL PROBABILITY 31

(a) Simple bar chart

60

50

40

1.00 2.00 3.00 4.00

30

20

10

0

Cou

ntC

ount

(b) Compound bar chart (stacked)

60

50

40

1.00 2.00 3.00 4.00

30

20

10

0 malefemale

Gender

Gender

Cou

nt

(c) Compound bar chart (clustered)

1.00 2.00 3.00 4.00

30

20

10

0malefemale

Figure C.2 Simple and two types of compound barchart

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 31

Page 43: Dictionary of statistics

confidence interval: an approach to assess-ing the information in samples which givesthe interval for a statistic covering the mostlikely samples drawn from the population(estimated from the samples).

Basically, the sample characteristics areused as an estimate of the population charac-teristics (Figure C.3). The theoretical distribu-tion of samples from this estimated populationis then calculated. The 95% confidence intervalrefers to the interval covering the statistic (e.g.the mean) for the 95% most likely samplesdrawn from the population (Figure C.4).Sometimes the confidence interval given is forthe 99% most likely samples. The steps are:

• Usually the population value is estimatedto be the same as the sample statistic. This

does not have to be the case, however, andany value could be used as the populationvalue.

• The variability of the available sample(s)is used to estimate the variability of thepopulation.

• This estimated population variability isthen used to calculate the variability of aparticular statistic in the population. Thisis known as the standard error.

• This value of the standard error is used inconjunction with the sample size(s) tocalculate the interval of the middle 95%of samples drawn from that estimatedpopulation.

The concept of confidence intervals can beapplied to virtually any statistic. However, itis probably best understood initially in

THE SAGE DICTIONARY OF STATISTICS32

The confidence interval is calculatedfrom the standard error multiplied bythe value of t or z which includes themiddle 95% or 99% of samples. For alarge sample, t = z ; therefore 1.96 xthe standard error gives the distance

from the population mean to theextreme 5% of samples

The sample(s) is onlyavailable information

The characteristics of thepopulation are estimated

(inferred) from sampleinformation fairly directly

for the most part

The mean of the population isestimated to be sample value or

perhaps some other value such as0.00 determined by the null

hypothesis

The spread (standard deviation) ofthe sample is used to estimate the

standard deviation of the population– then this is used to estimate the

standard error of the statistic inquestion

Figure C.3 Flowchart for confidence intervals

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 32

Page 44: Dictionary of statistics

terms of the confidence interval of the mean –probably the most common statistic thoughany statistic will have a confidence interval.

Confidence intervals are often promoted asan alternative to point estimates (conventionalsignificance testing). This is to overstate thecase since they are both based on the samehypothesis testing approach. Confidenceintervals are set values (95%, 99%). Conse-quently they may not be as informative as theexact values available for point estimatesusing computers. (The equivalent drawback istrue of point estimates when critical values areemployed in hand calculations.)

Confidence intervals are also claimed toallow greater understanding of the trueimportance of trends in the data. This is basi-cally because the confidence interval isexpressed in terms of a range on the originalscale of measurement (i.e. the measurementscale on which the calculations were based).Again this is somewhat overstated. Unlessthe confidence refers to something which isof concrete value in assessing the magnitudeof the trend in the data, then there is not thatmuch gain. For example, if we know that the95% confidence interval for height is 1.41 to1.81 metres then this refers to something thatis readily understood. If we know that the

confidence interval for a measure of dogma-tism is 22 to 24 then this is less evidentlymeaningful. See confidence interval of themean; margin of error; standard error ofthe regression coefficient

confidence interval of the mean: indi-cates the likely variability in samples giventhe known characteristic of the data.Technically, it gives the spread of the 95% (orsometimes 99%) most likely sample meansdrawn from the population estimated fromthe known characteristics of the researchsample. If one knows the likely spread ofsample means then it is possible to interpretthe importance and implications of themeans of the data from the sample(s)researched in the study. One problem withthe concept of confidence interval is that it isdifficult to put into everyday language with-out distorting its mathematical reality. Whenthe confidence interval is wide, our sample islikely to be a poor guide to the populationvalue.

An alternative way of looking at confi-dence intervals is to define them as thesample of means that are not significantly dif-ferent statistically from a particular popula-tion mean. By definition, then, the 95%confidence interval is the range of the middle95% of sample means: 47.5% above the meanand 47.5% below the mean. Obviously it isassumed that the distribution is symmetrical.This is illustrated in Figure C.5.

Most commonly in the single-samplestudy, the mean of the population is esti-mated as being the same as the mean of theknown sample from that population. Whencomparing two sample means (e.g. as withthe t test) the population mean would be setaccording to the null hypothesis as being zero(because if the null hypothesis is true then thedifference between the two samples will bezero). In this case, if the confidence intervaldoes not include zero then the hypothesis ispreferred to the null hypothesis.

Consider a public opinion survey in whicha sample, say, of 1000 people have been askedtheir age. Provided that the sample is a ran-dom sample from the population, then the

CONFIDENCE INTERVAL OF THE MEAN 33

Estimatedpopulationdistribution

Population mean underconsideration− often zero

95% confidence interval is the central part of thedistribution in which 95% of sample means (or other

statistics) lie

Figure C.4 Illustrating feature of the confidenceinterval

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 33

Page 45: Dictionary of statistics

characteristics of the sample are generally ourbest estimate of the characteristics of thatpopulation in the absence of any other infor-mation. However, given that we know thatrandom samples can and do depart from thepopulation characteristics, we need to beaware that the sample probably reflects thepopulation characteristics less than precisely.The confidence interval of the mean stipulatesthe range of values of sample means likelyfrom the population. This, of course, is to baseour estimate on the characteristics of the sampleabout which we have information.

Say the mean age of a sample of 1000 indivi-duals for example is known as (m � 34.5) andthe standard deviation of age is 3.6; then theconfidence intervals are relatively easily cal-culated. Assume we want to know the 95%confidence interval of age. We can estimatethe population mean age at 34.5 (our best esti-mate from the sample information). We thenuse the estimated standard deviation of thesample to calculate the estimated standarderror of samples of a particular size (1000 inthis case). The formula for estimated stan-dard error is merely the estimated standarddeviation/square root of sample size. So thestandard error in this case is

For large samples, 1.96 standard errors aboveand below the population mean cut off themiddle 95% of the distribution. (This isbecause the distribution is the same as for thez distribution.) For smaller sample sizes, thisfigure increases somewhat but can beobtained from the t distribution. So if thestandard error is 0.21 then the 95% confi-dence interval is a distance of

0.21 � 1.96 � 0.412

from the mean. The estimated mean age is34.5 so the 95% confidence interval is 34.5minus 0.412 to 34.5 plus 0.412. That is, the 95%confidence interval is 34.088 to 34.912 years.

This indicates that the researcher should beconfident that the mean age of the populationfrom which the sample came is close to 34.5as the vast majority of the likely samples havemeans very close to this. If the confidenceinterval were 25.0 to 44.5 years then therewould obviously be considerable uncertaintyas to the typical age in the population.

Normally research samples are muchsmaller so instead of using the z value of

THE SAGE DICTIONARY OF STATISTICS34

Another 47.5% of the sample means arebetween the down arrow and the right

up arrow

95% confidence interval is the central part of the distribution in which 95% ofsample means (or other statistics) lie.That is, between the two up arrows

47.5% of sample means willbe in the area between theleft up arrow and the central

down arrow

The distribution of sample means is estimated from the standard deviation of the known sample (s) from which

the standard error can be calculated.

Figure C.5 Illustrating the 95% confidence interval for the mean

3.6 3.6standard error �

�1000 �

17.321 � 0.21

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 34

Page 46: Dictionary of statistics

1.961 we need to use the appropriate z valuewhich is distributed by degrees of freedom(usually the sample size minus one). Tables ofthe t distribution tell us the number of stan-dard errors required to cover the middle 95%or middle 99% of sample means. Table C.8gives the 95% and the 99% values of the t dis-tribution for various degrees of freedom.

Broadly speaking, the confidence intervalis the opposite of the critical value in signi-ficance testing. In significance testing, weidentify whether the sample is sufficiently dif-ferent from the population in order to place itin the extreme 5% of sample means. In confi-dence intervals, we estimate the middle 95%of sample means which are not significantlydifferent from the population mean.

There is not just one confidence intervalbut a whole range of them depending onwhat value of confidence (or non-significance)is chosen and the sample size.

confirmatory factor analysis: a form offactor analysis concerned with testinghypotheses about or models of the data.Because it involves assessing how well theo-retical notions about the underlying structureof data actually fit the data, confirmatoryfactor analysis ideally builds on developedtheoretical notions within a field. For exam-ple, if it is believed that loneliness is com-posed of two separate components – actuallack of social contact and emotional feelingsof not being close to others – then we have atwo-component model of loneliness. Thequestion is how well this model accounts fordata on loneliness. So we would expect to beable to identify which items on a lonelinessquestionnaire are measuring these differentaspects of loneliness. Then using confirma-tory factor analysis, we could test how wellthis two-component model fits the data. A

relatively poor fit of the model to the datamight indicate that we need to consider thata third component may be involved – per-haps something as simple as geographicalisolation. This new model could be assessedagainst the data in order to see if it achieves abetter fit. Alternatively, one might explore thepossibility that there is just one component ofloneliness, not two.

Perhaps because of the stringent intellec-tual demands of theory and model develop-ment, much confirmatory factor analysis ismerely the confirmation or otherwise of thefactor structure of data as assessed usingexploratory factor analysis. If the factorsestablished in exploratory factor analysis canbe shown to fit a data set well then the factorstructure is confirmed.

Although there are alternative approaches,typically confirmatory factor analysis is car-ried out with a structural equation modellingsoftware such as LISREL or EQS. There arevarious indices for determining the degree offit of a model to the data. See also: structuralequation modelling

Cramer (2003)

confounding variable: a factor which mayaffect the dependent and independent vari-ables and confuse or confound the apparentnature of the relationship between the inde-pendent and dependent variables. In moststudies, researchers aim to control for theseconfounding variables. In any study there ispotentially an infinite variety of possible con-founding variables. See partial correlation;semi-partial correlation coefficient

constant: a value that is the same in a parti-cular context. Usually, it is a term (component)

CONSTANT 35

Table C.8 Example of values of t to calculate the 95% and 99% confidence intervals

N == 6 N == 10 N == 15 N == 30 N == 50 N == 100 N == 1000

95% 2.57 2.26 2.13 2.04 2.01 1.98 1.9699% 4.03 3.25 2.98 2.74 2.68 2.63 2.58

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 35

Page 47: Dictionary of statistics

of a statistical formula which maintains afixed value for all cases in the analysis. In aregression equation, the intercept and theregression coefficients are constants for a par-ticular set of cases and variables in so far thatthey remain the same. The values of the vari-ables, however, are not constant as they willvary. See also: criteria variable; predictorvariable

construct validity: the extent to which ameasure assesses the construct that it isintended or supposed to measure. For exam-ple, we would expect a measure of the con-struct of how anxious you were at a particularmoment would vary appropriately accordingto how anxious you were. So you wouldexpect this measure to show greater anxietybefore an exam than a few hours after it. If ameasure does not appear to show constructvalidity, the measure may be invalid or ourideas about the construct may be wrong orboth.

contingency coefficient: a sort of correla-tion coefficient applied to cross-tabulationtables. For such cross-tabulation tablesinvolving just two variables (e.g. Table C.9),the chi-square test can be used to comparesamples in order to assess whether the distri-butions of frequencies in the samples are dif-ferent. But it is equally an indication of howrelated the two variables are. Table C.9 indi-cates how the same contingency table can bereformulated in alternative ways. As can beseen, what is conceived as a sample is notcrucial. If there is a zero or non-significantvalue of chi-square, then the samples are notdifferent in terms of the distributions in thedifferent categories. If the value of chi-squareis significant, this is an indication that thesamples differ in their distributions of the cat-egories. This also means that there is a corre-lation between the two variables. See also:correlation

Another way of putting this is that there isa correlation between the variable ‘sample’and another variable ‘category’. In other

words, there is not a great difference betweena test of differences and a correlation. Chi-square is often referred to a test of associa-tion. The problem with using chi-square as atest of association or correlation is that itsnumerical values differ greatly from the �1through 0 to �1 range characteristic of, say,the Pearson correlation coefficient. Pearsonoffered a simple formula by which a chi-square value could be converted into a corre-lation coefficient known as the contingencycoefficient. This is simply

N is the total number of frequencies in thecontingency table and �2 is the value of chi-square for the table. The significance of thecontingency coefficient is the same as the con-stituent chi-square value.

The contingency coefficient simply doesnot match the Pearson correlation coefficientas it cannot take negative values and itsupper bound is always less than 1.00.

Cramer (1998)

contingency or cross-tabulation table:usually shows the frequency of cases accord-ing to two or more categorical variables.Examples of such tables are to be found in

THE SAGE DICTIONARY OF STATISTICS36

Table C.9 A contingency table expressed inthree different forms

High jumpers Long jumpers Hurdlers

Males 27 33 15Females 12 43 14

Sample 1 Sample 2 Sample 3

Males 27 33 15Females 12 43 14

High jumpers Long jumpers Hurdlers

Sample 1 27 33 15Sample 2 12 43 14

�2

contingency coefficient � ��2 � N

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 36

Page 48: Dictionary of statistics

Tables C.2 and C.9. A table displaying twovariables is called a two-way table, three vari-ables three-way, and so on. It is so calledbecause the categories of one variable arecontingent on or tabulated across the cate-gories of one or more other variables. Thistable may be described according to the num-ber of categories in each variable. For exam-ple, a 2 � 2 table consists of two variableswhich each have two categories. A 2 � 3 tablehas one variable with two categories andanother variable with three categories. Seealso: Cramer’s V; log–linear analysis;marginaltotals

continuity correction: see Yates’scorrection

continuous data: scores on a variable thatcould be measured in minutely incrementalamounts if we had the means of measuringthem. Essentially it is a variable which hasmany different values. For example, heightis continuous in that it can take an infinitenumber of different values if we have a suffi-ciently accurate way of measuring it. Otherdata cannot take many different values andmay just take a number of discrete values. Forexample, sex only has two values – male andfemale.

contrast: a term used to describe theplanned (a priori) comparisons between spe-cific pairs of means in analysis of variance.See also: a priori tests; analysis of variance;post hoc tests; trend analysis in analysis ofvariance

control condition or group: a compari-son condition which is as similar as possibleto the principal condition or treatmentunder study in the research – except that thetwo conditions differ in terms of the key

variable. In most disciplines, the controlconditions consist of a group of participantswho are compared with the treatmentgroup. Without the control group, there is noobvious or unambiguous way of interpret-ing the data obtained. In experimentalresearch, equality between the treatmentgroup and control group is usually achievedby randomization.

For example, in a study of the relativeeffectiveness of Gestalt therapy and psycho-analysis, the researchers may assign at ran-dom participants to the Gestalt therapytreatment and others to the psychoanalysistreatment condition. Any comparisons basedon these data alone simply compare the out-comes of the two types of therapy condition.This actually does not tell us if either is betterthan no treatment at all. Typically, controlgroups receive no treatment. It is the out-comes in the treatment group compared withthe control group which facilitate interpreta-tion of the value of treatment. Comparedwith the control group, the treatment may bebetter, worse or make no difference.

In research with human participants, con-trol groups have their problems. For onething it is difficult to know whether to putmembers of the control group through noprocedure at all. The difficulty is that they donot receive the attention they would if theywere, say, brought to the laboratory andclosely studied. Similarly, if the study was ofthe effects of violent videos, what should bethe control condition? No video? A videoabout rock climbing? Or what? Each of thelisted alternatives has different implications.See also: quasi-experiments; treatmentgroup

controlling: usually refers to eliminating theinfluence of covariates or third variables. Ourresearch may suggest that there is a relation-ship between variable A and variable B.However, it may be that another variable(s) isalso correlated with variable A and variable B.Because of this, it is possible that the relation-ship between variable A and variable B isthe consequence of variable C. Controllinginvolves eliminating the influence of variable

CONTROLLING 37

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 37

Page 49: Dictionary of statistics

C from the correlation of variable A andvariable B. This may be achieved through theuse of the partial correlation coefficient thoughthere are other ways of controlling for variable C.Without controlling, it is not possible to assesswhat the strength of the relationship of vari-able A and variable B would be without anyinfluence of variable C. Controlling sometimesrefers to the use of a control group. See also:controlling; partial correlation

convenience sample or sampling: a sampleor method of sampling in which cases areselected because of the convenience of access-ing them and not because they are thought tobe representative of the population. Unlesssome form of representative or random sam-pling has been employed, most samples areof this nature.

convergent validity: the extent to which ameasure is related to other measures whichhave been designed to assess the same con-struct. For example, a measure of anxietywould be expected to be related to other mea-sures of anxiety.

coordinates: the position a case has on twoor more axes is its coordinates. This is muchas geographical location may be defined bythe coordinates of latitude and longitude. Instatistics, an example of coordinates would bethe lines of a scatterplot. These allow the pre-cise position of an individual case on the twovariables to be pinpointed.

correlated groups or measures design:see within-subjects design

correlated groups or samples: seematching; within-subjects design

correlation or correlation coefficient:an index of the linear or straight line relation-ship between two variables which can beordered. Correlations can be either positive ornegative. A positive correlation indicates thathigh scores on one variable go with highscores on the other variable. A negative corre-lation means that high scores on one variablego with low scores on the other variable. Thesize of a correlation can vary from a mini-mum of �1.00 through 0 to a maximum of1.00. The bigger the size of the correlation,regardless of whether it is positive or nega-tive, the stronger the linear relationshipbetween the two variables. A correlation of 0or close to 0 means that there is either no orno linear relationship between the two vari-ables. There may, however, be a curvilinearrelationship between the two variables.Consequently, when the correlation is 0 orclose to 0, it is useful to draw a scatter dia-gram which is a graph of the relationshipbetween the two variables.

Correlations which differ substantiallyfrom 0 are statistically significant. The biggerthe correlation, regardless of its sign, themore likely it is to be statistically significant.The bigger the sample, the more likely it isthat a correlation will be significant. Verysmall correlations may be statistically signifi-cant provided that the sample is big enough.

There are various kinds of correlations.The most widely used is Pearson’s productmoment correlation coefficient. This name isusually shortened to Pearson’s correlationand is symbolized as r. It can be calculated bymultiplying the standardized scores of thetwo variables to obtain their ‘product’. Theseproducts are then summed and divided bythe number of cases minus one to give themean population estimate of the products.A product moment is the expected or meanvalue of a product of two variables. APearson’s correlation is the same as a stan-dardized regression coefficient. It is used todetermine the linear relationship betweentwo variables which are normally distributed.Pearson’s correlation can be strongly affectedby extreme scores or outliers. Consequently,if the scores are not normally distributed, thescores can be ranked and a Pearson’s correla-tion carried out on these ranked scores. This

THE SAGE DICTIONARY OF STATISTICS38

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 38

Page 50: Dictionary of statistics

type of correlation is known as Spearman’srank order correlation coefficient. This nameis usually shortened to Spearman’s correla-tion or rho and is symbolized with the Greekletter �. A Pearson’s correlation between adichotomous variable (such as sex) and a nor-mally distributed variable may be called thepoint–biserial correlation. A Pearson’s corre-lation between two dichotomous variables iscalled phi.

Squaring a Pearson’s correlation gives thecoefficient of determination. This provides aclearer indication of the meaning of the sizeof a correlation as it gives the proportion ofthe variance that is shared between two vari-ables. For example, a correlation of 0.50means that the proportion of the varianceshared between the two variables is 0.25(0.502 � 0.25). These proportions are usuallyexpressed as a percentage, which in this case,is 25% (0.25 � 100% � 25%). The percentageof shared variance for 11 correlations whicheach differ by 0.10 is shown in Table C.10.

We can see that as the size of the correlationincreases, the percentage of shared variancebecomes disproportionately larger. Althougha correlation of 0.60 is twice as large as a cor-relation of 0.30, the percentage of varianceaccounted for is four times as large.

Correlations of 0.80 or above are usuallydescribed as being large, strong or high andmay be found for the same variable (such asdepression) measured on two occasions twoweeks apart. Correlations of 0.30 or less are

normally spoken of as being small, weak orlow and are typically found for different vari-ables (such as depression and social support).Correlations between 0.30 and 0.80 are typi-cally said to be moderate or modest and areusually shown for similar measures (such asmarital support and marital satisfaction).

Also shown in this table is the minimumsize that the sample needs to be for aPearson’s correlation to be statistically signifi-cant at the two-tailed 0.05 level. As the corre-lations become smaller, the size of the samplethat is required to reach this level of statisticalsignificance increases. For example, a veryweak correlation of 0.10 will be statisticallysignificant at this level with a sample of 400or more.

The statistical significance of the correla-tion can be calculated by converting it into at value using the following formula (thoughthere are tables readily available or signi-ficance may be obtained from computeroutput):

The statistical significance of the t value canbe looked up in a table of its critical valuesagainst the appropriate degrees of freedomwhich are the number of cases (N) minusone.

Other types of correlations for two vari-ables which are rank ordered are Kendall’stau a, tau b and tau c, Goodman and Kruskall’sgamma and tau, and Somer’s d.

Measures of association for categoricalvariables include the contingency coefficient,Cramér’s V and Goodman and Kruskal’slambda. See also: simple or bivariate regres-sion; unstandardized partial regressioncoefficient

Cramer (1998)

correlation line: a straight line drawnthrough the points on a scatter diagram of thestandardized scores of two variables so that itbest describes the linear relationship betweenthese variables. If the scores were not

CORRELATION LINE 39

Table C.10 Correlations, percentage ofshared variance and minimumsize of sample for the two-tailed0.05 statistical significance

Percentage of Size of Correlation shared variance sample

1.00 100 30.90 81 60.80 64 90.70 49 110.60 36 150.50 25 220.40 16 260.30 9 440.20 4 970.10 1 4000.00 0

N � 2t � �r �

1 � r2

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 39

Page 51: Dictionary of statistics

standardized, this would simply be theregression line.

The vertical axis of the scatter diagram isused to represent the standardized values ofone variable and the horizontal axis the stan-dardized values of the other variable. Todraw this line we need two points. The meanof a standardized variable is zero. So onepoint is where the mean of one variable inter-sects with the mean of the other variable. Towork out the value of the other point we mul-tiply the correlation between the two vari-ables by the standardized value of one variable,say that represented by the horizontal axis, toobtain the predicted standardized value of theother variable. Where these two values intersectforms the second point. See also: standardizedscore; Z score

correlation matrix: a symmetrical tablewhich shows the correlations between a set ofvariables. The variables are listed in the firstrow and first column of the table. The diagonalof the table shows the correlation of each vari-able with itself which is 1 or 1.00. (In factoranalysis, these values in the diagonal may bereplaced with communalities.) Because theinformation in the diagonal is always the same,it may be omitted. The values of the correlationsin the lower left-hand triangle of the matrix arethe mirror image of those in the upper right-hand triangle. Because of this, the values in theupper right-hand triangle may be omitted.

correlations, averaging: when there ismore than one study which has reported thePearson’s correlation between the same twovariables or two very similar variables, it maybe useful to compute a mean correlation forthese studies to indicate the general size ofthis relationship. This procedure may be car-ried out in a meta-analysis where the resultsof several studies are summarized. If the sizeof the samples was exactly the same, wecould simply add the correlations togetherand divide by the number of samples. If thesize of the samples are not the same but sim-ilar, this procedure gives an approximation of

the average correlation because the differentsample sizes are not taken into account. Themore varied the size of the samples, thegrosser this approximation is.

The varying size of the samples is takeninto account by first converting the correla-tion into a z correlation. This may be done bylooking up the appropriate value in a table orby computing it directly using the followingformula where loge stands for the natural log-arithm and r for Pearson’s correlation:

This z correlation needs to be weighted bymultiplying it by the number of cases in thesample minus three. The weighted z correla-tions are added together and divided by thesum for each sample of its size minus three.This value is the average z correlation whichneeds to be converted back into a Pearson’scorrelation. This can be done by looking upthe value in the appropriate table or using thefollowing formula:

The calculation of the average Pearson’s cor-relation is given in Table C.11 for three corre-lations from different-sized samples. Notethat Pearson’s correlation is the same as the zcorrelation for low values so that when thevalues of the Pearson’s correlation are low agood approximation of the average correla-tion can be obtained by following the sameprocedure but using the Pearson’s correlationinstead of the z correlation. The average z cor-relation is 0.29 which changed into aPearson’s correlation is 0.28.

Cramer (1998)

correlations, testing differences: nor-mally, we test whether a correlation differssignificantly from zero. However, there arecircumstances in which we might wish toknow whether two different correlation coef-ficients differ from each other. For example, itwould be interesting to know whether thecorrelation between IQ and income for men isdifferent from the correlation between IQ and

THE SAGE DICTIONARY OF STATISTICS40

Zr � 0.5 � loge [(1 � r)/(1 � r)]

r � (2.7182 � zr �1)/(2.7182 � zr � 1)

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 40

Page 52: Dictionary of statistics

income for women. There are various tests todetermine whether the values of two correla-tions differ significantly from each other.Which test to use depends on whether thecorrelations come from the same or differentsamples and, if they come from the same sam-ple, whether one of the variables is the same.

The z test determines whether the correla-tion between the two variables differsbetween two unrelated samples. For exam-ple, the z test is used to see whether the cor-relation between depression and socialsupport differs between women and men.

The Z2* test assesses whether the correla-tion between two different variables in thesame sample varies significantly. Forinstance, the Z2* test is applied to determinewhether the correlation between depressionat two points differs significantly from thecorrelation between social support at thesame two points.

The T2 test ascertains whether the correla-tions in the same sample which have a vari-able in common differ significantly. Forexample, the T2 test is employed to find outwhether the correlation between social sup-port and depression varies significantly fromthat between social support and anxiety.

Cramer (1998)

count: another word for frequency or totalnumber of something, especially common incomputer statistical packages. In otherwords, it is a tally or simply a count of thenumber of cases in a particular category ofthe variable(s) under examination. In thiscontext, a count would give the number ofpeople with blue eyes, for example. Generally

speaking, counts need statistics which areappropriate to nominal (or category or cate-gorical) data such as chi-square. See also:frequency

When applied to a single individual, thecount on a variable may be equated to a score(e.g. the number of correct answers an indi-vidual gave to a quiz).

counterbalanced designs: counterbalanc-ing refers to one way of attempting to nullifythe effects of the sequence of events. It usesall (or many) of the various possiblesequences of events. It is appropriate inrelated (matched) designs where more thanone measure of the same variable is taken atdifferent points in time. Imagine a relateddesigns experiment in which the same groupof participants is being studied in the experi-mental and control conditions. It would becommon practice to have half of the partici-pants serve in the experimental conditionfirst and the other half in the control condi-tion first (see Table C.12).

In this way, all of the possible orders ofrunning the study have been used. This is avery simple case of a group of designs calledthe Latin Square design which allow partici-pants to serve in a number of different condi-tions while the number of different orders inwhich the conditions of the study are run ismaximized.

There are a number of ways of analysingthese data including the following:

• Combine the two experimental conditions(group 1 � group 2) and the two control

COUNTERBALANCED DESIGNS 41

Table C.11 Averaging Pearson’s correlations

Sample r zr Size (N) N �� 3 zr ×× (N �� 3)

1 0.38 0.40 33 30 0.40 × 30 � 12.002 0.29 0.30 53 50 0.30 × 50 � 15.003 0.20 0.20 43 40 0.20 × 40 � 8.00

Sum 120 35.00Mean zr 35.00/120 � 0.29Mean r 0.28

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 41

Page 53: Dictionary of statistics

conditions (group 1 � group 2). Thenthere is just an experimental and controlcondition to compare with a related t testor similar technique.

• The four conditions could be comparedusing two-way analysis of variance(mixed design). Keeping the data in theorder of Table C.12, then differencesbetween the experimental and controltreatment will be indicated by a signifi-cant interaction. Significant main effectswould indicate either a general differencebetween group 1 and group 2, or an ordereffect if it is between first and second. Thisapproach is probably less satisfactory thanthe next given the problems of interpret-ing main effects in ANOVA.

• Alternatively, the table could be rearrangedso that the two experimental conditionsare in the first column and the twocontrol groups are in the second column.This time, the main effect comparing theexperimental condition with the controlcondition would indicate an effect of con-dition. An interaction indicates that orderand condition are having a combinedeffect.

These different approaches differ in terms oftheir directness and statistical sophistication.Each makes slightly different assumptionsabout the nature of the data. The main differ-ence is that some allow the issue of ordereffects to be highlighted whereas others ignoreorder effects assuming that the counter-balancing has been successful.

Another context in which it is worth con-sidering counterbalancing is when samples ofindividuals are being measured twice on thesame variable. For example, the researchermay be interested in changes of children’s IQsover time. To give exactly the same IQ test

twice would encourage the claim that prac-tice effects are likely to lead to increases in IQat the second time of measurement. By usingtwo different versions (forms) of the test thiscriticism may be reduced. As the two differ-ent versions may vary slightly in difficulty, itwould be wise to give version A to half of thesample first followed by version B at the sec-ond administration, and also give version Bfirst to the other half of the sample followedby version A. Of course, this design will nottotally negate the criticism of practice effects.There are more complex designs in whichsome participants do not receive the first ver-sion of the test which may be helpful in deal-ing with this issue.

In counterbalanced designs, practicalitieswill sometimes intervene since there may betoo many different orders to be practicable.Partial counterbalancing may be the onlypractical solution. See also: matching

covariance: the variance that is sharedbetween two variables. It is the sum of prod-ucts divided by the number of cases minusone. The product is the deviation of a scorefrom the mean of that variable multiplied bythe deviation of the score from the mean ofthe other variable for that case. In otherwords, covariance is very much like variancebut is a measure of the variance that two dif-ferent variables share. The formal formula forthe covariance is

Cramer (1998)

covariance, analysis of: see analysis ofcovariance

covariate: a variable that is related to(usually) the dependent variable. Since thecovariate is correlated with the other variable,

THE SAGE DICTIONARY OF STATISTICS42

Table C.12 Example of a simple,counterbalanced design

Group 1 Experimental condition Control conditionfirst second

Group 2 Control condition Experimental condition first second

covariance [X with Y] ���X � XX� � �Y � Y� �

N

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 42

Page 54: Dictionary of statistics

it is very easy for the effects of the covariate tobe confused with the effects of the other vari-able. Consequently, the covariate is usuallycontrolled for statistically where this is a prob-lem. Hence, controlling for the covariant is animportant feature of procedures such as analy-sis of covariance and hierarchical multipleregression. See also: analysis of covariance

Cramer’s V: also known as Cramer’s phi.This is a correlation coefficient applied to acontingency or cross-tabulation table. Assuch, it is similar in application to the contin-gency coefficient. It can be used for any two-variable contingency table. For a 2 � 2contingency table, Cramer’s V gives exactlythe same value as the phi coefficient.

The formula for the statistic is

Where �2 is the chi-square value for the table,N is the total number of cases in the table, andS is the smaller of the number of columns orthe number of rows. Applying the procedureto the data in Table C.13 which has a �2 valueof 12.283, a total N of 139, and the smaller ofthe number of rows or columns is 2:

Treating this as roughly analogous toPearson’s correlation coefficient, the value ofCramer’s V suggests a modest correlationbetween leisure activity and nationality.There is an associated probability level but itis recommended that Cramer’s V is calcu-lated using a computer statistical package.See also: correlation

Cramer (1998)

criterion variable: the variable whose valuesare being predicted or understood on the basisof information about related variables. So, if a

researcher wishes to predict the recidivism ofan offender, the criterion variable would be ameasure of recidivism. In other words, the cri-terion variable is the dependent variable of thisanalysis (Figure C.6). The predictor variablemight be use of illegal drugs. See also: logisticregression; regression equation

The criterion variable is also known as thedependent variable though this is a moregeneral term which does not have the impli-cation of prediction.

critical values: in tests of significance suchas the t test, F ratio, etc., certain values of t or Fform the lower boundary for a particularsignificance level. These are the values whichare tabulated in tables of significance. So,from tables of the t distribution, with degreesof freedom of 10 and a two-tailed significanceof 0.05 (i.e. 5%), the minimum value of t to besignificant at this level is 2.228. It clearlysimplifies assessing statistical significance ifimportant levels of significance are providedfor each size of sample. Conventional statisti-cal tables found in statistics textbooks arereally tables of such critical values. However,with the widespread use of powerful com-puter programs it is more common to haveavailable from the printout the exact statisti-cal significance of the same statistical tests.On balance, it is probably better in general to

CRITICAL VALUES 43

�2

Cramer’s V � �N � (S � 1)

Table C.13 A contingency table showingrelationship between nationalityand favourite leisure activity

Sport Social Craft

Australian 29 16 19British 14 22 39

Independent variablePredictor variable

X variable

Dependent variableCriterion variablePredicted value

Y variable

Figure C.6 Synonyms for independent and criterionvariable

Cramer’s V �12.283

� 12.283�139 � (2−1) �139 � 1

� �0.0884 � 0.297

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 43

Page 55: Dictionary of statistics

report the exact significance than the criticalvalues since these indicate extreme differencesmuch smaller than available tables of criticalvalues can. See also: analysis of variance;chi-square; critical values; trend analysis inanalysis of variance

Cronbach’s alpha reliability: see alphareliability

cross-lagged correlations: the pattern ofcorrelations between two variables which areboth measured at two separate points in time.It is consequently often employed in theanalysis of panel studies. Imagine the twovariables are the amount of TV viewed andaggressiveness. These could be measuredwhen a child is 8 years old and again whenthe child is 15 years old. So really we havefour variables: aggression age 8, aggressionage 15, viewing age 8 and viewing age 15.There are a number of correlation coefficientsthat could be calculated:

• Aggression age 8 with viewing age 8• Aggression age 15 with viewing age 15• Aggression age 8 with aggression age 15• Viewing age 8 with viewing age 15• Aggression age 8 with viewing age 15• Aggression age 15 with viewing age 8

The first two correlations are regular correlationcoefficients measured at a single point in time.The third and fourth correlations are laggedcorrelations since there is a time lag in collectingthe first and second measures. The fifth andsixth correlations are cross-lagged correlations.This means that there is a time lag in their col-lection but also the correlation crosses over tothe other variable as well as being lagged.

The point of this is that it may help assesswhether or not there is a causal relationshipbetween TV viewing and aggression. The cor-relation between aggression age 8 and view-ing age 8 does not establish a causal link. Itcould be that TV viewing causes the child tobe aggressive but it could equally be that

aggressive children just choose to watch moreTV. However, what if we found that there wasa stronger correlation between aggression atage 15 and viewing at age 8 than there wasbetween aggression at age 8 and viewing atage 8? This would tend to suggest that view-ing at age 8 was causing aggression at age 15.By examining the strengths of the variouscorrelations, some researchers claim to beable to gain greater confidence on issues ofcausality.

cross-lagged panel correlation analysis:a method which tries to ascertain the tempo-ral predominance or order of two variableswhich are known to be related to one another.For example, we may be interested in know-ing what the temporal relationship is betweenmental health and social support. There arefour possible temporal relationships. Onerelationship is that greater support precedesor leads to greater mental health. Anotherrelationship is that greater mental health pre-cedes or brings about greater support. A thirdrelationship is that both support and mentalhealth affect each other. A fourth relationshipis that the apparent relationship between thetwo variables is spurious and is due to one ormore other variable which is related to thetwo variables.

The two variables need to refer to the sameperiod of time and to be measured at least attwo points in time as depicted in Figure C.7.If the cross-lagged correlation between sup-port at time 1 and mental health at time 2 issignificantly more positive than the cross-lagged correlation between mental health attime 1 and support at time 2, this differenceimplies that greater support precedes orbrings about greater mental health. If thecross-lagged correlation between mentalhealth at time 1 and support at time 2 is signi-ficantly more positive than the cross-laggedcorrelation between support at time 1 andmental health at time 2, this differenceimplies that greater mental health precedes orbrings about greater support. If there is nosignificant difference between the cross-lagged correlations (one or both of whichmust be significant), the relationship between

THE SAGE DICTIONARY OF STATISTICS44

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 44

Page 56: Dictionary of statistics

the two variables is either reciprocal or spurious.These interpretations depend on there beingno significant difference between the syn-chronous correlations at time 1 and 2 and thetwo auto- or test–retest correlations.

It is generally recommended that a moreappropriate method of analysing this kind ofdesign is structural equation modelling.

Kenny (1975); Rogosa (1980)

cross-sectional design or study: a studywhere measures have been obtained at across-section of, or single point in, time. Insuch studies it is not possible to infer anychanges over time or the causal nature of anyrelationship between the variables. For exam-ple, if we had a sample of individuals whichvaried in age from 18 to 75 and we found thatolder people were more conservative in theirpolitical views, we could not conclude thatpeople become more conservative as they

become older, because it could be that theolder group underwent experiences whichmade them more conservative which maynot be the case for the younger group.

cross-tabulation or cross-tabulationtables: see contingency table; log–linearanalysis; marginal totals

cube or cubed: in statistics used to refer toa number multiplied by itself three times. Forexample, 2.7 cubed is written as 2.73 andmeans 2.7 � 2.7 � 2.7 � 19.683. Another wayof putting it is to say 2.7 to the power of 3. Itis also 2.7 to the exponent of 3. It is occasion-ally met with in statistical calculationsthough not the common ones.

cumulative frequencies: frequencieswhich accumulate by incorporating earliervalues in the range. So in a class of childrenwhose ages range from 5 to 11 years, a cumu-lative frequency distribution would be 5 yearolds; 5 and 6 year olds; 5, 6 and 7 year olds;and so forth. So it answers the question howmany children are there up to age 8 years? Afrequency table would tell you the number of8 year olds.

From Table C.14 the number of 6 year oldsis 20 (i.e. 32 � 12). Also in this table, it shouldbe noted that there are no 11 year olds (sincethe cumulative frequency does not increaseover the final two categories). The cumula-tive frequency can also be expressed ascumulative percentage frequencies.

It is possible to have a bar chart whichaccumulates in this fashion. See also:Kolmogorov–Smirnov test for one sample

curvilinear relationship: mostly in thesocial sciences the relationships between twovariables are assumed to be linear or straightline in form. However, relationships can take

CURVILINEAR RELATIONSHIP 45

Support

Mental health

Support

Mental health

Time 1 Time 2

Figure C.7 A two-wave two-variable panel design

Table C.14 Example of a cumulativefrequency table

Age range Cumulative frequency

5 year olds 125–6 year olds 325–7 year olds 465–8 year olds 575–9 year olds 625–10 year olds 715–11 year olds 71

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 45

Page 57: Dictionary of statistics

the form of curved lines – that is, the best fit-ting line through the points of the scattergrambetween two variables is a curved shape suchas a U-shape or an inverted U-shape. Figure C.8shows a linear relationship and Figure C.9shows a curvilinear relationship.

Just as with a linear relationship, the close-ness of the points to the curved line is the

important factor in determining the size ofthe correlation: the more the spread, thelower the correlation.

The usually recommended correlationcoefficient for curvilinear data is eta, which isrelated to ANOVA. See also: correlation; eta

THE SAGE DICTIONARY OF STATISTICS46

Figure C.9 Curvilinear relationship

Figure C.8 Linear relationship

Cramer Chapter-C.qxd 4/22/04 5:14 PM Page 46

Page 58: Dictionary of statistics

data: (plural of datum) the information orfacts about something. Statistics employ datawhich have in some way been expressed in anumerical form as frequencies or scores. Onecrucial step in the successful application ofstatistics is to understand and recognize thedifference between score data and nominal orcategorical data. Score data collect informa-tion from each case or participant in the formof a numerical value which indicates thequantity of a characteristic. Nominal datainvolve the assignment of individuals to cat-egories. The number of individuals placed ineach category is known as the frequency.Statistical analysis of nominal data is largelythe analysis of these frequencies.

deciles: like percentiles except are the cut-offpoints for the bottom 10%, 20%, 30%, etc., ofscores. Hence, the fifth decile or the 50%decile is the score which separates the lowest50% of scores from higher ones. A decile ispresented as a specific score though some-times it is an estimated value, not an actualvalue. See also: percentiles

decimal places (number of ) : decimalsare attractive since they add an appearance ofprecision. Unfortunately, given the nature ofmuch of the data in many disciplines, thisprecision is spurious or more apparent thanreal. It is a matter of judgement, but for data

collected on whole-number scales, it isusually sufficient to report statistics such asthe mean or standard deviation to two decimalplaces at the most, and one decimal place willsuffice in most circumstances. With data col-lected with a greater degree of accuracy (i.e.involving decimals), the reporting of meansand standard deviations to one more decimalplace than in the data is again probably suffi-cient accuracy.

Calculations, of course, should proceedusing as many decimals as the means of cal-culation permits. Computers and calculatorscan be safely left to work out decimals appro-priately. Systematic rounding errors canoccur in hand calculations if too few decimalplaces are used, which produce misleadingoutcomes. A minimum of three more decimalplaces than present in the data would seem avery safe rule of thumb.

decimal places (rounding): see roundingdecimal places

decimals: a convenient way of writing frac-tions or fractions of numbers without usingthe fraction notation. Fractions are simplyparts of whole numbers such as a half, a fifth,a tenth, a seventeenth, a hundredth and athousandth – or two-fifths or three-fifths andso forth. The simplest way of writing fractionsis 2

5 or 1125 though this is not used generally in

D

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 47

Page 59: Dictionary of statistics

statistics. For many purposes this sort ofnotation is fine. The difficulty with it is thatone cannot directly add fractions together.

Decimals are a notational system forexpressing fractions of numbers as tenths,hundredths, thousands, etc., which greatlyfacilitates the addition of fractions. In the dec-imal system, there are whole numbers fol-lowed by a dot which is then followed by thefraction:

17.5

In the above number, 17 is the whole number,the dot . indicates the beginning that a fractionwill follow and the number after the dot indi-cates the size of the fraction.

The first number after the dot is the num-ber of tenths that there are. In the aboveexample we therefore have five tenths whichis 5

10which simplifies to 1

2or a half.

In the following number we have twonumbers after the dot or decimal point:

160.25

In this example, the whole number is 160 andthe fraction after the decimal point . is 25. Thefirst number 2 is the number of tenths. Thesecond number 5 is the number of hun-dredths. In other words, there are two tenths

and five hundredths. Put more simply, the25 after the decimal place is the number ofhundredths since two tenths ( 2

10) is the same

thing as twenty hundredths ( 20100). A decimal of

0.25 corresponds to 25 hundredths which is aquarter.

The following number has three numbersafter the decimal point:

14.623

Therefore we have 14 and 610 and 2100 plus some-

thing else. The final 3 refers to the number ofthousandths that we have. Thus, in full14.623 � 14 and 6

10 and 2100 and 3

1000. This is actu-ally the same as 14 and 623

1000.The system goes on to the number of ten

thousandths, the number of hundred-thousandths, millionths and beyond.

Figure D.1 illustrates the components ofthe decimal number 423.7591.

The addition and subtraction of decimalsare far simpler than the addition and subtrac-tion of other forms of fractions. Thus,0.53 � 0.22 can be directly added together tobecome 0.75. Similarly with subtraction:0.53 � 0.22 � 0.31. Because we are addingfractions, sometimes the fraction will be big-ger than one. So 0.67 � 0.92 � 1.59.

See also: decimal places (number of);round-ing decimal places

THE SAGE DICTIONARY OF STATISTICS48

Decimal pointindicates start offraction

TwentySeventenths

Ninethousandths

ThreeFourhundred

Fivehundredths One ten

thousandth

423.7591

Figure D.1 The components of a decimal number

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 48

Page 60: Dictionary of statistics

degrees of freedom: a difficult conceptwhich has a bearing on many of the basicinferential statistical techniques. Statisticalestimates based on samples often containredundant information which if includedmay lead to a systematic bias in the estimate.For example, when calculating the varianceor standard deviation of a sample in order toestimate the population characteristics, weknow both the scores in the sample and thesample mean. The sample mean is used toestimate the population mean. However, thismean is also used in the estimation of thevariance or standard deviation of the popula-tion. Take the following set of scores:

2, 5, 7, 9

The mean of this sample is 23/4 � 5.75. Imagine that we wish to calculate the vari-

ance based on this sample. It is basically thesquare of each of the deviations from themean. So we take (2 � 5.75)2, (5 � 5.75)2,(7 � 5.75)2, etc., but a problem emerges whenwe come to the score of 9. It actually containsno new information. Since we know that thesample mean is 5.75 then knowing the firstthree scores allows us to know that the finalscore has to be 9. No other number will fit.So, the final number actually provides noinformation that has not already beenaccounted for. In this case there would bethree degrees of freedom since the final scoreis fixed by the mean and the other threescores.

The calculation method for degrees of free-dom varies according to what is being esti-mated and it is not possible to give generalrules which easily apply to any calculation.Nevertheless, where degrees of freedom haveto be calculated, it is usually a simple matter.Generally speaking, degrees of freedom areonly an issue in relation to hand calculationof statistical tests because tables of criticalvalues are often distributed according to thedegrees of freedom. Computer calculationsrarely if ever require reference to such tablesby the user. See also: analysis of variance;chi-square; estimated standard deviation; tdistribution, t test for unrelated samples;variance estimate

De Moivre’s distribution: another termfor a normal distribution. See also: normaldistribution

dendrogram: a graphical representation ofthe results of a cluster analysis in which linesare used to indicate which variables or clus-ters are paired at which stage of the analysis.

denominator: in a fraction such as 47 or 317 the

denominator is the number at the bottom –that is, 7 in the first example and 17 in thesecond. The numerator is the number at thetop. See also: numerator

dependent groups or samples: seewithin-subjects design

dependent variable: a variable that isassumed to ‘depend’ on, be affected by, orrelated to the value of one or more indepen-dent variables. See also: criterion variable;multiple regression; regression equation

descriptive statistics: a wide variety oftechniques that allow us to describe the gen-eral characteristics of the data we collect. Thecentral tendency (typical score) may beassessed by the mean, median or mode. Theshape or spread of the distribution of scorescan be presented graphically (using his-tograms, for example) and the spread can begiven as the range, standard deviation orvariance. They are key to and essential inknowing the nature of the data collected.They are too easily overlooked by thoseimpressed by the complexity of inferentialstatistics.

Although descriptive statistics seem to bemuch simpler than inferential statistics, they

DESCRIPTIVE STATISTICS 49

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 49

Page 61: Dictionary of statistics

are essential to any data analysis. A commonmistake on the part of novices in statistics isto disregard the descriptive analysis in favourof the inferential analysis. The consequenceof this is that the trends in the data are notunderstood or overlooked and the results ofthe inferential statistical analysis appearmeaningless. A thorough univariate andbivariate examination of the data is an initialand key step in analysing any data. See also:exploratory data analysis

design: see between-subjects design; cross-sectional design or study; factorial design;longitudinal design or study; non-experimental design or study; true experi-ment; within-subjects design

determinant, of a matrix: a value associ-ated with a square matrix which is calculatedfrom it. It is denoted by two vertical lines oneither side of the letter which represents itsuch as |D|. The determinant of a correlationmatrix may vary between 0 and 1. When it is1, all the correlations in the matrix are 0. Thisis known as an identity matrix in which themain or leading diagonal of the matrix (goingfrom top left to bottom right) consists of onesand the values above and below it of zeros.When the determinant is zero, there is at leastone linear dependency in the matrix. Thismeans that one or more columns can beformed from other columns in the matrix.This can be done by either transforming a col-umn (such as multiplying it by a constant) orforming a linear combination (such as sub-tracting one column from another). Such amatrix may be called ill-conditioned.

deviation: the size of the difference betweena score and the mean (usually) of the set ofscores to which it belongs. Deviation is calcu-lated to include the sign – that is, the devia-tion of a score of 6 from the mean of 4 is 2,whereas the deviation of a score of 2 from the

mean of 4 is �2 See also: absolute deviation;difference score

dichotomous variable: a variable havingonly two values. Examples include sex (maleand female being the values), employmentstatus (employed and non-employed beingthe two values) and religion (Catholic andnon-Catholic being the values in the study).There is no assumption that the two valuesare equally frequent.

One reason for treating dichotomous vari-ables as a special case is that they can be treatedas simple score variables. So, if we code male as1 and female as 2, these are values on the vari-able femaleness. The bigger the score, the morefemale the individual is. In this way, a dicho-tomous variable can be entered into an analysiswhich otherwise requires scores.

It also explains that there is a close relation-ship between some research designs that aregenerally seen as looking at mean differencesand those which seek correlations betweenvariables. The t test is commonly applied todesigns where there are an experimentalgroup and a control group which are com-pared on a dependent variable measured asscores. However, if we consider the indepen-dent variable here (group) then it is easy to seethat it can be coded 1 for the control groupand 2 for the experimental group. If that isdone, the 1s and 2s (the independent variable)can be correlated with the scores (the depen-dent variable). This Pearson correlation coeffi-cient gives exactly the same probability valueas does the t test applied to the same data.

This is important since it allows any nomi-nal variable to be recoded as a number ofdichotomous variables. For example, if par-ticipants are given a choice of, say, six geo-graphical locations for their place of birth, itwould be possible to recode this single vari-able as several separate variables. So ifNorthern England is one of the geographicallocations, people could be scored as beingfrom Northern England or not. If anothercategory is South West England then allparticipants could then be scored as beingfrom South West England or not. These areessentially dummy variables.

THE SAGE DICTIONARY OF STATISTICS50

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 50

Page 62: Dictionary of statistics

difference score: in related designs inwhich individuals are measured twice, forexample, a difference score is simply the dif-ference between that individual’s score onone variable and their score on another vari-able. Difference scores can take positive ornegative values. See also: deviation

directional hypothesis: see hypothesis

directionless tests: (a) a test of significancewhich is two tailed (see significant); (b) morelikely to refer to a test of significance whichcan only yield absolute values and hence isnot capable of testing for direction. Examplesof this include chi-square (for bigger than2 � 2 tables) and the F ratio.

discriminant function analysis: identifiesthe pattern of variables which differentiates agroup of cases from other groups of cases. Forexample, imagine the research issue is thechildhood and family factors which differen-tiate criminal adolescents from their peers.The obvious strategy is to collect informationabout, say:

1 The extent of father’s nurturant behaviour2 Family income3 Mother’s IQ

It would be a simple matter to determinewhether delinquents differ from their non-delinquent peers by using the unrelated t testor ANOVA on each variable to see whetherthere is a significant difference on nurturantbehaviour between the two groups, then fam-ily income, and then mother’s IQ. Up to apoint this is fine. Unfortunately it does nothingto establish what are the really importantfactors distinguishing the groups and what isthe best pattern for differentiating betweenthe various groups. Discriminant functionanalysis attempts to rectify these difficultieswith the simpler approach.

In discriminant function analysis there is adependent or criterion variable (the particu-lar group to which an individual actuallybelongs) and a set of independent or predic-tor or discriminating variables (things whichmight differentiate between the groupsinvolved). A discriminant function is aweighted combination of variables whichoptimizes the ability of the predictors to dif-ferentiate between the groups of cases. So adiscriminant function might be

discriminant (function) score � constant �b1x1 � b2x2 � b3x3 � b4x4 � b5x 5 � b6x6

Wilks’s lambda is used to indicate the con-tribution of each of the discriminating vari-ables to the separation of the groups. Thesmaller lambda is the greater the discriminat-ing power of the predictor variable. The dis-criminant function is made up by a weightedcombination of scores on different variables.The bs in the above formula are the weights,and the xs the scores an individual has onparticular measures. So in the above examplethere are six predictor variables. This is essen-tially little different from the formula for mul-tiple regression, except we are predictingwhich group a person with a particular patternwill belong to rather than what score thatperson will get on the dependent variable. Aswith multiple regressions, regression weightsmay be expressed in unstandardized or stan-dardized form. When expressed in standard-ized form, the relative impact of the differentvariables on the classification of individualsto the different groups can be seen muchmore clearly from what are known as stan-darized discriminant coefficients.

If there are more than two groups to differ-entiate, the number of discriminant functionsincreases. This is normally the number ofgroups minus one but can be smaller if thenumber of independent (predictor or dis-criminating) variables is less.

The centroid is the average score on thediscriminant function of a person whois classified in a particular group. For atwo-group discriminant function analysisthere are two centroids. Quite clearly, therehas also got to be cut-off points which sortout which group the individual should

DISCRIMINANT FUNCTION ANALYSIS 51

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 51

Page 63: Dictionary of statistics

belong to on the basis of the discriminantfunction. This is midway between the twocentroids if the group sizes are equal butweighted towards one if they are not.

There is also the important matter of howwell the discriminant function sorts the casesinto their known groups. Sometimes this maybe referred to as the classification table, con-fusion matrix or prediction table. It tabulatesthe known distribution of groups against thepredicted distribution of individuals betweenthe groups based on the independent vari-ables. The better the fit between the actualgroups and predicted groups, the better thediscriminant function.

Discriminant function analysis is easilyconducted using a standard computer pack-age such as SPSS. Despite this, its use is sur-prisingly rare in many disciplines. See also:hierarchical or sequential entry; Hotelling’strace criterion; multivariate normality;Roy’s gcr; stepwise entry;Wilks’s lambda

Cramer (2003)

discriminant validity: the extent to which ameasure of one construct is less stronglyrelated to measures of other constructs thanmeasures of the same one. For example, wewould expect measures of anxiety to be lessstrongly related to measures of intelligencethan other measures of anxiety – that is, ifour measure of anxiety is valid.

dispersion, measure of: a measure orindex of the spread or dispersion of values ina sample. The three most common measuresof dispersion are the range, the variance andthe standard deviation.

distribution: the pattern of characteristics(e.g. variable values). That is, the way in whichcharacteristics (values) of a variable are dis-tributed over the sample or population.Hence, the distribution of scores in a sample ismerely a tally or graph which shows the pat-tern of the various values of the score. The

distribution of gender, for example, would bemerely the number or proportion of malesand females in the sample or population. Seenormal distribution

distribution-free tests: generally used todenote non-parametric statistical techniques.They are distribution free in the sense that noassumptions are made about the distributionof the data in the population from which thesamples are drawn. Parametric tests assumenormality (normal distribution) and symme-try which may make them inappropriate inextreme cases of non-normality of asymmetry.See non-parametric tests; ranking tests

drop-out rate: see attrition

dummy coding: a method for defining adummy variable in which membership of acategory is coded as 1 and non-membershipas 0. It is so called because the zero does notspecify, or is dumb to, the membership of theother categories.

Cohen and Cohen (1983)

dummy variable: a common procedure,where data have been collected as a complexcategory (nominal/categorical/qualitative)variable, is to employ dummy variables inorder to be able to analyse the variable usingregression and other techniques. Imagine thefollowing categories of a variable measuringoccupation:

Construction workForestry workOffice workMedical workOther

Classed as a single variable, this occupationvariable is not readily turned into numerical

THE SAGE DICTIONARY OF STATISTICS52

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 52

Page 64: Dictionary of statistics

scores for the obvious reason that it is anominal variable. The different jobs can beturned into dummy variables by creating sev-eral dichotomous or binary variables. So, forexample, construction work could be turnedinto a dummy variable by simply coding 0 ifthe respondent was not in construction workand 1 if the respondent was in constructionwork. Forestry work similarly would be codedas 0 if the respondent was not in forestry and1 if the respondent was in forestry work.

There is no requirement that every possibledummy variable is created from a nominalvariable. Once the dummy variables have beencoded as 0 or 1 (any other pair of values woulddo in most circumstances), they can be enteredin correlational techniques such as Pearson’scorrelation coefficient (see phi coefficient andpoint–biserial correlation coefficient) or espe-cially regression and multiple regression. Ifentering the dummy variables into multipleregression, one would restrict the number ofdummy variables to one less than the numberof categories in the category variable. This is toavoid the situation in which variables may bestrongly collinear in the regression analysis.Strong collinearity is a problem for the properinterpretation of predictions in multiple regres-sion. A simple example might help explain whywe restrict the number of dummy variables tothe number of categories minus one. Take thevariable gender which has the values male andfemale. We could turn this into two dummyvariables. One for male in which being male iscoded 1 and not being male is coded 0, andanother dummy variable for female in whichbeing female is coded 1 and not being male iscoded 0. There would be a perfect (negative)correlation between the two. Entered into amultiple regression, only one of these dummyvariables would emerge as being predictiveand the other would not emerge as predictive.This is a consequence of the collinearity.

Dummy variables provide a useful meansof entering what is otherwise nominal datainto powerful techniques such as multipleregression. Some confidence is needed tomanipulate data in this way but dummy vari-ables are simple to produce and the benefitsof employing them considerable. See also:analysis of variance; dichotomous variable;effect coding; multiple regression

Duncan’s new multiple range test: a posthoc or multiple comparison test which isused to determine whether three or moremeans differ significantly in an analysis ofvariance. It may be used regardless ofwhether the overall analysis of variance issignificant. It assumes equal variance and isapproximate for unequal group sizes. It is astepwise or sequential test which is similarto the Newman–Keuls method, procedure ortest in that the means are first ordered insize. However, it differs in the significancelevel used. For the Newman–Keuls test thesignificance level is the same for howevermany comparisons there are, while forDuncan’s test it becomes more lenient themore comparisons there are. Consequently,differences are more likely to be significantfor this test.

The significance level is 1 � (1 � �)r � 1

where r is the number of steps the two meansare apart in the ordered set of means. For twoadjacent means, r is 2, for two means sepa-rated by a third mean r is 3, for two meansseparated by two other means r is 4, and soon. For an � or significance level of 0.05, thesignificance level for Duncan’s test is 0.05when r is 2 [1 � (1 � 0.05)2 � 1 � 1 � 0.951 �1 � 0.95 � 0.05]. It increases to 0.10 when r is3 [1 � (1 � 0.05)3 � 1 � 1 � 0.952 � 1 � 0.90 �0.10], to 0.14 when r is 4 [1 � (1 � 0.05)4 � 1 �1 � 0.953 � 1 � 0.86 � 0.14], and so on.

For the example given in the entry for theNewman–Keuls method, the value of thestudentized range is 3.26 when r is 2, 3.39 whenr is 3, and 3.47 when r is 4. These values canbe obtained from a table which is available insome statistics texts such as the source below.Apart from the first value which is the sameas that for the Newman–Keuls method, thesevalues are smaller, which means that the dif-ference between the two means being com-pared does not have to be as large as it doesfor the Newman–Keuls method when r ismore than 2 to be statistically significant. Forthe example given in the entry for theNewman–Keuls method, these differences are5.97 when r is 3 (3.39 � 1.76 � 5.97) and 6.11when r is 4 (3.47 � 1.76 � 6.11) comparedwith 7.11 and 7.97 for the Newman–Keulsmethod respectively.

Kirk (1995)

DUNCAN’S NEW MULTIPLE RANGE TEST 53

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 53

Page 65: Dictionary of statistics

Dunn’s test: the same as the Bonferroni ttest. It is an a priori or multiple comparisontest used to determine which of three ormore means differ significantly from oneanother in an analysis of variance. It controlsfor the probability of making a Type I errorby reducing the 0.05 significance level to takeaccount of the number of comparisons made.It can be used with groups of equal orunequal size.

Dunn–Sidak multiple comparison test:an a priori or multiple comparison test usedto determine which of three or more meansdiffer significantly from one another in ananalysis of variance. It controls for the prob-ability of making a Type I error by reducingthe 0.05 significance level to take account ofthe number of comparisons made. Itassumes equal variances but can be usedwith groups of equal or unequal size. It is aslight modification of Dunn’s or theBonferroni t test. This revision makes it veryslightly more powerful than this latter test,which practically may make little difference.For Dunn’s test the familywise or experi-mentwise significance level is calculated bydividing the conventional 0.05 level by thenumber of comparisons to be made. So withthree comparisons this significance level is0.0167 (0.05/3 � 0.0167). For the Dunn–Sidaktest the comparison level is computed withthe following formula where C is the num-ber of comparisons:

1 � (1 � 0.05) 1/C

For three comparisons this level is 0.0170[1 � (1 � 0.05)1/3 � 1 � 0.951/3 � 1 � 0.9830 �0.0170]. The difference between these two sig-nificance levels is only 0.0003. The differencebetween the two significance levels becomesprogressively smaller the greater the numberof comparisons being made.

Kirk (1995)

Dunnett’s C test: an a priori or multiplecomparison test used to determine whetherthe mean of a control condition differs fromthat of two or more experimental conditionsin an analysis of variance. It can be used forequal and unequal group sizes where thevariances are unequal. This test takes accountof the increased probability of making a TypeI error the more comparisons that are made.

Toothaker (1991)

Dunnett’s T3 test: an a priori or multiplecomparison test used to determine whetherthe mean of a control condition differs fromthat of two or more experimental conditionsin an analysis of variance. It can be used forequal and unequal group sizes where thevariances are unequal. This test takes accountof the increased probability of making a TypeI error the more comparisons that are made. Itis a modification of Tamhane’s T2 multiplecomparison test.

Toothaker (1991)

Dunnett’s test: an a priori or multiple com-parison test used to determine whether themean of a control condition differs from thatof two or more experimental conditions in ananalysis of variance. It can be used for equaland unequal group sizes where the variancesare equal. The formula for this test is the sameas that for the unrelated t test where the vari-ances are equal:

However, the probability level of this t testhas been reduced to take account of theincreased probability of making a Type I errorthe more comparisons that are made.Dunnett’s C or T3 test should be used whenthe variances are unequal.

Kirk (1995)

THE SAGE DICTIONARY OF STATISTICS54

group 1 mean � group 2 mean

�(group 1 variance/group 1 n) � (group 2 variance/group 2 n)

Cramer Chapter-D.qxd 4/22/04 2:11 PM Page 54

Page 66: Dictionary of statistics

Ebel’s intraclass correlation: seebetween-judges variance

ecological validity: the extent to which astudy can be seen as representing the ‘real-life’ phenomenon it is designed to investi-gate. For example, the way couples handletheir differences when asked to do so in alaboratory may be different from the waythey would do this outside of this setting intheir everyday lives. In these circumstances,ecological validity is poor.

effect: the influence of one variable onanother variable (though this may be zero). Acommon phrase is to discuss ‘the effect ofvariable X on variable Y’. The term may beused in the sense of a causal effect in whichchanges in one variable result directly orcause changes in the other. However, moregenerally the term is used to refer to theextent of the relationship of one variable withanother – that is, simply their mathematicalrelationship with no implications of socialscience causality. Experimental effect refers tothe influence of the independent variable onthe dependent variable in an experiment.

effect coding: a method for coding nominalvariables in which the two categories or

groups to be identified are coded as 1 and� 1 respectively and any other categories orgroups as 0. It is so called because it deter-mines the effect of the groups or treatmentsbeing compared. It is used in multipleregression.

Cohen and Cohen (1983)

effect size: a term used in meta-analysis andmore generally to indicate the relationshipbetween two variables. The normal implica-tion of the term effect size is that it indicatesthe size of the difference between the meansof the conditions or groups on the dependentvariable. Such an approach does not readilyallow direct comparisons between studiesusing different measuring instruments and soforth. Consequently effect size is normallyreported as a more standardized index suchas Cohen’s d. Alternatively, and probablymuch more usefully, effect size can beexpressed in terms of a Pearson’s correlationcoefficient. This is more intuitively under-stood by researchers with modest statisticalknowledge. The point–biserial correlationand eta may be used to calculate correlationcoefficients where the independent variableis in the form of a small number of nominalcategories. Where both variables are scores,then Pearson’s correlation is the appropriatestatistic. In this way, correlation coefficientswhich are comparable with each other maybe obtained for the vast majority of studies.The effect size expressed in this way, since itis a correlation coefficient, gives the proportion

E

Cramer Chapter-E.qxd 4/22/04 2:12 PM Page 55

Page 67: Dictionary of statistics

of variance explained by the two variables ifthe coefficient is squared. See coefficient ofdetermination. See also: meta-analysis

eigenvalue, in factor analysis: the amount(not percentage) of variance accounted for bythe variables on a factor in a factor analysis. Itis the sum of the squared correlations or load-ings between each variable and that factor.The magnitude of a factor and its ‘signifi-cance’ are assessed partly from the eigen-value. See also: Kaiser’s (Kaiser–Guttman)criterion; scree test, cattell’s

endogenous variable: a variable in a pathanalysis which is assumed to be explained byone or more other variables in the analysis.As a consequence it has one or more arrowsleading either directly or indirectly to it fromother variables.

EQS: (pronounced like the letter X) the nameof one of the computer programs for carryingout structural equation modelling. The nameseems to be an abbreviation for equation sys-tems. Information about EQS can be found atthe following website:http://www.mvsoft.com/index.htmSee also: structural equation modelling

error: the variation in scores which theresearcher has failed to control or measure ina particular study. Once it is measured as anidentifiable variable it ceases to be error. Sovariations in the scores on a measure whichare the consequence of time of day, for exam-ple, are error if the researcher does not appre-ciate that they are due to variations in time ofday and includes time of day as a variable inthe study. Error may be due to poor researchdesign or methodology, but it is not a mistakein the conventional sense. The objective of the

researcher needs to be to keep error to aminimum as far as possible. Error makes theinterpretation of trends in the data difficultsince the greater the error, the greater thelikely variation due to chance or unknownfactors. Many factors lead to increased error –poor measurement techniques such asunclear or ambiguous questions and varia-tions in the instructions given to participants,for instance.

error mean square: the term used inanalysis of variance for dividing the effectmean square to obtain the F ratio. It is theerror sum of squares divided by the errordegrees of freedom. The smaller this term isin relation to the effect mean square, the big-ger the F ratio will be and the more likely thatthe means of two or more of the groups willbe statistically significant.

error sum of squares: the sum of squaresin an analysis of variance which remains afterthe sum of squares for the other effects hasbeen removed. It represents the variancewhich remains to be explained and so may bereferred to as the residual sum of squares. Seealso: analysis of variance

error variance: see within-groups variance

estimated standard deviation: an esti-mate of the ‘average’ amount by which scoresvary from the mean of scores. It is the likelystandard deviation of the population ofscores. It is based on the characteristics of aknown sample from that population. The for-mula for calculating the estimated standarddeviation differs from that of the standarddeviation in that it features a correction forbias in the estimate. This simply reduces thesample size (N) by one, which then produces

THE SAGE DICTIONARY OF STATISTICS56

Cramer Chapter-E.qxd 4/22/04 2:12 PM Page 56

Page 68: Dictionary of statistics

an unbiased estimate. So for the sample ofscores 3, 7 and 8, the formula and calculationof estimated standard deviation is

where X is a score, X—

is the mean score, andN is the number of scores. Thus

Thus, the value of the estimated standarddeviation is 2.65 units. So the average scorein the population differs from the mean of 6by 2.65. The average, however, is not calcu-lated conventionally but this is a helpfulway to look at the concept. However, sincethere are a number of ways of calculatingaverages in statistics (e.g. harmonic mean,geometric mean) then we should be pre-pared for the unusual. The estimated stan-dard deviation is basically the square root ofthe average squared deviation from themean.

Computer packages often compute theestimated standard deviation but neverthe-less and confusingly call it the standarddeviation. Since generally the uses of stan-dard deviation are in inferential statistics,we would normally be working with theestimated standard deviation anywaybecause research invariably involves sam-ples. Hence, the unavailability of standarddeviation in some packages is not a prob-lem. As long as the researcher understandsthis and reports standard deviation appro-priately as estimated when it is, few difficul-ties arise.

estimated variance: see variance estimate,estimated population variance or samplevariance

estimation:when we cannot know somethingfor certain then sometimes it is useful to esti-mate its value. In other words, have a goodguess or an educated guess as to its value.This is an informed decision really ratherthan a guess. In statistics the process of esti-mation is commonly referred to as statisticalinference. In this, the most likely populationvalue is calculated or estimated based on theavailable information from the researchsample(s). Without estimation, research onsamples could not be generalized beyond theindividuals or cases that the research wasconducted on. The process of estimationsometimes does employ some complex calcu-lations (as in multiple regression), but for themost part there is a simple relationshipbetween the characteristics of a sample andthe estimated characteristics of the popula-tion based on this sample.

eta (��): usually described as a correlationcoefficient for curvilinear data for whichlinear correlation coefficients such as Pearson’sproduct moment correlation are not appro-priate. This is a little misleading as etagives exactly the same numerical value asPearson’s correlation when applied to per-fectly linear data. However, if the data are notideally fitted by a straight line then there willbe a disparity. In this case, the value of etawill be bigger (never less) than the corre-sponding value of Pearson’s correlationapplied to the same data. The greater thedisparity between the linear correlationcoefficient and the curvilinear correlationcoefficient, the less linear is the underlyingrelationship.

Eta requires data that can be presented as aone-way analysis of variance. This meansthat there is a dependent variable whichtakes the form of numerical scores. The inde-pendent variable takes one of a number ofdifferent categories. These categories may beordered (i.e. they can take a numerical valueand, as such, represent scores on the inde-pendent variable). Alternatively, the cate-gories of the independent variable maysimply be nominal categories which have nounderlying order.

ETA (η) 57

���X � XX� �2

N � 1

estimated standarddeviation �

� �(3 � 6�2 � (7 � 6)2 � (8 � 6)2estimated standard3 � 1deviation

� ��32 � 12 � 22

� �9 � 1 � 42 2

� �14 � �7 � 2.65

2

Cramer Chapter-E.qxd 4/22/04 2:12 PM Page 57

Page 69: Dictionary of statistics

Data suitable for analysis using eta can befound in Table E.1 and Table E.2. They clearlycontain the same data. The first table could bedirectly calculated as a Pearson’s correlationcoefficient. The correlation coefficient is 0.81.The data in the other table are obviouslyamenable to a one-way analysis of variance.The value of eta can be calculated from theanalysis of variance summary table since it ismerely the following:

Table E.3 gives the analysis of variance sum-mary table for the data in Table E.2 as well asthe calculation of eta for that table. The analy-sis of variance is a basic one-way Analysis ofVariance. Not every aspect of the summarytable is used to calculate eta – we do not usethe within subjects (i.e. error) sum of squares.We find that the value of eta is 0.83 comparedwith 0.81 for Pearson’s correlation. Thus, thereis virtually no departure from linearity sincethe ‘curvilinear’ eta is only marginally largerthan the linear correlation.

The calculation of eta should be more rou-tine than it is in current practice. In many dis-ciplines it is rarely considered. However, itscalculation is easy on statistical computerpackages with data entered in a form suitablefor calculating Pearson’s correlation (unre-lated ANOVA). It gives an indication ofnon-linearity when compared with Pearson’scorrelation. It is important to stress again thatalthough our example deals with a linearindependent variable, nevertheless this isnot a requirement. So the independent vari-able may be nominal categories. With nomi-nal data, the test of linearity is obviouslyinapplicable.

As we interpret the value of eta more or lessas a Pearson’s correlation coefficient then acorrelation of 0.83 indicates a strong relation-ship between the independent variable andthe dependent variable. In light of the factthat the linear relationship is almost as

THE SAGE DICTIONARY OF STATISTICS58

eta � �between sum of squarestotal sum of squares

Table E.1 Data with four categories of theindependent variable

Independent variable Dependent variable

1 31 41 21 51 42 32 62 72 53 73 63 43 83 93 84 124 124 114 104 6

Table E.2 The data (from Table E.1) put inthe form of one-way ANOVA

Category of Independent Variable

Category 1 Category 2 Category 3 Category 4

3 3 7 124 6 6 122 7 4 115 5 8 104 9 6

8

Table E.3 Analysis of variance summarytable for data in Table E.2

Source of Sum of Degrees of Mean variance squares freedom square

Between 118.050 3 39.350groups

Within or 54.750 16Error

Total 172.800

eta for above � �118.050 � �0.683 � 0.83

172.800

Cramer Chapter-E.qxd 4/22/04 2:12 PM Page 58

Page 70: Dictionary of statistics

strong, we would hesitate to describe therelationship as curvilinear in this example asthe departure from linearity is small. See also:curvilinear relationship

Excel: a widely available spreadsheet pro-gram for handling data which has good facil-ities for graphics and offers some basic butuseful statistical procedures.

exogenous variable: a variable in a pathanalysis which is not explained by any othervariables in the analysis but which explainsone or more other variables. As a conse-quence, it has one or more arrows leadingfrom it but none leading to it.

expected frequencies: the counts or fre-quencies expected if the null hypothesis weretrue (or it is what is based on the statisticalmodel in some forms of analysis). Most com-monly met in relation to the chi-square test. Ifthere is just one sample of scores (one-sampleor one-way chi-square) the expected frequen-cies are based either on known distributions oron the probabilities expected by chance underthe null hypothesis. In a two-way chi-square,the expected frequencies are based on the nullhypothesis of no difference or relationshipbetween samples. The expected frequenciesare the distribution obtained by combining thesamples proportionately to give a single pop-ulation estimate. See chi-square for the calcu-lation of expected frequencies.

experiment: see also: condition; treatmentgroup

experimental conditions: see quasi-experiment

experimental design or study: seebetween-subjects design; factorial design;longitudinal design or study; true experi-ment; within-subjects design

experimental effect: see effect

experimental group: see dichotomousvariable; treatment group

experimental manipulation: see fixedeffects

experimentwise error rate: see analysisof variance; familywise error rate

exploratory data analysis: a philosophyand strategy of research which puts the pri-mary focus of the researcher on using thedata as the starting point for understandingthe matter under research. This is distinctfrom the use of data as a resource for check-ing the adequacy of theory. Classical orconventional approaches to data analysisare driven by a desire to examine fairlylimited hypotheses empirically but, as a con-sequence, may ignore equally importantfeatures of the data which require under-standing and explanation. In conventionalapproaches, the focus is on using techniquessuch as the t test, ANOVA, and so forth, inorder to establish the credibility of the modeldeveloped by the researcher largely prior todata collection.

In exploratory data analysis, the emphasisis on maximizing the gain from the data bymaking more manifest the process of describ-ing and analysing the obtained data in theircomplexity. Many of the techniques used inexploratory data analysis would be regarded as

EXPLORATORY DATA ANALYSIS 59

Cramer Chapter-E.qxd 4/22/04 2:12 PM Page 59

Page 71: Dictionary of statistics

very basic statistical plots, graphs and tables –simple descriptive statistics. Exploratory dataanalysis does not require the powerful infer-ential statistical techniques common toresearch. The reason is that inferential statisti-cal techniques may be seen as testing theadequacy of hypotheses that have beendeveloped by other methods.

The major functions of exploratory dataanalysis are as follows:

1 To ensure maximum understanding of thedata by revealing their basic structure.

2 To identify the nature of the importantvariables in the structure of the data.

3 To develop simple but insightful modelsto account for the data.

In exploratory data analysis, anomalous datasuch as the existence of outliers are notregarded as a nuisance but something to beexplained as part of the model. Deviationsfrom linearity are seen as crucial aspects to beexplained rather than a nuisance. See also:descriptive statistics

exploratory factor analysis: refers to anumber of techniques used to determine theway in which variables group together but itis also sometimes applied to see how casesgroup together. The latter may be called Qanalysis, methodology or technique, to dis-tinguish it from the former which may bereferred to as R analysis, methodology ortechnique. Variables which are related orwhich group together should correlate highlywith the same factor and not correlate highlywith other factors. If the variables seem to bemeasuring or sampling aspects of the sameunderlying construct, they may be aggre-gated to form a single variable which is acomposite of these variables. For example,questions which seek to measure how anx-ious people are may form a factor. If this is thecase, the answers to these questions may becombined to produce a single index or scoreof anxiety.

In a factor analysis there are as many fac-tors as variables. The first factor explains the

greatest amount of the variance that is sharedby the variables in that analysis. The secondfactor explains the next greatest amount ofthe remaining variance that is shared by thevariables. Subsequent factors explain pro-gressively smaller amounts of the sharedvariance. The amount of variance explainedby a factor is called the eigenvalue, whichmay also be called the characteristic or latentroot.

As the aim of factor analysis is to deter-mine whether the variables can be explainedin terms of a number of factors which aresmaller than the number of variables, thereneeds to be some criterion to decide the num-ber of factors which explains a substantialamount of the variance. Various criteria havebeen proposed. One of the most widely usedis the Kaiser or Kaiser–Guttman criterion.Factors which have an eigenvalue of morethan one are considered to explain a worth-while amount of variance as this representsthe amount of variance a variable would haveon average. This criterion may result in toofew factors when there are few variables andtoo many factors when there are many fac-tors, although these numbers have not beenspecified. It may be considered a minimumcriterion below which a factor cannot possi-bly be statistically significant. An alternativemethod for determining the number of fac-tors is Cattell’s scree test. This test plots theeigenvalues of the factors. The scree startswhere there appears to be a sharp break in thesize of the eigenvalues. At this point the sizeof the eigenvalue will tend to be small andappears to decrease in a linear or straight linefunction for subsequent factors. Factorsbefore the scree tend to decline in substantialsteps of variance explained. The number ofthe factor at the start of the scree is the num-ber of factors that are considered to explain auseful amount of the variance.

Often, these factors are then rotated so thatsome variables will correlate highly with afactor while others will correlate lowly withit, making it easier to interpret the meaning ofthe factors. There are various methods ofrotation. One kind of rotation is to ensure thatthe factors are uncorrelated with or indepen-dent of one another. Factors that are unre-lated to one another can be visually depicted

THE SAGE DICTIONARY OF STATISTICS60

Cramer Chapter-E.qxd 4/22/04 2:12 PM Page 60

Page 72: Dictionary of statistics

as being at right angles to one another.Consequently, this form of rotation is oftenreferred to as orthogonal rotation as opposedto oblique rotation in which the factors areallowed to lie at an oblique angle to oneanother. The correlations of the variables on afactor can be aggregated together to form ascore for that factor – these are known as fac-tor scores. The advantage of orthogonal rota-tion is that as the scores on one factor areunrelated to those on other factors, the infor-mation provided by one factor is not similarto the information provided by other factors.In other words, this information is not redun-dant. This is not the case with oblique rota-tion where the scores on one factor may berelated to scores on other factors. The morehighly related the factors are, the more simi-lar the information is for those factors and sothe more redundant it is. The advantage ofoblique rotation is that it is said to be a moreaccurate reflection of the correlations of thevariables. Factors which are related to oneanother may be entered into a new factoranalysis to form second-order or higher orderfactors. This is possible because oblique fac-tors are correlated with each other and so cangenerate a correlation matrix. Orthogonalfactors are not correlated so their correlationscannot generate further factors.

The meaning of a factor is usually inter-preted in terms of the variables which corre-late highly with it. The higher the correlation,regardless of its sign, the more that factorreflects that variable. To aid interpretation ofits meaning, a factor should have at leastthree variables which correlate substantiallywith it. Variables may correlate highly withtwo or more factors, which suggests thatthese variables represent two or more factors.When aggregating variables together to forma single index for a factor, it is preferable toinclude only those variables which correlatehighly with that factor and to exclude vari-ables which correlate lowly with it or correlate

highly with one or more other factor. In thisway, the index can be seen as being a clearerrepresentation of that factor.

It is usual to report the amount of variancethat is explained by orthogonal factors. Thisis normally done in terms of the percentageof variance accounted for by those factors. Itis not customary to describe the amount ofvariance explained by oblique factors assome of the variance will be shared withother factors. The stronger the oblique factorsare correlated, the greater the shared variancewill be.

Cramer (2003)

exponent: a symbol or number writtenabove and to the right of another symbol ornumber to indicate the number of times thata quantity should be multiplied by itself. Forexample, the exponent 3 in the expression 23

indicates that the quantity 2 should be multi-plied by itself three times, 2 � 2 � 2. Theexponent 2 in the expression R2 indicates thatthe quantity R is multiplied by itself twice.The exponent J in the expression CJ indicatesthat the quantity C should be multiplied byitself the number of times that J represents.See also: logarithm

exponential: see natural or Napierianlogarithm

external validity: the extent to which thefindings of a study can be applied more gen-erally to other samples, settings and times.If the findings are specific to a contrivedresearch situation, then they are said to lackexternal validity. See also: randomization

EXTERNAL VALIDITY 61

Cramer Chapter-E.qxd 4/22/04 2:12 PM Page 61

Page 73: Dictionary of statistics

F ratio: the effect mean square dividedby the error mean square in an analysis ofvariance.

It is used to determine whether one or moremeans or some combination of means differsignificantly from each other. The bigger theratio is, the more likely it is to be statisticallysignificant. Tables for the value that the Fratio has to be or to exceed are available inmany statistical texts. Some of these valuesare shown in the entry on analysis of vari-ance. The significance levels of the valuesobtained for analysis of variance are usuallyprovided by statistical programs which com-pute this test.

The F ratio in multiple regression can beexpressed in terms of the following formula:

R2 is the squared multiple correlationbetween the criterion and all the predictorsthat have been entered into the multipleregression at that stage. Consequently, 1 � R2

is the error or remaining variance. R2 changeis the difference in R2 between all the predic-tors that have been entered into the multiple

regression and the predictor or predictorsthat have been entered into the last step of themultiple regression. In other words, R2

change is the variance accounted for by thepredictor(s) in the last stage. These two setsof variance are divided by their degrees offreedom. For R2 change it is the number ofpredictors involved in that change. For 1 � R2

it is the number of cases in the sample (N)minus the number of predictors that havebeen entered including those in the laststep minus one. See also: between-groupsvariance

Cramer (1998)

F test for equal variances in unrelatedsamples: one test for determining whetherthe variance in two groups of unrelatedscores is equal or similar and does not differsignificantly. This may be of interest in itself.It needs to be determined if a t test is to beused to see whether the means of those twogroups differ because which version of the ttest is to be used depends on whether thevariances are equal or not. The F test is thelarger variance divided by the smaller one:

The value that F has to be or to exceed tobe statistically significant at the 0.05 level isfound in many statistics texts. This test has

F

F �effect mean squareerror mean square

F �

R2 change/number of predictors in thatchange

(1 � R2)/(N � number of predictors � 1)

F test �larger variance

smaller variance

Cramer Chapter-F.qxd 4/22/04 5:13 PM Page 62

Page 74: Dictionary of statistics

two degrees of freedom (df ), one for thelarger variance and one for the smaller vari-ance. These degrees of freedom are oneminus the number of cases in that group. Thecritical value that F has to be or to exceed forthe variances to differ significantly at the 0.05level is given in Table F.1 for a selection ofdegrees of freedom.

This test is suitable for normally distributeddata.

face validity: the extent to which a measureappears to be measuring what it is supposedto be measuring. For example, a measure ofself-reported anxiety may have face validityif the items comprising it seem to be con-cerned with aspects of anxiety.

factor: see analysis of variance; higher orderfactors; principal axis factoring; scree test,Cattell’s

factor analysis: see Bartlett’s test ofsphericity; confirmatory factor analysis;eigenvalue, in factor analysis; exploratoryfactor analysis; factor loading; factorialvalidity; iteration; Kaiser’s (Kaiser–Guttman)criterion; oblique rotation; Q analysis; prin-cipal components analysis; R analysis; screetest,Cattell’s; simple solution; varimax rota-tion, in factor analysis

factor, in analysis of variance: a variablewhich consists of a relatively small number oflevels or groups which contain the values orscores of a measured variable. This variable issometimes called an independent variable asthis variable is thought to influence the othervariable which may be referred to as thedependent variable. There may be more thanone factor in such an analysis.

factor, in factor analysis: a composite vari-able which consists of the loading or correla-tion between that factor and each variablemaking up that factor. Factor analysis is usedto determine the extent to which a number ofrelated variables can be grouped togetherinto a smaller number of factors which sum-marize the linear relationship between thosevariables.

factor loading: a term used in factor analy-sis. A factor loading is a correlation coeffi-cient (Pearson’s product moment) between avariable and a factor (which is really a clusterof variables). Loadings can take positive andnegative values between � 1 and � 1. Theyare interpreted more or less as a correlationcoefficient would be. So if a variable has a fac-tor loading of 0.8 on factor A, then this meansthat it is strongly correlated with factor A. Ifa variable has a strong negative correlation of� 0.9, then this means that the variable needsto be reversed in order to understand what itis about the variable which relates to factor A.A factor loading of zero approximatelymeans that the variable has no relationshipwith the factor.

Each variable has a different factor loadingon each of the different factors. The pattern ofvariables which have high loadings on a fac-tor (taken in conjunction with those which donot load on that factor) is the basis for inter-preting the meaning of that factor. In otherwords, if there are similarities between thevariables which get high loadings on a par-ticular factor, the nature of the similarity isthe starting point for identifying what thefactor is.

FACTOR LOADING 63

Table F.1 The 0.05 critical values forthe F test

df forsmallervariance df for larger variance

10 20 30 40 50 ��

10 3.72 3.42 3.31 3.26 3.22 3.0820 2.77 2.46 2.35 2.29 2.25 2.0930 2.51 2.20 2.07 2.01 1.97 1.7940 2.39 1.99 1.94 1.80 1.75 1.6450 2.32 1.94 1.87 1.74 1.70 1.55� 2.05 1.71 1.57 1.48 1.43 1.00

Cramer Chapter-F.qxd 4/22/04 5:13 PM Page 63

Page 75: Dictionary of statistics

factor rotation: see oblique and orthogonalrotation

factorial: the factorial of 6 is denoted as 6!and equals 6 � 5 � 4 � 3 � 2 � 1 � 720 whilethe factorial of 4 is denoted 4! and equals4 � 3 � 2 � 1 � 24. Thus, a factorial is thenumber multiplied by all of the whole numberslower than itself down to one (Table F.2).The factorial of 1 is 1 and the factorial of 0 isalso 1. It has statistical applications incalculating the Fisher exact probability testand permutations and combinations. Unfor-tunately, factorials rapidly exceed the capa-bilities of calculators and their use can bedifficult as a consequence. One solution is tolook for terms which can be cancelled, so 6!/5!which is (6 � 5 � 4 � 3 � 2 � 1)/(5 � 4 � 3 �2 � 1) simply cancels out to 6.

See Fisher (exact probability) test

factorial analysis of variance: see analysisof variance

factorial design: a design which investigatesthe relationship between two or more factorson the scores of one or more other measuredvariables. There are two main advantages instudying two or more factors together. Thefirst is that error or unexplained variance inthe measured variables may be explained bysome of the other factors or their interactions,thereby reducing it. This provides a more sen-sitive test of that factor. The second advantageis that the interaction between the factors canbe examined. See also: interaction

factorial validity: sometimes used to referto whether the variables making up a measure

have been shown to group together as aunitary factor through the statistical techniqueof factor analysis. A measure may be said tobe factorially valid or to have factorial valid-ity if the variables comprising it have beendemonstrated to cluster together in this way.

Nunnally and Bernstein (1994)

familywise error rate: the probability orsignificance level for a finding when a familyor number of tests or comparisons are beingmade on the data from the same study. It isalso known as the experimentwise error rate.When determining whether a finding is sta-tistically significant, the significance level isusually set at 0.05 or less. If more than onefinding is tested on a set of data, the proba-bility of those findings being statistically sig-nificant increases the more findings or tests ofsignificance that are made. The following for-mula can be used for determining the family-wise error rate significance level or a (alpha)where the tests are independent:

1 � (1 � �)number of comparisons

For example, if three tests are conductedusing the 0.05 alpha level the familywiseerror rate is about 0.14:

1 � (1 � 0.05)3 � 1 � 0.953 � 1 � 0.8574 � 0.1426

There are various ways of controlling for thisfamilywise error rate. Some of the tests fordoing so are listed under multiple comparisontests. One of the simplest is the Bonferroni testwhere the 0.05 level is divided by the numberof tests being made. So, if three tests are beingconducted, the appropriate significance level is0.0167 (0.05/3 � 0.0167) though this is reportedas 0.05. See also: analysis of variance

first-order partial correlation: see zero-order correlation

THE SAGE DICTIONARY OF STATISTICS64

Table F.2 Factorials

N 0 1 2 3 4 5 6 7 8 9 10 11

N! 0 1 2 6 24 120 720 5040 40,320 362,880 3,628,800 39,916,800

Cramer Chapter-F.qxd 4/22/04 5:13 PM Page 64

Page 76: Dictionary of statistics

Fisher (exact probability) test: a test ofsignificance (or association) for small contin-gency tables. Generally known is the 2 � 2contingency table version of the test but thereis also a 2 � 3 contingency table versionavailable. As such, it is an alternative tech-nique to the chi-square in these cases. Itcan be used in circumstances in which theassumptions of chi-square such as minimalnumbers of expected frequencies are not met.It has the further advantage that exact proba-bilities are calculated even in the hand calcu-lation, though this advantage is eroded bythe use of power statistical packages whichprovide exact probabilities for all statistics.The major disadvantage of the Fisher (exactprobability) test is that it involves the use offactorials which can rapidly become unwieldywith substantial sample sizes. The calculationof a simple 2 � 2 Fisher test on the data inTable F.3 begins as follows:

Fisher exact probability

Taking the factorial values from Table F.2,this is

This value (0.082) is the probability of gettingprecisely the outcome to be found in Table F.3 –that is, the data. This is just one component ofthe calculation. In order to calculate the sig-nificance of the Fisher exact test one mustalso calculate the probability of the moreextreme outcomes while maintaining themarginal totals as they are in Table F.3. Table F.4

gives all of the possible tables – the tablestarts with the cases that are more extremethan the data and finishes with the tablesthat are more extreme in the opposite direc-tion. Not all values are possible in the topleft-hand side cell if the marginal totals areto be retained. So, for example, it is not pos-sible to have 8, 1 or 0 in the top left-handcells.

We can use the probabilities calculatedusing the formula above to calculate theFisher exact probabilities. It is simply a mat-ter of adding the appropriate probabilitiestogether:

• One-tailed significance is assessed byadding together the probability for the datathemselves (0.082) with any more extremeprobabilities in the same direction (there isonly one such table – the first one which hasa probability of 0.005). Consequently, in thiscase, the one-tailed significance is 0.082 �0.005 �0.087. This is not statistically signifi-cant in our example.

• Two-tailed significance is not simplytwice the one-tailed significance. Thisgives too high a value as the distributionis not symmetrical. One needs to find allof the extreme tables in the opposite direc-tion from the actual data which also havea probability equal to or smaller than thatfor the data. The probability for the datathemselves is 0.082. As can be seen, theonly table in the opposite direction withan equal or smaller probability is the finaltable. This has a probability of 0.016. Toobtain the two-tailed probability one simplyadds this value to the one-tailed probabil-ity value. This is 0.087 � 0.016 � 0.103.This is not statistically significant at the5% level.

FISHER (EXACT PROBABILITY) TEST 65

Table F.3 Contingency table to illustrate calculationof 2 × 2 Fisher exact test with letter symbolsadded to ease calculation

Men Women

Employed a � 6 b � 2 W � a � b � 8Not employed c � 1 d � 4 X � c � d � 5

Y � a � c � 7 Z � b � d � 6 N � 13

Fisher exact probability � W!X!Y!Z!N!a!b!c!d!

�8!5!7!6!

13!6!2!1!4!

40,320 � 120 � 5040 � 720

6,227,020,800 � 720 � 2 � 1 � 24 � 0.082

Cramer Chapter-F.qxd 4/22/04 5:13 PM Page 65

Page 77: Dictionary of statistics

One important thing to remember about theFisher exact probability test is that the num-ber obtained (in this case 0.082) is the signifi-cance level – it is not a chi-square value whichhas to be looked up in tables. While a signifi-cance level of 0.082 is not statistically signifi-cant, of course, it does suggest a possibletrend. See also: factorial

Fisher’s LSD (Least SignificantDifference) or protected test: a post hocor multiple comparison test which is used todetermine which of three or more means dif-fer from one another when the F ratio in ananalysis of variance is significant. It is essen-tially an unrelated t test in which the signifi-cance level has not been adjusted for thenumber of comparisons being made. As aconsequence its use is generally not recom-mended. The formula for this test is the sameas that for the unrelated t test where the vari-ances are equal:

A version of this test is also used to ascertainwhich of three or more adjusted means differfrom one another when the F ratio in ananalysis of covariance is significant. This testhas the following formula:

See analysis of varianceHowell (2002); Huitema (1980)

Fisher’s z transformation: the logarithmictransformation of Pearson’s correlationcoefficient. Its common use is in the z testwhich compares the size of two correlationcoefficients from two unrelated samples.(This is obviously different from the moreconventional test of whether a single correla-tion coefficient differs from zero.) Without

THE SAGE DICTIONARY OF STATISTICS66

7 1

0 5

p = 0.005 – this is the only more extreme case than thedata in the same direction as the data

6 2

1 4

p = 0.082 – this is the same table as for the data

5 3

2 3

p = 0.326

4 4

3 2

p = 0.408

3 5

4 1

p = 0.163

2 6

5 0

p = 0.016 – this is the only more extreme case in theopposite direction to the data

Table F.4 All of the possible outcomes ofa study which maintain themarginal totals of Table F.3

group 1 mean � group 2 meant �

�(group 1 variance/group 1 n) � (group 2 variance/group 2 n)

adjusted group 1 mean � adjusted group 2 mean

t � adjusted 1 1 (covariate group 1 mean� covariate group 2 mean)2

�mean ��� � � ��square n1 n2 covariate error sum of squareserror

Cramer Chapter-F.qxd 4/22/04 5:13 PM Page 66

Page 78: Dictionary of statistics

the transformation, the distribution ofdifferences between the two correlationcoefficients becomes very skewed andunmanageable.

This transformation may be found by look-ing up the appropriate value in a table or bycomputing it directly using the following for-mula where loge stands for the natural loga-rithm and r for Pearson’s correlation:

zr � 0.5 � loge [(1 � r)/(1 � r)]

The values of Pearson’s correlation vary from0 to ± 1 as shown in Table F.5 and the valuesof the z correlation from 0 to about ±3(though, effectively, infinity).

fixed effects: the different levels of anexperimental treatment (experimentalmanipulation) in many studies are merely asmall selection from a wide range of possiblelevels. Generally speaking, the researcherchooses a small number of different levels forthe treatment on the basis of some reasonedargument about what is appropriate, practi-cal and convenient. That is, the levels arefixed by the researcher. This is known asthe fixed effects model. It merely indicatesthat the different treatments are fixed by theresearchers. The alternative is the randomeffects model in which the researcher selectsthe different levels of the treatment used by arandom procedure. So, the range of possibletreatments has to be identified and the selec-tion made randomly from this range. Fixedeffects are by far the most common approachin the various social sciences. As all of thecommon statistical techniques assume fixedrather than random effects, there is little

point in deviating from the fixed effectsapproach in terms of planning one’s ownresearch. It is difficult to find examples of theuse of the random effects model in many dis-ciplines. Nevertheless, advanced textbooks instatistics cover the appropriate procedures.See also: random effects model

floor effect: occurs when scores on thedependent variable are so low that the intro-duction of an experimental treatment cannotdepress the scores of participants any further.This, if it is not recognized, might be taken asa sign that the experimental treatment is inef-fective. This is not the case. For example, ifchildren with educational difficulties aregiven a multiple-choice spelling test, it islikely that their performances are basicallyrandom so that they score at a chance level.For these children, the introduction of timepressure, stress or some other deleterious fac-tor is unlikely to reduce their performance.This is because their performances on the testare at the chance level anyway. It is the oppo-site of a ceiling effect. The reasons for flooreffects are diverse. They can be avoided bycareful development of the measuring instru-ments and by careful review of the appropri-ate research participants.

frequency: the number of times a particularevent or outcome occurs. Thus, the frequencyof women in a sample is simply the totalnumber of women in the sample. The fre-quency of blue-eyed people is the number ofpeople with blue eyes. Generally in statisticswe are referring to the number of cases witha particular characteristic. An alternative term,most common in statistical packages, iscount. This highlights the fact that the fre-quency is merely a count of the number ofcases in a particular category.

Data from individuals sometimes may becollected in the form of frequencies. Thismight cause the novice some confusion. Forexample, a score may be based on the fre-quency or number of eye blinks a participant

FREQUENCY 67

Table F.5 r to z transformation

r z r z

0.00 0.00 0.60 0.690.10 0.10 0.70 0.870.20 0.20 0.80 1.100.30 0.31 0.90 1.470.40 0.42 1.00 3.000.50 0.55

Cramer Chapter-F.qxd 4/22/04 5:13 PM Page 67

Page 79: Dictionary of statistics

does in a 10-minute period. This is really ascore since it indicates the rate of blinking ina 10-minute period for each participant.Those with little knowledge or experience ofstatistics may find frequencies hard to differ-entiate from scores. This is because in order todecide whether the number 23 is a score or afrequency one needs to know how the number23 was arrived at.

Data collected in the form of frequencies innominal categories (categorical or categorydata) are analysed using tests such as chi-square. See also: Bayesian inference; count;marginal totals

frequency curve: a graphical frequencydistribution applied to a continuous (or nearcontinuous) score. It consists of an axis of fre-quency and an axis corresponding to thenumerical value of the scores (Figure F.1).Since the steps on the value axis are small, thefrequency distribution is fitted well by usinga curved line rather than a succession ofblocks. The term frequency polygon is some-times met to describe the line that connectsthe points of a frequency curve. The mostfamous frequency curve in statistics is thebell-shaped, normal distribution.

frequency distribution: a table or diagramgiving the frequencies of values of any givenvariable. For qualitative variables this is thenumber of times each of the categories occurs(e.g. depressive, schizophrenic and paranoid)whereas for quantitative variables this is thenumber of times each different score (orrange of scores) occurs. The importance of afrequency distribution is simply that it allowsthe researcher to see the general characteris-tics of a particular variable for the cases orparticipants in the research. The frequencydistribution may reveal important character-istics such as asymmetry, normality, spread,outliers, and so forth, in the case of scoredata. For qualitative data, it may help reveal

categories which are very frequent or soinfrequent that it becomes meaningless toanalyse them further.

Frequency distributions may be presentedin a number of different forms: simple fre-quency distributions which are merely fre-quencies, percentage frequency distributionsin which the frequencies are expressed as apercentage of the total of frequencies for thatvariable, cumulative frequencies, and evenprobability curves in which the frequenciesare expressed as a proportion of the total (i.e.expressed as a proportion or probability outof one).

While frequency distributions reflect thedistribution of a single variable, there arevariants on the theme which impose a secondor third variable. These are compound fre-quency distributions (more often compoundbar charts or compound histograms).

frequency polygon: see frequency curve

Friedman two-way analysis of variancetest: determines if the mean ranks of three ormore related samples or groups differ signifi-cantly. It is a non-parametric test which isused on ranked data. The test statisticapproximates a chi-square distribution wherethe degrees of freedom are the number ofgroups minus one.

Cramer (1998)

THE SAGE DICTIONARY OF STATISTICS68

Frequency

Score

Figure F.1 Frequency curve

Cramer Chapter-F.qxd 4/22/04 5:13 PM Page 68

Page 80: Dictionary of statistics

Gabriel simultaneous test procedure: apost hoc or multiple comparison test which isused to determine which of three or moremeans differ from one another in an analysisof variance. It is used with groups whichhave equal variances. It is based on thestudentized maximum modulus rather thanthe studentized range. The studentized max-imum modulus is the maximum absolutevalue of the group means which is divided bythe standard error of the means.

Kirk (1995)

Games–Howell multiple comparisonprocedure: a post hoc or multiple comparisontest which is used to determine which of threeor more means differ from one another whenthe F ratio in an analysis of variance is signifi-cant. It was developed to deal with groupswith unequal variances. It can be used withgroups of equal or unequal size. It is based onthe studentized range statistic. See also:Tamhane’s T2 multiple comparison test

Howell (2002); Kirk (1995)

Gaussian distribution: another term for anormal distribution. See normal distribution

geometric mean: the nth root of the prod-uct of n scores. So the geometrical mean of 2,9, and 12 is

The geometric mean of 3 and 6 is

The geometric mean has no obvious role inbasic everyday statistical analysis. It isimportant, though, since it emphasizes thatthere are more meanings of the concept ofmean than the arithmetic mean or average.See also: average

general factor: in factor analysis, a gen-eral factor is one on which all of the vari-ables in the analysis load to a significantdegree. If there is a general factor, it willinevitably be the largest factor in terms ofvariance explained (eigenvalue) and conse-quently the first factor to emerge in theanalysis.

Goodman and Kruskal’s gamma (��): ameasure of association for ranked or ordinaldata. It can range from �1 to �1 just like acorrelation coefficient. It takes no account ofthe number of ranks for the two variables orcases which have the same or tied ranks. Seealso: correlation

Cramer (1998); Siegal and Castellan (1988)

2�3 � 6 � 2�18� 4.24

G

3�2 � 9 � 12 � 3�216 � 6.0

Cramer Chapter-G.qxd 4/22/04 2:13 PM Page 69

Page 81: Dictionary of statistics

Goodman and Kruskal’s lambda (��): ameasure of the proportional increase in accu-rately predicting the outcome for one cate-gorical variable when we have informationabout a second categorical variable, assumingthat the same prediction is made for all casesin a particular category. For example, wecould use this test to determine how muchour ability to predict whether students passor fail a test is affected by knowing their sex.

We can illustrate this test with the data inTable G.1 which show the number of femalesand males passing or failing a test.

If we had to predict whether a particularstudent had passed the test disregardingwhether they were female or male, our bestbet would be to say they had passed as mostof the students passed (164 out of 200). If wedid this we would be wrong on 36 occasions.How would our ability to predict whether astudent had passed be increased by knowingwhether they were male or female? If we pre-dicted that the student had passed and wealso knew that they were female we would bewrong on 12 occasions, whereas if they weremale we would be wrong on 24 occasions. Ifwe knew whether a student was female ormale we would be wrong on 36 occasions.This is the same number of errors as wewould make without knowing the sex of thestudent, so the proportional increase know-ing the sex of the student is zero. The value oflambda varies from 0 to 1. Zero means thatthere is no increase in predictiveness whereasone indicates that there is perfect predictionwithout any errors.

This test is asymmetric in that the propor-tional increase will depend on which of thetwo variables we are trying to predict. Forthis case, lambda is about 0.15 if we reversethe prediction in an attempt to predict the sexof the student on the basis of whether theyhave passed or failed the test. The test

assumes that the same prediction is made forall cases in a particular row or column of thetable. For example, we may assume that allfemales have passed or that all males havefailed. Goodman and Kruskal’s tau presumesthat the predictions are randomly made onthe basis of their proportions in the row andcolumn totals. See also: correlation

Cramer (1998); Siegal and Castellan (1988)

Goodman and Kruskal’s tau (��): a mea-sure of the proportional increase in accuratelypredicting the outcome of one categoricalvariable when we have information about asecond categorical variable where it isassumed that the predictions are based on thetheir overall proportions.

We can illustrate this test with the data inTable G.1 (under the entry for Goodman andKruskal’s lambda) where we may be inter-ested in finding out how much our ability topredict whether a person has passed or faileda test is increased by our knowledge ofwhether they are female or male. If we pre-dicted whether a person had passed on thebasis of the proportion of people who hadpassed disregarding whether they werefemale or male, we would be correct for 0.82(164/200 � 0.82) of the 164 people who hadpassed, which is for 134.48 of them (0.82 �164 � 134.48). If we did this for the peoplewho had failed, we would be correct for 0.18(36/200 � 0.18) of the 36 people who hadfailed, which is for 6.48 of them (0.18 � 36 �6.48). In other words, we would haveguessed incorrectly that 59.04 of the peoplehad passed (200 – 134.48 – 6.48 � 59.04)which is a probability of error of 0.295(59.04/200 � 0.295).

If we now took into account the sex of theperson, we would correctly predict that theperson had passed the test

for 0.90 (108/120 � 0.90) of the 108 femaleswho had passed, which is for 97.20 of them(0.90 � 108 � 97.20),

for 0.70 (56/80 � 0.70) of the 56 males whohad passed, which is for 39.20 of them(0.70 � 56 � 39.20),

THE SAGE DICTIONARY OF STATISTICS70

Table G.1 Number of females and malespassing or failing a test

Pass Fail Row total

Females 108 12 120Males 56 24 80Column total 164 36 200

Cramer Chapter-G.qxd 4/22/04 2:13 PM Page 70

Page 82: Dictionary of statistics

for 0.10 (12/120 � 0.10) of the 12 femaleswho had failed, which is for 1.20 of them(0.10 � 12 � 1.20), and

for 0.30 (24/80 � 0.30) of the 24 maleswho had failed, which is for 7.20 of them(0.30 � 24 � 7.20).

In other words, we would have guessedincorrectly that 55.20 of the people hadpassed (200 � 97.20 � 39.20 � 1.20 � 7.20 �55.20). Consequently, the probability of errorof prediction is 0.276 (55.20/200 � 0.276). Theproportional reduction in error in predictingwhether people had passed knowingwhether they were female or male is 0.064[(0.295 � 0.276)/0.295 � 0.064].

Cramer (1998)

goodness-of-fit test: gives an indication ofthe closeness of the relationship between thedata actually obtained empirically and thetheoretical distribution of the data based onthe null hypothesis or a model of the data.These can be seen in the chi-square test andlog–linear analysis respectively. In chi-squarethe empirically observed data and theexpected data under the null hypothesis arecompared. The smaller the value of chi-square the better is the goodness of fit to thetheoretical model based on the null hypothe-sis. In log–linear analysis, chi-square is usedto test the closeness of the empiricallyobserved data and a complex model based oninteractions and main effects. So a goodness-of-fit test assesses the correspondence betweenthe actual data and a theoretical account ofthat data that can be specified numerically.See also: chi-square

grand: as in grand mean and grand total.This is a somewhat old-fashioned term fromwhen the analysis of variance, for example,would be routinely hand calculated. It refersto the overall characteristics of the data – notthe characteristics of individual cells orcomponents of the analysis.

grand mean: see between-groups variance

graph: a diagram which illustrates the rela-tionship between variables, usually two vari-ables. Scattergrams are a typical example of agraph in statistics.

grouped data: usually refers to the processby which a range of values are combinedtogether especially to make trends in the datamore apparent. So, for example, weights inthe range 40–49 kg could be combined intoone group, weights in the range 50–59 kg intothe second group, weights in the range 60–69 kginto the third group, and so forth. There is aloss of information by doing so since a personwho weighs 68 kg is placed in the samegroup as a person who weighs 62 kg despitethe fact that their weights do differ. Groupingis most appropriate for graphical presenta-tions of data. The purpose of grouping isbasically to clarify trends in the distributionof data. Too many data points with fairlysmall samples can help disguise what themajor features of the data are. Groupingsmooths out some of these difficulties.

Grouping is sometimes used in the collec-tion of data. For example, one commonly seesage being assessed in terms of age bandswhen participants are asked their age group,such as 20–29 years, 30–39 years, etc. There isno statistical reason for collecting data in thisform. Indeed, there is an obvious case for col-lecting information such as age as actual age.Modern computer analyses of data canrecode information in non-grouped formreadily into groups if this is required.

There are special formulae for the rapidcalculation of statistics such as the mean fromgrouped data. These had advantages in thedays of hand calculation but they have less tooffer nowadays.

GROUPED DATA 71

Cramer Chapter-G.qxd 4/22/04 2:13 PM Page 71

Page 83: Dictionary of statistics

harmonic mean: the harmonic mean of 3and 6 is

The harmonic mean of 3, 6 and 7 is

In other words, the harmonic mean is thenumber of scores, divided by the sum of 1/Xfor each score.

The harmonic mean is rarely directly cal-culated as such in statistics. It is part of thecalculation of the unrelated t test, for example.See also: average

Hartley’s test or Fmax: one test used todetermine whether the variances of three ormore groups are similar (i.e. homogeneous)when the number of cases in each group isthe same or very similar. It is the largest vari-ance divided by the smallest variance. Thedegrees of freedom for this F ratio arethe number of groups for the numerator. Forthe denominator they are the number of casesin a group minus one or, where the groupsizes differ slightly, the number of cases in themost numerous group minus one.

heterogeneity: the state of being of incom-parable magnitudes. It is particularly applied

to variance. Heterogeneity of variance is aphrase that indicates that two variances arevery dissimilar from each other. This meansin practice that, since they are so very differ-ent, it is misleading to try to combine them togive a better estimate of the population vari-ance, for example. The usual criterion is thatthe two variances need to be significantly dif-ferent from each other using such tests as theF ratio, Levene’s test or the Bartlett–Box F test.

heteroscedasticity: most notably an issuein connection with correlation and regressionwhich uses the least squares method of fittingthe best fitting straight line between the datapoints. The variation of the data points of thescattergram vertically can differ at differentpoints of the scattergram. For some points onthe horizontal of the scattergram there maybe a large amount of variation of the points,for other points on the horizontal of the scat-tergram there may be little variation. Suchvariation at different points tends to make thefit of the best fitting straight line rather pooror less than ideal (though not consistently soin any direction). Tests of heteroscedasticityare available so that appropriate correctionscan be made.

Heteroscedasticity is shown in Figure H.1.This is a scattergram of the relationshipbetween two variables X and Y. If we take thefirst data point along the horizontal axis andlook at the four scores indicated above thatpoint, we can see that these data points varyin their score on the vertical Y axis but this

H

2 2 213

�16

�0.333 � 0.167

�0.5

� 4

3 3 313

�16

�17

�0.333 � 0.167 +0.143

=0.643

� 4.67

Cramer Chapter-H.qxd 4/22/04 2:14 PM Page 72

Page 84: Dictionary of statistics

variation is relatively small (2 units on thevertical axis). They all are at position 1 on thehorizontal axis. Look at the scores at datapoint 5 on the horizontal axis. Obviously all ofthese points are 6 on the horizontal axis butthey vary from 2 to 10 on the vertical axis.Obviously the variances at positions 1 and 5on the horizontal axis are very different fromeach other. This is heteroscedasticity. See also:multiple regression; regression equation

heuristic: this term has several differentmeanings. It may be used as an adjective or anoun. A general meaning is finding out ordiscovering. Another meaning is to find outthrough experience. A third meaning is aprocess or method which leads to a solution.See also: algorithm

hierarchical agglomerative clustering:one of the more widely used methods of clus-ter analysis in which clusters of variablesare formed in a series or hierarchy of stages.Initially there are as many clusters as vari-ables. At the first stage, the two variables thatare closest are grouped together to form onecluster. At the second stage, either a thirdvariable is added or agglomerated to the first

cluster containing the two variables or twoother variables are grouped together to form anew cluster, whichever is closest. At the thirdstage, two variables may be grouped together,a third variable may be added to an existinggroup of variables or two groups may be com-bined. So, at each stage only one new clusteris formed according to the variables, the vari-able and cluster or the clusters that are closesttogether. At the final stage all the variables aregrouped into a single cluster.

Cramer (2003)

hierarchical or sequential entry: amethod in which predictors are entered in anorder or sequence which is determined by theanalyst in statistical techniques such as logisticregression, multiple regression, discriminantfunction analysis and log–linear analysis.

hierarchical method in analysis of vari-ance: see Type I, hierarchical or sequentialmethod in analysis of variance

hierarchical multiple regression: seecovariate; multiple regression

HIERARCHICAL MULTIPLE REGRESSION 73

Scores at this point onX axis have biggestspread of values on Yaxis

Y 14

12

10

8

6

4

2

0

−20 1 2 4 5 6 73

X

Figure H.1 Illustrating heteroscedasticity

Cramer Chapter-H.qxd 4/22/04 2:14 PM Page 73

Page 85: Dictionary of statistics

higher order factors: these emerge from afactor analysis of the correlations between thefirst or first-order factors of a factor analysisof variables. This is only possible when thefactors are rotated obliquely (i.e. allowed tocorrelate). So long as the factors in a higherorder factor analysis are rotated obliquely,further higher order factor analyses maybe carried out. Except where the analysisinvolves numerous variables, it is unlikelythat more than one higher order factor analy-sis will be feasible. See also: exploratoryfactor analysis

higher order interactions: involve rela-tionships between more than two factorsor variables. For example, the interactionbetween the three variables of gender, socialclass and age would constitute a higher orderinteraction.

histogram: used to present the distributionof scores in diagram form. They use rectan-gular bars to illustrate the frequency of aparticular score or range of scores. Becausescores are based on numerical order, the barsof a histogram will touch (unless there is zerofrequency for a particular bar). Unlike a barchart, the bars cannot be readily rearrangedin terms of order without radically alteringthe interpretation of the data. A typical histo-gram is to be found in Figure H.2. This dia-gram is interpreted generally by looking atthe numerical frequency scale along the verti-cal axis if there is one or simply the heights ofthe histogram bars.

There is an important point to bear in mindif one is using scores in short ranges such asages 20–29, 30–39, 40–49 years etc. Theseranges should be identical in width otherwisethere is a complication. For example, if theresearcher classified in the above data as20–29 years, 30–39 years, 40–49 years andabove 50–79 years, then one should examinethe area of the bar and not its height as thewidth of the bar should be greater if the agerange is greater. However, this is commonlynot done by researchers.

Although histograms are common in themedia in general and are used by non-statisticians, it is important not to regard themas having little or no place in sophisticatedstatistical analyses. Quite the contrary: timespent examining univariate data (and bivari-ate data) using histograms, compound barcharts, and the like, may reveal unexpectedfeatures of the data. For example, it may bethat scores on a variable appear to be in twodistinct groups on a histogram. This sort ofinformation would not be apparent from themost complex of statistical analyses withoutusing these simple graphical techniques.

Hochberg GT2 test: a post hoc or multiplecomparison test which is used to determinewhich of three or more means differ from oneanother in an analysis of variance. It is usedwith groups which have equal variances. It isbased on the studentized maximum modulusrather than the studentized range. Thestudentized maximum modulus is the maxi-mum absolute value of the group meanswhich is divided by the standard error of themeans.

Toothaker (1991)

homogeneity: the state of being uniform orsimilar especially in terms of extent whenapplied to statistics. Hence, homogeneity of

THE SAGE DICTIONARY OF STATISTICS74

Score

7.06.05.04.03.02.01.0

Fre

quen

cy

40

30

20

10

0

Figure H.2 Histogram

Cramer Chapter-H.qxd 4/22/04 2:14 PM Page 74

Page 86: Dictionary of statistics

variance means that the variances of two ormore sets of scores are highly similar (or notstatistically dissimilar). Homogeneity ofregression similarly means that the regres-sion coefficients between two variables aresimilar for different samples.

Some statistical techniques (e.g. the t test,the analysis of variance) are based on theassumption that the variances of the samplesare homogeneous. There are tests for homo-geneity of variances such as the F ratio test. Ifit is shown that the variances of two samplesof scores are not homogeneous, then it isinappropriate to use the standard version ofthe unrelated t test. Another version isrequired which does not combine the twovariances to estimate the population vari-ance. This alternative version is available in afew textbooks and in a statistical packagesuch as SPSS.

homogeneity of regression: the regres-sion coefficients being similar or homo-geneous in an analysis of covariance ormultivariate analysis of covariance for the dif-ferent categories or groups of an independentvariable. The results of such an analysis aredifficult to interpret if the regression coeffi-cients are not homogeneous because thismeans that the relationship between thecovariate and the dependent variable is notthe same across the different groups. One ofthe assumptions of this type of analysis is thatthe regression coefficients need to be homoge-neous. See also: analysis of covariance

homogeneity of variance: the variances oftwo or more groups of scores being similaror homogeneous. Analysis of variance assumesthat the variances of the groups are similar.These variances are grouped together to formthe error mean square or variance estimate. Ifthe variance of one or more of the groups isconsiderably larger than that of the others, thelarger variance will increase the size of theerror variance estimate which will reducethe chance of the F ratio being significant.The means of the groups with the smaller

variances may differ from each other but thisdifference is hidden by the inclusion of thegroups with the larger variances in the errorvariance estimate. If the variances are dissim-ilar or heterogeneous, they may be mademore similar by transforming the scoresthrough procedures such as taking theirsquare root. If there are only two groups, at test may be used in which the variancesare treated separately. See also: analysis ofvariance

Hotelling’s trace criterion: a test used inmultivariate statistical procedures such ascanonical correlation, discriminant functionanalysis and multivariate analysis of varianceto determine whether the means of thegroups differ on a discriminant function orcharacteristic root. See also: Wilks’s lambda

Tabachnick and Fidell (2001)

hypergeometric distribution: a probabi-lity distribution which is based on samplingwithout replacement where once a particularoutcome has been selected it cannot be selectedagain.

hypothesis: a supposition or suggestionabout the possible nature of the facts. It isgenerally regarded as the starting point forfurther investigation. Nevertheless, someresearch is purely exploratory without anyformally expressed hypotheses. In such stud-ies data are explored in order to formulateideas about the nature of relationships.

It should be noted that the researcher’shypothesis and the statistical hypothesis arenot necessarily the same thing. Statisticalhypotheses are made up of the null hypothesisof no relationship between two variables andthe alternative hypothesis of a relationshipbetween the two variables. Failure of thenull hypothesis merely indicates that it ismore plausible that the relationship betweenthe two variables is not the result of chancethan it is the result of chance. Hence, the

HYPOTHESIS 75

Cramer Chapter-H.qxd 4/22/04 2:14 PM Page 75

Page 87: Dictionary of statistics

researcher’s hypothesis may be preferred inthese circumstances but there may be otherequally plausible hypotheses which cannotbe explained on the basis of chance either.

At the very least a hypothesis expresses arelationship between two variables such as‘Social support will be negatively related todepression’ or ‘Those with high support willbe less depressed than those with low sup-port.’ It is usual to express the direction of theexpected relationship between the variablesas illustrated here because the researchertypically expects the variables to be related ina particular way. Hypotheses may be non-directional in the sense that a relationshipbetween the variables is expected but thedirection of that relationship is not specified.An example of such a hypothesis is ‘Socialsupport will be related to depression.’ Insome areas of research it may not be possibleto specify the direction of the hypothesis. Forexample, people who are depressed may bemore likely to seek and to receive support orthey may be those individuals who have notreceived sufficient support. In such circum-stances a non-directional hypothesis may beproposed.

With a true experimental design it is prefer-able to indicate the causal direction of the rela-tionship between the two variables such as‘Greater social support will lead to less depres-sion’ if at all possible. See exploratory dataanalysis; hypothesis testing

hypothesis testing: generally speaking,this is a misnomer since much of what isdescribed as hypothesis testing is really null-hypothesis testing. Essentially in null-hypothesis testing, the plausibility of the ideathat there is zero difference or zero relation-ship between two measures is examined. Ifthere is a reasonable likelihood that thetrends in the obtained data could reflect apopulation in which there is no difference orno relationship, then the hypothesis of a rela-tionship or a difference is rejected. That is, thefindings are plausibly explained by the nullhypothesis which suggests that chance fac-tors satisfactorily account for the trends in theobtained data.

Research which deals with small samples(i.e. virtually all modern research) suffersfrom the risk of erroneous interpretations ofany apparent trends in the data. This isbecause samples are intrinsically variable.Hypothesis testing generally refers to theNeyman–Pearson strategy for reducing therisk of faulty interpretations of data. Althoughthis approach is generally the basis of muchstatistics teaching, it has always been a con-troversial area largely because it is taken toestablish more than it actually does. In theNeyman–Pearson approach, the alternativehypothesis of a relationship between twovariables is rigorously distinguished from thenull hypothesis (which states that there is norelationship between the same two variables).The reason for concentrating on the nullhypothesis is that by doing so some fairlysimple inferences can be made about the pop-ulation which the null hypothesis defines. Soaccording to the null hypothesis, the popula-tion distribution should show no relationshipbetween the two variables (by definition).Usually this means that their correlation is0.00 or there is 0.00 difference between thesample means. It is assumed, however, thatother information taken from the samplesuch as the standard deviation of the scoresor any other statistic adequately reflects thecharacteristics of the population.

Hypothesis testing then works out thelikely distribution of the characteristics of allsamples taken from this theoretical popula-tion defined by the null hypothesis and aspectsof the known sample(s) studied in theresearch. If the actual sample is very differentfrom the population as defined by the nullhypothesis then it is unlikely to come fromthe population as defined by the null hypoth-esis. (The conventional criterion is that sam-ples which come in the extreme 5% areregarded as supporting the hypothesis; themiddle 95% of samples support the nullhypothesis.) If a sample does not come fromthe population as defined by the null hypoth-esis, then the sample must come from a pop-ulation in which the null hypothesis is false.Hence, the more likely it is that the alternativehypothesis is true. In contrast, if the sample(s)studied in the research is typical of the vastmajority of samples which would be obtained

THE SAGE DICTIONARY OF STATISTICS76

Cramer Chapter-H.qxd 4/22/04 2:14 PM Page 76

Page 88: Dictionary of statistics

if the null hypothesis were true, then themore the actually obtained sample leads us toprefer the null hypothesis.

One difficulty with this lies in the way thatthe system imposes a rigid choice betweenthe hypothesis and the null hypothesis basedsolely on a simple test of the likelihood thatthe data obtained could be explained away asa chance variation if the null hypothesis weretrue. Few decisions in real life would bemade on the basis of such a limited numberof considerations and the same is true interms of research activity. The nature ofscientific and intellectual endeavour is that

ideas and findings are scrutinized andchecked. Hence, the test of the null hypothesisis merely a stage in the process and quite acrude one at that – just answering the ques-tion of how well the findings fit the possibilitythat chance factors alone might be responsi-ble. Of course, there are any number of otherreasons why a hypothesis or null hypothesismay need consideration even after statisticalsignificance testing has indicated a preferencebetween the hypothesis and null hypothesis.For example, inadequate methodology includ-ing weak measurement may be responsible.See also: significant

HYPOTHESIS TESTING 77

Cramer Chapter-H.qxd 4/22/04 2:14 PM Page 77

Page 89: Dictionary of statistics

icicle plot: one way of graphically presentingthe results of a cluster analysis in which thevariables to be clustered are placed horizon-tally with each variable separated by a col-umn. The number of clusters is presentedvertically, starting with the final cluster andending with the first cluster. A symbol, suchas X, is used to indicate the presence of a vari-able as well as the link between variables andclusters in the columns between the variables.

identification: the extent to which there issufficient information to estimate the para-meters of a model in structural equationmodelling. In a just-identified model there isjust enough information to identify or esti-mate all the parameters of the model. Suchmodels always provide a perfect fit to thedata. In an under-identified model there isnot sufficient information to estimate all theparameters of the model. In an over-identifiedmodel there is more than enough informationto identify all the parameters of the modeland the fit of different over-identified modelsmay be compared.

ill-conditioned matrix: see determinant

independent-samples t test: see t test forunrelated samples

independent variable: a variable thoughtto influence or affect another variable. Thisother variable is sometimes referred to as adependent variable as its values are thoughtto depend on the corresponding value of theindependent variable. See also: analysis ofvariance; between-groups variance; depen-dent variable; multiple regression; regres-sion equation

inferential statistics: that branch of statis-tics which deals with generalization fromsamples to the population of values. Itinvolves significance testing. The othermain sort of statistics is descriptive statisticswhich involve the tabulation and organiza-tion of data in order to demonstrate theirmain characteristics.

infinity: basically the largest possible num-ber. Infinity can by definition always beexceeded by adding one. It is usually sym-bolized as �. The concept is of little practicalimportance in routine statistical analyses.

integer: a whole number such as 1, 50 or209. Fractions and decimals, then, are notintegers.

I

Cramer.Chapter-I.qxd 4/22/04 2:22 PM Page 78

Page 90: Dictionary of statistics

interaction:a situation in which the influenceof two or more variables does not operate in asimple additive pattern. This occurs when therelationship between two variables changesmarkedly when the values of another vari-able(s) are taken into account. For example,the relationship between gender and depres-sion may vary according to marital status.Never married women may be less depressedthan never married men whereas marriedwomen may be more depressed than marriedmen. In other words, whether women are moredepressed than men depends on the third vari-able of depression. It is not possible to accountfor depression on the basis of the separateinfluences of gender and marital status, in thiscase. Particular combinations of the values ofthe variable gender and of the variable maritalstatus tend to produce especially high or espe-cially low depression levels. So, for example,women are more depressed than men whenthey are married and less depressed when theyhave never been married. In this case thereis an interaction between gender and maritalstatus on depression.

The presence or absence of an interactionmay be more readily understood if the rele-vant statistic for the different groups is dis-played in a graph such as that in Figure I.1. Inthis figure marital status is represented as twopoints on the horizontal axis while gender isrepresented by two types of lines. Which vari-able is represented by the horizontal axis andwhich variable is represented by the lines doesnot matter. The vertical axis is used to repre-sent the relevant statistic. In this case it is meandepression score. There is no interaction whenthe two lines in the graph are more or less par-allel to one another. There is an interactionwhen the two lines are not parallel as is thecase here. Whether this interaction is statisti-cally significant depends on the appropriatestatistical test being applied, which in this casewould be a 2 � 2 analysis of variance. See also:analysis of variance; factorial design; log–linearanalysis; main effect; multiple regression

intercept: in a simple or bivariate regres-sion, the intercept is the point at which theregression line intercepts or cuts across the

vertical axis when the value of the horizontalaxis is zero. The vertical or y axis representsthe criterion variable while the horizontal orx axis represents the predictor variable. Thecut point is usually symbolized as a in theregression equation.

More generally, the intercept is the para-meter in a regression equation which is theexpected or predicted value of the criterionvariable when all the predictor variables arezero. It is often referred to as the constant in aregression analysis. See also: slope; unstan-dardized partial regression coefficient

interjudge (coder, rater) reliability: theextent to which two or more judges, coders orraters are consistent or similar in the waythey make judgements about identical infor-mation. These measures include Cohen’skappa coefficient for categorical variablesand Ebel’s intraclass correlation for non-categorical variables. For example, does teacherA rate the same children as intelligent asteacher B does? See also: reliability

internal consistency: the extent to whichthe items making up a scale are related to oneanother. The most widely used test isCronbach’s alpha reliability or �. Essentially,it should vary from 0 to 1. Measures with analpha of 0.75 or more are considered to beinternally consistent. The more similar theitems, the higher alpha is likely to be. Scales

INTERNAL CONSISTENCY 79

Nevermarried

Married

High

Low

Women

Men

Figure I.1 A group showing an interaction

Cramer.Chapter-I.qxd 4/22/04 2:22 PM Page 79

Page 91: Dictionary of statistics

with more items are also more likely to havehigher alpha values. An alpha with a negativevalue may be obtained if items have not beenappropriately recoded. For example, if highscores are being used to indicate greater anxi-ety but there are some items which have beenworded so that high scores represent low anx-iety, then these items need to be recoded. Seealso: alpha reliability, Cronbach’s

internal validity: the extent to which aresearch design allows us to infer that a rela-tionship between two variables is a causalone or that the absence of a relationship indi-cates the lack of a causal relationship. Theimplication is that the interpretation is validin the context of the research situation butmay simply not apply to non-research set-tings. That is, the findings have no externalvalidity. See also: randomization

interquartile range: the range of the middle50% of scores in a distribution. It is obtainedby dividing the scores, in order, into four dis-tinct quarters with equal numbers of scores ineach quarter (Figure I.2). If the largest andsmallest quarters are discarded, the interquar-tile range is the range of the remaining 50% ofthe scores. See also: quartiles; range

interval or equal interval scale or level ofmeasurement: one of several different typesof measurement scale – nominal, ordinal andratio are the others. An interval scale is a

measure in which the adjacent intervalsbetween the points of the scale are of equalextent and where the measure does not have anabsolute zero point. It is thought that psycho-logical characteristics such as intelligence oranxiety do not have an absolute zero point inthe sense that no individual can be described ashaving no intelligence or no anxiety. A measureof such a construct may consist of a number ofitems. For example, a measure of anxiety mayconsist of 10 statements or questions whichcan be answered in terms of, say, ‘Yes’ or ‘No’.If these two responses are coded as 0 or 1, theminimum score on this measure is 0 and themaximum 10. The intervals between the adja-cent possible scores of this measure are of thesame size, namely 1. So, the size of the intervalbetween a score of 2 and a score of 3 is thesame as that between a score of 3 and a scoreof 4. Let us assume that higher scores indicatehigher anxiety. Because a score of 0 does notrepresent the absence of anxiety, we cannotsay that a person who has a score of 8 is twiceas anxious as someone with a score of 4.Measures which have an absolute zero pointand equal intervals, such as age, are known asratio levels of measurement or scales becausetheir scores can be expressed as ratios. Forexample, someone aged 8 is twice as old assomeone aged 4, the ratio being 8:4. It has beensuggested that interval and ratio scales, unlikeordinal scales, can be analysed with paramet-ric statistics. Some authors argue that in prac-tice it is neither necessary nor possible todistinguish between ordinal, interval and ratioscales of measurement in many disciplines.Instead they propose a dichotomy betweennominal measurement and score measure-ment. See also: measurement; ratio level ofmeasurement; score

THE SAGE DICTIONARY OF STATISTICS80

Interquartile range

Lowest 25% ofscores

Middle 50% (or middle twoquarters) of scores

Highest 25% ofscores

Figure I.2 Interquartile range

Cramer.Chapter-I.qxd 4/22/04 2:22 PM Page 80

Page 92: Dictionary of statistics

intervening variable: a variable thought toexplain the relationship between two othervariables in the sense that it is caused by oneof them and causes the other. For example,there may be a relationship between genderand pay in that women are generally less wellpaid than men. Gender itself does not imme-diately affect payment but is likely to influ-ence it through a series of intermediate orintervening variables such as personal, famil-ial, institutional and societal expectations andpractices. For instance, women may be lesslikely to be promoted which in turn leads tolower pay. Promotion in this case would bean intervening variable as it is thought toexplain at least the relationship between gen-der and pay. Intervening variables are alsoknown as mediating variables.

intraclass correlation, Ebel’s: estimatesthe reliability of the ratings or scores of threeor more judges. There are four different formsof this correlation which can be described andcalculated with analysis of variance.

If the ratings of the judges are averaged foreach case, the appropriate measure is theinterjudge reliability of all the judges whichcan be determined as follows:

If only some of the cases were rated by all thejudges while other cases were rated by one ofthe other judges, the appropriate measure isthe interjudge reliability of an individualjudge which is calculated as follows:

The above two measures do not determine ifa similar rating is given by the judges butonly if the judges rank the cases in a similarway. For example, judge A may rate case A as3 and case B as 6 while judge B may rate caseA as 1 and case B as 5. While the rank orderof the two judges is the same in that bothjudges rate case B as higher than case A, none

of their ratings are the same. Measures whichtake account of the similarity of the ratings aswell as the similarity of their rank order maybe known as interjudge agreement andinclude a measure of between-judges variance.There are two forms of this measure whichcorrespond to the two forms of interjudgereliability.

If the ratings of the judges are averaged foreach case, the appropriate measure is theinterjudge agreement of all the judges whichcan be determined as follows:

If only some of the cases were rated by all thejudges while other cases were rated by one ofthe other judges, the appropriate measure isthe interjudge reliability of an individualjudge which is calculated as follows:

Tinsley and Weiss (1975)

iteration: not all calculations can be fullycomputed but can only be approximated togive values that can be entered into the equa-tions to make a better estimate. Iteration isthe process of doing calculations which makebetter and better approximations to the opti-mal answer. Usually in iteration processes,the steps are repeated until there is littlechange in the estimated value over successiveiterations. Where appropriate, computer pro-grams allow the researcher to stipulate theminimum improvement over successive iter-ations at which the analysis ceases. Iterationis a common feature of factor analysis andlog–linear analysis as prime examples.

Generally speaking, because of the compu-tational labour involved, iterative processesare rarely calculated by hand in statisticalanalysis.

ITERATION 81

between-subjects variance � error variancebetween-subjects variance

between-subjects variance � error variancebetween-subjects variance � [error variance �

(number of judges � 1)]

between-subjects variance � error variance �between-judges variance

between-subjects variance

between-subjects variance � error variance �between-judges variance

between-subjects variance � [(error �between-judges variance) � (number of

judges � 1)]

Cramer.Chapter-I.qxd 4/22/04 2:22 PM Page 81

Page 93: Dictionary of statistics

joint probability: the probability of havingtwo distinct characteristics. So one can speakof the joint probability of being male and rich,for example.

just-identified model: a term used in struc-tural equation modelling to describe a model

in which there is just sufficient information toestimate all the parameters of the model. Thismodel will provide a perfect fit to the dataand will have no degrees of freedom. Seealso: identification

J

Cramer Chapter-J.qxd 4/22/04 2:14 PM Page 82

Page 94: Dictionary of statistics

Kaiser’s or Kaiser–Guttman criterion: acriterion used in factor analysis to determinethe number of factors or components for con-sideration and possible rotation. It may beregarded as a minimal test of statistical signi-ficance of the factor. The criterion is that fac-tors with eigenvalues of greater than 1.00should be retained or selected for rotation. Aneigenvalue is the amount of variance that isexplained or accounted for by a factor. Themaximum amount of variance that a variablein a factor analysis can have is 1.00. So, thecriterion of 1.00 means that the factorsselected will explain the variance equal tothat of at least one variable on average. It hasbeen suggested that this criterion may selecttoo many factors when there are many vari-ables and too few factors when there are fewvariables. Consequently, other criteria havebeen proposed for determining the optimumnumber of factors to be selected, such asCattell’s scree test. See also: exploratoryfactor analysis

kappa (κκ) coefficient, Cohen’s: an indexof the agreement between two judges incategorizing information. It can be extendedto apply to more than two judges. The pro-portion of agreement between two judges isassessed while taking into account the pro-portion of agreement that may simply occurby chance. It can be expressed in terms of thefollowing formula:

which can be re-expressed in frequencies:

Kappa can vary from –1 to 1. A kappa of0 indicates agreement equal to chance levels.A negative kappa indicates a less than chanceagreement and a positive kappa a greaterthan chance agreement. A kappa of 0.70 ormore is usually considered to be an accept-able level of agreement or reliability.

Siegal and Castellan (1988)

Kendall’s partial rank correlationcoefficient: a measure of partial associationfor ordinal variables in which one or moreother ordinal variables may be partialled out orcontrolled. Its calculation is similar to that ofpartial correlation except that Kendall’s rankcorrelation tau b is used instead of Pearson’scorrelation coefficient in the formula.

Kendall’s rank correlation or tau (��): ameasure of the linear association between

K

observed proportion of agreement � chance-expected proportion of agreement

1 � chance-expected proportion of agreement

observed frequency of agreement � chance-expected frequency of agreement

number of cases � chance-expectedfrequency of agreement

Cramer Chapter-K.qxd 4/22/04 2:15 PM Page 83

Page 95: Dictionary of statistics

two ordinal variables. A negative valuemeans that lower ranks on one variable areassociated with higher ranks on the othervariable. A positive value means that higherranks on one variable go together with higherranks on the other variable. A zero or close tozero value means that there is no linear asso-ciation between the two variables.

There are three forms of this measurecalled tau a, tau b and tau c.

Tau a should be used when there are no tiesor tied ranks. It can vary from –1 to 1. It canbe calculated with the following formula:

A concordant pair is one in which the firstcase is ranked higher than the second case,while a discordant pair is the reverse inwhich the first case is ranked lower than thesecond case.

Tau b should be used when there are ties ortied ranks. It can vary from –1 to 1 if the tableof ranks is square and if none of the row andcolumn totals are zero. It can be calculatedwith the following formula:

where T1 and T2 are the numbers of tied ranksfor the two variables.

Tau c should be used when the table ofranks is rectangular rather than square as thevalue of tau c can come closer to –1 or 1. It canbe worked out with the following formula:

where S is the number of columns or rowswhichever is the smaller. See also: correlation

Siegal and Castellan (1988)

Kolmogorov–Smirnov test for onesample: a non-parametric test for determin-ing whether the distribution of scores on anordinal variable differs significantly fromsome theoretical distribution for that vari-able. For example, we could compare the dis-tribution of the outcome of throwing a die60 times with the theoretical expectation thatit will land with 1 up 10 times, with 2 up10 times, and so forth, if it is an unbiased die.The largest absolute difference between thecumulative frequency of the observed andthe expected frequency of a value is used todetermine whether the observed andexpected distributions differ significantly.

Siegal and Castellan (1988)

Kolmogorov–Smirnov test for twosamples: a non-parametric test which deter-mines whether the distributions of scores onan ordinal variable differ significantly for twounrelated samples. The largest absolute dif-ference between the cumulative frequency ofthe two samples for a particular score orvalue is used to determine whether the twodistributions differ significantly.

Siegal and Castellan (1988)

Kruskal–Wallis one-way analysis ofvariance or H test: a non-parametric testused to determine whether the mean rankedscores for three or more unrelated samplesdiffer significantly. The scores for all the sam-ples are ranked together. If there is little dif-ference between the sets of scores, their meanranks should be similar. The statistic for thistest is chi-square which can be corrected forthe number of ties or tied scores.

Siegal and Castellan (1988)

kurtosis: a mathematically defined termwhich reflects the characteristics of the tails(extremes) of a frequency distribution. These

THE SAGE DICTIONARY OF STATISTICS84

number of concordant pairs � number ofdiscordant pairs

total number of pairs

number of concordant pairs � number ofdiscordant pairs

�total number of pairs � T1 � (total numberof pairs � T2)

(number of concordant pairs � number ofdiscordant pairs) � 2 � S

number of cases2 � (S � 1)

Cramer Chapter-K.qxd 4/22/04 2:15 PM Page 84

Page 96: Dictionary of statistics

tails may range from the elongated to therelatively stubby (Figure K.1). The bench-mark standard is the perfect normal distribu-tion which has zero kurtosis by definition.Such a distribution is described as mesokur-tic. In contrast, a shallower distribution withrelatively elongated tails is known as a lepto-kurtic distribution. A steeper distribution

with shorter, stubbier tails is described asplatykurtic.

The mathematical formula for kurtosisis rarely calculated in many disciplinesalthough discussion of the concept of kurtosisitself is relatively common. See also: moment;standard error of kurtosis

KURTOSIS 85

Normal curve (mesokurtic curve)

Leptokurtic curve

Platykurtic curve

Figure K.1 Main types of kurtosis

Cramer Chapter-K.qxd 4/22/04 2:15 PM Page 85

Page 97: Dictionary of statistics

large-sample formulae:when sample sizesare small to moderate, there are readily avail-able tables giving the critical values of manynon-parametric or distribution-free tests ofsignificance. These tables allow quick andeasy significance testing especially as manyof these statistics are easy to compute byhand. Unfortunately, because of the demands ofspace and practicality, most of these tables havea fairly low maximum sample size whichreduces their usefulness in all circumstances. Incircumstances where the tabulated values areinsufficiently large, large-sample formulaemay be employed which can be used to assesssignificance. Frequently, but not always, theselarge-sample formulae give values whichapproximate that of the z distribution. One dis-advantage of the large-sample formulae is theneed to rank large numbers of scores in someinstances. This is a cumbersome procedure andlikely to encourage mistakes when calculatingby hand. An alternative approach in thesecircumstances would be to use the parametricequivalent of the non-parametric test since mostof the limitations on the use of parametric testsbecome unimportant with large-sample sizes.

latent root: another term for eigenvalue.See also: eigenvalue, in factor analysis

latent variable: the term used in structuralequation modelling to refer to a variable

which either has had its unreliabilityadjusted for or is based on the varianceshared by two or more other measures andthought to represent it (which is similar toadjusting for unreliability). Many theoreticalconstructs or variables cannot be directlymeasured. For example, intelligence is atheoretical construct which is measured byone’s performance on various tasks which arethought to involve it. However, performanceon these tasks may involve variables otherthan intelligence, such as one’s experience orfamiliarity with those tasks. Consequently,the measures used to assess a theoretical vari-able may be a partial manifestation of thatvariable. Hence a distinction is made betweena manifest variable and a latent variable. In apath diagram the latent variable may be signi-fied by a circle or ellipse, while a manifestvariable is depicted by a square or rectangle.The relationship between a latent variableand a manifest variable is presented by anarrow going from the latent variable to themanifest variable as the latent variable isthought partly to show or manifest itself inthe way in which it is measured. See also:canonical correlation; manifest variable

Latin Square: a display of rows andcolumns (i.e. a matrix or array) in which eachsymbol occurs just once in each column andeach row. They are also known as ‘magicsquares’. An example of a magic square withthree different elements is shown in Table L.1.

L

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 86

Page 98: Dictionary of statistics

For our purposes, C1, C2 and C3 refer to dif-ferent conditions of an experiment thoughthis is not a requirement of Latin Squares.

The statistical application of Latin Squaresis in correlated or repeated-measures designs.It is obvious from Table L.1 that the threesymbols, C1, C2 and C3, are equally frequentand that with three symbols one needsthree different orders to cover all possibleorders (of the different conditions) effectively.So for a study with three experimental condi-tions, Table L.1 could be reformulated asTable L.2.

As can be seen, only three participants arerequired to run the entire study in terms of allof the possible orders of conditions beingincluded. Of course, six participants wouldallow the basic design to be run twice, andnine participants would allow three completeversions of the design to run.

The advantage of the Latin Square is thatit is a fully counterbalanced design so anyorder effects are cancelled out simplybecause all possible orders are run and inequal numbers. Unfortunately, this advan-tage can only be capitalized upon in certaincircumstances. These are circumstances inwhich it is feasible to run individuals in anumber of experimental conditions, wherethe logistics of handling a variety of condi-tions are not too complicated, where servingin more than one experimental conditiondoes not produce a somewhat unconvincing

or even ludicrous scenario for the participants,and so forth.

It is also fairly obvious that the more condi-tions there are in an experimental design, themore complex will be the Latin Square. Hence,it is best considered only when the number ofconditions is small. The Latin Square designeffectively is a repeated-measures ANOVAdesign. The statistical analysis of such designscan be straightforward so long as it isassumed that the counterbalancing of ordersis successful. If this is assumed, then the LatinSquare design simplifies to a related analysisof variance comparing each of the conditionsinvolved in the study. See also: counter-balanced designs; order effect; within-subjectsdesign

least significant difference (LSD) test:see Fisher’s LSD (Least SignificantDifference) or protected test

least squares method in analysis ofvariance: see Type II, classic experimentalor least squares method in analysis ofvariance

leptokurtic: a frequency distribution whichhas relatively elongated tails (i.e. extremes)compared with the normal distribution and istherefore a rather flat shallow distributionof values. A good example of a leptokurticfrequency distribution is the t distribution

LEPTOKURTIC 87

Table L.1 A simple Latin Square

C1 C3 C2C2 C1 C3C3 C2 C1

Table L.2 An experimental design based on the Latin Square

Participant 1 Participant 2 Participant 3

Condition C1 C3 C2run first

Condition C2 C1 C3run second

Condition C3 C2 C1run third

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 87

Page 99: Dictionary of statistics

and especially for small numbers of degreesof freedom. Although this distribution is thesame as the normal distribution (i.e. z distri-bution) for very large numbers of degrees offreedom, in general it is flatter and more lepto-kurtic than the normal distribution. See also:kurtosis

levels of treatment: the different condi-tions or groups that make up an independentvariable in a research design and its analysis.A factor or independent variable which hastwo conditions may be described as havingtwo levels of treatment. It is a term normallyapplied in the context of research usingexperimental designs.

Levene’s test for equality of relatedvariances: a test of whether the variances oftwo or more related samples are equal (i.e.homogeneous or similar). This is a require-ment for statistics which combine the vari-ances from several samples to give a ‘betterestimate’ of population characteristics. Theassumption that these variances do not differis an assumption of the repeated-measuresanalysis of variance, for example, which isused to determine whether the means of twoor more related samples differ. Of course, thetest may be used in any circumstances whereone wishes to test whether variances differ.For the Levene test, the absolute differencebetween the score in a group and the meanscore for that group is calculated for all thescores. A single-factor repeated-measuresanalysis of variance is carried out on theseabsolute difference scores. If the F ratio forthis single factor is significant, this meansthat two or more of the variances differ sig-nificantly from each other. If the assumptionthat the variances are equal needs to be met,it may be possible to do this by transformingthe scores using, for example, their squareroot or logarithm. This is largely a matter oftrial and error until a transformation is foundwhich results in equal variances for the setsof transformed scores. See also: analysis ofvariance; heterogeneity

Levene’s test for equality of unrelatedvariances: one applies this test when deal-ing with unrelated data from two or more dif-ferent samples. It tests whether two or moresamples of scores have significantly unequal(heterogeneous or dissimilar) variances.Usually this assumption of equality isrequired in order to combine two or morevariance estimates to form a ‘better’ estimateof the population characteristics. The mostcommon application of the Levene test is inthe unrelated analysis of variance where thevariances of the different samples should notbe significantly different. However, it isequally appropriate to test the hypothesisthat two variances differ using this test inmore general circumstances. For the Levenetest, the absolute difference between the scorein a group and the mean score for that groupis calculated for all the scores. A single-factoranalysis of variance is carried out on theseabsolute difference scores. If the F ratio forthis single factor is significant, this meansthat two or more of the variances differ sig-nificantly from each other. If the assumptionthat the variances are equal needs to be met,it may be possible to achieve equal variancesby transforming all of the scores using, forexample, their square root or logarithm. Thisis normally a matter of trying out a variety oftransformations until the desired equality ofvariances is achieved. There is no guaranteethat such a transformation will be found.

likelihood ratio chi-square:a version of chi-square which utilizes natural logarithms. It isdifferent in some respects from the more famil-iar form which should be known in full asPearson chi-square (though both were devel-oped by Karl Pearson). It is useful where thecomponents of an analysis are to be separatedout since this form of chi-square allows accu-rate addition and subtraction of components.Because of its reliance on natural logarithms,likelihood ratio chi-square is a little more diffi-cult to compute. Apart from its role in log–linear analysis, there is nothing to be gained byusing likelihood ratio chi-square in general.With calculations based on large frequencies,numerically the two differ very little anyway.

THE SAGE DICTIONARY OF STATISTICS88

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 88

Page 100: Dictionary of statistics

The formula for the likelihood ratio chi-square involves the observed and expectedfrequencies which are common to the regu-lar chi-square test. It also involves naturallogarithms, which may be obtained fromtables, scientific calculators or statisticalsoftware:

It is difficult to imagine the circumstances inwhich this will be calculated by hand. In itscommonest application in statistics (log–linear analysis), the computations are toorepetitive and complex to calculate withoutthe aid of a computer.

Likert scaling: the name given to themethod of constructing measurement scalesinvolving a series of statements to which therespondent is asked to respond on an orderedscale such as

We should throw away the key for convictedsex offenders.

Strongly Disagree Neither Agree Stronglydisagree agree

These response alternatives from stronglydisagree to strongly agree are then scored 1 to5, 1 to 7, or 1 to 9 according to the number ofpoints. The numerical scale is, in fact, arbi-trary but seems to work well in practice andit is conventional to score the responses usingthe above system. Likert scales are extremelyfamiliar and have the big advantage that thequestions can simply be summed to obtain ascore. (Though it is probably better to trans-form the scores to standard scores in order todo so.)

The introduction of Likert scaling simpli-fied an earlier type of scale developed byThurstone. The latter was cumbersome asquestions/items had to be developed whichcovered the whole range of possible attitudesusing subtly worded questions/items.

line of best fit: see regression line

linear association or relationship: seecurvilinear relationship

LISREL: an abbreviation for linear struc-tural relationships. It is one of several com-puter programs for carrying out structuralequation modelling. Information aboutLISREL can be found at the followingwebsite:http://www.ssicentral.com/lisrel/mainlis.htm

See also: structural equation modelling

listwise deletion: a keyword used in somestatistical packages such as SPSS to confinean analysis to those cases for which there areno missing values for all the variables listedfor that analysis. When this is done wherethere are some missing values, the numberof cases will be the same for all the analyses.Suppose, for example, we want to correlatethe three variables of job satisfaction,income and age in a sample of 60 people andthat 3 cases have missing data for incomeand a different set of 4 cases have missingdata for age. All 7 cases will be omitted fromthe analysis. So when using the listwiseprocedure all the correlations will be basedon 53 cases as this number of cases has nomissing data on all three variables. An alter-native procedure of analysis is to providecorrelations for those cases which do nothave missing values for those variables.The keyword for this procedure is pairwisedeletion in a statistical package such asSPSS. In this case, the correlations will bebased on different numbers of cases. Thecorrelation between job satisfaction andincome will be based on 57 cases, thatbetween job satisfaction and age on 56 casesand that between income and age on 53cases. See also: missing values; pairwisedeletion

LISTWISE DELETION 89

2 � � natural logarithm ofobserved frequency

expected frequency

� observed frequency

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 89

Page 101: Dictionary of statistics

loading: see exploratory factor analysis;factor, in factor analysis

log likelihood: determines whether the pre-dictor variables included in a model providea good fit to the data. It is usually multipliedby �2 so that it takes the approximate form ofa chi-square distribution. A perfect fit is indi-cated by 0 while bigger values signify pro-gressively poorer fits. Because the �2 loglikelihood will be bigger the larger thesample, the size of the sample is controlled bysubtracting the �2 log likelihood value ofa model containing the predictors fromthe �2 log likelihood value of a model notcontaining any predictors.

The log likelihood is the sum of the proba-bilities associated with the predicted andactual outcome for each case which can becalculated with the following formula:

(outcome � log of predicted probability) �[(1 � outcome) � log of (1 � predicted

probability)]

See also: logistic regression

log–linear analysis: conceptually, this maybe regarded as an extension of the Pearsonchi-square test to cross-tabulation or contin-gency tables involving three or more vari-ables. It is recommended that the chi-squaretest be studied in detail prior to attemptingthe more complex log–linear analysis. Likechi-square, log–linear analysis involvesnominal or category (or categorical) data (i.e.frequency counts). However, log–linearanalysis can be used for score data if thescores are subdivided into a small number ofdivisions (e.g. 1–5, 6–10, 11–15, etc.).

Table L.3 contains a three-variable cross-tabulation table for purposes of illustration.This cannot be analysed properly using chi-square because the table has too manydimensions, though researchers in the pastcommonly would carry out numeroussmaller two-variable chi-squares derivedfrom this sort of table. Unfortunately, such an

approach cannot fully analyse the data nomatter how thoroughly applied. There is nosimple way of calculating the expected fre-quencies for three or more variables. Inlog–linear analysis, the objective is to find themodel which best fits the empirical data. Thefit of the model to the empirical data isassessed by using chi-square (usually likeli-hood ratio chi-square). This version of thechi-square formula has the major advantagethat the values of chi-square it provides maybe added or subtracted directly without intro-ducing error.

Significant values of chi-square mean thatthe data depart from the expectations of themodel. This indicates that the model has apoor fit to the data. A non-significant value ofchi-square means that the model and the datafit very well. In log–linear analysis, what thismeans is that the cross-tabulated frequenciesare compared with the expected frequenciesbased on the selected model (selected combi-nation of main effects and interactions). Thecloser the actual frequencies are to theexpected (modelled) frequencies, the betterthe model is.

The models created in log–linear analysisare based on the following components:

1 The overall mean frequency (sometimesknown as the constant). This would be theequal-frequencies model. Obviously if allof the individual cell means are the sameas the overall mean, then we would needto go no further in the analysis since theequal-frequencies model applies.

2 The main effects (i.e. the extent to whichthe frequencies on any variable departfrom equality). A gender main effect wouldsimply mean that there are different

THE SAGE DICTIONARY OF STATISTICS90

Table L.3 A three-variable contingency table

Male Female Male Female

Convicted of Never convictedcrime of crime

Rural 29 9 79 65upbringing

Urban 50 17 140 212upbringing

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 90

Page 102: Dictionary of statistics

numbers of males and females in theanalysis. As such, in much research maineffects are of little interest, though inother contexts the main effects are impor-tant. For example, if it were found thatthere was a main effect of sex in a study ofan anti-abortion pressure group, thismight be a finding of interest as it wouldimply either more men in the group ormore women in the group.

3 Two-way interactions – combinations oftwo variables (by collapsing the cate-gories or cells for all other variables)which show larger or smaller frequencieswhich can be explained by the influenceof the main effects operating separately.

4 Higher order interactions (what they aredepends on the number of variablesunder consideration). In this example,the highest order interaction would begender � rural � prison.

5 The saturated model – this is simply thesum of all the above possible components.It is always a perfect fit to the data by defi-nition since it contains all of the compo-nents of the table. In log–linear analysis, thestrategy is often to start with the saturatedmodel and then remove the lower levels(the interactions and then the main effects)in turn to see whether removing the com-ponent actually makes a difference to the fitof the model. For example, if taking awayall of the highest order interactions makesno difference to the data’s fit to the model,then the interactions can be safely droppedfrom the model as they are adding nothingto the fit. Generally the strategy is to dropa whole level at a time to see if it makes adifference. So three-way interactions wouldbe eliminated as a whole before going onto see which ones made the difference if adifference was found.

Log–linear analysis involves heuristic meth-ods of calculations which are only possibleusing computers in practice since theyinvolve numerous approximations towardsthe answer until the approximation improvesonly minutely. The difficulty is that the inter-actions cannot be calculated directly. Oneaspect of the calculation which can beunderstood without too much mathematical

knowledge is the expected frequencies fordifferent components of a log–linear model.These are displayed in computer printouts oflog–linear analysis.

Interaction in log–linear analysis is oftencompared with interactions in ANOVA. Thekey difference that needs to be considered,though, is that the frequencies in the variouscells are being predicted by the model (or pat-tern of independent variables) whereas inANOVA it is the means within the cells on thedependent variable which are being pre-dicted by the model. So in log–linear analysis,an interaction between sex and type ofaccommodation (i.e. owned versus rented)simply indicates that, say, there are moremales living in rented accommodation thancould be accounted for simply on the basis ofthe proportion of people in rented accommo-dation and the number of males in the study.See also: hierarchical or sequential entry;iteration; logarithm; natural or Napierianlogarithm; stepwise entry

Cramer (2003)

log of the odds: see logit

logarithm: to understand logarithms, oneneeds also to understand what an exponent is,since a logarithm is basically an exponent.Probably the most familiar exponents arenumbers like 22, 32, 42, etc. In other words, thesquared 2 symbol is an exponent as obviouslya cube would also be as in 23. The squaredsign is an instruction to multiply (raise)the first number by itself a number of times:22 means 2 � 2 � 4, 23 means 2 � 2 � 2 � 8.

The number which is raised by the expo-nent is called the base. So 2 is the base in 22 or23 or 24, while 5 is the base in 52 or 53 or 54. Thelogarithm of any number is given by a simpleformula in which the base number is repre-sented by the symbol b, the logarithm may berepresented by the symbol e (for exponent),and the number under consideration is giventhe symbol x:

x � be

LOGARITHM 91

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 91

Page 103: Dictionary of statistics

So a logarithm of a number is simply theexponent of a given base (which is of one’sown choosing such as 10, 2, 5, etc.) whichgives that number. One regularly used basefor logarithms is 10. So the logarithm for thebase 10 is the value of the exponent (e) for10 which equals our chosen number. Let’ssuppose that we want the logarithm of thenumber 100 for the base 10. What we areactually seeking is the exponent of 10 whichgives the number 100:

10e � 100

In other words, the logarithm for 100 is 2simply because 10 to the power of 2 (i.e. 102 or10 � 10) equals 100. Actually the logarithmwould look more like other logarithms if wewrite it out in full as 2.000.

Logarithm tables (especially to the base 10)are readily available although their common-est use has rather declined with the advent ofelectronic calculators and computers – thatwas a simple, speedy means of multiplyingnumbers by adding together two logarithms.However, students will come across them instatistics in two forms:

1 In log–linear analysis, which involves nat-ural logarithms in the computation of thelikelihood ratio chi-square. The base innatural logarithms is 2.718 (to three deci-mal places).

2 In transformations of scores especiallywhen the unmodified scores tend to vio-late the assumptions of parametric tests.Logarithms are used in this situationsince it would radically alter the scale.So, the logarithm to the base 10 forthe number 100 is 2.000, the logarithmof 1000 � 3.000 and the logarithm of10,000 � 4.000. In other words, although1000 is 10 times as large as the number100, the logarithmic value of 1000 is 3.000compared with 2.000 for the logarithm of100. That is, by using logarithms it is pos-sible to make large scores proportionallymuch less. So basically by putting scoreson a logarithmic scale the extreme scorestend to get compacted to be relativelycloser to what were much smaller scoreson the untransformed measure. Logarithms

to any base could be used if they produce adistribution of the data appropriate for thestatistical analysis in question. For exam-ple, logarithms could be used in somecircumstances in order to make a moresymmetrical distribution of scores. See also:Fisher’s z transformation; natural orNapierian logarithm; transformations

logarithmic scale: because the logarithm ofnumbers increases relatively slowly to thenumbers that they represent, logarithms canbe used to adjust or modify scores so thatthey meet the requirements of particular sta-tistical techniques better. Thus, expressed asone form of logarithm (to the base 10), thenumbers 1 to 10 can be expressed as loga-rithms as shown in Table L.4.

In other words, the difference between1 and 10 on a numerical scale is 9 whereas itis 1.00 on a logarithmic scale.

By using such a logarithmic transforma-tion, it is sometimes possible to make thecharacteristics of one’s data fit better theassumptions of the statistical technique. Forexample, it may be possible to equate thevariances of two sets of scores using thelogarithmic transformation. It is not a verycommon procedure among modern researchersthough it is fairly readily implemented onmodern computer packages such as SPSS.

logarithmic transformation: seetransformations

THE SAGE DICTIONARY OF STATISTICS92

Table L.4 Logarithms to the base 10

Number Logarithm

1 0.002 0.303 0.484 0.605 0.706 0.787 0.858 0.909 0.95

10 1.00

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 92

Page 104: Dictionary of statistics

logged odds: see logit

logistic (and logit) regression: a tech-nique (logit regression is very similar to butnot identical with logistic regression) used todetermine which predictor variables are moststrongly and significantly associated with theprobability of a particular category in the cri-terion variable occurring. The criterion vari-able can be a dichotomous or binary variablecomprising only two categories or it can be apolychotomous, polytomous or multinomialvariable having three or more categories. Theterm binary or binomial may be used todescribe a logistic regression having a binarycriterion (dependent variable) and the termmultinomial to describe one having a crite-rion with more than two categories. As inmultiple regression, the predictors can beentered in a single step, in a hierarchical orsequential fashion or according to some otheranalytic criterion.

Because events either occur or do notoccur, logistic regression assumes that therelationship between the predictors and thecriterion is S-shaped or sigmoidal rather thanlinear. This means that the change in a pre-dictor will have the biggest effect at the mid-point of 0.5 of the probability of a categoryoccurring, which is where the probability ofthe category occurring is the same as theprobability of it not occurring. It will haveleast effect towards the probability of 0 and 1of the category occurring. This change isexpressed as a logit or the natural logarithmof the odds.

The unstandardized logistic regressioncoefficient is the change in the logit of thecategory for every change of 1 unit in thepredictor variable. For example, an unstand-ardized logistic regression coefficient of 0.693means that for every change of 1 unit in thepredictor there is a change of 0.693 in the logitof the category. To convert the logit into theestimated odds ratio, we raise or exponenti-ate the value of 2.718 to the power of thelogit, which in this case gives an odds ratio of2.00 (2.7180.693 � 2.00). Thus, the odds changeby 2.00 for every change of 1 unit in the pre-dictor. In other words, we multiply the odds

by 2.00 for every change of 1 unit in thepredictor. The operation of raising a constantsuch as 2.718 to a particular power such asthe unstandardized logistic regression coeffi-cient is called exponentiation and may bewritten as exp (symbol for the unstandard-ized logistic regression coefficient).

It is easiest to demonstrate the meaning ofa change in the odds with one predictor con-taining only two categories. Suppose wewere interested in determining the associa-tion between smoking and cancer for the datashown in Table L.5 where the criterion alsoconsists of only two categories.

The odds of a smoker having cancer are 2(40/20 � 2.00) while the odds of a non-smokerhaving cancer are 0.33 (10/30 � 0.33). Thechange in the odds of smokers and non-smokershaving cancer is about 6.00 which is the oddsratio ( 2.00/0.33 � 6.06). In other words, smok-ers are six times more likely to get cancer thannon-smokers. An odds ratio is a measure ofassociation between two variables. A ratio of 1means that there is no association betweentwo variables, a ratio of less than 1 a negativeor inverse relationship and a ratio of morethan 1 a positive or direct relationship.

With a binary predictor and multinomialcriterion the change in the odds is expressedin terms of each of the categories and the lastcategory. So, for the data in Table L.6, theodds of being diagnosed as anxious are com-pared with the odds of being diagnosed asnormal, while the odds of being diagnosed asdepressed are also compared with the oddsof being diagnosed as normal. The odds ofbeing diagnosed as anxious rather than asnormal for men are 0.40 (20/50 � 0.40) andfor women are 0.25 (10/40 � 0.25). So, theodds ratio is 1.60 (0.40/0.25 � 1.60). Men are1.6 times more likely to be diagnosed as anx-ious than women. The odds of being diag-nosed as depressed rather than as normal formen are 0.20 (10/50 � 0.20) and for women

LOGISTIC (AND LOGIT) REGRESSION 93

Table L.5 A binary criterion and predictor

Cancer No cancer

Smokers 40 20Non-smokers 10 30

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 93

Page 105: Dictionary of statistics

are 0.75 (30/40 � 0.75). The odds ratio is about0.27 (0.20/0.75 � 0.267). It is easier to under-stand this odds ratio if it had been expressedthe other way around. That is, women areabout 3.75 times more likely to be diagnosedas depressed than men (0.75/0.20 � 3.75).

The extent to which the predictors in amodel provide a good fit to the data can betested by the log likelihood, which is usuallymultiplied by �2 so that it takes the form of achi-square distribution. The difference in the�2 log likelihood between the model with thepredictor and without it can be tested forsignificance with the chi-square test with 1degree of freedom. The smaller this differenceis, the more likely it is that there is no differ-ence between the two models and so the lesslikely that adding the predictor improves thefit to the data. See also: hierarchical orsequential entry; stepwise entry

Cramer (2003); Pampel (2000)

logit, logged odds or log of the odds: thenatural logarithm of the odds. It reflects theprobability of an event occurring. It variesfrom about �9.21, which is the 0.0001 proba-bility that an event will occur, to about 9.21,which is the 0.9999 probability that an eventwill occur. It also indicates the odds of anevent occurring. A negative logit means thatthe odds are against the event occurring, a

positive logit that the odds favour the eventoccurring and a zero logit that the odds areeven. A logit of about �9.21 means that theodds of the event occurring are about 0.0001(or very low). A logit of 0 means that the oddsof the event occurring are 1 or equal. A logitof about 9.21 means that the odds of the eventoccurring are 9999 (or very high).

Logits are turned to probabilities using atwo-stage process. Stage 1 is to convert thelogit into odds using whatever tables orcalculators one has which relate natural loga-rithms to their numerical values. Suchresources may not be easy to find. Stage 2 isto turn the odds into a probability using theformula below:

Fortunately, there is rarely any reason to dothis in practice as the information is readilyavailable from computer output. See logisticregression; natural logarithm

longitudinal design or study: tests thesame individuals on two or more occasions orwaves. One advantage of a panel study is thatit enables whether a variable changes fromone occasion to another for that group ofindividuals.

Baltes and Nesselroade (1979)

LSD test: see Fisher’s LSD (LeastSignificant Difference) or protected test

THE SAGE DICTIONARY OF STATISTICS94

Table L.6 A multinomial criterion andbinary predictor

Anxious Depressed Normal

Men 20 10 50Women 10 30 40

probability 2.718 oddsof an � �

event 1 � 2.718 1 � odds

Cramer Chapter-L.qxd 4/22/04 2:15 PM Page 94

Page 106: Dictionary of statistics

main effect: the influence of a variableacting on its own or independently. It is to becontrasted with an interaction effect which isthe conjoint influence of two or more vari-ables which produce greater effects on thedata than the main effects would in a sum-mative fashion. See also: analysis of variance;within-groups variance

MANCOVA: see multivariate analysis ofcovariance

manifest variable: a term used in structuralequation modelling to describe a variablewhere the measure of that variable representsthat variable. For example, an IQ score maybe used as a direct measure of the theoreticalconstruct of intelligence. However, a mea-sure, like an IQ score, is often not a perfectrepresentation of the theoretical construct asit may not be totally reliable and may assessother variables as well. For example, an IQscore may also measure knowledge in partic-ular areas. Consequently, a manifest variableis distinguished from a latent variable whichmay take some account of the unreliability ofthe manifest variable or which may be basedon the variance shared by two or more mani-fest variables such as separate measures ofintelligence or subcomponents of an IQ test.In a path diagram, the manifest variable maybe signified by a square or rectangle, while a

latent variable is depicted by a circle orellipse. The relationship between a latentvariable and a manifest variable is presentedby an arrow going from the latent variable tothe manifest variable as the latent variable isthought partly to show or manifest itself inthe way in which it is measured.

manipulated variable: a variable whichhas been deliberately manipulated or variedto determine its effect on one or more othervariables. For example, if we are interested inthe effects of alcohol on performance, wemay manipulate or vary the amount ofalcohol consumed by different groups ofparticipants in a between-subjects design oron different occasions in a within-subjectsdesign. If all other factors are held constantapart from the manipulated variable and ifwe find that there are statistical differences inthe measured effects of that manipulatedvariable, we can be more certain that the dif-ferent effects were due to the manipulation ofthat variable.

Mann–Whitney U test: a non-parametrictest used to determine whether scores fromtwo unrelated samples differ significantlyfrom one another. It tests whether the num-ber of times scores from one sample areranked higher than scores from the othersample when the scores for both samples

M

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 95

Page 107: Dictionary of statistics

have been ranked in a single sample. If thetwo sets of scores are similar, the number oftimes this happens should be similar for thetwo groups. If the samples are 20 or less, thestatistical significance of the smaller U valueis used. If the samples are greater than 20, theU value is converted into a z value. The valueof z has to be 1.96 or more to be statisticallysignificant at the 0.05 two-tailed level or 1.65or more at the 0.05 one-tailed level.

Howitt and Cramer (2000)

MANOVA: see multivariate analysis ofvariance

margin of error: the range of means (orany other statistic) which are reasonablylikely possibilities for the population value.This is normally expressed as the confidenceinterval.

marginal totals: the total frequency ofcases in the rows or columns of a contingencyor cross-tabulation table.

matched-samples t test: see t test forrelated samples

matching: usually the process of selectingsets (pairs, trios, etc.,) of participants or casesto be similar in essential respects. In research,one of the major problems is the variationbetween people or cases over which the res-earcher has no control. Potentially, all sorts ofvariation between people and cases mayaffect the outcome of research, for examplegender, age, social status, IQ and many otherfactors. There is the possibility that these fac-tors are unequally distributed between theconditions of the independent variable andso spuriously appear to be influencing scores

on the dependent variable. For example, ifone measured examination achievement intwo different types of schools, educationachievement would be the dependent vari-able and the different types of schools differ-ent values of the independent variable. It isknown that girls are more successful educa-tionally. Hence, if there were more girls inone type of school than the other, then wewould expect that the type of school with themost girls would tend to appear to be themost educationally successful. This isbecause of the disproportionate number ofgirls, not that one type of school is superioreducationally, though it might be.

Matching actually only makes a differenceto the outcome if one matches on variableswhich are correlated with the dependent vari-able. The matching variable also needs to becorrelated with the independent variable inthe sense that there would have to be a dis-proportionate number in one conditioncompared with the other. Sometimes it isknown from previous empirical research thatthere are variables which correlate with boththe independent variable and the dependentvariable. More commonly, the researchermay simply feel that such matching is impor-tant on a number of possibly confoundingvariables.

The process of matching involves identify-ing a group of participants about which onehas some information. Imagine that it isdecided that gender (male versus female) andsocial class (working-class versus middle-class parents) differences might affect theoutcome of your study which consists oftwo conditions. Matching would entail thefollowing steps:

• Finding from the list two participantswho are both male and working class.

• One member of the pair will be in onecondition of the independent variable, theother will be in the other condition of theindependent variable. If this is an experi-ment, they will be allocated to the condi-tions randomly.

• The process would normally be repeatedfor the other combinations. That is, oneselects a pair of two female working-class participants, then a pair of two male

THE SAGE DICTIONARY OF STATISTICS96

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 96

Page 108: Dictionary of statistics

middle-class participants, and then a pairof two female middle-class participants.Then more matched pairs could beselected as necessary.

Of course, matching in this way is time con-suming and may not always be practicable ordesirable. Because it reduces the uncontrolledvariation between groups in a study, it poten-tially has the advantage of allowing the useof smaller sample sizes. So where there arepotentially few appropriate participantsavailable, matching may be a desirableprocedure.

In an ideal study, participants in the dif-ferent conditions ought to be matched interms of as many characteristics as possibleor practicable. One way of achieving this isto use pairs of identical twins since these arealike in a large number of biological, socialand psychological respects. They are geneti-cally identical and brought up in similarenvironments, for example. Another possi-bility is to use participants as their own con-trol. That is, use the same participants inboth (or more) conditions of the study ifpracticable. If counterbalancing is applied,this will often effectively match on a widerange of variables. The terms correlated sub-jects design or related subjects design arevirtually synonymous with matched subjectsdesigns.

Matching can be positively recommendedin circumstances in which there are positivetime and economic or other practical advan-tages to be gained by using relatively smallsamples. While it can be used to control forthe influence of unwanted variables on thedata, it can present its own problems. In par-ticular, participants may have to be rejectedsimply because they do not match others orbecause the researcher has already foundenough of that matched set of individuals.Furthermore, matching has lost some of itsusefulness with the advent of increasinglypowerful methods of statistical control aidedby modern computers. These allow manyvariables to be controlled for whereas mostforms of matching (except twins and own-controls) can cope with only a small numberof matching variables before they becomeunwieldy and unmanageable.

matrix: a rectangle of numbers or symbolsthat represent numbers. It consists of one ormore rows and one or more columns. It isusual to refer to the rows first. So, a 2 � 3matrix consists of two rows and threecolumns. The numbers or symbols in a matrixare called elements. The position of an ele-ment is represented by a symbol and twonumbers in subscripts. The first numberrefers to the row and the second to the col-umn in which the element is. So, a2,3 refers tothe element in the second row of the third col-umn. The position of an element may bereferred to more generally by its symbol andthe two subscripts, i and j, where i refers tothe row and j to the column.

matrix algebra: rules for transformingmatrices such as adding or subtracting them.It is used for calculating more complicatedstatistical procedures such as factor analysis,multiple regression and structural equationmodelling. For example, the following twomatrices, A and B, may be added together toform a third matrix, C:

Mauchly’s test of sphericity: like manytests, the analysis of variance makes certainassumptions about the data used. Violationsof these assumptions tend to affect the valueof the test adversely. One assumption is thatthe variances of each of the cells should bemore or less equal (exactly equal is a practicalimpossibility). In repeated-measures designs,it is also necessary that the covariances of thedifferences between each condition are equal.That is, subtract condition A from conditionB, condition A from condition C etc., and cal-culate the covariance of these differencescores until all possibilities are exhausted.The covariances of all of the differencesbetween conditions should be equal. If not,then adjustments need to be made to the

MAUCHLY’S TEST OF SPHERICITY 97

A � B � C

� �� � � � � �2 4

3 5

1 3

5 2

3 7

8 7

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 97

Page 109: Dictionary of statistics

analysis such as finding a test which does notrely on these assumptions (e.g. a multivariateanalysis of variance is not based on theseassumptions and there are non-parametricversions of the related ANOVA). Alternatively,there are adjustments which apply directlyto the ANOVA calculation as indicatedbelow.

Put another way, Mauchly’s test ofsphericity determines whether the variance–covariance matrix in a repeated-measuresanalysis of variance is spherical or circular,which means that it is a scalar multiple of anidentity matrix. An identity or sphericalmatrix has diagonal elements of 1 and off-diagonal elements of 0. If the assumption ofsphericity or circularity is not met, the F ratiois more likely to be statistically significant.There are several tests which adjust this biasby changing the degrees of freedom for the Fratio – that is, making the ANOVA less signif-icant. Examples of such adjustments includethe Greenhouse–Geisser correction and theHuynh–Feldt correction.

maximum likelihood estimation,method or principle: one criterion for esti-mating a parameter of a population of valuessuch as the mean value. It is the value that isthe most likely given the data from a sampleand certain assumptions about the distribu-tion of those values. This method is widelyused in structural equation modelling. Togive a very simple example of this principle,suppose that we wanted to find out what theprobability or likelihood was of a coin turn-ing up heads. We had two hypotheses. Thefirst was that the coin was unbiased and sowould turn up heads 0.50 of the time. Thesecond was that the coin was biased towardsheads and would turn up heads 0.60 of thetime. Suppose that we tossed a coin threetimes and it landed heads, tails and heads inthat order. As the tosses are independent, thejoint probability of these three events is theproduct of their individual probabilities. Sothe joint probability of these three events forthe first hypothesis of the coin being biased is0.125 (0.5 � 0.5 � 0.5 � 0.125). The jointprobability of these three events for the

second hypothesis of the coin being biasedtowards heads is 0.144 (0.6 � 0.6 � 0.4 �0.144). As the probability of the outcome forthe biased coin is greater than that for theunbiased coin, the maximum likelihood esti-mate is 0.60. If we had no hypotheses aboutthe probability of the coin turning up heads,then the maximum likelihood estimatewould be the observed probability which is0.67 (2/3 � 0.67).

Hays (1994)

McNemar test for the significance ofchanges: a simple means of analysing sim-ple related nominal designs in which partici-pants are classified into one of two categorieson two successive occasions. Typically, it isused to assess whether there has been changeover time. As such, one measure can bedescribed as the ‘before’ measure (which isdivided into two categories) and the secondmeasure is the ‘after’ measure. Consider thereadership of the two newspapers The LondonGlobe and The London Tribune. A sample ofreaders of these newspapers is collected.Imagine that then there is a promotional cam-paign to sell The London Tribune. The questionis whether the campaign is effective. Oneresearch strategy would be to study the sam-ple again after the campaign. One wouldexpect that some readers of The London Globeprior to the campaign will change to readingThe London Tribune after the campaign. Somereaders of The London Tribune before thecampaign will change to reading The LondonGlobe after the campaign. Of course, therewill be Tribune readers who continue to beTribune readers after the campaign. Similarly,there will be Globe readers prior to the cam-paign who continue to be Globe readers afterthe campaign. If the campaign works, wewould expect more readers to shift to theTribune after the campaign than shift to theGlobe after the campaign (see Table M.1).

In the test, those who stick with the samenewspaper are discarded. The locus of inter-est is in those who switch newspapers – thatis, the 6 who were Globe readers but read theTribune after the campaign, and the 22 whoread the Tribune before the campaign but

THE SAGE DICTIONARY OF STATISTICS98

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 98

Page 110: Dictionary of statistics

switched to the Globe after the campaign. Ifthe campaign had no effect, we would expectthat the switchers would be equally repre-sented for the Tribune and the Globe. The moreunequal the numbers switching to the Tribuneand the Globe, the more likely that the cam-paign is having an effect. Since we expect halfswitching in each direction if the null hypoth-esis is true, then we know the expected fre-quencies to enter the data into a one-samplechi-square test. The data for the analysistogether with the expected frequencies aregiven in Table M.2.

These observed and expected frequenciesmay then be entered into the one-sample chi-square formula:

With 1 degree of freedom (always) then this isstatistically significant at the 0.01 level.Hence, the changers in newspaper readershipare predominantly from the Tribune to theGlobe. This suggests that the campaign is

counterproductive as the Tribune is losingreaders.

mean (M): see average; mean (M),arithmetic

mean (M), arithmetic: the sum of scoresdivided by the number of scores. So the meanof 2 and 4 is the sum of 6 divided by 2 scoreswhich gives a mean of 3.00. The mean isthe central point in a set of scores in that thesum of absolute deviations (i.e. ignoring thenegative signs) of scores above the mean(4 � 3.00 � 1.00) is equal to the sum ofabsolute deviations of scores below the mean(2 � 3.00 � � 1.00). The mean is sometimescalled the arithmetic mean to distinguish itfrom other forms of mean such as the har-monic mean or geometric mean. Because it isthe most common form of mean it is usuallysimply called the mean. It is sometimes abbre-viated as M with or without a bar over it.

mean deviation: the mean of the absolutedeviations of the scores from the mean. Forexample, the mean deviation of the scores of2 and 6 is the sum of their absolute devia-tions from their mean of 3.00 which is 2.00[(4 � 3.00 � 1.00) � (2 � 3.00 � �1.00) � 2.00]divided by the number of absolute deviationswhich is 2. The mean deviation of these twodeviations is 1.00 (2.00/2 � 1.00). The meandeviation is not very widely used in statisticssimply because the related concept of standarddeviation is far more useful in practice.

MEAN DEVIATION 99

Table M.1 Pre-test–post-test design

After campaign

Tribune Globe

Before campaign Tribune 19 22Globe 6 19

Table M.2 Table M.1 recast in terms of changing

Expected frequency(half of all who

Frequency of change newspaperschangers in either direction)

From the Tribune 22 14to the Globe

From the Globe 6 14to the Tribune

chi-square � �(observed � expected)2

expected

�(22 � 14)2

�(6 � 14)2

14 14

�82

��82

�64

�64

14 14 14 14

� 4.571 � 4.571 � 9.14

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 99

Page 111: Dictionary of statistics

mean score: arithmetic mean of a set ofscores. See also: coefficient of variation

mean square (MS): a measure of the esti-mated population variance in analysis ofvariance. It is used to form the F ratio whichdetermines whether two variance estimatesdiffer significantly. The mean square is thesum of squares (or squared deviations)divided by the degrees of freedom. See also:between-groups variance

measurement: the act of classification ofobservations is central to research. There arebasically two approaches to measurement.One is categorizing observations such aswhen plants are categorized as being in a par-ticular species rather than another species.Similar classification processes occur forexample when people are classified accord-ing to their gender or when people withabnormal behaviour patterns are classified asschizophrenics, manic-depressives, and soforth. Typically this sort of measurement isdescribed as qualitative (identifying thedefining characteristics or qualities of some-thing) or nominal (naming the definingcharacteristics of something) or categorical orcategory (putting into categories). The sec-ond approach is closer to the common-sensemeaning of the word measurement. That is,putting a numerical value to an observation.Giving a mass in kilograms or a distance inmetres are good examples of such numericalmeasurements. Many other disciplines mea-sure numerically in this way. This is knownas quantitative measurement, which meansthat individual observations are given aquantitative or numerical value as part of themeasurement process. That is, the numericalvalue indicates the amount of something thatan observation possesses.

Not every number assigned to somethingindicates the quantity of a characteristic itpossesses. For example, take the number763214. If this were the distance in kilometresbetween city A and city B then it clearlyrepresents the amount of distance. However,

if it were a telephone number the exactsignificance of the numerical value is obscure.For example, the letter sequence GFCBADcould be the equivalent of the number andhave no implications of quantity at all.

Quantitative measurements are often sub-divided into three types:

1 Ordinal or rank measurement: This iswhere a series of observations are placedin order of magnitude. In other words,the rank orders 1st, 2nd, 3rd, 4th, 5th, 6th,etc., are applied. This indicates the rela-tive position in terms of magnitude.

2 Interval measurement: This assumes thatthe numbers applied indicate the size ofthe difference between observations. So ifone observation is scored 5 and another isscored 7, there is a fixed interval of 2 unitsbetween them. So something that is 7 centi-metres long is 2 centimetres longer thansomething which is 5 centimetres long,for example.

3 Ratio measurement: This is similar inmany ways to interval measurement. Thebig difference is that in ratio measure-ment it should be meaningful to saythings like X is twice as tall as Y or A is aquarter of the size of B. That is, ratios aremeaningful.

This is a common classification but has enor-mous difficulties for most disciplines beyondthe natural sciences such as physics andchemistry. This is because it is not easy toidentify just how measurements such as IQ orsocial educational status relate to the numbersgiven. IQ is measured on a numerical scalewith 100 being the midpoint. The question iswhether someone with an IQ of 120 is twiceas intelligent as a person with an IQ of 60.There is no clear answer to this and noamount of consideration has ever definitivelyestablished that it is.

There is an alternative argument – that is tosay, that the numbers are numbers and can bedealt with accordingly irrespective of howprecisely those numbers relate to the thingbeing measured. That is, deal with the numbersjust as one would any other numbers – addthem, divide them, and so forth.

THE SAGE DICTIONARY OF STATISTICS100

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 100

Page 112: Dictionary of statistics

Both of these extremes have adherents.Why does it matter? The main reason is thatdifferent statistical tests assume differentthings about the data. Some assume that thedata can only be ranked, some assume thatthe scores can be added, and so forth. Withoutassessing the characteristics of the data, it isnot possible to select an appropriate test.

median: the score in the ‘middle’ of a set ofscores when these are ranked from the small-est to the largest. Half of the scores are aboveit, half below – more or less. Thus, for the fol-lowing (odd number of) scores, to calculatethe median we simply rearrange the scores inorder of size and select the score in the centreof the distribution:

19, 17, 23, 29, 12

Rearranged these read

12, 17, 19, 23, 29

Since 19 is the score in the middle (equal dis-tance from the two ends), then 19 is themedian.

When there are equal numbers of scores, itbecomes necessary to estimate the median asno score is precisely in the middle. Take thefollowing six scores:

19, 17, 23, 29, 12, 14

Rearranged in order these become

12, 14, 17, 19, 23, 29

Quite clearly the median is somewherebetween 17 and 19. The simplest estimate ofthe median is the average of these two scoreswhich is 18.

There are other ways of calculating themedian in these circumstances but theyappear to be obsolete. Look at the followingordered sequence of scores:

17, 17, 17, 19, 23, 29

The above method gives the median as 18again. However, it may seem that the median

should be closer to 17 than to 19 in this case.See also: average

median test,Mood’s: a non-parametric testused to determine whether the number ofscores which fall either side of their commonmedian differs for two or more unrelatedsamples. The chi-square test is used to deter-mine whether the number of scores differssignificantly from that expected by chance.

mediating variable: see intervening variable

mesokurtic: see kurtosis

meta-analysis: involves the consolidationof data and findings from a variety ofresearch studies on a topic. Primarily itinvolves the combination of the findings ofthese various studies in order to estimate thegeneral or overall effect of one variable overanother. The secondary function, and proba-bly the more illuminating, is to seek patternsin the findings of the various studies in orderto understand the characteristics of studieswhich tend to show strong relationshipsbetween the variables in question and thosewhich tend to show weak or no relationship.Meta-analysis does not always work, whichmay reveal something about the state ofresearch in a particular area. Meta-analysis isa systematic review of studies using rela-tively simple statistical techniques. It can becontrasted with the more traditional reviewof the literature in which researchers subjec-tively attempted to synthesize the researchusing criteria which were and are difficult todefine or identify. Part of the gains of usingmeta-analysis are to do with systematic defi-nition of a research area and the meticulouspreparation of the database of studies.

It is important to be able to define fairlyprecisely the domain of interest of the

META-ANALYSIS 101

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 101

Page 113: Dictionary of statistics

meta-analysis. The effectiveness of a particulartype of treatment programme or the effects ofthe amount of delay between witnessing anevent and being interviewed by the police onthe quality of testimony might be seen astypical examples. An important early stage isdefining the domain of interest for the reviewprecisely. While this is clearly a matter for thejudgement of the researcher, the criteriaselected clarify important aspects of theresearch area. Generally speaking, the res-earcher will explore a range of sources ofstudies in order to generate as full a list ofrelevant research as possible. Unpublishedresearch is not omitted on the basis of beingof poor quality; instead it is sought outbecause its findings may be out of line withaccepted wisdom in a field. For example,non-significant findings may not achievepublication.

The notion of effect size is central to meta-analysis. Effect size is literally what it says – thesize of the effect (influence) of the indepen-dent variable on the dependent variable. Thisis expressed in terms which refer to theamount of variance which the two variablesshare compared with the total variation. It isnot expressed in terms of the magnitude ofthe difference between scores as this will varywidely according to the measures used andthe circumstances of the measurement.Pearson’s correlation coefficient is one com-mon measure of effect size since the squaredcorrelation is the amount of variation that thetwo variables have in common. Another com-mon measure is Cohen’s d, though this has noobvious advantages over the correlation coef-ficient and is much less familiar. There is asimple relationship between the two and onecan readily be converted to the other (Howittand Cramer, 2004).

Generally speaking, research reports donot use effect size measures so it is necessaryto calculate them (estimate them) from thedata of a study (which are often unavailable)or from the reported statistics. There are anumber of simple formulae which help dothis. For example, if a study reports t tests aconversion is possible since there is a simplerelationship of the correlation coefficientbetween the two variables with the t test. Theindependent variable has two conditions – the

experimental group and the control group –which can be coded 1 and 2 respectively.Once this is done, the point–biserial correla-tion coefficient may be applied simply by cal-culating Pearson’s correlation coefficientbetween these 1s and 2s and the scores on thedependent variable. For other statistics thereare other procedures which may be appliedto the same effect. Recently researchers haveshown a greater tendency to provide effectsize statistics in their reports. Even if they donot, it is surprisingly easy to provide usableestimates of effect size from very minimalstatistical analyses.

Once the effect sizes have been calculatedin an appropriate form, these various estimatesof effect size can be combined to indicate anoverall effect size. This is slightly morecomplex than a simple average of the effectsize but conceptually is best regarded as this.The size of the study is normally taken intoaccount. One formula for the average effectsize is simply to convert each of the effectsizes into the Fisher z transformation ofPearson’s correlation. The average of thesevalues is easily calculated. This averageFisher z transformation value is reconvertedback to a correlation coefficient. The table ofthis transformation is readily available in anumber of texts. See also: correlations, aver-aging; effect size

Howitt and Cramer (2004); Hunter andSchmidt (2004)

Minitab: this software was originally writtenin 1972 at Penn State University to teachstudents introductory statistics. It is one ofseveral widely used statistical packages formanipulating and analysing data. Informationabout Minitab can be found at the followingwebsite:http://www.minitab.com/

missing values or data: are the result ofparticipants in the research failing to providethe researcher with data on each variable inthe analysis. If it is reasonable to assume that

THE SAGE DICTIONARY OF STATISTICS102

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 102

Page 114: Dictionary of statistics

these are merely random omissions, theanalysis is straightforward in general.Unfortunately, the case for this assumption isnot strong. Missing values may occur for anynumber of reasons: the participant has inad-vertently failed to answer a particular ques-tion, the interviewer has not noted down theparticipant’s answer, part of the data isunreadable, and so forth. These leave the pos-sibility that the data are not available for non-random reasons – for example, the questionis unclear so the participant doesn’t answer,the interviewer finds a particular questionboring etc., and has a tendency to overlook it.Quite clearly the researcher should anticipatethese difficulties to some extent and allow thereasons for the lack of an answer to a ques-tion to be recorded. For example, the inter-viewer records that the participant could notanswer, would not answer, and so forth.Generally speaking, the fewer the missingvalues the fewer difficulties that arise as aconsequence. It is difficult to give figures foracceptable numbers of missing values and itis somewhat dependent on the circumstancesof the research. For example, in a survey ofvoting intentions missing information aboutthe participants’ political affiliation may sig-nificantly influence predictions.

Computer programs for the analysis ofstatistical data often include a missing valuesoption. A missing value is a particular valueof a variable which indicates missing infor-mation. For example, for a variable like age amissing value code of 999 could be specifiedto indicate that the participant’s age is notknown. The computer may then be instructedto ignore totally that participant – essentiallydelete them from the analysis of the data.This is often called listwise deletion of miss-ing values since the individual is deletedfrom the list of participants for all intents andpurposes. The alternative is to select stepwise deletion of missing values. Thisamounts to an instruction to the computer toomit that participant from those analyses ofthe data involving the variable for which theysupplied no information. Thus, the partici-pant would be omitted from the calculationof the mean age of the sample (because theyare coded 999 on that variable) but includedin the analysis of the variable gender

(because information about their gender isavailable). Some statistical packages (SPSS)allow the user to stipulate the value thatshould be inserted for a missing score. Oneapproach is to use the average of scores onthat variable.

Another approach to calculating actualvalues for missing values is to examine thecorrelates in the data of missing data and usethese correlates to estimate what the scorewould have been had the score not beenmissing. See also: listwise deletion; samplesize

mixed analysis of variance or ANOVA:an analysis of variance which contains atleast one between-subjects factor and onewithin-subjects or repeated-measures factor.

mixed design: a research design whichcontains both within-subjects and between-subjects elements. For example, Table M.3summarizes research in which a group ofmen and a group of women are studiedbefore and after taking a course on coun-selling. The dependent variable is a measureof empathy.

The within-subjects aspect of this design isthat the same group of individuals have beenmeasured twice on empathy – once before thecourse and once after. The between-subjectsaspect of the design is the comparisonbetween men and women – with differentgroups of participants. See also: analysis ofvariance

mode: one of the common measures of thetypical value in the data for a variable. Themode is simply the value of the most fre-quently occurring score or category in one’sdata. It is not a frequency but a value. So if inthe data there are 12 electricians, 30 nursesand 4 clerks the modal occupation is nurses.If there were seven 5s, nine 6s, fourteen 7s,three 8s and two 9s, the mode of the scores is

MODE 103

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 103

Page 115: Dictionary of statistics

7. The mode (unlike mean and median) canbe applied to category (categorical or nomi-nal) data as well as score data. See also: aver-age; bimodal; frequency; multimodal

model: a simplified representation of some-thing. In statistics, a model is a way ofaccounting for the data in question. Typically,then, a statistical model is a mathematicaldescription of how several variables explainthe data in question. A good model fits(explains) the data as closely as possiblewithout becoming so complicated that itbegins to obscure understanding.

moderating or moderator effect orvariable, moderated relationship:where the relationship between two vari-ables differs for the values of one or moreother variables. It represents an interactionbetween the variables. For example, the rela-tionship between gender and depressionmay differ according to marital status.Married women may be more depressedthan married men while never marriedwomen may be less depressed than nevermarried men. In this case, the relationshipbetween gender and depression is moder-ated by marital status. Marital status is themoderating variable. See also: multipleregression

moment: the mean or expected value of thepower of the deviation of each value in a dis-tribution from some given value, which isusually the mean. For example, the firstmoment about the arithmetic mean is 0:

If we raise the deviation of each value to thepower of 1, we have the mean deviationwhich is 0.

The second moment about the arithmeticmean is the variance:

The third moment about the arithmetic meanis skewness:

The fourth moment about the arithmeticmean is kurtosis:

Monte Carlo methods: means of calculat-ing, among other things, the probability ofoutcomes based on a random process. So anystatistical test which is based on calculatingthe probability of a variety of outcomes con-sequent of randomly allocating a set of scoresis a Monte Carlo method. See bootstrapping

multicollinearity: see collinearity

multimodal: a distribution of scores havingmore than two modes. See bimodal

multiple coefficient of determinationor R2: the square of the multiple correlation.It is commonly used in multiple regression torepresent the proportion of variance in a cri-terion or dependent variable that is sharedwith or explained by two or more predictoror independent variables.

multiple comparison tests or proce-dures: are used to determine which or which

THE SAGE DICTIONARY OF STATISTICS104

Table M.3 An example of a mixed design

Before course After course

Men Mean � 21.2 Mean � 28.3Women Mean � 24.3 Mean � 25.2

m1 �sum of (each value � mean value)1

� 0number of values

m2 �sum of (each value � mean value)2

� variancenumber of values

m3 �sum of (each value � mean value)3

� skewnessnumber of values

m4 �sum of (each value � mean value)4

� kurtosisnumber of values

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 104

Page 116: Dictionary of statistics

combinations of three or more means differsignificantly from each other. If there aregood grounds for predicting which meansdiffer from each other, an a priori or plannedtest is used. Post hoc, a posteriori or unplan-ned tests are used to find out which othermeans differ significantly from each other. Anumber of a priori and post hoc tests havebeen developed. Particularly with respectto the post hoc tests, there is controversysurrounding which test should be used. See apriori comparison or tests; Bonferroni test;Bryant–Paulson simultaneous test proce-dure; Duncan’s new multiple range test;Dunn’s test; Dunn–Sidak multiple compari-son test; Dunnett’s C test; Dunnett’s T3test; Dunnett’s test; familywise error rate;Fisher’s LSD (Least Significant Difference)or protected test; Gabriel’s simultaneoustest procedure; Games–Howell multiplecomparison procedure; Hochberg GT2test; Newman–Keuls method, procedure ortest; post hoc,a posteriori or unplanned tests;Ryan or Ryan–Einot–Gabriel–Welsch F(REGWF) multiple comparison test; Ryanor Ryan–Einot–Gabriel–Welsch Q (REGWQ)multiple comparison test; Scheffé test;studentized range statistic q; Tamhane’sT2 multiple comparison test; trend analysisin analysis of variance; Tukeya or HSD(Honestly Significant Difference) test;Tukeyb or WSD (Wholly SignificantDifference) test; Tukey–Kramer test;Waller–Duncan t test

multiple correlation or R: a term used inmultiple regression. In multiple regression acriterion (or dependent) variable is predictedusing a multiplicity of predictor variables.For example, in order to predict IQ, the pre-dictor variables might be social class, educa-tional achievement and gender. For everyindividual, using these predictors, it is possi-ble to make a prediction of the most likelyvalue of their IQ based on these predictors.Quite simply, the multiple correlation is thecorrelation between the actual IQ scores ofthe individuals in the sample and the scorespredicted for them by applying the multipleregression equation with these three predictors.

As with all correlations, the larger thenumerical value, the greater the correlation.See also: canonical correlation; F ratio; multi-ple regression

multiple regression: a method designed toanalyse the linear relationship between aquantitative criterion or dependent variableand two or more (i.e. multiple) predictors orindependent variables. The predictors maybe qualitative or quantitative. If the predic-tors are qualitative, they are turned into a setof dichotomous variables known as dummyvariables which is always one less than thenumber of categories making up the qualita-tive variable.

There are two main general uses to this test.One use is to determine the strength and thedirection of the linear association betweenthe criterion and a predictor controlling for theassociation of the predictors with each otherand the criterion. The strength of the associa-tion is expressed in terms of the size of eitherthe standardized or the unstandardized par-tial regression coefficient which is symbolizedby the small Greek letter β or its capital equiv-alent B (both called beta) although which letteris used to signify which coefficient is not con-sistent. The direction of the association is indi-cated by the sign of the regression coefficientas it is with correlation coefficients. No signmeans that the association is positive withhigher scores on the predictor being associatedwith higher scores on the criterion. A negativesign shows that the association is negativewith higher scores on the predictor being asso-ciated with lower scores on the criterion. Theterm partial is often omitted when referring topartial regression coefficients. The standard-ized regression coefficient is standardized sothat it can vary from �1.00 to 1.00 whereas theunstandardized coefficient can be greater than±1.00. One advantage of the standardizedcoefficient is that it enables the size of the asso-ciation between the criterion and the predic-tors to be compared on the same scale asthis scale has been standardized to ±1.00.Unstandardized coefficients enable the valueof the criterion to be predicted if we know thevalues of the predictors.

MULTIPLE REGRESSION 105

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 105

Page 117: Dictionary of statistics

Another use of multiple regression is todetermine how much of the variance in thecriterion is accounted for by particular pre-dictors. This is usually expressed in terms ofthe change in the squared multiple correla-tion (R2) where it corresponds to the propor-tion of variance explained. This proportioncan be re-expressed as a percentage by multi-plying the proportion by 100.

Multiple regression also determines whetherthe size of the regression coefficients and ofthe variance explained is greater than thatexpected by chance, in other words whetherit is statistically significant. Statistical signifi-cance depends on the size of the sample. Thebigger the sample, the more likely that smallvalues will be statistically significant.

Multiple regression can be used to conductsimple path analysis and to determinewhether there is an interaction between twoor more predictors and the criterion andwhether a predictor acts as a moderatingvariable on the relationship between a pre-dictor and the criterion. It is employed tocarry out analysis of variance where thenumber of cases in each cell is not equal orproportionate.

There are three main ways in which pre-dictors can be entered into a multiple regres-sion. One way is to enter all the predictors ofinterest at the same time in a single step. Thisis sometimes referred to as standard multipleregression. The size of the standardizedpartial regression coefficients indicates thesize of the unique association betweenthe criterion and a predictor controlling forthe association of all the other predictors thatare related to the criterion. This method isuseful for ascertaining which variables aremost strongly related to the criterion takinginto account their association with the otherpredictors.

A second method is to use some statisticalcriteria for determining the entry of predic-tors one at a time. One widely used method isoften referred to as stepwise multiple regres-sion in which the predictor with the most sig-nificant F ratio is considered for entry. This isthe predictor which has the highest correla-tion with the criterion. If the probability ofthis F ratio is higher than a particular criterion,which is usually the 0.05 level, the analysis

stops. If the probability meets this criterion,the predictor is entered into the first step ofthe multiple regression.

The next predictor that is considered forentry into the multiple regression is the pre-dictor whose F ratio has the next smallestprobability value. If this probability valuemeets the criterion, the predictor is enteredinto the second step of the multiple regres-sion. This is the predictor that has the high-est squared semi-partial or part correlationwith the criterion taking into account thefirst predictor. The F ratio for the first predic-tor in the second step of the multiple regres-sion is then examined to see if it meets thecriterion for removal. If the probability levelis the criterion, this criterion may be set at ahigher level (say, the 0.10 level) than the cri-terion for entry. If this predictor does notmeet this criterion, it is removed from theregression equation. If it meets the criterionit is retained. The procedure stops when nomore predictors are entered into or removedfrom the regression equation. In many analy-ses, predictors are generally entered in termsof the proportion of the variance they accountfor in the criterion, starting off with those thatexplain the greatest proportion of the vari-ance and ending with those that explain theleast. Where a predictor acts as a suppressorvariable on the relationship between anotherpredictor and the criterion, this may nothappen.

When interpreting the results of a multipleregression which uses statistical criteria forentering predictors, it is important to remem-ber that if two predictors are related to oneanother and to the criterion, only one of thepredictors may be entered into the multipleregression, although both may be related to avery similar degree to the criterion. Conse-quently, it is important to check the extent towhich this may be occurring when looking atthe results of such an analysis.

A third method of multiple regression iscalled hierarchical or sequential multipleregression in which the order and the numberof predictors entered into each step of themultiple regression are determined by theresearcher. For example, we may wish to groupor to block predictors together in terms of sim-ilar characteristics such as socio-demographic

THE SAGE DICTIONARY OF STATISTICS106

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 106

Page 118: Dictionary of statistics

variables (e.g. age, gender, socio-economicstatus), personality variables (e.g. extrover-sion, neuroticism) and attitudinal variablesand to enter these blocks in a particular ordersuch as the order presented here. In this case,the proportion of variance explained by thesecond block of personality variables is thatwhich has not already been explained by thefirst block of socio-demographic variables. Ineffect, when we are using this method we arecontrolling or partialling out the blocks ofvariables that have been previously entered.

Hierarchical multiple regression is used todetermine whether the relationship betweena predictor and a criterion is moderated byanother predictor called a moderator ormoderating variable. The predictor and themoderator are entered in the first step andthe interaction which is the product of thepredictor and the moderator is entered intothe second step. If the interaction accounts fora significant proportion of the variance in thecriterion, the relationship between the predic-tor and the criterion is moderated by theother variable. The nature of this moderatedrelationship can be examined by using themoderator variable to divide the sample intotwo groups and to examine the relationshipbetween the predictor and the criterion inthese two groups.

The particular step of a statistical or hierar-chical method of multiple regression is, ineffect, a standard multiple regression of thepredictors in that step. In other words,the partial regression coefficients representthe association between each predictor andthe criterion controlling their associationwith all the other predictors on that step. Theregression coefficient of a predictor is likelyto vary when other predictors are entered orremoved.

Multiple regression, like simple regressionand Pearson’s product moment correlation, isbased on the assumption that the values ofthe variables are normally distributed andthat the relationship between a predictor andthe criterion is homoscedastic in that the vari-ance of one variable is similar for all values ofthe other variable. Homoscedasticity is theopposite of heteroscedasticity where the vari-ance of one variable is not the same for allvalues of the other variable. See also: dummy

coding; hierarchical or sequential entry;logistic regression; multiple coefficient ofdetermination; multiple regression: predic-tor variable; standardized partial regressioncoefficient; stepwise entry; unstandardizedpartial regression coefficient

Cramer (2003), Pedhazur (1982)

multiplication rule: a basic principle ofprobability theory. It really concerns whathappens in a sequence of random events suchas throwing dice. What is the probability ofthrowing a 6 on two successive throws? Sincethere are six possible outcomes (1, 2, 3, 4, 5and 6 spots) for the first throw then the prob-ability of obtaining 6 is 1/6 � 0.167. Once wehave thrown a 6, then the probability of thenext throw being a 6 is governed by the sameprobability. The multiplication rule is basi-cally that the likelihood of getting the two 6sin a row is 0.167 � 0.167 � 0.028. Put anotherway, there is one chance in six of getting 6spots on the first toss and a one chance insix of getting another 6 spots on the secondtoss. That is, 1 divided by 6 � 6 or 1 dividedby 36 � 0.028.

multiply: increasing a number by a numberof times. So 3 � 2 is 3 increased two times –that is, 3 � 3 � 6. See also: negative values

multistage sampling: a method of sam-pling which proceeds in two or more stages.This term is often used with cluster sam-pling in which an attempt is made to reducethe geographical area covered. For example,if we were to sample people in a city,the first stage might be to select the elec-toral wards in that city. The next step mightbe to select streets in those wards. Thethird stage might be to select households inthose streets. The fourth and final stagemight be to select people in those house-holds. This would be a four-stage clustersample.

MULTISTAGE SAMPLING 107

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 107

Page 119: Dictionary of statistics

multivariate analysis: an analysis involvingthree or more variables at the same time.

multivariate analysis of covariance orMANCOVA: an analysis of covariance(ANCOVA) which is carried out on two ormore dependent variables at the same timeand where the dependent variables arerelated to each other. One advantage of thismethod is that the dependent variablesanalysed together may be significant whereasthe dependent variables analysed separatelymay not be significant. When the combinedor multivariate effect of the dependent vari-ables is significant, it is useful to know whichof the single or univariate effects of thedependent variables are significant.

multivariate analysis of variance orMANOVA: an analysis of variance(ANOVA) which is conducted on two ormore dependent variables simultaneouslyand where the dependent variables are inter-related. One advantage of this procedure isthat the dependent variables analysed togethermay be significant whereas the dependentvariables analysed on their own may notbe significant. When the combined or multi-variate effect of the dependent variables is

significant, it is usual to determine which ofthe single or univariate effects are also signif-icant. See also: analysis of variance;Hotelling’s trace criterion; multivariatenormality; Roy’s gcr;Wilks’s lambda

multivariate normality: the assumptionthat each variable and all linear combinationsof the variables in an analysis are normallydistributed. As there are potentially a verylarge number of linear combinations, thisassumption is not easy to test. When the dataare grouped as in multivariate analysis ofvariance, the sampling distribution of themeans of the dependent variables in each ofthe cells has to be normally distributed aswell as the variables’ linear combinations.With relatively large sample sizes, the centrallimit theorem states that the sampling distri-bution of means will be normally distributed.If there is multivariate normality, the sam-pling distribution of means will be normallydistributed. When the data are ungrouped asin discriminant function analysis and struc-tural equation modelling, if there is multi-variate normality each variable will benormally distributed and the relationship ofpairs of variables will be linear andhomoscedastic.

Tabachnick and Fidell (2001)

THE SAGE DICTIONARY OF STATISTICS108

Cramer Chapter-M.qxd 4/22/04 2:16 PM Page 108

Page 120: Dictionary of statistics

natural or Napierian logarithm: of aparticular number (such as 2) this is the expo-nential or power of the base number of 2.718which gives rise to that particular number.For example, the natural logarithm of 2 is 0.69as the base number of 2.718 raised to thepower of 0.69 is 2.00. As the base number is aconstant, it is sometimes symbolized as e(after Euler’s number). The natural logarithmmay be abbreviated as log e or ln. Naturallogarithms are used in statistical tests such aslog–linear analysis. They may be calculatedfrom tables, scientific electronic calculatorsor statistical software such as SPSS. Fewresearchers ever have to use them personally.See also: correlations, averaging; Fisher’s ztransformation; logarithm; logit; odds

negative: a value less than zero. See also:absolute deviation

negative correlation: an inverted relation-ship between two variables (Figure N.1).Thus, as the scores on variable A increase thescores on variable B decrease. This meansthat people with high scores on variable Awill tend to get the lower scores on variable B.Those with lower scores on variable A tend toget higher scores on variable B. Sometimesconfusion can arise if the researcher is notclear what a high score on each variable indi-cates. This is especially so when variables

have not been clearly named or where theyconsist of a list of questions which vary interms of what agreement and disagreementimply.

negative values: a value less than zero.Negative values are fairly common in statisti-cal calculations simply because differencesbetween scores or values above or below themean are used. For this reason, students arelikely to use negative values in calculations

N

Income

∗ ∗

IQ

Figure N.1 A negative correlation between twovariables

Cramer Chapter-N.qxd 4/22/04 2:17 PM Page 109

Page 121: Dictionary of statistics

than when collecting their data. Little data inthe social science disciplines have negativevalues in the form that they are collected.Negative values are like being in debt – theyare amounts you owe other people. Positivevalues are like having money to spend. Thebasic mathematical operations that involvepositive and negative signs and which are atall common in statistical calculations are asfollows.

AddingIf one adds two positive numbers one endsup with a positive outcome:

5 � 2 � 7

If one adds two negative numbers, one endsup with a negative number (you owe moneyto two people not just one):

�4 � �7 � �11

If one adds a negative number to a positivenumber then one ends up with either a posi-tive or negative number depending oncircumstances. For example, if you have 5euros in your pocket and you owe 2 euros tosomeone else, you would have 3 euros whenyou paid that person back:

5 � �2 � 3

but if you owed someone 7 euros and onlypaid them back 5 euros you would still owe2 euros:

5 � �7 � �2

SubtractingTaking away minus numbers is effectivelythe same as adding them. So, 5[�]3 � 2.Hence, 5 � �3 � 8 since the formula tells usto remove �3 from 5. The implication of thisis that the 5 was obtained by giving away 3 toanother person. If that person gives us that 3back (minuses the minus 3) then we get 8.Another example:

�8 � �3 � �5

MultiplyingA negative number multiplied by a positivenumber gives a negative number:

�4 � 3 means �4 � �4 � �4 � �12

A negative number multiplied by a positivenumber yields a positive number:

�4 � �3 � 12

negatively skewed: see skewness

nested model: a model which is similar tothe one it has been derived from except thatone or more of the parameters of the originalmodel have been removed or restricted tozero. It is a simplification of the originalmodel. Because the original model has moreparameters (i.e. it has more features toexplain the data) it might be expected to pro-vide a better fit to the data. The statistical fitof a nested model may be compared with thatof the original model. If the statistical fit ofthe nested model to the data is significantlypoorer than that of the original model, theoriginal model gives a better fit to the data.That is, the extra feature of the original modelis of value. If, however, there is no significantdifference in the fit of the two models, thenested model provides a simpler and moreparsimonious model for the data. In thatsense, it would be the preferred model sincethe removed or restricted features of themodel seem not to contribute anything to thepower of the model to explain the data.Nested models are often compared in struc-tural equation modelling. See structuralequation modelling

Newman–Keuls method, procedure ortest: a post hoc or multiple comparison testwhich is used to determine whether three ormore means differ significantly in an analysisof variance. It may be used regardless ofwhether the analysis of variance is signifi-cant. It assumes equal variance and isapproximate for unequal group sizes. It is astepwise or sequential test which is similar to

THE SAGE DICTIONARY OF STATISTICS110

Cramer Chapter-N.qxd 4/22/04 2:17 PM Page 110

Page 122: Dictionary of statistics

Duncan’s new multiple range test in that themeans are first ordered in size. However, itdiffers in the significance level used. For theNewman–Keuls test the significance level isthe same no matter how many comparisonsthere are, while for Duncan’s test the signifi-cance level becomes more lenient the morecomparisons there are. Consequently, differ-ences are less likely to be significant forthis test.

The use of this test will be illustrated withthe data in Table N.1 which shows the indi-vidual and mean scores for four unrelatedgroups. The means for the four groups are 6,14, 12 and 2 respectively. See also: Duncan’snew multiple range test; Ryan F and Qtests; Scheffé test; trend analysis in analysisof variance

A one-way analysis of variance indicatesthat there is an effect which is significant atless than the 0.01 level.

To conduct a Newman–Keuls test thegroups are first ordered according to thesize of their means, starting with the small-est and ending with the largest as shown inthe second row and column of Table N.2.Group 4 is listed first followed by groups 1,3 and 2.

The absolute differences between themeans of the four groups are then entered intothe table. So the absolute difference betweengroups 4 and 1 is 4 (2 � 6 � �4), groups 4 and3 is 10 (2 � 12 � �10), and so on.

The value that this difference has to exceedis symbolized as W. It is based on the numberof means that separate the two means beingcompared. This number is symbolized by rand includes the two means being compared.So the minimum number that r can be is 2.The maximum number it can be is thenumber of groups which in this case is 4. As r

becomes smaller, so does W. In other words,the more means that separate the two meansbeing compared, the bigger the differencebetween those means has to be in order to bestatistically significant.

The formula for calculating this critical dif-ference is the value of the studentized rangemultiplied by the square root of the errormean square divided by the number of casesin each group:

The error mean square is obtained from theanalysis of variance and is 9.25. The numberof cases in each group is 3. So the value of thestudentized range needs to be multiplied by1.76 [�(9.25/3) � �3.083 � 1.76]. This value isthe same for all the comparisons.

The value of the studentized range variesaccording to the value of r (and the degrees offreedom for the error mean square but thiswill be the same for all the comparisons beingmade). This value can be obtained from atable which is available in some statisticstexts such as the source below. The degrees offreedom for the error mean square for thisexample are 8. This is the number of groupssubtracted from the number of cases (12 �4 � 8). For these degrees of freedom and the0.05 significance level, the value of thestudentized range is 3.26 when r is 2, 4.04when r is 3, and 4.53 when r is 4.

Consequently, the value that W has to be is7.97 when r is 4 (4.53 � 1.76 � 7.97), 7.11when r is 3 (4.04 � 1.76 � 7.11) and 5.74when r is 2 (3.26 � 1.76 � 5.74). These valueshave been inserted in Table N.2.

We now start with the third row of Table N.2and the biggest difference between the fourgroups which is 12. As 12 is bigger than the value of W when r is 4 (7.97), the difference

NEWMAN–KEULS METHOD, PROCEDURE 111

W � studentized range � � error mean square/group n

Table N.1 Individual scores and meanscores for four unrelated groups

Group 1 Group 2 Group 3 Group 4

6 10 10 02 15 12 2

10 17 14 4Sum 18 42 36 6Mean 6 14 12 2

Table N.2 Newman–Keuls test

4 1 3 2Group Means 2 6 12 14 W r

4 2 – 4 10* 12* 7.97 41 6 – 6* 8* 7.11 33 12 – 2 5.74 22 14 –

Cramer Chapter-N.qxd 4/22/04 2:17 PM Page 111

Page 123: Dictionary of statistics

in the means between groups 4 and 2 isstatistically significant. That this differenceis significant is indicated by an asterisk inTable N.2. If this difference were not statisti-cally significant, the analysis would stop here.

As it is significant, we proceed to the nextbiggest difference in this row which is 10. Asr is 3 for this comparison, we need to com-pare this difference with the value of W whenr is 3 (7.11) which is in the next row. As 10 isbigger than a W of 7.11, the differencebetween groups 2 and 3 is also significant. Ifthis difference had not been significant, wewould ignore all comparisons to the left ofthis comparison in this row and those belowand move to the biggest difference in thenext row.

As this difference is significant, we exam-ine the last difference in this row which is 4.As r is 2 for this comparison, we need to com-pare this difference with the value of W whenr is 2 (5.74) which is in the next row. As 4 issmaller than a W of 5.74, this difference is notsignificant.

We now follow the same procedure withthe next row. The biggest difference is 8. As ris 3 for this comparison, we compare this dif-ference with the value of W when r is 3,which is 7.11. As 8 is larger than a W of 7.11,the difference in the means between groups 1and 2 is statistically significant. The nextbiggest difference is 6 which is bigger than aW of 5.74 when r is 2.

In the last row the difference of 2 is notbigger than a W of 5.74 when r is 2.Consequently, this difference is not statisti-cally significant.

We could arrange these means into twohomogeneous subsets where the means in asubset would not differ significantly fromeach other but where means in one subsetwould differ significantly from those in theother. We could indicate these two subsets byunderlining the means which did not differas follows:

nominal data: see nominal level of mea-surement; score nominal level of measure-ment, scale or variable: a measure wherenumbers are used simply to name or nominate

the different categories but do not representany order or difference in quantity. For exam-ple, we may use numbers to refer to the coun-try of birth of an individual. The numbersthat we assign to a country are arbitrary andhave no other meaning than labels. For exam-ple, we may assign 1 to the United States and2 to the United Kingdom but we could assignany two numbers to label these two coun-tries. The only statistical operation we cancarry out on nominal variables is a frequencyanalysis where we compare the frequency ofcases in different categories.

non-directional or direction-less hypo-thesis: see hypothesis

non-parametric tests: any of a largenumber of inferential techniques in statisticswhich do not involve assessing the character-istics of the population from characteristics ofthe sample. These involve ranking proceduresoften, or may involve re-randomization andother procedures. They do not involve theuse of standard error estimates characteristicof parametric statistics.

The argument for using non-parametricstatistics is largely in terms of the inapplica-bility of parametric statistics to much data.Parametric statistics assume a normal distrib-ution which is bell shaped. Data that do notmeet this criterion may not be effectivelyanalysed by some statistics. For example, thedistribution may be markedly asymmetricalwhich violates the assumptions made whendeveloping the parametric test. The paramet-ric test may then be inappropriate. On theother hand the non-parametric test mayequally be inappropriate for such data.Furthermore, violations of the assumptionsof the test may matter little except in the mostextreme cases. The other traditional argu-ment for using non-parametric tests is thatthey use ordinal (rankable) data and do notrequire interval or ratio levels of measure-ment. This is an area of debate as some statis-ticians argue that it is the properties of thenumbers which are important and not some

THE SAGE DICTIONARY OF STATISTICS112

2 6 12 14Kirk (1995)

Cramer Chapter-N.qxd 4/22/04 2:17 PM Page 112

Page 124: Dictionary of statistics

abstract scale of measurement which isdeemed to underlie the numbers. Both argu-ments have their adherents. The generalimpression is that modern researchers errtowards using parametric analyses as oftenas possible, resorting to non-parametrictests only when the distribution of scores isexceptionally skewed. This allows them touse some of the most powerful statisticaltechniques.

Once non-parametric tests had ease of cal-culation on their side. This no longer applieswith the ready availability of statistical com-puter packages. See also: distribution-freetests; Kruskal–Wallis one-way analysis ofvariance; Mann–Whitney U-test; mediantest, Mood’s; ranking tests; skewness

non-random sample: see conveniencesample; quota sampling

non-reciprocal relationship: see uni-directional relationship

non-recursive relationship: see bi-directional relationship

normal curve or distribution: a bell-shaped curve or distribution, like that shownin Figure N.2. It is a theoretical distributionwhich shows the frequency or probability ofall the possible values that a continuous vari-able can take. The horizontal axis of the dis-tribution represents all possible values of thevariable while the vertical axis representsthe frequency or probability of those values.The distribution can be described in terms ofits mean and its standard deviation or vari-ance. The exact form of the distribution willdepend on the values of the mean and itsstandard deviation. Its shape will becomeflatter, the bigger the variance is. The tails ofthe distribution never touch the horizontal

axis, indicating that these extreme valuesmay occur but are very unlikely to do so.Most of the scores will fall within three stan-dard deviations of the mean. About 68.26% ofthe scores will fall within one standard devi-ation of the mean, 95.44% within two stan-dard deviations and 99.70% within threestandard deviations.

If the distribution of a variable approxi-mates normality, we can determine what pro-portion or percentage of scores lies betweenany two values of that variable or what theprobability is of obtaining a score betweenthose two values by converting the originalscores into standard scores and determiningtheir z values. See also: Gaussian distribu-tion; parametric tests

null hypothesis: in statistical inferences thenull hypothesis is actually the hypothesiswhich is tested. It is often presented as beingthe nullification of the researcher’s hypothe-sis but really is the claim that the indepen-dent variable has no influence on thedependent variable or that there is no corre-lation between the two variables. The nullhypothesis assumes that the samples comefrom the same population or that the twovariables come from a population wherethere is zero correlation between the twovariables. Thus, the simplest ways of writingthe null hypothesis are ‘There is no relation-ship between variable A and variable B’, or‘The independent variable has no influenceon the dependent variable.’ Unfortunately, bypresenting the adequacy of the null hypothe-sis as being in competition with the adequacyof the alternative hypothesis, the impressionis created that the disconfirmation of the nullhypothesis means that the hypothesis as pre-sented by the researcher is true. This is

NULL HYPOTHESIS 113

Figure N.2 The normal distribution curve

Cramer Chapter-N.qxd 4/22/04 2:17 PM Page 113

Page 125: Dictionary of statistics

misleading since it merely suggests that thebasic stumbling block to the hypothesis can bedismissed – that is, that chance variations areresponsible for any correlations or differencesthat are obtained. See also: confidence inter-val; hypothesis testing; sign test; significant

number of cases: see count

numerator: the number at the top of afraction such as 4

7. The numerator in this caseis 4. The denominator is the bottom half of thefraction and so equals 7. See also: denominator

numerical scores: see score

THE SAGE DICTIONARY OF STATISTICS114

Cramer Chapter-N.qxd 4/22/04 2:17 PM Page 114

Page 126: Dictionary of statistics

oblique factor: see oblique rotation

oblique rotation: a form of rotating factors(see orthogonal rotation) in factor analysis.In oblique rotation, the factors are allowedto correlate with each other; hence the axescan cease to be at right angles. There are var-ious methods of orthogonal rotation ofwhich Promax is a typical and well-knownexample. One important consequence ofoblique rotation is that since the factors maycorrelate, it is possible to calculate a correla-tion matrix of the correlations of the obliquefactors one with each other. This secondarycorrelation matrix can then itself be factoranalysed yielding what are known as second-order factors or higher order factors. Seealso: axis; exploratory factor analysis

odds: the ratio of the probability of an eventoccurring to the probability of it not occur-ring. It can also be expressed as the ratio ofthe frequency of an event occurring to the fre-quency of other events occurring. Suppose,for example, that six out of nine studentspass an exam. The probability of a studentpassing the exam is about 0.67 (6/9 � 0.667).The probability of a student failing the examis about 0.33 (3/9 � 0.33 or 1 � 0.67 � 0.33).Consequently, the odds of a student passingthe exam are about 2 (0.667/0.333 � 2.00).This is the same as the number of students

passing the exam divided by the number ofstudents failing it (6/3 � 2). See also: logit

odds ratio: an indicator of relative probabil-ities when outcomes are dependent (condi-tional) on another factor. For example, whatis the probability of mental illness in a personwho comes from a family in which at leastone member has been in a mental hospitalcompared with one in which no membershave been in a mental hospital? The imagi-nary data in Table O.1 might be considered.See also: logistic regression

The probability of a person sufferingmental illness if there is no mental illness inthe family is 30/(30 � 180) � 0.143. Theprobability of a person suffering mental ill-ness if there is mental illness in the family is41/68 � 0.603.

The odds ratio for suffering mental illnessis obtained simply by dividing the larger

O

Table O.1 Mental illness and familybackground

No mental illness Mental illness in family in family

Person suffers 30 41mental illness

Person does 180 27not suffermental illness

Cramer Chapter-O.qxd 4/22/04 2:17 PM Page 115

Page 127: Dictionary of statistics

probability by the smaller probability. In ourexample, the odds ratio is 0.603/0.143 � 4.2.In other words, one is over four times morelikely to suffer mental illness if there is mentalillness in one’s family than if there is not.

The odds ratio is a fairly illuminating wayof presenting relative probabilities. It is afeature of logistic regression.

omnibus test: a test to determine whetherthree or more means differ significantly fromone another without being able to specify inadvance of collecting the data which of thosemeans differ significantly. It is the F ratio inan analysis of variance which determineswhether the between-groups variance issignificantly greater than the within-groupsor error variance. If there are good groundsfor predicting which means differ from eachother, an a priori or planned test should becarried out to see whether these differencesare found. If differences are not expected, posthoc or unplanned tests should be conductedto find out which means or combination ofmeans differ from each other.

one-sample t test: see t test for one sample

one-tailed level or test of statisticalsignificance: see chi-square; significant

one-way relationship: see uni-directionalrelationship

operationalization: a procedure for mea-suring or manipulating a variable. There isoften more than one way of measuring ormanipulating a variable. For example, thevariable of anxiety may be assessed throughself-report, ratings by observers, physiologicalindices, projective methods, behavioural

tests, and so on. It may also be manipulatedin various ways such as subjecting people tofrightening experiences or asking them toimagine being in frightening situations.While it is very difficult to provide concep-tual definitions of variables (e.g. just what isintelligence?), it is more practical to describethe steps by which a variable is measured ormanipulated in a particular study.

order effect: the order in which a participantetc. is required to undertake two or more tasksor treatments may affect how they respond.For example, participants may become lessinterested the more tasks they carry out orthey may become more experienced in whatthey have to do. When requiring participantsto conduct more than one task, it may beimportant to determine whether the order inwhich they carry out the tasks has any effecton their performance. If the number of tasksis very small, it is possible to have each taskcarried out in all possible orders. For example,with three tasks, A, B and C, there are six dif-ferent orders: ABC, ACB, BAC, BCA, CAB andCBA. This is known as counterbalancing and itassumes that by varying the order in such asystematic fashion, order effects are systemati-cally cancelled out. If there are a large numberof tasks, it may be useful to select a few ofthese either at random or on some theoreticalbasis to determine if order effects exist.Assessing order effects essentially requiresthat comparisons are made according to theorder of carrying out the tasks, not accordingto type of task.

ordinal level of measurement, scale orvariable: this is data which are collected inthe form of scores. Its distinguishing featureis that these scores only allow the data to beput into ranks (rank order). Rank order isbased on ordinal numbers which refer to the1st, 2nd, 3rd, 4th, 5th etc., scores. These repre-sent the smallest score to the largest one but itis not known how much the 3rd and5th scores actually differ from each other inany meaningful sense. The ordinal scale of

THE SAGE DICTIONARY OF STATISTICS116

Cramer Chapter-O.qxd 4/22/04 2:17 PM Page 116

Page 128: Dictionary of statistics

measurement is the basis for manynon-parametric tests of significance.Researchers rarely actually collect data thatare in the form of ranks. Ordinal measure-ment refers to the issue of the nature of themeasurement scale that underlies the scores.Some statisticians argue that the scores col-lected in the social sciences and some otherdisciplines are incapable of telling us any-thing other than the relative order of individ-uals. In practice, one can worry too muchabout such conceptual matters. What is ofimportance is to be aware that non-parametrictests, some of which are based on ranked data,can be very useful when one’s data are inade-quate for other forms of statistical tests. Seealso: chi-square; Kendall’s rank correlation;Kolmogorov–Smirnov test for one sample;measurement; partial gamma; score

ordinary least squares (OLS) regres-sion: see multiple regression

ordinate: the vertical or y axis on a graph. Itusually represents the quantity of the depen-dent variable. See also: y-axis

orthogonal: indicates that right angles areinvolved. Essentially it means in statisticsthat two variables (or variants of variablessuch as factors) are unrelated to each other.As such, this means that they can be repre-sented in physical space as two lines (axes) at90o to each other. See orthogonal rotation

orthogonal factors: see orthogonal rotation

orthogonal rotation: the process by whichaxes for pairs of variables which are uncorre-lated (at right angles) are displaced around acentre point from their original position while

retaining the right angle between them. Thisoccurs in factor analysis when the set of factorsemerging out of the mathematical calculationsis considered unsatisfactory for some reason. Itis somewhat controversial, but it should berecognized that the first set of factors to emergein factor analysis is intended to maximize theamount of variation extracted (or explained).However, there is no reason why this shouldbe the best outcome in terms of interpretationof the factors. So many researchers rotate thefirst axes or factors. Originally rotation wascarried out graphically by simply plottingthe axes and factors on graph paper, takingpairs of factors at a time. These axes are movedaround their intersection (i.e. the axes are spunaround). Only the axes move. Hence, therotated factors have different loadings on thevariables than did the original factors. Thereare various ways of carrying out rotation andvarious criteria for doing so. The net outcomeis that a new set of factors (and new factor load-ings) are calculated. Seem Figure O.1 (a) and (b)which illustrate an unrotated factor solutionfor two factors and the consequence of rotationrespectively. The clearest consequence is thechange in factor loadings. Varimax is probablythe best known orthogonal rotation method.See also: exploratory factor analysis; ortho-gonal rotation; varimax rotation, in factoranalysis

outliers: cases which are unusually extremeand which would have an undue and mis-leading influence on the interpretation of datain a naïve analysis of the data. They are gener-ally a relatively minor problem in a non-parametric analysis (because the scores are‘compacted’ by turning them into ranks). Inparametric analyses they can be much moreproblematic. They are particularly importantin bivariate analyses because of the extraweight placed on extreme values in calculat-ing covariance. Figure O.2(a) shows a scatter-gram for a poor or near zero correlationbetween two variables – variable A and vari-able B. Any attempt to fit a best fitting straightline is difficult because there is no apparenttrend in the data as the correlation is zero.Figure O.2(b) shows the same data but another

OUTLIERS 117

Cramer Chapter-O.qxd 4/22/04 2:17 PM Page 117

Page 129: Dictionary of statistics

case has been included. This shows thecharacteristics of an outlier – it is high onthe two variables (though equally an outliermay have low scores on the two variables). Amechanical application of statistical tech-niques (correlation and best fitting straightline) would lead to an apparent correlationbetween variables A and B and a line of best fitas shown in Figure O.2(b). However, we knowthat the correlation and significance are theresult of the outlier since without the outlierthere is no such trend. So a correct interpreta-tion of the data has to resist the temptation tointerpret the data in Figure O.2(b) as reflectinga positive trend. Unfortunately, unless aresearcher examines the scattergram, there isno automatic check for outliers. Some authors(e.g. Howitt and Cramer, 2004) recommend

running the parametric analysis (usingPearson’s correlation coefficient) in parallelwith a non-parametric analysis (e.g. usingSpearman’s correlation coefficient – a rankingtest). If both analyses produce the same broadfindings and conclusions then there is no prob-lem created by outliers. If the non-parametricanalysis yields substantially different findingsand conclusions to the parametric analysisthen the influence of outliers should be sus-pected. An alternative procedure is to deletethe suspected outliers to see whether thismakes a difference. If the complete data pro-duce significant findings and the partial analy-sis produces very different non-significant(or very much less significant) findings thenthe influence of outliers is likely. See also:transformations

over-identified model: a term used in struc-tural equation modelling to describe a modelfor which there is more information than nec-essary to identify or estimate the parameters ofthe model. See also: identification

THE SAGE DICTIONARY OF STATISTICS118

∗∗

∗∗∗

∗∗

∗∗∗

(a) Unrotated

(b) Rotated

Figure O.1 Unrotated and rotated factors

∗ ∗∗

∗∗∗

∗ ∗∗

∗∗∗

(b) With outlier

(a) Without outlier

Figure O.2 The effects of outliers

Cramer Chapter-O.qxd 4/22/04 2:17 PM Page 118

Page 130: Dictionary of statistics

paired-samples t test: see t test forrelated samples

pairing: see matching

pairwise deletion: a keyword used in somestatistical packages such as SPSS to restrict ananalysis to those cases for which there are nomissing values on the variable or variablesfor that analysis. When this is done wherethere are some missing values, the number ofcases may not be the same for all the analyses.Suppose, for example, we want to correlatethe three variables of job satisfaction, incomeand age in a sample of 60 people and thatthree cases have missing data for income anda different set of four cases have missing datafor age. The correlation between job satisfac-tion and income will be based on 57 cases,that between job satisfaction and age on56 cases and that between income and age on53 cases. An alternative procedure of analysisis to provide correlations for those caseswhich do not have missing values on any ofthose variables in which the number of caseswill be the same for all the analyses. The key-word for this procedure is listwise deletion ina statistical package such as SPSS. Using thisprocedure with this example, all the correla-tions will be based on 53 cases. See also: list-wise deletion

panel design or study: see cohort analysis;longitudinal design or study

parameter: a numerical characteristic of apopulation or a model. For example, themean and standard deviation are two para-meters of a population. The regressioncoefficient in a model is a parameter. Theterm statistic is used for the equivalentcharacteristics of samples rather thanpopulations.

parametric statistics: see parametric tests;ratio level of measurement

parametric tests: tests of significancewhich assume that the distribution of thepopulation values has a particular shapewhich is usually a normal distribution. Forexample, the t test, which is based on esti-mating the variance of the population,assumes that the population values are nor-mally distributed.

part correlation: see multiple correlation;semi-partial correlation coefficient

P

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 119

Page 131: Dictionary of statistics

partial correlation: the correlation betweentwo variables (a predictor and criterion or inde-pendent and dependent variable) adjusted for athird (control) variable. Partialling is a way ofcontrolling for the effects of other variables. It isused as a simple form of causal modelling.For example, it is believed that adult criminalityis determined by adolescent delinquency.Imagine the correlation between the two mea-sures is 0.5. This appears to support the model.However, it is possible that a better explanationof the relationship is that both of these arethe consequences of another (or third variable).People who live their lives in families that areprone to crime may themselves be encouragedin their criminal ways both as adolescents andas adults. What would be the correlationbetween adult and adolescent criminality iffamily criminality could be controlled (i.e. vari-ation in family criminality eliminated)? Thereare a number of ways of doing this.

One simple way is to separate familycriminality into different levels (high and lowwill do for illustration). If family criminalityis not involved then there should be a corre-lation of 0.5 between adolescent delinquencyand adult criminality for the high-criminalityfamilies and a similar correlation size for thelow-criminality families. If the two correla-tions become zero or near zero, this is evi-dence that criminality in the family isresponsible for the correlation between ado-lescent delinquency and adult criminality.

A more common method to achieve more orless the same end is the use of partial correla-tion. This can be used to control for a numberof control variables (five or six is probably thepractical limit). This is complex without acomputer package but simple enough if weonly wish to control for a single variable at atime. The formula is

In our example we know that adult criminal-ity (variable x) correlates with adolescentdelinquency (r � 0.5). Imaging that adultcriminality correlates 0.4 with family crimi-nality and adolescent delinquency correlates0.6 with family criminality. This could be pre-sented in a diagram such as Figure P.1. Thereare double arrows between the various pointsbecause it is not known what, if anything, isthe cause of what. Some points can be elimi-nated logically. For example, adult crimecould not cause adolescent delinquencybecause a cause must precede an effect.

Entering these values into the aboveformula:

As can be seen, the correlation has becomesmaller as a consequence of partialling (i.e.controlling for family crime). Neverthelessthe remaining correlation of 0.35 still indicatesa relationship between adolescent delinquency

THE SAGE DICTIONARY OF STATISTICS120

Correlationbetween xand y withvariable ccontrolled

Correlationbetween xand y

Correlationsof variables xand y withthe thirdvariable c

rxy.c �rxy � (rxc � ryc)

�1 � r2xc

�1 � r2yc

Adolescentdelinquency

Adultcriminality

Familycrime

0.5

0.40.6

Figure P.1 Correlations between three variables

0.5 � (0.6 � 0.4)rxy.c �

�1 � 0.62�1 � 0.42

0.5 � 0.24�

�1 � 0.36�1 � 0.16

0.26�

�0.64�0.84

0.26�

0.800 � 0.917

0.26�

0.734

� 0.35

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 120

Page 132: Dictionary of statistics

and adult crime. Consequently the net effectof controlling for the third variable in thisexample leaves a smaller relationship but onewhich still needs explaining. Table P.1 givessome examples of the possible outcomes ofcontrolling for third variables.

Partial correlation is a common technique instatistics and is a component of some impor-tant advanced techniques in addition. Becauseof the complexity of the calculation (i.e. thelarge number of steps) it is recommended thatcomputer packages are used when it is inten-ded to control for more than two variables.Zero-order partial correlation is another namefor the correlation coefficient between twovariables, first-order partial correlation occurswhen one variable is controlled for, second-order partial correlation involves controllingfor two variables, and so forth. The higherorder partial correlations are denoted byextending the notation to include more controlvariables. Consequently rxy.cde means the partialcorrelation between variables x and y control-ling (partialling out) for variables c, d and e.

Partial correlation is a useful technique.However, many of its functions are possiblybetter handled using, for example, multiple

regression. This allows for more subtlety ingenerating models and incorporating controlvariables. See also: Kendall’s partial rankcorrelation coefficient; multiple regression

partial gamma: a measure of partial associ-ation between two ordinal variables control-ling for the influence of a third ordinalvariable. It is the number of concordant pairsminus the number of discordant pairs for thetwo variables summed across the differentlevels of the third variable divided by thenumber of concordant and discordant pairssummed across the different levels of thethird variable. A concordant pair is one inwhich the first case is ranked higher than thesecond case, while a discordant pair is thereverse, in which the first case is rankedlower than the second case.

Cramer (1998); Siegal and Castellan (1988)

participant: the modern term for subject.The term participant more accurately reflects

PARTICIPANT 121

Table P.1 Some possible changes as a result of partialling and their implications

Initial correlation Correlation betweenbetween x and y x and y partialling for c Description Conclusion

rxy = 0.5 rxy.c = 0.5 (no change) There is no significant Partialling makes no differencechange following so c is neither the cause of partialling nor an intervening variable

in the relationship between x and y

rxy = 0.5 rxy.c = 0.0 Correlation declines c could be an antecedent*significantly or (cause) of the relationshipbecomes 0 between x and y; or it could

be an intervening variable inthe relationship

rxy = 0.0 rxy.c = 0.5 Correlation changes c is a suppressor variable infrom a low value to the relationship betweena high value after x and y, i.e. it hides thepartialling relationship which is revealed

when the effects of c areremoved

*For c to be an antecedent or cause of the relationship between x and y, one needs to ask questions about thetemporal order and logical connection between the two. For example, the number of children a person hascannot be a cause of that person’s parents’ occupation.The time sequence is wrong in this instance.

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 121

Page 133: Dictionary of statistics

the active role of the participant in theresearch process. The term subject impliessubjugation and obedience. Unfortunately,many terms in statistics use the term subject(e.g. related-subjects, between-subjects, etc.)which makes it difficult to replace the termsubject.

partitioning: another word for dividingsomething into parts. It is often used in analy-sis of variance where the total sum of squaresis partitioned or divided into its componentparts or sources such as the between- orwithin-groups sum of squares.

partitioning chi-square:denotes the processby which a large cross-tabulation/contingencytable is broken down into smaller tables. Thishelps the researcher to assess just exactlywhere differences are occurring. Large contin-gency tables such as 2 � 3 or 4 � 2 are difficultto analyse since differences may occur in justsections of the table. Other parts of the tablemay show no trends at all. Table P.2 gives a3 � 3 contingency table. It can probably beseen that sample 1 and sample 3 are virtuallyindistinguishable. In both cases apples arefavourites with bananas second. Sample 2seems very different. Bananas are by far thefavourite in sample 2 with apples hardlyselected by anyone. Through all of the samples,grapes show similar rates of choice. So itwould not really be correct to suggest that allof the samples differ from each other (even ifchi-square applied to these data were signifi-cant) since differences occur in only parts ofthe table.

One solution to identifying precisely wherein the table a difference occurs is to partitionthe contingency table into a number ofsmaller tables. Each of these smaller tables isthen interpretable since it will produce eithersignificant or non-significant results. Table P.3shows one smaller table that the data could bepartitioned into. Here we have a 2 � 2 tablethat is not ambiguous as to interpretationsince sample 1 and sample 2 are different inrespect of what fruits they like.

Such partitioning does only what commonsense tells us to do when examining some-thing as complex as Table P.2. We simply lookat a smaller part of the whole. A 2 � 2 table isthe smallest unit for analysing two variablesat a time. The chi-square based on this 2 � 2table gives its significance level.

The main difficulties in employing parti-tioning lie in the quantity of partitioned tablesthat may be produced and the consequentquestion of what the correct significance levelshould be for each one. One solution is to usethe Bonferroni inequality, which gives themaximum number of outcomes that would beexpected by chance with the increased testingrate. In effect this means taking the exact sig-nificance levels of the 2 � 2 chi-squares andmultiplying by the number of times the origi-nal contingency table was partitioned.

Partitioning may help reduce erroneousinterpretations of large chi-square or contin-gency tables. Typical among the errors is theassumption that significant values of chi-squarefor large tables indicate that that all groups orsamples are different from each other. The dif-ferences may only occur in parts of the data.

path analysis: an analysis in which three ormore variables are ordered by the researcherin terms of their presumed causal relationships.

THE SAGE DICTIONARY OF STATISTICS122

Table P.2 Data on favourite fruits in threesamples

Favourite fruit Sample 1 Sample 2 Sample 3

Apple 35 4 32Banana 22 54 23Grape 12 16 13

Table P.3 A small table taken fromTable P.2 to illustrate onepartition of the data

Favourite fruit Sample 1 Sample 2

Apple 35 4Banana 22 54

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 122

Page 134: Dictionary of statistics

The causal relationship between the variablesis more readily understood when presented interms of a path diagram, in which arrows areused to indicate the direction and existence ofa relationship between variables. Variablesmay be ordered in more than one way. Eacharrangement of variables is called a model.The size and sign of these relationships areestimated. The simplest way of estimatingthese relationships is with correlation and mul-tiple regression. However, these methods donot take account of the fact that the reliabilityof the measurement of the variables may dif-fer. The relationship for a measure with lowerreliability will be less strong than for one withhigher reliability. Consequently, the relation-ships for less reliable measures may be under-estimated. Structural equation modellingtakes account of measurement error and so isa more appropriate method for estimatingthese relationships although it is more compli-cated. It also provides an index of the extent towhich the relationships assumed to hold fitthe data. A variable where no account is takenof its reliability is known as a manifest vari-able and may be represented by a square orrectangle in a path diagram. A variable whereits reliability is taken into consideration isknown as a latent variable and may be repre-sented by a circle or ellipse. See also: endoge-nous variable; exogenous variable; structuralequation modelling

Cramer (2003)

path diagram: shows the presumed causalrelationships between three or more variables.An example of a path diagram in presented inFigure P.2. The variables are usually orderedfrom left to right in terms of their causalsequence, a uni-directional causal relation-ship being indicated by a straight, singleright-pointing arrow. So variables to the left ofother variables are assumed to influence vari-ables to their right. For example, variables aand b influence variable c which in turn influ-ences variable e. There is no causal relation-ship between variable b and variable e orbetween variable b and variable f becausethere is no arrow between them. There is anindirect relationship between b and e which is

mediated through c. There is a direct and anindirect relationship between a and c. Theindirect relationship is also mediated throughc. A reciprocal or bi-directional relationshipbetween two variables is indicated by twoarrows between them, such as those betweene and f indicating that e influences f and finfluences e. A curved line with an arrow ateither end, such as that between a and b,shows that two variables are related to oneanother but there is no causal relationshipbetween them. Although not shown here, avariable enclosed in a square or rectangle is amanifest variable and one enclosed in a circleor ellipse is a latent variable. See also: mani-fest variable; path analysis

Cramer (2003)

Pearson chi-square: see chi-square; likeli-hood ratio chi-square

Pearson’s correlation coefficient: seecorrelation; eta; outliers; phi coefficient;point–biserial correlation coefficient; T2

test; z test

Pearson’s partial correlation: see partialcorrelation

Pearson’s product moment correlationcoefficient: see correlation

PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT 123

a

c e

f

b

d

Figure P.2 An example of a path diagram

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 123

Page 135: Dictionary of statistics

percentage: the proportion or fraction ofsomething expressed as being out of 100 –literally ‘for every hundred’. So another wayof saying 65% is to say sixty-five out of everyhundred or 65

100 . This may be also expressed as65 per cent. The way of calculating percent-ages is the fraction � 100. Thus, if a samplehas 25 females out of a total of 60 cases, thenthis expressed as a percentage is 25

60 �100 � 0.425 � 100 � 42.5%.

percentage probability: probabilities arenormally expressed as being out of one event, –for example one toss of a coin or die. Thus, theprobability of tossing a head with one flip of acoin is 0.5 since there are two equally likelyoutcomes – head or tail. Percentage probabili-ties are merely probabilities expressed out of100 events. Thus, to calculate a percentageprobability, one merely takes the probabilityand multiplies it by 100. So the percentageprobability of a head on tossing a coin is 0.5 �100 � 50%. See also: probability

percentiles: if a set of scores is put in orderfrom the smallest to the largest, the 10 per-centile is the score which cuts off the bottom10% of scores (the smallest scores). Similarly,the 30th percentile is the score which cuts offthe bottom 30% of scores. The 100th per-centile is, of course, the highest score by thisreasoning. The 50th percentile is also knownas the median. Percentiles are therefore a wayof expressing a score in terms of its relativestanding to other scores. For example, in itselfa score of 15 means little. Being told that ascore of 15 is at the 85th percentile suggeststhat it is a high score and that most scores arelower.

There are a number of different ways ofestimating what the percentile should bewhen there is no score that is actually thepercentile. These different methods may giveslightly different values. The followingmethod has the advantage of simplicity. Wewill refer to the following list of 15 scores inthe estimation:

1 Order the scores from smallest to largest(this has already been done for the listabove).

2 Count the number of scores (N). This is15.

3 To find the percentile (p) calculate p �(N � 1). If we want the 60th percentile,the value of p is 0.6 (i.e. the numericalvalue of the percentile required dividedby 100).

4 So with 15 scores, the calculation givesus 0.6 � (15 � 1) � 0.6 � 16 � 9.6.

5 The estimate of the 60th percentile isfound by splitting the 9.6 into an integer(9) and a fraction (0.6).

6 We use the integer value to find the scorecorresponding to the integer (the Ithscore). In this case the integer is 9 and sowe look at the 9th score in the orderedsequence of scores. This is the score of 14.

7 We then find the numerical differencebetween the 9th score and the nexthigher score (the score at position I � 1)in the list of scores. The value of thisscore is 15. So the difference we calculateis the difference between 14 and 15 inthis case. Thus, the difference is 1.

8 We then multiply this difference by the0.6 that we obtained in step 5 above. Thisgives us 1 � 0.6 � 0.6.

9 We then add this to the value of the 9thscore (i.e. 14 � 0.6) to give us the esti-mated value of the percentile. So, thisestimated value is 14.6.

10 Quite clearly the 60th percentile had tolie somewhere between the scores of 14and 15. However, this estimate places itat 14.6 rather than 14.5 which would bethe value obtained from simply splitting(averaging) the scores.

Percentiles are a helpful means of presentingan individual’s score though they havelimited utility and show only a part of thetotal picture. More information is conveyedby a frequency distribution, for example.Percentiles are a popular way of presentingnormative data on tests and measurementswhich are designed to present informationabout individuals compared with the general

THE SAGE DICTIONARY OF STATISTICS124

1, 3, 4, 4, 6, 7, 8, 9, 14, 15, 16, 16, 17, 19, 22

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 124

Page 136: Dictionary of statistics

population or some other group. Quartilesand deciles are related concepts. See also: boxplot

perfect correlation: a correlation coeffi-cient of 1.00 or �1.00 in which all the pointsof the scattergram for the two variables lieexactly on the best fitting straight linethrough the points. If a perfect correlation isobtained in a computer analysis, it is proba-bly worthwhile checking the correspondingscattergram (and the other descriptive statis-tics pertinent to the correlation) as it may besuspiciously high and the result of, say, somesort of coding error. A perfect correlationcoefficient is an unrealistically high ideal inmost disciplines given the general limitedadequacy of measurement techniques. Henceif one occurs in real data then there is goodreason to explore why this should be the case.Figure P.3 gives the scattergram for a perfectpositive correlation – the slope would bedownwards from the left to the right for aperfect negative correlation.

permutation: a sequence of events in prob-ability theory. For example, if the sequence ofevents were the toss of a coin, then one per-mutation of events might be T, T, T, T, T, H, H,

H. This is a distinct permutation from theseries T, H, T, T, H, T, H, T despite the fact thatboth series contain exactly three heads andfive tails each. (If we ignore sequence, thenthe different possibilities are known as com-binations.) Permutations of outcomes such astaking a name from a hat containing initiallysix names can be calculated by multiplyingthe possibilities at each selection. So if wetake out a name three times the number ofdifferent combinations is 6 � 5 � 4 � 120.This is because for the first selection there aresix different names to choose from, then atthe second selection there are five differentnames (we have already taken one name out)and at the third selection there are four dif-ferent names. Had we replaced the name intothe hat then the permutations would be 6 �6 � 6. See also: combination

Permutations are rarely considered in routinestatistical analyses since they have more to dowith mathematical probability theory thanwith descriptive and inferential statisticswhich form the basis of statistical analysisin the social sciences. See also: combination;sampling with replacement

phi coefficient: a variant of Pearson’s corre-lation applied to two variables each of whichtakes one of two values which may be coded

PHI COEFFICIENT 125

In a perfect correlation each of thedata points fit perfectly on the bestfitting straight line

00

X

Y

2 4 6 8 10 1412

10

20

30

Figure P.3 Illustrating a perfect correlation coefficient

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 125

Page 137: Dictionary of statistics

1 or 2 (or any other pair of values for thatmatter). As such, its most common applica-tions are in correlating two items on a ques-tionnaire where the answers are coded yes orno (or as some other binary alternatives), andin circumstances in which a 2 � 2 chi-squarewould alternatively be employed. While thereis a distinct formula for the phi coefficient, it issimply a special case of Pearson’s correlationwith no advantage in an era of computer sta-tistical packages. In cases where it needs to becomputed by hand, simply applying Pearson’scorrelation formula to the variables coded asabove gives exactly the same value. Its advan-tage was that it made the calculation of the phicoefficient a little more speedy, which savedconsiderable time working by hand on largequestionnaires, for example. See also: correla-tion; Cramer’s V; dummy coding

pictogram: essentially a form of bar chart inwhich the rectangular bars are replaced by apictorial representation appropriate to themeasures being summarized. For example,the frequencies of males and females in asample could be represented by male andfemale figures of heights which correspond tothe frequencies. This is illustrated in Figure P.4.Alternatively, the figures could be piledabove each other to give an equivalent, butdifferent, representation.

Sometimes a pictogram is used to replace abar chart, though attempts to do this arelikely to be less successful. A frequent diffi-culty with pictograms lies in their lack of cor-respondence to the bars in a bar chart exceptin terms of their height. Classically, in a barchart the area of a bar actually relates directlyto the number of cases. Hence, typically thebars are each of the same width. This corre-spondence is lost with any other shape. Thiscan be seen in Figure P.4 where the male fig-ure is not just taller than the female figure,but wider too. The net effect is that the malefigure is not about twice the height of thefemale, but four times the area. Hence, unlessthe reader concentrates solely on height inrelation to the frequency scale a misleadingimpression may be created. Nevertheless, apictogram is visually appealing, communicates

the nature of the information fairly well, andappeals to non-statistically minded audi-ences. See also: bar chart

pie chart: a common way of presenting fre-quencies of categories of a variable. The com-plete data for the variable are represented bya circle and the frequencies of the subcate-gories of the variable are given as if they wereslices of the total pie (circle). Figure P.5 illus-trates the approach although there are manyvariations on the theme. The size of the sliceis determined by the proportion of the totalthat the category represents. For example, if130 university students are classified asscience, engineering or arts students and 60of the students are arts students, they wouldbe represented by a slice which was 60

130of the

total circle. Since a circle encompasses 360o,the angle for this particular slice would be60130

� 360o � 0.462 � 360o � 166 o.Pie diagrams are best reserved, in general,

for audio-visual presentations when theirvisual impact is maximized. They are muchless appropriate in research reports and thelike where they take up rather a lot of spacefor the information that they communicate.Pie diagrams are most effective when thereare few slices. Very small categories may becombined as ‘others’ to facilitate the clarity ofthe presentation. See also: bar chart; fre-quency distribution

Pillai’s criterion: a test used in multivariatestatistical procedures such as canonical

THE SAGE DICTIONARY OF STATISTICS126

��Frequency

Figure P.4 Pictogram indicating relative frequenciesof males and females

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 126

Page 138: Dictionary of statistics

correlation, discriminant function analysisand multivariate analysis of variance to deter-mine whether the means of the groups differon a discriminant function or characteristicroot. This test is said to be more robust thanWilks’s lambda, Hotelling’s trace criterion andRoy’s gcr criterion when the assumption of thehomogeneity of the variance–covariancematrix is violated. This assumption is morelikely to be violated with small and unequalsample sizes. See also: Wilks’s lambda

Tabachnick and Fidell (2001)

planned comparison or test: see analysisof variance

platykurtic: see kurtosis

point–biserial correlation coefficient: avariant of Pearson’s correlation applied incircumstances where one variable has twoalternative discrete values (which may becoded 1 and 2 or any other two differentnumerical values) and the other variable hasa (more or less) continuous distribution ofscores. This is illustrated in Table P.4. The dis-crete variable may be a nominal variable solong as there are only two categories. Hence a

variable such as gender (male and female) issuitable to use as the discrete variable. Seealso: dummy coding

Pearson’s correlation (which is also knownas the point–biserial correlation) between thenumbers in the second and third columns ��0.70. The minus is purely arbitrary and theTable P.4 of consequence of the arbitrary valuefor females being lower than that for males, butthe females typically have the higher scores onlinguistic aptitude. See also: correlation

point estimates: a single figure estimaterather than a range. See also: confidenceinterval

POINT ESTIMATES 127

Pie-diagrams may also include the frequencyin each slice or percentage frequencies. Thisdiagram is clear because of the small numberof slices. Slices may be ‘exploded’ out foremphasisOther problem

RelationshipWork related

Emotional

Figure P.5 Types of cases taken on by psychological counsellors

Table P.4 Illustrating how a binary nominalvariable may be recoded as a scorevariable to enable a correlationcoefficient to be calculated

Gender Recode gender Linguistic aptitude (discrete 1 for female, (continuous variable) 2 for male variable)

Female 1 49Female 1 17Male 2 15Male 2 21Male 2 19Female 1 63Female 1 70Male 2 29

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 127

Page 139: Dictionary of statistics

pooled variances: combined or averagedvariances. Variances are pooled to provide abetter population estimate in the t test forunrelated samples where the variances areequal but the number of cases is unequal in thetwo groups. The formula for pooling variancesis as follows where the size of the sample isindicated by n and the two samples arereferred to by the two subscripts 1 and 2:

See also: variance

population: a deceptively simple conceptwhich actually means two quite distinctthings in research and statistics respectively.Hence, it can cause a great deal of confusion.In research, population means much the sameas in everyday language – a population is allof a particular type of individual. This may belimited by geographical location or one ormore other characteristics. So populationswould include all children attending schoolsin London, or all gay people in the country,and so forth. The nature of a populationchanges according to the nature and purposeof the research.

In statistics, this sort of concept of popula-tion would be somewhat cumbersome.Consequently, the term is applied to a vari-able rather than an identifiable group of indi-viduals or cases. So a population in statisticsreally refers to a population of scores on aparticular variable. This could be the IQs ofthe geographical grouping which is the popu-lation of the British Isles in everyday terms.But it can be much more abstract than that.Variables often do not exist separate from theprocess of measuring them, so populations instatistics can be difficult to define. That is,until a researcher invents, say, an attitudescale and gives it to a sample, the variablewhich the scale measures simply does notexist. Each time a researcher tries to measuresomething new with a sample of people, a newpopulation is created. Nevertheless, despitethe intangible nature of some populations in

statistics, the concept of population is essentialsince it describes the wider set from whichthe research’s sample is regarded as beingdrawn. See also: convenience sample; esti-mated standard deviation; parameter; sam-ple; significant

population standard deviation: see stan-dard deviation

population variance: see variance orpopulation variance

positivism: a major aspect of the philosophyof science. It has as its basic principle thatknowledge should be obtained from observa-tion as a means of assessing the value of ournotions about the nature of things. It then isvery different from non-scientific thoughtsuch as theism, which holds that knowledgeresides in religion and religious teachings,and metaphysics, which holds that knowl-edge emerges from philosophical thoughtabout issues. It has its origins as a formalphilosophy of science in the work of AugustComte in the nineteenth century. Someresearchers virtually equate statistical analy-sis with positivism though there is little instatistics itself (rather than the way it hasbeen applied) which justifies this. Possiblythe major difficulty that positivism has pre-sented is that it has historically been associ-ated with the idea of universalism, which isthe view that the object of science is todevelop universal laws of nature which applyanywhere – such that the principles of gravityshould apply throughout the universe.Unfortunately, universalism when applied tohuman nature does not work very well.Human beings operate in the context of a cul-tural system such that cultural factors need tobe taken into account when studying societyand human activity. To some extent, then,social science needs to be culturally specific:in other words, not universal. Classic examples

THE SAGE DICTIONARY OF STATISTICS128

�[variance1 � (n1 � 1)] � [variance2 � (n2 � 1)]�� �n1 �n2 � 2 n1 n2

�11

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 128

Page 140: Dictionary of statistics

of universalism are abundant in psychology,for example, when the researcher seeks to dis-cover ‘the laws of human behaviour’. Labora-tory experiments encourage such an enterprisesince they divorce the participants fromthe context in which they usually operate. Ina nutshell, the subject statistics does notequate to positivism although it has tradi-tionally been a common feature of positivisticresearch.

Owusu-Bempah and Howitt (2000)

post hoc, a posteriori or unplanned testsor multiple comparison tests: statisticaltests applied after the data have been col-lected without any prior consideration ofwhat comparisons are essential to the researchquestions. They are comparisons done afterthe facts (data) have been collected and whereoverall the analysis of variance has indicatedthat the null hypothesis that there is no differ-ence between the means is not sustainable (i.e.where the F ratio is statistically significant).The point is that statistical significance merelymeans that some but not necessarily all of thetreatment means are unequal. The question iswhich ones are unequal? The post hoc testsallow comparisons between pairs of treat-ment means in order to indicate which of thepairs of means are statistically significantlydifferent from each other.

The risk is, of course, simply dredgingthrough the data in order to find significantfindings without making any allowance forthe fact that the more significance tests onedoes the more likely one is to obtain signifi-cance due to chance rather than ‘true’ effects.Hence, to do repeated t tests, for examplebetween all possible pairs of means, isfrowned upon. One can employ the Bonferronicorrection, which basically says that if onedoes three t tests, for example using a 5% sig-nificance level, this is in the worst instancelike really testing at the 15% level of signifi-cance. That is, the 15% level of significance isextremely conservative and is the highestvalue it could be.

So all post hoc tests make some correctionfor the number of comparisons involved.These adjustments tend to be rather large and

are described as being conservative (i.e.tending to support the null hypothesis of nodifference). There is a variety of post hoc testswhich is rather bewildering. A lack of con-sensus about where and when to apply thedifferent measures is another limitation.Some have limitations such as being inapplic-able where the groups are of different size(e.g. Tukey’s HSD test) and others areregarded as being too biased towards detect-ing no differences (e.g. the Scheffé test). A rea-sonable recommendation, given that few willcalculate the values of post hoc tests withoutthe use of a computer, is to do a range of posthoc tests. Where they all indicate the sameconclusions then clearly there is no problem.If they indicate very different conclusions fora particular set of data, then the reasons andimportance of this have to be assessed.However, in this way the problem has beenidentified as existing. See also: analysis ofvariance; Bonferroni test; Duncan’s newmultiple range test; Fisher’s LSD test;Gabriel’s simultaneous test procedure;Games–Howell multiple comparison pro-cedure; Hochberg GT2 Test; multiple com-parison tests; Newman–Keuls method;Ryan F and Q tests; Scheffé test;Tamhane’sT2 multiple comparison test; Tukeya andTukeyb tests; Tukey–Kramer test; Waller–Duncan t test

post-test: the measurement made immedi-ately after the experimental treatment or thecontrol for the experimental treatment hasbeen made. See pre-test

power: the number of times that a quantityor number is multiplied by itself. It is usuallywritten as an exponent, which is a number orsymbol placed above and to the right of thatquantity. For example, the exponent 2 in theexpression 32 indicates that the quantity 3 israised or multiplied to the second power orthe power of 2, which is 3 � 3. The exponent3 in the expression 33 indicates that the quan-tity 3 is raised to the third power or thepower of 3, which is 3 � 3 � 3. See exponent

POWER 129

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 129

Page 141: Dictionary of statistics

power of a test: the probability of a statisticaltest of finding a relationship between twovariables when there is such a relationship.The maximum power a test can have is 1 andthe minimum 0 with 0.80 indicating anacceptable level of power. The power of a testis generally greater for one- than two-tailedhypotheses, parametric than non-parametrictests, lower (e.g. 0.05) than higher (e.g. 0.001)significance levels and larger samples.

predictive validity: the extent to which avariable predicts or is related to another vari-able which is measured subsequently. It maybe distinguished from concurrent validity inwhich the two variables are measured at thesame or similar time.

predictor variable: often used to refer to thevariables in a multiple regression which areemployed to predict the values of the criterionor criterion variable. This criterion variable maybe measured at the same time as some or all ofthe predictor variables or it may be measuredsubsequently to them. This term is also usedmore generally to refer to variables that areemployed to predict a variable which is mea-sured subsequently. See also: log likelihood;logistic regression; regression equation

pre-test: a measurement stage preceding theadministration of the experimental treatment.It provides a baseline measurement againstwhich change due to the experimental treat-ment can be assessed. Without a pre-test, it isnot possible to know whether scores haveincreased, stayed the same or reduced. It alsoshows whether the means of the groups aresimilar prior to the subsequent measurement.If the pre-test means differ significantly and ifthe pre-test is correlated with the post-test,these pre-test differences need to be taken intoaccount when examining the post-test differ-ences. The recommended statistical test fordoing this is analysis of covariance.

Research designs involving pre-tests are notwithout their problems. For one thing, the pre-test may sensitize the participants and affectthe degree of influence of the experimentaltreatment (Figure P.6). For example, if thestudy is about changing attitudes, forewarn-ing participants by giving them a pre-test mea-sure of their attitudes may make them realizethat their susceptibility to influence is beingassessed. Consequently, they may try theirhardest not to change their attitude under theexperimental treatment. Hence, sometimes apre-test design also includes additional groupswhich are not pre-tested to see whether thepre-test may have had an influence. See also:baseline; quasi-experiments

pre-test–post-test design: involves a pre-test and a post-test. See pre-test

principal axis factoring: a form of factoranalysis in which only the variance sharedbetween the variables is analysed. Variancewhich is unique to a variable or is error is notanalysed. The shared variance or communal-ity can vary from a minimum of 0 to a maxi-mum of 1. It is generally less than 1.

principal components analysis: a form offactor analysis in which all the variance of thevariables is analysed. The communality ofeach variable is 1.

probability: the mathematical chance orlikelihood that a particular outcome will

THE SAGE DICTIONARY OF STATISTICS130

Pre-test Post-test

Experimental treatment or merelypassage of time

Figure P.6 The pre-test design

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 130

Page 142: Dictionary of statistics

occur. It is expressed in terms of the ratio of aparticular outcome to all possible outcomes.Thus, the probability of a coin landing headsis 1 divided by 2 (the possible outcomes areheads or tails). Probability is expressed perevent so will vary between 0 and 1. Probabili-ties are largely expressed as decimals suchas 0.5 or 0.67. Sometimes probabilities areexpressed as percentage probabilities, whichis simply the probability (which is out of asingle event) being re-expressed out of 100events (i.e. the percentage probability is theprobability � 100).

The commonest use of probabilities is insignificance testing. While it is useful toknow the basic ideas of probability, manyresearchers carry out thorough and appropri-ate analyses of their data without using otherthan a few very basic notions about probabil-ity. Hence, the complexities of mathematicalprobability theory are of little bearing on thepractice of statistical analysis of psychologi-cal and social science data.

probability level: see significant

probability sampling: a form of samplingin which each case or element in the popula-tion has a known probability of beingincluded in the sample. In the simplest situa-tion of simple random sampling each casehas the same probability of being sampled. Instratified random sampling each case maynot have the same probability of beingincluded as the population is divided intodifferent strata or groups which may not

have the same probability of being chosen.Probability sampling may be distinguishedfrom non-probability sampling in which theprobability of a case being selected is unknown.Non-probability sampling is often euphemis-tically described as convenience sampling.

probability theory: see Bayesian infer-ence; Bayes’s theorem; odds; percentageprobability; probability

proband: see subject

proportion: the frequency of cases in a cat-egory divided by the total number of cases.It varies from a minimum of 0 to a maximumof 1. For example, if there are 8 men and12 women in a sample, the proportion of menis 0.40 (8/20 � 0.40). A proportion may beconverted into a percentage by multiplyingthe proportion by 100. A proportion of 0.40represents 40% (0.40 � 100 � 40).

prospective design or study: see cohortanalysis; longitudinal design or study

pseudo-random number or tables: seerandom number tables

PSEUDO-RANDOM NUMBER 131

Cramer Chapter-P.qxd 4/22/04 2:18 PM Page 131

Page 143: Dictionary of statistics

Q analysis, methodology or technique:the usual strategy in factor analysis is to lookfor patterns in the data supplied by the par-ticipants. For example, a questionnaire maybe factor analysed to see what groupings ofquestions tend to be responded to similarly.An alternative to this involves looking forpatterns in the participants. That is, to formthe factors on the basis of which people showsimilar patterns in their replies to the ques-tions compared with others in the sample. Inits simplest form, this involves producing acorrelation matrix in which the different par-ticipants are the variables and their answersto the questions are the equivalent of differ-ent cases in normal factor analysis. In otherwords, the data are entered such that therows become the columns and the columnsbecome the rows compared with normal fac-tor analysis. This will then generate a correla-tion matrix of correlations between peoplerather than one of correlations between vari-ables. The factors are then interpreted interms of the individuals who load highly ona factor (rather than variables which loadhighly on a factor).

It is possible to carry out Q analysis on anydata which have been analysed using factoranalysis by simply using the facility of com-puter packages such as SPSS to transpose thedata matrix (spreadsheet). This option essen-tially moves the matrix through 90� so thatthe rows become the columns and vice versa.The transposed matrix can then be factoranalysed yielding Q factors. See also: explo-ratory factor analysis

qualitative research: generally researchwhere there is little or no attempt to summarizethe data and/or describe the relationshipsfound using numbers. It is difficult to discussqualitative research briefly since it variesenormously in its style, scope, methods andtheoretical underpinnings. Some researchersregard it as a prelude to quantitative researchin which the researchers familiarize them-selves with the matter of their research withthe intention of developing much more struc-tured ways of collecting data. Other researchersregard qualitative methods as a means of cor-recting what they see as the fundamentalerrors in quantification. Qualitative researchis largely concerned with the development ofcoding categories to describe fully the data inquestion. In grounded theory, for example,the researcher employs data analysis meth-ods which essentially require numerous revi-sions of codings of textual data in order toforce a close fit between the data and the ana-lytic scheme being developed.

In terms of statistical analysis, in many waysthe distinction between qualitative and quanti-tative data analysis is misleading. Researchwhich analyses data in terms of coding cate-gories is potentially amenable to the use of sta-tistical techniques for nominal (category orcategorical) data. This form of data has some-times been referred to as qualitative data in thestatistical literature. In this sense, qualitativeresearchers may not always be right to rejectstatistics as part of the analysis of their data.

There are lessons for quantitative researchersto learn from qualitative research. Perhaps

Q

Cramer Chapter-Q.qxd 4/22/04 2:18 PM Page 132

Page 144: Dictionary of statistics

the most important of these is the importanceof closely matching the analysis to the detailof the data. It is interesting to note thatexploratory data analysis stresses this sort ofdetailed fit in quantitative data analysis.

qualitative variables: see categorical(category) variable; frequency distribution

quantitative measurement: seemeasurement

quantitative research: generally researchwhere there is some attempt to summarizethe data and/or describe the relationshipsfound using numbers. Statistical analysis iscertain to accompany such data collection.

quantitative variables: see categoricalvariable; frequency distribution

quartile deviation or semi-interquartilerange: a measure of dispersion (spread ofdata) which is sometimes used instead of theinterquartile range. It is the interquartilerange divided by 2.

quartiles: scores which cut off the bottom25%, 50% and 75% of scores in a sequence ofscores ordered from the smallest to thelargest. They are known as the first, secondand third quartiles. They refer to specific val-ues of scores. The second quartile is alsoknown as the median. They are calculated inthe same way as the 25th percentile, the 50thpercentile and the 75th percentile would be(see percentiles). The interquartile range isthe range of scores from the 25th percentile to

the 75th percentile or the 1st quartile (lowerquartile) to the 3rd quartile (upper quartile).See also: interquartile range; percentile

quasi-experiments: research designswhich may be regarded as close to proper ortrue experiments. In many areas of research,it is not possible to allocate participants ran-domly to experimental and control con-ditions in order to study the influence of theindependent variable. That is, true experi-ments are not practical or feasible or theresearcher may be unsympathetic to the useof invasive experimental techniques whenstudying real-world phenomena. Quasi-experiments are a number of research designswhich by various means attempt to improvethe researcher’s ability to identify causalinfluences compared with a correlationalstudy. There may be ethical difficulties in ran-dom assignment (e.g. in a study of the effectsof medical treatment, to withhold treatmentfrom a control group may result in deaths orother serious consequences for the controlgroup). There may be practical difficulties.For example, in a study of the effects of man-agement style in different organizations,companies may be unwilling to take thefinancial risk associated with changing theirmanagement style in accordance with thedesires of researchers. Some random alloca-tions are also simply impossible – people can-not be assigned to different gendersrandomly. Laboratory-based research is themost amenable in general to randomizedexperiments, whereas field-based disciplinesmay find such designs inappropriate.

Quasi-experiments attempt to emulate thestrengths of randomized experiments by thefollowing means:

1 By obtaining pre-test (baseline) measure-ments on the dependent variable, it ispossible to adjust statistically for pre-testdifferences between the groups in thestudy. In other words, there are experi-mental and control groups but participantscould not be randomly assigned so areallocated to the groups on some other

QUASI-EXPERIMENTS 133

Cramer Chapter-Q.qxd 4/22/04 2:18 PM Page 133

Page 145: Dictionary of statistics

THE SAGE DICTIONARY OF STATISTICS134

basis. Sometimes the research will employpre-existing groups such as when, say,students at two different types of schoolare compared by school type.

2 By observing the effects of interventionsover a series of temporal points (timeseries), it is possible to assess whetherchange occurs following the intervention.For example, if a researcher was studyingthe effects of the introduction of incen-tives on office morale, the morale of officeworkers could be assessed monthly for ayear. At some stage (perhaps at random),the management would introduce theincentive scheme. Later it could be removed.And perhaps later still reintroduced. Onewould expect that if the incentive schemewas effective, morale would increasewhen it is introduced, decrease when itis taken away, and increase on itsreintroduction.

Variations are possible and, for example, thetime-series feature may be combined with acontrol time series which does not havethe intervention. Since quasi-experimentaldesign is a term which covers a large propor-tion of research designs which are not experi-ments, it is a major aspect of research design.Of course, there is a lot of research whichdoes not set out to address issues of causality.

questionnaire: a list of written questionsor statements which is given to a person to

complete (self-completion questionnaire) ormay be read to the participant by a researcheror interviewer who records their response.Questionnaires vary greatly in their degree ofstructure. Perhaps the majority of question-naires structure the replies by supplying arange of alternative responses from which theparticipant chooses.

quota sampling: a non-random sample ofa population which has been divided intovarious categories of respondent (e.g. malesand female, employed and unemployed). Thepurpose of quota sampling is to ensure thatcertain categories of person are included insufficient proportions in the data. For each ofthe categories a certain number or quota ofcases are specified and obtained. This ensuresthat there is sufficient cases in each of thesecategories. (Random sampling, in theory,may result in inadequate samples for somepurposes. Random sampling from a popula-tion containing equal numbers of males andfemales may result in a sample with few males.)For example, we may want to have a certainnumber of gay and lesbian people in the sam-ple. Samples obtained in this way are nor-mally not random samples and are obtainedthrough a range of practical means. Conse-quently, we do not know how representativethese cases are of their categories.

Cramer Chapter-Q.qxd 4/22/04 2:18 PM Page 134

Page 146: Dictionary of statistics

R analysis, methodology or technique:R equates to regular factor analysis in whichthe correlations between variables are fac-tored and the factors account for clusters ofvariables measuring similar things. This isdifferent from the Q technique in which theobjective is to identify groups of cases whichare highly similar to each other. In otherwords, R analysis groups variables, Q analy-sis groups cases.

random allocation: see randomization

random assignment:see quasi-experiments;randomization

random effects model: a term used inexperimental design and the analysis ofvariance to denote the selection of levels ofthe experimental treatment at random. Itrequires selection from the full possible rangeof levels. It is relatively uncommon to find itemployed in practice but it is of conceptualimportance. The difficulties include that thefull range of treatments may not be known(especially if the independent variable isessentially nominal categories rather thanparticular values of a score). More generally,researchers tend to use a fixed effects modelin which the researcher selects which levels

of treatment to use on the basis of somereasoned decisions (which do not involverandom selection). The type of model has abearing on ANOVA calculations in particular.Generally speaking, the random effectsmodel is rarely employed in branches ofmany social science disciplines simplybecause of the practical and conceptual diffi-culties of identifying different levels of theindependent variable from which to samplerandomly. See also: fixed effects; levels oftreatment

random number tables: sometimesknown as pseudo-random number tables.These were particularly useful in the pastwhen researchers did not have ready accessto computers capable of generating randomsequences. As the name implies, these aretables of random sequences of the digits 0, 1,2, 3, 4, 5, 6, 7, 8 and 9. Typically they consistof many pages of numbers. Usually thesequences are listed in columns with, say,every six digits separated from the next by aspace. For example,

901832 143912 861543 672315 531250167321 784321 422649 216975 356217636298 157823 916421 436126 752318264843 652365 132863 254731 534174

To use the table, typically the researcherwould choose a starting point (with eyes shutusing a pin, for example) and a pre-specifiedinterval. So if the starting point selected was

R

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 135

Page 147: Dictionary of statistics

the 8 underlined above and numbers in therange 01 to 90 were needed for selection, thenthe first number would be 84. Assume thatthe interval selected was seven digits. Theresearcher would then ignore the next sevendigits (4321 422) and so the next chosen num-ber would be 64 and the one after that 35. If aselection were repeated by chance, then theresearcher would simply ignore the repetitionand proceed to the next number instead. Solong as the rules applied by the researcher areclear and consistently applied then thesetables are adequate. The user should not betempted to depart from the procedure chosenduring the course of selection in any circum-stances otherwise the randomness of theprocess is compromised. The tables are some-times known as pseudo-random numberspartly because they were generated usingmathematical formulae that give the appear-ance of randomness but at some level actuallyshow complex patterns. True randomnumbers would show no patterns at all nomatter how hard they are sought. For practicalpurposes, the problem of pseudo-randomnessdoes not undermine the usefulness ofrandom number tables for the purposes ofmost researchers in the social sciences sincethe patterns identified are very broad. Seealso: randomization

random sampling: see simple randomsampling

random sampling with replacement:see sampling with replacement

random selection: see randomization

random variable: one meaning of this termis a variable whose values have not been cho-sen. For example, age is a random variable ifwe simply select people regardless of what

age they are. The opposite of a randomvariable is a fixed variable for which particu-lar values have been chosen. For example, wemay select people of particular ages or wholie within particular age ranges. Anothermeaning of the term is that it is a variablewith a specified probability distribution.

randomization: also known as randomallocation or assignment. It is used in experi-ments as a means of ‘fairly’ allocating partic-ipants to the various conditions of the study.Random allocation methods are those inwhich every participant or case has an equallikelihood of being placed in any of the vari-ous conditions. For example, if there are twodifferent conditions in a study (an experi-mental condition and a control condition),then randomization could simply beachieved by the toss of a coin for each partici-pant. Heads could indicate the experimentalcondition, tails the control condition. Othermethods would include drawing slips from acontainer, the toss of dice, random numbertables and computer-generated random num-ber tables. For example, if there were fourconditions in a study, the different conditionscould be designated A, B, C and D, equalnumbers of identical slips of paper labelledA, B, C or D placed in a container, and theslips drawn out one by one to indicate theparticular condition of the study for a partic-ular participant.

Randomization can also be used to decidethe order of the conditions in which a parti-cular individual will take part in repeated-measures designs in which the individualserves in more than one condition.

Randomization (random allocation) needsto be carefully distinguished from randomselection. Random selection is the process ofselecting samples from a population of poten-tial participants. Random allocation is theprocess of allocating individuals to the condi-tions of an experiment. Few experiments in thesocial sciences employ random selectionwhereas the vast majority use random alloca-tion. Failure to select participants randomlyfrom the population affects the external valid-ity of the research since it becomes impossible

THE SAGE DICTIONARY OF STATISTICS136

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 136

Page 148: Dictionary of statistics

to specify exactly what the population inquestion is and, hence, who the findings of theresearch apply to. The failure to use random-ization (random allocation) affects the internalvalidity of the study – that is, it is impossibleto know with any degree of certainty why thedifferent experimental conditions produceddifferent outcomes (or even why they pro-duced the same outcome). This is because ran-domization (random allocation) results in eachcondition having participants who are similarto each other on all variables other than theexperimental manipulation. Because each par-ticipant is equally likely to be allocated to anyof the conditions, it is not possible to have sys-tematic biases favouring a particular conditionin the long run. In other words, randomizationhelps maximize the probability that theobtained differences between the differentexperimental conditions are due to the factorswhich the researcher systematically varied (i.e.the experimental manipulation).

Randomization can proceed in blocks. Thedifficulty with, for example, random alloca-tion by the toss of a coin is that it is perfectlypossible that very few individuals are allo-cated to the control condition or that there isa long run of individuals assigned to the con-trol condition before any get assigned to theexperimental condition. A block is merely asubset of individuals in the study who aretreated as a unit in order to reduce the possi-ble biases which simple randomization can-not prevent. For example, if one has a studywith an experimental condition (A) and acontrol condition (B), then one could operatewith blocks of two participants. These areallocated at the same time – one at random tothe experimental group and the other conse-quently to the control group. In this way, longruns are impossible as would be unequalnumbers in the experimental and controlgroups. There is a difficulty with thisapproach – that is, it does nothing to preventthe possibility that the first person of the twois more often allocated to the experimentalcondition than the control condition. This ispossible in random sampling in the shortterm. This matters if the list of participants isstructured in some way – for example, if theparticipants consisted of the head of a householdfollowed by their spouse on the list, which

could result in more employed people beingin the experimental condition. Consequently,the researcher might consider using a systemin which runs are eliminated by running par-ticipants in fours with the first two assignedrandomly to AB and the next two to BA.

Failure to understand the importance ofand careful procedures required for random-ization can lead to bad practices. For exam-ple, it is not unknown for researchers tospeak of randomization when they reallymean a fairly haphazard allocation system.A researcher who allocates to the experimentaland control conditions alternate participantsis not randomly allocating. Furthermore, ide-ally the randomization should be conductedby a disinterested party since it is possiblethat the randomization procedure is ignoredor disregarded for seemingly good reasonswhich nevertheless bias the outcome. Forexample, a researcher may be tempted to putan individual in the control condition if theybelieve that the participant may be stressedor distressed by the experimental procedure.

Randomization may be compromised inother ways. In particular, the different condi-tions of an experiment may produce differen-tial rates of refusal or drop-out. A study of theeffects of psychotherapy may have a treatedgroup and a non-treated group. The non-treated group may have more drop-outs sim-ply because members of this group arearrested more by the police and are lost to theresearchers due to imprisonment. This, ofcourse, potentially might effect the outcomeof the research. For this reason, it is importantto keep notes on rates of refusal and attrition(loss from the study).

Finally, some statistical tests are based onrandomized allocations of scores to condi-tions. That is, the statistic is based on the pat-tern of outcomes that emerge in the long runif one takes the actual data but then ran-domly allocates the data back to the variousconditions such as the experimental and con-trol group. In other words, the probability ofthe outcome obtained in the research isassessed against the distribution of outcomeswhich would apply if the distribution ofscores were random. Terms like resampling,bootstrapping and permutation methodsdescribe different means of achieving statistical

RANDOMIZATION 137

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 137

Page 149: Dictionary of statistics

THE SAGE DICTIONARY OF STATISTICS138

tests based on randomization. See also:control group

randomized block design: see matching

range: the size of the difference between thelargest and the smallest score in a set ofscores. Thus, if the highest value obtained is18 and the lowest value is 11 then the range is18 � 11 � 7. There is also the interquartilerange which is the range of the middle 50% ofscores.

One common misunderstanding is to pre-sent the range as the largest value to thesmallest value. This is not the range – thedifference numerically between the extremevalues constitutes the range. See also: disper-sion, measure of

rank measurement: see measurement

rank order: see ordinal level ofmeasurement

ranking tests: a term often used to denotenon-parametric statistical tests in general butmore appropriate for those techniques in whichscores or differences are converted into ranksprior to the application of the statistical formu-lae. It should really refer to tests where rankingis carried out. For example, there are versions ofthe t test which utilize ranks whereas the t testis usually regarded as operating directly withscores. See distribution-free tests

ratio: a measure which shows the relativesize of two numbers. It can be expressed astwo numbers separated by a colon, as one

number divided by another or as the result ofthat division. For example, the ratio of 12 to 8can be expressed as 12:8, 12/8 or 1.5. See also:score

ratio level of measurement, scale orvariable: a measure in which the adjacentintervals between the points of the scale areof equal extent and where the measure has anabsolute zero point. For example, the mea-sure or scale of age in years has adjacentintervals of equal extent. The extent of theinterval between age 7 and 8 is the same asthat between age 8 and 9, namely 1 year. Italso has an absolute zero point in that thelowest possible age is 0. It is called a ratiomeasure because ages can be expressed asratios. Age 10 is twice as old as age 5(10:5 � 2:1). It is frequently suggested thatratio and interval levels of measurement,unlike ordinal ones, should be analysed withparametric statistics. See also: measurement

reciprocal relationship: see bi-directionalrelationship

recursive relationship: see uni-directionalrelationship

refusal rates: participation in research isessentially a voluntary action for virtuallyall studies actively involving people.Participants may refuse to take part in theresearch study at all or refuse to take part inaspects of the research (e.g. they may behappy to answer any question but the one todo with their age). Researchers should keep arecord of the persons approached and asmuch information as possible about theircharacteristics. Rates of refusal should bepresented and any comparisons possiblebetween participants and refusers identified.Some forms of research have notoriously

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 138

Page 150: Dictionary of statistics

high refusal rates such as telephone surveysand questionnaires sent out by post.Response rates as low as 15% or 20% wouldnot be unusual.

Refusal rates are a problem because of the(unknown) likelihood that they are differentfor different groupings of the sample.Research is particularly vulnerable to refusalrates if its purpose is to obtain estimates ofpopulation parameters from a sample. Forexample, if intentions to vote are beingassessed then refusals may bias the estimate inone direction or another. High refusal rates areless problematic when one is not trying toobtain precise population estimates. Experi-ments, for example, may be less affected thansome other forms of research such as surveys.

Refusal rates are an issue in interpretingthe outcomes of any research and should bepresented when reporting findings. See also:attrition; missing values

regression: see heteroscedasticity; inter-cept; multiple regression; regression equa-tion; simple or bivariate regression standarderror of the regression coefficient; stepwiseentry; variance of estimate

regression coefficient: see standardizedor unstandardized partial regression coeffi-cient or weight

regression equation: expresses the linearrelationship between a criterion or depen-dent variable (symbolized as Y) and one ormore predictor or independent variables(symbolized as X if there is one predictor andas X1, X2 to Xk where there is more than onepredictor).

Y can express either the actual value of thecriterion or its predicted value. If Y is the pre-dicted value, the regression equation takesthe following form:

Y � a � �1X1 � �2X2 � . . . � �kXk

If Y is the actual value, the regression equationincludes an error term (or residual) which issymbolized as e. This is the differencebetween the predicted and actual value:

Y � a � �1X1 � �2X2 � . . . � �kXk � e

X represents the actual values of a case. If wesubstitute these values in the regressionequation together with the other values, wecan work out the value of Y. ß is the regres-sion coefficient if there is one predictor andthe partial regression coefficient if there ismore than one predictor. It can representeither the standardized or the unstandard-ized regression coefficient. The regressioncoefficient is the amount of change that takesplace in Y for a specified amount of change inX. To calculate this change for a particularcase we multiply the actual value of that pre-dictor for that case by the regression coeffi-cient for that predictor. a is the intercept,which is the value of Y when the value of thepredictor or predictors is zero.

regression line: a straight line drawnthrough the points on a scatter diagram of thevalues of the criterion and predictor variablesso that it best describes the linear relationshipbetween these variables. If these values arestandardized scores the regression line is thesame as the correlation line. This line is some-times called the line of best fit in that thisstraight line comes closest to all the points inthe scattergram.

This line can be drawn from the values ofthe regression equation which in its simplestform is

predicted value � intercept � (regressioncoefficient � predictor value)

The vertical axis of the scatter diagram is usedto represent the values of the criterion and thehorizontal axis the values of the predictor. Theintercept is the point on the vertical axiswhich is the predicted value of the criterionwhen the value of the predictor is 0. This pre-dicted value will be 0 when the standardizedregression coefficient is used. To draw a

REGRESSION LINE 139

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 139

Page 151: Dictionary of statistics

THE SAGE DICTIONARY OF STATISTICS140

straight line we need only two points. Theintercept provides one point. The other pointis provided by working out the predictedvalue for one of the other values of thepredictor.

related designs: also known as related sam-ples designs, correlated designs and corre-lated samples designs. This is a crucialconcept in planning research and its associ-ated statistical analysis. A related design isone in which there is a relationship betweentwo variables. The commonest form of thisdesign is where two measures are taken on asingle sample at two different points in time.This is essentially a test–retest design inwhich changes over time are being assessed.There are obvious advantages to this designin that it involves fewer participants.However, its big advantage is that it poten-tially helps the researcher to gain some con-trol over measurement error. All measures canbe regarded as consisting of a componentwhich reflects the variable in question andanother component which is measurementerror. For example, when asked a simplequestion like age, there will be some error inthe replies as some people will round up,some round down, some will have forgottentheir precise age, some might tell a lie, and soforth. In other words, their answers will notbe always their true age because of the diffi-culties of measurement error.

The common circumstances in whichrelated designs are used are:

• ones in which individuals serve as theirown controls;

• where change over time in a sample isassessed;

• where twins are used as members of pairsone of which serves in the experimentalcondition and the other serves in the con-trol condition;

• where participants are matched accordingto characteristics which might be relatedto the dependent variable.

See also: counterbalanced designs; matching

related t test: see analysis of variance; ttest for related samples

relationship: see causal relationship;correlation; curvilinear relationship; uni-directional relationship

reliability: a rather abstract concept since it isdependent on the concept of ‘true’ score. It isthe ratio of the variation of the ‘true’ scoreswhen measuring a variable to the total varia-tion on that measure. In statistical theory, the‘true’ score cannot be assessed with absoluteprecision because of factors, known as ‘error’,which give the score as measured. In otherwords, a score is made up of a true score plusan error score. Error is the result of any num-ber of factors which can affect a score – chancefluctuations due to factors such as time of day,mood, ambient noise, and so forth. The influ-ence of error factors is variable from time ofmeasurement to time of measurement. Theyare unpredictable and inconsistent. As a con-sequence, the correlation between two mea-sures will be less than perfect.

More concretely, reliability is assessed in anumber of different ways such as test–retestreliability, split–half reliability, alpha reliabilityand interjudge (coder, rater) reliability. Sincethese are distinctive and different techniquesfor assessing reliability, they should not beconfused with the concept itself. In otherwords, they will provide different estimatesof the reliability of a measure.

Reliability is not really an invariant charac-teristic of a measure since it is dependent onthe sample in question and the circumstancesof the measurement as well as the particularmeans of assessing reliability employed.

Not all good measures need to be reliablein every sense of the term. So variables whichare inherently stable and change from occa-sion to occasion do not need good test–retestreliability to be good measures. Mood is agood example of such an unstable variable.See also: attenuation, correcting correla-tions for; Spearman–Brown formula

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 140

Page 152: Dictionary of statistics

repeated-measures analysis of varianceor ANOVA: an analysis of variance which isbased on the same or similar (matched) casesand where cases are tested more than once onone or more factors. For example, each caseor a matched case may take part in each ofthree different conditions.

repeated-measures design: see analysisof variance; carryover or asymmetricaltransfer effect; randomization; within-groups variance; within-subjects design

representative sample: a sample which isrepresentative of the population from whichit is drawn. For example, if 55% of the popu-lation is female, then a similar percentage ofthe sample should be female. Some form ofrandom sampling should be used to ensurethat the cases are representative of those inthe population.

resampling techniques: involve the use ofthe scores obtained in the study to produce a‘sampling’ distribution of the possible out-comes of the study based on that data. Takethe simple example of a study comparing twogroups (A and B) on the variable C. The datamight look as in Table R.1.

If the null hypothesis were true (thatthere is no difference between group A andgroup B), then the distribution of scoresbetween group Aand group B is just haphazard.If this is so, then what we can do is to collecttogether the scores for group A and group Band then randomly allocate each score toeither group A or group B. This will producetwo samples which can be compared.Repeating the process will produce increas-ing numbers of pairs of samples each ofwhich are easily compared. So, based on thedata above, it is possible to produce a sam-pling difference between all of the possiblecombinations of scores in the table. This, ineffect, is like any other sampling distribution

except that it is rigidly limited by the scoresin the data. They are the same scores butresampling assigns them to group A andgroup B differently.

There are a number of statistical techniqueswhich employ such procedures. Their majordifficulty is that they require computer soft-ware capable of generating the random samplingdistribution. See also: bootstrapping

residual: the difference between the actualvalue of a criterion or dependent variableand its predicted value. Larger differences orresiduals imply that the predictions are lessaccurate. The concept is commonly used toindicate the disparity between the data andthe statistical model for that data. It has thebig advantage of simplicity. Careful scrutinyof the residuals will help a researcher identifywhere the model or statistic is especially poorat predicting the data.

The variance of the residual (scores) isquite simply the residual variance.

response rate: the proportion or percent-age of cases who take part in a study or indifferent stages of a study. People or organi-zations may not be contactable or may notagree to participate in a study when approachedto do so. If the sample is supposed to be arepresentative one or if it is to be tested onmore than one occasion, a low response ratewill increase the chances of the sample beingless representative.

RESPONSE RATE 141

Table R.1 Possible data for resampling

Group A Group B

4 62 95 73 62 31 84 93 62 74

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 141

Page 153: Dictionary of statistics

response sets: see acquiescence oryea-saying response set or style

robust: a robust test means that a significantrelationship will be found to be statistically sig-nificant when there is such a relationship evenwhen the assumptions underlying the testabout the distribution of the data are not met.

rotation of factors: see exploratory factoranalysis; simple solution; varimax rotation,in factor analysis

rounding decimal places: the process bywhich long strings of decimals are shortened.To avoid systematic biases in reporting find-ings (albeit very small ones in most terms),there are rules in terms of how decimals areshortened. These involve checking the deci-mal place after the final decimal place to bereported. If that ‘extra’ decimal place has anumerical value between 0 and 4 then wesimply report the shortened number by delet-ing the extra decimals. Thus

17.632 � 17.63 (to two decimal places)103.790 � 103.79 (to two decimal places)12.5731 � 12.573 (to three decimal places)

However, if the ‘additional’ decimal place isbetween 5 and 9 then the last figure reportedin the decimal is increased by 1. For example,

17.639 is reported as 17.64 (to two decimalplaces)103.797 is reported as 103.80 (to twodecimal places)12.5738 is reported as 12.574 (to threedecimal places)

In general it is bad practice not to adopt thisscheme. While much of the time it will notmatter much, failure to do so leaves theresearcher open to criticism – for example,when reporting exact significance levels atthe margins of being statistical significance.

rounding errors: many fractions ofnumbers cannot be handled with absoluteprecision in decimal form. No matter thenumber of decimal places, the figure will beslightly imprecise. Rounding errors occurwhen a calculation produces a slightly inac-curate value simply because of the use of toofew decimal places, though some calculationsare especially prone to such problems.Rounding errors can be seen, for example, whencalculating percentages of cases. By roundingeach percentage properly using the appropri-ate rules, sometimes the sum of the percent-ages will differ from 100%. Some researcherswill draw attention to the fact that their fre-quencies do not sum to 100% because ofrounding errors.

Rounding errors occasionally occur incomputer statistical packages giving whatmight be regarded as a slightly incorrect out-come. Mostly users will be unaware of theseand rarely is it of material importance.

Roy’s gcr or greatest characteristic rootcriterion: a test used in multivariate statisti-cal procedures such as canonical correlation,discriminant function analysis and multivari-ate analysis of variance to determine whetherthe means of the groups differ on a discrimi-nant function or characteristic root. As thename implies, it only measures differences onthe greatest or the first canonical root or dis-criminant function. See also: Wilks’s lambda

Tabachnick and Fidell (2001)

r to z transformation: see Fisher’s ztransformation

run: an uninterrupted series of one or moreidentical events in a sequence of events whichis followed and preceded by different eventsor no events. For example, the followingsequence of heads (H) and tails (T), HHTHH,contains a run of two heads, followed by arun of one tail and a run of two heads. The

THE SAGE DICTIONARY OF STATISTICS142

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 142

Page 154: Dictionary of statistics

RYAN 143

total number of runs in the sequence ofevents indicates whether the sequence ofevents is likely to be random. A run wouldindicate non-randomness. For example, thefollowing sequence of heads and tails, HHH-HTTTT, consists of two runs and suggeststhat each event is grouped together and notrandom.

Ryan or Ryan–Einot–Gabriel–Welsch F(REGWF) multiple comparison test: apost hoc or multiple comparison test which isused to determine whether three or moremeans differ significantly in an analysis ofvariance. It may be used regardless of whetherthe analysis of variance is significant. Itassumes equal variance and is approximate

for unequal group sizes. It is a stepwise orsequential test which is a modification of theNewman–Keuls method, procedure or test. Itis based on the F distribution rather than theQ or studentized range distribution. This testis generally less powerful than the Newman–Keuls method in that differences are less likelyto be statistically significant.

Toothaker (1991)

Ryan or Ryan–Einot–Gabriel–Welsch Q(REGWQ) multiple comparison test:This is like the Ryan or Ryan–Einot–Gabriel–Welsch F (REGWF) multiple comparison testexcept that it is based on the Q rather than theF distribution.

Toothaker (1991)

Cramer Chapter-R.qxd 4/22/04 5:15 PM Page 143

Page 155: Dictionary of statistics

sample: a set of cases drawn or selectedfrom a larger set or population of cases, usu-ally with the aim of estimating characteristicsof the larger set or population. For example,we may be interested in finding out what therelationship is between childhood sexualabuse and subsequent psychiatric disorder.Because it would not be possible or practicalto investigate this relationship in the popula-tion, we take a sample of the population andstudy this relationship in the sample. Fromthis sample we can determine to what extentthis relationship is likely to be found in thepopulation from which the sample wasdrawn. There are various ways of sampling apopulation. See also: convenience sample

sample size: the number of cases or indi-viduals in the sample studied. Usually repre-sented by the symbol N. Sometimes becauseof missing data the sample size for parts ofthe statistical analysis is reduced. In com-puter analysis, care is needed to check onfinal sample sizes because of this. In inferen-tial statistics, sample size is important as thetest distribution often varies with sample size(or the degrees of freedom which are some-times based on an adjustment to sample size).

sample size (minimum): the minimumsize of sample needed to run a study is acomplex matter to assess. Nevertheless it is a

frequently asked question. It depends on arange of factors. The most important is theresearcher’s expectations of the likely size ofthe trend in the data. The greater the trend thesmaller the appropriate sample size. Experi-enced researchers in a particular field arelikely to be able to estimate an appropriatesample size based on experience and conven-tion in the particular field of study. In termsof estimating the minimum sample size touse, the most practical advice is to use a sam-ple size which has proven effective in otherresearch using similar measures, samples andmethods to the proposed study.

Other researchers may have an idea of theminimum effect or relationship that would beof practical or theoretical interest to them. Ifone requires a fairly large effect to make theoutcome of the research of value, one coulduse a commensurately smaller sample size.

Small sample sizes with a large trend in thedata are adequate to establish statistical sig-nificance. Unfortunately, small sample sizeslack intuitive credibility in the minds of otherresearchers, the public and research usersand are best avoided unless the scarcity ofappropriate participants for the study inquestion makes it impracticable to obtain alarger sample. Sample sizes should neverbe so small that statistical significance isnot properly testable. These minimal samplesizes can be estimated by examining tablesof significance for particular statisticaltechniques.

Very large samples, apart from the obvioustime and cost disadvantages, have an obvi-ous drawback. That is, very minimal trends

S

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 144

Page 156: Dictionary of statistics

in the data may prove to be statisticallysignificant with very large samples. Forexample, with a sample size of 1000 aPearson correlation coefficient of 0.062 is sig-nificant at the 5% level of significance. It onlyexplains (accounts for) four-thousandths of thevariation. Such a small correlation is virtuallynegligible or negligible for most purposes.The equally difficult issue of what size of cor-relation is worthy of the researcher’s attentionbecomes more central in these circumstances.

The first concern of the researcher shouldbe to maximize the effectiveness of the mea-sures and procedures used. Poor measures(ones which are internally unreliable) of avariable will need larger sample sizes to pro-duce statistically significant findings, allother things being equal. The maximumvalue of correlations, for example, betweentwo unreliable measures (as all measures areto a degree in practice) will be limited by thereliabilities of the two measures involved.This can be corrected for if a measure of thereliability of each measure is available. Carein the wording and piloting of questionnairesis appropriate in an attempt to improve relia-bility of the measures. Poor, unstandardized,procedures for the conduct of the study willincrease the uncontrolled variability ofresponses in the study. Hence, in some fieldsof research rigorous and meticulous proce-dures are adopted for running experimentsand other types of study.

The following are recommended in theabsence of previous research. Carry out smallpilot study using the procedures and materi-als to be implemented in the full study.Assess, for the size of effect found, what theminimum sample size is that would berequired for significance. For example, for agiven t value or F value, look up in tableswhat N or df would be required for signifi-cance and plan the final study size based onthese. While one could extend the samplesize if findings are near significance, often theanalysis of findings proceeds after the studyis terminated and restarting is difficult.Furthermore, it is essentially a non-randomprocess to collect more data in the hope thateventually statistical significance is obtained.

In summary, the question of appropriatesample size is a difficult one for a number of

reasons which are not conventionallydiscussed in statistics textbooks. Theseinclude:

1 the purpose of the research2 the costs of poor statistical decisions3 the next stage in the research process – is

this a pilot study, for example?

sample standard deviation: see estimatedstandard deviation

sample variance: see variance estimate

sampling: see cluster sample; conveniencesample or sampling; multistage sampling;probability sampling; quota sampling; rep-resentative sample; simple random sam-pling; snowball sampling; stratified randomsample; systematic sample

sampling distribution: the characteristicsof the distribution of the means (etc.,) ofnumerous samples drawn at random from apopulation. A sampling distribution isobtained by taking repeated random samplesof a particular size from a population. Somecharacteristic of the sample (usually its meanbut any other characteristic could be chosen)can then be studied. The distribution of themeans of samples, for example, could then beplotted on, say, a histogram. This distributionin the histogram illustrates the samplingdistribution of the mean. Different samplesizes will produce different distributions (seestandard error of the mean). Different popu-lation distributions will also produce differ-ent sampling distributions. The concept ofsampling distribution is largely of theoreticalimportance in terms of the needs of practi-tioners in the social sciences. It is most usuallyassociated with testing the null hypothesis.

SAMPLING DISTRIBUTION 145

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 145

Page 157: Dictionary of statistics

sampling error: the variability of samplesfrom the characteristics of the populationfrom which they came. For example, themeans of samples will tend to vary from themean of the population from which they aretaken. This variability from the populationmean is known as the sampling error. Itmight be better termed sampling variabilityor sampling uncertainty since it is not amistake in the usual sense but a feature ofrandom sampling from a population.

sampling frame: in surveys, the samplingframe is the list of cases from which the sam-ple is selected. Easily obtained samplingframes would include telephone directoriesand lists of electors. These have obviousproblems in terms of non-representativeness –for example, telephone directories only listpeople with telephones who are responsiblefor paying the bill. It is extremely expensiveto draw up a sampling frame where none isavailable – hence the willingness ofresearchers to use less than optimum sources.See quota sampling; sampling with replace-ment; simple random sampling; stratifiedrandom sample

sampling with replacement: means thatonce something has been selected randomly,it is replaced into the population and, conse-quently, may be selected at random again.Curiously, although virtually all statistical calcul-ations are based on sampling with replace-ment, the practice is not to replace into thepopulation. The reasons for this are obvious.There is little point in giving a person aquestionnaire to fill in twice or three timessimply because they have been selected twoor three times at random. Hence, we do notreplace.

While this is obviously theoretically unsat-isfactory, in practice it does not matter toomuch since with fairly large sampling framesthe chances of being selected twice or moretimes are too small to make any noticeabledifference to the outcome of a study.

SAS: an abbreviation for Statistical AnalysisSystem. It is one of several widely usedstatistical packages for manipulating andanalysing data. Information about SAS can befound at the following website:http://www.sas.com/products/index.html

scale: generally a measure of a variablewhich consists of one or more items wherethe score for that scale reflects increasingquantities of that variable. For example, ananxiety scale may consist of a number ofquestions or statements which are designedto indicate how anxious the respondent is.Higher scores on this scale may indicatehigher levels of anxiety than lower scores.

scatter diagram, scattergram or scat-terplot: a graph or diagram which plots theposition of cases in terms of their values ontwo quantitative variables and which showsthe way the two variables are related to oneanother as shown in Figure S.1. The verticalor Y axis represents the values of one variablewhich in a regression analysis is the criterionor dependent variable. The horizontal or Xaxis represents the values of the other vari-able which in a regression analysis is the pre-dictor or independent variable. Each point onthe diagram indicates the two values on thosevariables that are shown by one or morecases. For example, the point towards the bot-tom left-hand corner in Figure S.1 representsan X value of 1 and a Y value of 2.

The pattern of the scatter of the points indi-cates the strength and the direction of therelationship between the two variables. Valueson the two axes are normally arranged so thatvalues increase upwards on the vertical scaleand rightwards on the horizontal scale asshown in the figure. The closer the points areto a line that can be drawn through them, thestronger the relationship is. In the case of alinear relationship between the two variables,the line is a straight one which is called aregression or correlation line depending onwhat the statistic is. In the case of a curvilinear

THE SAGE DICTIONARY OF STATISTICS146

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 146

Page 158: Dictionary of statistics

relationship the line is a curved one. If theredoes not appear to be a clear linear or curvi-linear relationship between the two variables,there is no relationship between the vari-ables. A straight line sloping from lower leftto upper right, as illustrated in the figure,indicates a positive or direct linear relation-ship between the two variables. A straightline running from upper left to lower rightindicates a negative or inverse linear relation-ship between the two variables. See also:correlation; perfect correlation

Scheffé test: a post hoc or multiple comparisontest which is used to determine whether three ormore means differ significantly in an analysis ofvariance. Equal variances are assumed but it issuitable for unequal group sizes. Generally, it isregarded as one of the most conservative posthoc tests. The test can compare any contrastbetween means and not just comparisonsbetween pairs of means. It is based on the F sta-tistic which is weighted according to the num-ber of groups minus one. For example, the 0.05critical value of F is about 4.07 for a one-wayanalysis of variance comprising four groupswith three cases in each group. For the Scheffétest this critical value is increased to 12.21[4.07 � (4 − 1) � 12.21]. In other words, a com-parison has to have an F value of 12.21 or greaterto be statistically significant at the 0.05 level.

The following formula is used to calculatethe F value for the comparison:

The numerator of the formula represents thecomparison. If we take the case where thereare four groups and we wanted to comparethe mean of groups 1 and 2, the weights forthis comparison could be

1 �1 0 0

Suppose the means for the four groups are 6,14, 12 and 2 respectively as shown in the exam-ple for the entry under the Newman–Keulsmethod. As multiplying a number by 0 gives 0,the comparison reduces to subtracting thegroup 2 mean from the group 1 mean andsquaring the difference. This gives a value of 8.

[(1 � 6) � (�1 � 14) � (0 � 12) � (0 � 2)]2 �[6 � (�14) � 0 �0]2 � (�8)2 � 64

The denominator represents the weightederror mean square. The error mean square forthis example is 9.25. As dividing 0 by a num-ber gives 0, we are weighting the error meansquare by groups 1 and 2.

The F value for this comparison is thesquared difference of 64 divided by theweighted error mean square of 6.20, whichgives 10.32 (64/6.20 � 10.32). As this F valueis smaller than the critical value of 12.21,these two means do not differ significantly.See also: analysis of variance

Kirk (1995)

score: a numerical value which indicates thequantity or relative quantity of a variable/

SCORE 147

X

109876543210

Y

10

9

8

7

6

5

4

3

2

1

0

Figure S.1 An example of a scatter diagram

F �

[(group 1 weight � group 1 mean) � (group 2 weight� group 2 mean) � . . . ]2

error group 1 weight2 group 2 weight2

mean � � � . . .square group 1 n group 2 n� �

9.25 � 12

��12

�02

�02

� 9.25 � 1

�1

�0

�0

3 3 3 3 3 3 3 3

� 9.25 �1 � 1 � 0 � 0

3

� 9.25 �2 3

� 9.25 � 0.67 � 6.20

� � �

��

��

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 147

Page 159: Dictionary of statistics

construct. In other words, it is normally thedata collected from an individual case on aparticular variable or measure. Scores includeratio, interval and ordinal measurement. Thealternative to scores is nominal data (or cate-gorical or category or qualitative data).

In statistics, there is a conceptual distinctiondrawn between two components – the ‘true’score and the ‘error’ score. A score � ‘true’score � ‘error’ score. These two componentscan, at best, only be estimated. The true scoreis really what the score would be if it werepossible to eliminate all forms of measurementerror. If one were measuring the weight of anindividual, measurement error would includeinconsistencies because of inconsistencies ofthe scales, operator recording errors, varia-tions in weight according to the time of daythe measure was taken, and other factors.

Error scores and true scores will not corre-late by definition. If they did correlate, thenthe two would be partially indistinguishable.The greater the proportion of true score in ascore the better in terms of the measure’s abil-ity to measure something substantial.

Often the ‘true’ score is estimated by usingthe ‘average’ of several different measures ofa variable just as one might take the averageof several measures of length as the bestmeasure of the length of something. Thus, insome research designs (related designs) themeasurement error is reduced simplybecause the measure has been taken morethan once. The term measurement error is alittle misleading since it really applies toaspects of the measurement which theresearcher does not understand or overwhich the researcher has no control.

It is also important to note that measure-ment error is uncorrelated with the score andthe true score. As such, consistent biases inmeasurement (e.g. a tape measure which con-sistently overestimates) is not reflecting mea-surement error in the statistical sense. Theseerrors would correlate highly with the correctmeasures as measured by an accurate rule. Seealso: categorical (category) variable; error

scree test, Cattell’s: one test for determin-ing the number of factors to be retained for

rotation in a factor analysis. The eigenvalueor the amount of variance accounted for byeach factor is plotted against the number ofthe factor on a graph like that shown inFigure S.2. The vertical axis represents theeigenvalue while the horizontal axis showsthe number of the factors in the order they areextracted, which is in terms of decreasing sizeof their eigenvalue. So, the first factor has thegreatest eigenvalue, the second factor thenext largest eigenvalue, and so on. Scree is ageological term to describe the debris thataccumulates at the bottom of a slope and thatobscures it. The factors to be retained arethose represented by the slope while the fac-tors to be discarded are those represented bythe scree. In other words, the factors to bekept are the number of the factor just beforethe scree begins. In Figure S.2 the screeappears to begin at the factor number 3 so thefirst two of the 13 factors should be retained.The point at which the scree begins may bedetermined by drawing a straight linethrough or very close to the points repre-sented by the scree as shown in Figure S.2.The factors above the first line represent thenumber of factors to be retained. See also:exploratory factor analysis

Cattell (1966)

second-order factors: factors obtained infactor analysis from factor analysing theresults of an oblique factor rotation which

THE SAGE DICTIONARY OF STATISTICS148

Factor number

13121110987654321

Eig

enva

lue

6

5

4

3

2

1

0

Figure S.2 An example of a scree test

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 148

Page 160: Dictionary of statistics

allows the factors to correlate. The correlationmatrix of the factors can then be factoranalysed to give new factors which are fac-tors of factors. Hence, they are termedsecond-order factors. Care should be taken torecognize the difference as some researchersand theorists report second-order factors thatseem materially different from the factors ofother researchers. That is, because they are adifferent order of factor analysis. See also:exploratory factor analysis;oblique rotation;orthogonal rotation

second-order interactions: an interactionbetween three independent variables or fac-tors (e.g. A � B � C). Where there are threeindependent variables, there will also bethree first-order interactions which are inter-actions between the independent variablestaken two at a time (A � B, A � C and B � A).Obviously, there cannot be an interaction fora single variable.

semi-interquartile range: see quartiledeviation

semi-partial correlation coefficient:similar to the partial correlation coefficient.However, the effect of the third variable isonly removed from the dependent variable.The independent variable would not beadjusted. Normally, the influence of the thirdvariable is removed from both the indepen-dent and the dependent variable in partialcorrelation. Semi-partial correlation coeffi-cients are rarely presented in reports of statis-tical analyses but they are an importantcomponent of multiple regression analyses.

What is the point of semi-partial correla-tion? Imagine we find that there is a relation-ship between intelligence and income of 0.50.Intelligence is our independent variable andincome our dependent variable for the pre-sent purpose. We then find out that socialclass is correlated with income at a level 0.30.

In other words, some of the variation inincome is due to social class. If we remove thevariation due to social class from the variableincome, we have left income without theinfluence of social class. So the correlation ofintelligence with income adjusted for socialclass is the semi-partial correlation. But thereis also a correlation of social class with ourindependent variable intelligence – say thatthis is the higher the intelligence, the higherthe social class. Partial correlation would takeoff this shared variation between intelligenceand social class from the intelligence scores.By not doing this, semi-partial correlationleaves the variation of intelligence associ-ated with social class still in the intelligencescores. Hence, we end up with a semi-partialcorrelation in which intelligence is exactly thevariable it was when we measured it andunadjusted any way. However, income hasbeen adjusted for social class and is differentfrom the original measure of income.

The computation of the partial correlationcoefficient in part involves adjusting the cor-relation coefficient by taking away the varia-tion due to the correlation of each of thevariables with the control variable. For everyfirst-order partial correlation there are twosemi-partial correlations depending onwhich of the variables x or y is beingregarded as the dependent or criterionvariable.

This is the partial correlation coefficient:

The following are the two semi-partialcorrelation coefficients:

SEMI-PARTIAL CORRELATION COEFFICIENT 149

rxy.c �rxy � (rxc � ryc)

�1 � r2xc �1 � r2

yc

rx(c.y) �rxy � (rxc � ryc)

�1 � r2xc

rx(y.c) �rxy � (rxc � ryc)

�1 � r2yc

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 149

Page 161: Dictionary of statistics

The difference between the formulae for partialand semi-partial correlation is obvious – thebottom of the semi-partial correlation formulais shorter. This is because no adjustment isbeing made for the variance shared betweenthe control and predictor variables in semi-partial correlation. This in effect means that thesemi-partial correlation coefficient is the corre-lation between the predictor and criterion vari-ables with the correlation of each with thecontrol variable removed. The amount of vari-ation in the criterion variable is adjusted for byits correlation with the control variable. Sincethis adjustment is only applied to the criterionvariable, the adjustment is smaller than for thepartial correlation in virtually all cases. Thus,the partial correlation is almost invariably largerthan (or possibly equal to) the semi-partialcorrelation. See also: multiple regression

sequential method in analysis of variance:see Type I, hierarchical or sequential methodin analysis of variance

sequential multiple regression: seemultiple regression

Sidak multiple comparison test: seeDunn–Sidak multiple comparison test

sign test: a very simple test for differences inrelated data. It is called a sign test because thedifference between each matched pair of scoresis converted simply into the sign of the differ-ence (zero differences are ignored). The smallerof the numbers of signs (i.e. the smaller of thenumber of pluses or the number of minuses) isidentified. The probability is then assessed fromtables or calculated using the binomial expan-sion. Statistical packages also do the calculation.

Despite its simplicity, generally the loss ofinformation from the scores (the size of thedifferences being ignored) makes it a poorchoice in anything other than the most excep-tional circumstances.

The calculation of the sign test is usuallypresented in terms of a related design as inTable S.1. In addition, the difference betweenthe two related conditions is presented in thethird column and the sign of the difference isgiven in the final column.

We then count the number of positive signsin the final column (1) and the number ofnegative signs (8). The smaller of these is thenused to check in a table of significance. Fromsuch a table we can see that our value for thesign test is statistically significant at the 5%level with a two-tailed test. That is, there is asignificant difference between conditions Aand B.

The basic rationale of the test is that if thenull hypothesis is true then the differencesbetween the conditions should be positiveand negative in equal numbers. The greaterthe disparity from this equal distribution ofpluses and minuses the less likely is the null

THE SAGE DICTIONARY OF STATISTICS150

Table S.1 llustrating data for the sign test plus calculating thesign of the difference

Condition A Condition B Difference Sign of difference

18 22 −4 −12 15 −3 −15 18 −3 −29 29 0 0 (zero differences are

dropped from analysis)22 24 −2 −11 10 +1 +18 26 −8 −19 27 −8 −21 24 −3 −24 27 −3 −

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 150

Page 162: Dictionary of statistics

hypothesis to be true; hence, the hypothesis ismore supported.

significance level or testing: see significant

significance testing: see probability;significant

significant: implies that it is not plausiblethat the research findings are due to chance.Hence, the null hypothesis (of no correlationor no difference) is rejected in favour of the(alternative) hypothesis. Significance testingis only a small part of assessing the implica-tions of a particular study. Failure to reachsignificance may result in the researcher re-examining the methods employed especiallywhere the sample appears of a sufficientsize. Furthermore, where the hypothesisunder test is deemed important for sometheoretical or applied reason the researchermay be more highly motivated to re-examinethe research methods employed. If signifi-cance is obtained, then the researcher mayfeel in a position to examine the implicationsand alternatives to the hypothesis morefully.

The level at which the null hypothesis isrejected is usually set as 5 or fewer times outof 100. This means that such a difference orrelationship is likely to occur by chance 5 orfewer times out of 100. This level is generallydescribed as the proportion 0.05 and some-times as the percentage 5%. The 0.05 proba-bility level was historically an arbitrarychoice but has been acceptable as a reason-able choice in most circumstances. If there isa reason to vary this level, it is acceptable todo so. So in circumstances where there mightbe very serious adverse consequences if thewrong decision were made about the hypoth-esis, then the significance level could bemade more stringent at, say, 1%. For a pilotstudy involving small numbers, it might bereasonable to set significance at the 10% levelsince we know that whatever tentative

conclusions we draw they will be subjected toa further test.

Traditionally the critical values of a statis-tical test were presented in tables for specificprobability levels such as 0.05, 0.01 and 0.001.Using these tables, it is possible to determinewhether the critical value of the statistical testwas equal to or lower than one of these lessfrequent probability levels and to report thisprobability level if this was the case, forexample as 0.01 rather than 0.05. Nowadays,most statistical analyses are carried out usingstatistical packages which give exact proba-bility levels such as 0.314 or 0.026. It iscommon to convert these exact probabilitylevels into the traditional ones and to reportthe latter. Findings which have a probabil-ity of greater than 0.05 are often simplydescribed as being non-significant, whichis abbreviated as ns. However, it could beargued that it is more informative to presentthe exact significance levels rather than thetraditional cut-off points.

Significance levels should also bedescribed in terms of whether they concernonly one tail of the direction of the results orthe two tails. If there are strong grounds forpredicting the direction of the results, theone-tailed significance level should be used.If the result was not predicted or if there wereno strong grounds for predicting the result,the two-tailed level should be used. The signi-ficance level of 0.05 in a two-tailed level ofsignificance is shared equally between thetwo tails of the distribution of the results. Sothe probability of finding a result in onedirection (say, a positive difference or correla-tion) is 0.025 (which is half the 0.05 level)while the probability of finding a differencein the other direction (say, a negative differ-ence or correlation) is also 0.025. In a one-tailed test, the 0.05 level is confined to thepredicted tail. Consequently, the probabilityof a particular result being significant is halfas likely for a two-tailed than a one-tailed testof significance. See also: confidence interval;hypothesis testing; probability

significantly different: see confidenceinterval; significant

SIGNIFICANTLY DIFFERENT 151

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 151

Page 163: Dictionary of statistics

simple or bivariate regression: describesthe size and direction of a linear associationbetween one quantitative criterion or depen-dent variable and one quantitative predictoror independent variable. The direction of theassociation is indicated by the sign of theregression coefficient in the same way as witha correlation coefficient. The lack of a signdenotes a positive association in whichhigher scores on the predictor are associatedwith higher scores on the criterion. A nega-tive sign shows a negative or inverse associa-tion in which higher scores on the predictorgo with lower scores on the criterion.

The size of the linear association can beexpressed in terms of a standardized or anunstandardized regression coefficient. Thestandard regression coefficient is the same as acorrelation coefficient. It can vary from � 1.00to 1.00. The unstandardized regression coeffi-cient can be bigger than this as it is based onthe original rather than the standardizedscores of the predictor. The unstandardizedregression coefficient is used to predict thescores of the criterion. The bigger the coeffi-cient, the stronger the association betweenthe criterion and the predictor. Strength of theassociation is more difficult to gauge from theunstandardized regression coefficient as itdepends on the scale used by the predictor. Forthe same-sized association, a scale made up ofmore points will produce a bigger unstan-dardized regression coefficient than a scalecomprised of fewer points. These two regres-sion coefficients are signified by the smallGreek letter � or its capital equivalent B (bothcalled beta) although which letter is used torepresent which coefficient is not consistent.

These two coefficients can be calculatedfrom each other if the standard deviation ofthe criterion and the predictor are known. Astandardized regression coefficient can beconverted into its unstandardized coefficientby multiplying the standardized regressioncoefficient by the standard deviation (SD) ofthe criterion and dividing it by the standarddeviation of the predictor:

An unstandardized regression coefficient canbe converted into its standardized coefficientby multiplying the unstandardized regres-sion coefficient by the standard deviation ofthe predictor and dividing it by the standarddeviation of the criterion:

The statistical significance of these two coeffi-cients is the same and can be expressed as a tvalue. One formula for calculating the t valuefor the standardized regression coefficient is

A formula for working out the t value for theunstandardized regression coefficient is

simple random sampling: in randomsampling every case in the population has anequal likelihood of being selected. Suppose,for example, that we want a sample of 100people from a population of 1000 people.First, we draw up a sampling frame in whichevery member of the population is listed andnumbered from 1 to 1000. We then need toselect 100 people from this list at random.Random number tables could be used to dothis. Alternatively, slips for the 1000 differentnumbers could be put into a big hat and 100slips taken out to indicate the final sample.See also: probability sampling; systematicsample

simple solution: a term used in factoranalysis to indicate the end point of rotationof factors. The initial factors in factor analysis

THE SAGE DICTIONARY OF STATISTICS152

unstandardizedregression coefficient

standardized regressioncoefficient

� �criterion SD

predictor SD

standardizedregression coefficient

unstandardized regressioncoefficient

� �predictor SD

criterion SD

t ��standardizedregressioncoefficient

number of cases � 2

1 � squared standardizedregression coefficient

unstandardized regression coefficientt �

standard error of the unstandardizedregression coefficient

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 152

Page 164: Dictionary of statistics

are the product of a purely mathematicalprocess which usually has the maximizationof the amount of variance accounted for byeach of the factors in turn. What this means inpractice is that the factors should have asmany variables as possible with reasonablyhigh factor loadings. Unfortunately, such amathematically based procedure does notalways lead to factors which are readily inter-pretable. A simple solution is a set of criteriaproposed by the psychologist L.L. Thurstoneto facilitate the interpretation of the factors.The factors are rotated to a simple structurewhich has the key feature that the factor load-ings should ideally be either large or small innumerical value and not of a middle value. Inthis way, the researcher has factors which canbe defined more readily by the large loadingsand small loadings without there being a con-fusing array of middle-size loadings whichobscure meaning. See also: exploratoryfactor analysis; oblique rotation; orthogonalrotation

single-sample t test: see t test for onesample

skewed: see skewness

skewness: the degree of asymmetry in a fre-quency distribution. A perfectly symmetricalfrequency distribution has no skewness. Thetail of the distribution may be longer either tothe left or to the right. If the tail is longer tothe left then this distribution is negativelyskewed (Figure S.3): if it is longer to the rightthen it is positively skewed. If the distribu-tion of a variable is strongly skewed, it maybe more appropriate to use a non-parametrictest of statistical significance. A statistical for-mula is available for describing skewnessand it is offered as an option in some statisticalpackages. See also: moment; standard errorof skewness; transformations

slope: the slope of a scattergram or regressionanalysis indicates the ‘angle’ or orientation ofthe best fitting straight line through the set ofpoints. It is not really correct to describe it asan angle since it is merely the number of unitsthat the line rises up the vertical axis of thescattergram for every unit of movementalong the horizontal axis. So if the slope is2.0 this means that for every 1 unit along thehorizontal axis, the slope rises 2 units up thevertical axis (see Figure S.4).

The slope of the regression line only par-tially defines the position of the line. In addi-tion one needs a constant or intercept pointwhich is the position at which the regression

SLOPE 153

Fre

quen

cy

30

20

10

0

Figure S.3 Illustrating negative skewness

Length of dotted line divided bylength of dashed line = slope

Figure S.4 The slope of a scattergram

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 153

Page 165: Dictionary of statistics

line cuts the vertical or y axis. See also: multipleregression; regression line

Somers’ d: a measure of associationbetween two ordinal variables which takesaccount of ties or tied pairs of scores. It pro-vides an asymmetric as well as a symmetricmeasure. The formula for the asymmetricmeasure for the first variable is

where C is the number of concordant pairs, Dthe number of discordant pairs and T1 thenumber of tied pairs for the first variable. Aconcordant pair is one in which the first caseis ranked higher than the second case, a dis-cordant pair is where the first case is rankedlower than the second case and a tied pair iswhere they have the same rank. The formulafor the asymmetric measure for the secondvariable is the same as the first except that Tis the number of tied ranks for the secondvariable. The formula for the symmetricmeasure is

See also: correlationSiegal and Castellan (1988)

small expected frequencies: a term usedin chi-square especially to denote expectedfrequencies of under five. If too many cells(greater than about 20%) have expected fre-quencies of this size, then chi-square ceasesto be an effective test of significance. Seechi-square

snowball sampling: a method of obtaininga sample in which the researcher asks a par-ticipant if they know of other people whomight be willing to take part in the study whoare then approached and asked the same

question. In other words, the sample consistsof people who have been proposed by otherpeople in the sample. This technique is par-ticularly useful when trying to obtain peoplewith a particular characteristic or experiencewhich may be unusual and who are likely toknow one another. For example, we may usethis technique to obtain a sample of divorcedindividuals.

Spearman’s rank order correlation orrho (��): see correlation

Spearman–Brown formula: indicates theincrease in the reliability of a test follow-ing lengthening by adding more items. Theformula is

where rkk is the reliability of the test length-ened by a factor of k, k is the increase in lengthand r11 is the original reliability.

Thus, if the reliability of the original test is0.6 and it is proposed to double the length ofthe test, the reliability is likely to be

Thus, having doubled the length of the test,the reliability increases from 0.6 to 0.75.Whether this increase is worth the costs oflengthening the scale so much depends on arange of factors such as the deterrent effectthis would have on participants.

specific variance: the variance of a variablethat is not shared with other variables anddoes not represent random error. The vari-ance of a variable may be thought of as madeup of three components:

THE SAGE DICTIONARY OF STATISTICS154

C � DC � D � T1

C � D(C � D � T1 � C � D � T2)/2

k(r11)rkk � [1 � (k � 1)r11]

2(0.6)rkk � [1 � (2 � 1)0.6]

1.2 1.2� 1 + 0.6 � 1.6 � 0.75

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 154

Page 166: Dictionary of statistics

1 common variance – that is, variance thatit shares with another identified variable;

2 error variance – that is, variance whichhas not been identified in terms of itsnature or source;

3 specific variance – that is, variance on avariable which only is measured by thatvariable and no other.

These notions are largely conceptual ratherthan practical and obviously what is error vari-ance in one circumstance is common variancein another. Their big advantage is when under-standing the components of a correlationmatrix and the perfect self-correlations of 1.000found in the diagonals of correlation matrices.

split-half reliability:a measure of the internalconsistency of a scale to measure a variable. Itconsists of scoring one half of the items on thescale and then scoring separately the otherhalf of the items. The correlation coefficientbetween the two separate halves is essentiallythe split-half reliability, though typically a sta-tistical adjustment is made for the fact that thescale has been effectively halved. Split-halfreliability is sometimes assessed as odd–evenreliability in which the scores based on theodd-numbered items are correlated with thescores based on the even-numbered items.

Split-half reliability is a measure of the inter-nal consistency of the items of a scale ratherthan a measure of consistency over time(test–retest reliability). See also: alpha reliability,Cronbach’s; internal consistency; reliability

SPSS: an abbreviation for Statistical Packagefor the Social Sciences. It is one of severalwidely used statistical packages for manipu-lating and analysing data. Information aboutSPSS can be found at the following website:http://www.spss.com/

spurious relationship: a relationshipbetween two variables which is no longer

apparent when one or more variables arestatistically controlled and do not appear to beacting as intervening or mediating variables.For example, there may be a positive rela-tionship between the size of the police forceand amount of crime in that there may be abigger police force where there is more crime.This relationship, however, may be dueentirely to size of population in that areaswith more people are likely to have bothmore police and more crime. If this is the casethe relationship between the size of the policeforce and the amount of crime would be aspurious one.

square root: of a number is another numberwhich when squared gives the first number.Thus, the square root number of 9 is 3 since 3multiplied by itself equals 9. The sign � is aninstruction to find the square root of what fol-lows. It is sometimes written as 2� and as theexponent 1/2. Square roots are not easy tocalculate by hand but are a regular feature ofeven the most basic hand-held calculators.

squared or squaring: multiplying a numberby itself. It is normally written as 22 or 32 or 3.22

indicating 2 squared, 3 squared and 3.2squared respectively. This is simply anotherway of writing 2 � 2, 3 � 3 and 3.2 � 3.2.Another way of saying the same thing is to saytwo to the power of two, three to the power oftwo, and three point two to the power of two.It is very commonly used in statistics.

squared Euclidean distance:a widely usedmeasure of proximity in a cluster analysis. It issimply the sum of the squared differencesbetween the scores on two variables for thecases in a sample. See also: cluster analysis

stacked or component bar chart: seecompound bar chart

STACKED 155

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 155

Page 167: Dictionary of statistics

standard deviation (SD): quite a difficultconcept to understand because of its non-intuitive nature. Possibly the easiest whilestill accurate way to regard it is as the averageamount by which scores in a set of scores dif-fer from the mean of that set of scores. Inother words, it is similar to the average devi-ation from the mean. There are a couple ofproblems. The first is that we are speaking ofthe average absolute deviation from the mean(otherwise, the average deviation wouldalways be zero). Second, the way in which theaverage is calculated is not the common wayof calculating the average.

The formula for the calculation of the stan-dard deviation is, of course, the clearest defi-nition of what the concept is. The formula forstandard deviation is

The formula would indicate the mean devia-tion apart from the square sign and thesquare root sign. This is a clue to the fact thatstandard deviation is merely the averagedeviation from the mean but calculated in anunusual way.

There is no obvious sense in which thestandard deviation is standard since its valuedepends on the values of the observations onwhich it is calculated. Consequently, the sizeof standard deviation varies markedly fromsample to sample.

Standard deviation has a variety of uses instatistical analyses. It is important in the cal-culation of standard scores (or z scores). It isalso appropriate as a measure of variationof scores although variance is a closelyrelated and equally acceptable way of doingthe same.

When the standard deviation of a sample isbeing used to estimate the standard deviationof the population, the sum of squared devia-tions is divided by the number of cases minusone: (N � 1). The estimated standard devia-tion rather than the standard deviation isused in tests of statistical inference. See also:

bias; coefficient of variation; dispersion,measure of; estimated standard deviation

standard error of the difference inmeans: an index of the degree to which thedifference between two sample means willdiffer from that between other samples. Thebigger this standard error is, the more likely itis that this difference will vary across sam-ples. The smaller the samples, the bigger isthe standard error likely to be. The standarderror is used to determine whether twomeans are significantly different from oneanother and to provide an estimate of theprobability that the difference will fall withinspecified limits of this difference.

The way that the standard error is calcu-lated depends on whether the scores from thetwo samples are related or unrelated and, ifthey are unrelated, whether the variances ofthe scores for the two samples differ fromeach other. Unrelated samples consist of dif-ferent cases or participants while related sam-ples consist of the same cases on twooccasions or similar or matched cases such aswife and husband.

One formula for the standard error forrelated means is

The deviation or difference between pairs ofrelated scores is subtracted from the meandifference for all pairs of scores. These differ-ences are squared and added together to formthe deviation variance. The deviation vari-ance is divided by the number of scores. Thesquare root of this result is the standard error.

A formula for the standard error forunrelated means with different or unequalvariances is:

The formula for the standard error for unre-lated means with similar or equal variances is

THE SAGE DICTIONARY OF STATISTICS156

standard deviation � ��(score � mean)2

sample size

� ���X � X��N

�sum of (mean deviation � deviation)2

number of scores

�variance of one sample �

variance of other sample

size of one sample size of other sample

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 156

Page 168: Dictionary of statistics

more complicated. Consequently, the termsin the formula are abbreviated so that the sizeof the sample is indicated by n and the twosamples are referred to by the two subscripts1 and 2:

To determine whether the two means differsignificantly, a t test is carried out in which thetwo means are subtracted from each otherand divided by the appropriate standarderror. The statistical significance of this t valueis looked up against the appropriate degreesof freedom. For the related t test this is thenumber of cases minus one. For the unrelatedt test with unequal variances it is the numberof cases in both samples minus two. For theunrelated t test the formula is more compli-cated and can be found in the reference below.

The calculation of the estimated probabil-ity of the confidence interval for this differ-ence is the same as that described in thestandard error of the mean. See also: samplingdistribution

Cramer (1998)

standard error of the estimate: a mea-sure of the extent to which the estimate in asimple or multiple regression varies fromsample to sample. The bigger the standarderror, the more the estimate varies from onesample to another. The standard error ofthe estimate is a measure of how accurate theobserved score of the criterion is from thescore predicted by one or more predictorvariables. The bigger the difference betweenthe observed and the predicted scores, thebigger the standard error is and the less accu-rate the estimate is.

The standard error of the estimate is thesquare root of the variance of estimate. Oneformula for the standard error is as follows:

standard error of kurtosis: an index ofthe extent to which the kurtosis of a distri-bution of scores varies from sample to sam-ple. The bigger it is, the more likely it is todiffer from one sample to another. It is likelyto be bigger for smaller samples. The standarderror is used to determine whether thekurtosis of a distribution of scores differssignificantly from that of a normal curve ordistribution.

One formula for this standard error is asfollows, where N represents the number ofcases in the sample:

To ascertain whether the kurtosis of a distrib-ution of scores differs significantly from thatfor a normal curve, divide the measure ofkurtosis by its standard error. This value is az score, the significance of which can belooked up in a table of the standard normaldistribution. A z value of 1.96 or more isstatistically significant at the 95% or 0.05 two-tailed level.

Cramer (1998)

standard error of the mean: a measure ofthe extent to which the mean of a populationis likely to differ from sample to sample. Thebigger the standard error of the mean, themore likely the mean is to vary from one sam-ple to another. The standard error of themean takes account of the size of the sampleas the smaller the sample, the more likelyit is that the sample mean will differ fromthe population mean. One formula for thestandard error is

The standard error is the standard deviationof the variance of sample means of a particu-lar size. As we do not know what is the vari-ance of the means of samples of that size, weassume that it is the same as the variance of

[variance1 � (n1 � 1)] � [variance2 � (n2 � 1)]

n1 � n2 � 2 n1 n2� �

STANDARD ERROR OF THE MEAN 157

��11

sum of squared differences between thecriterion’s predicted and observed scores

number of cases � number of predictors � 1�

�variance of skewness � 4 � (N2 � 1)(N � 3) � (N � 5)

variance of sample means�number of cases in the sample

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 157

Page 169: Dictionary of statistics

that sample. In other words, the standarderror is the standard deviation of the varianceof a sample taking into account the size ofthat sample. Consequently, one formula forthe standard error is

The standard error of the mean is used todetermine whether the sample mean differssignificantly from the population mean. Thesample mean is subtracted from the popula-tion mean and this difference is divided bythe standard error of the mean. This test isknown as a one-sample t test.

The standard error is also used to provide anestimate of the probability of the differencebetween the population and the samplemean falling within a specified interval ofvalues called the confidence interval ofthe difference. Suppose, for example, thatthe difference between the population and thesample mean is 1.20, the standard error of themean is 0.20 and the size of the sample is 50.To work out the 95% probability that thisdifference will fall within certain limits of adifference of 1.20, we look up the criticalvalue of the 95% or 0.05 two-tailed level of tfor 48 degrees of freedom (50 � 2 � 48),which is 2.011. We multiply this t value bythe standard error of the mean to give theinterval that this difference can fall on eitherside of the difference. This interval is 0.4022(2.011 � 0.20 � 0.4022). To find the lowerlimit or boundary of the 95% confidenceinterval of the difference we subtract 0.4022from 1.20 which gives about 0.80(1.20 � 0.4022 � 0.7978). To work out theupper limit we add 0.4022 to 1.20 whichgives about 1.60 (1.20 � 0.4022 � 1.6022). Inother words, there is a 95% probability thatthe difference will fall between 0.80 and 1.60.If the standard error was bigger than 0.20,the confidence interval would be bigger.

standard error of the regressioncoefficient: a measure of the extent to whichthe unstandardized regression coefficient insimple and multiple regression is likely tovary from one sample to another. The largerthe standard error, the more likely it is to dif-fer from sample to sample.

The standard error can be used to give anestimate of a particular probability of theregression coefficient falling within certainlimits of the coefficient. Suppose, for exam-ple, that the unstandardized regression coeffi-cient is 0.50, its standard error is 0.10 and thesize of the sample is 100. We can calculate,say, the 95% probability of that coefficientvarying within certain limits of 0.50. To dothis we look up the two-tailed 5% or 0.05probability level of the t value for the appro-priate degrees of freedom. These are the num-ber of cases minus the number of predictorsminus one. So if there is one predictor thedegrees of freedom are 98 (100 � 1 � 1 � 98).The 95% two-tailed t value for 98 degrees offreedom is about 1.984. We multiply this tvalue by the standard error to give the inter-val that the coefficient can fall on one side ofthe regression coefficient. This interval is0.1984 (1.984 � 0.10 � 0.1984). To find thelower limit or boundary of the 95% confi-dence interval for the regression coefficientwe subtract 0.1984 from 0.50 which givesabout 0.30 (0.50 � 0.1984 � 0.3016). To workout the upper limit we add 0.1984 to 0.50which gives about 0.70 (0.50 � 0.1984 �0.6984). For this example, there is a 95% prob-ability that the unstandardized regressioncoefficient will fall between 0.30 and 0.70. Ifthe standard error was bigger than 0.10, theconfidence interval would be bigger. Forexample, a standard error of 0.20 for the sameexample would give an interval of 0.3968(1.984 � 0.20 � 0.3968), resulting in a lowerlimit of about 0.10 (0.50 � 0.3968 � 0.1032)and an upper limit of about 0.90 (0.50 �0.3968 � 0.8968).

The standard error is also used to deter-mine the statistical significance of theregression coefficient. The t test for theregression coefficient is the unstandard-ized regression coefficient divided by itsstandard error:

THE SAGE DICTIONARY OF STATISTICS158

variance of sample�number of cases in the sample

t �population mean � sample mean

standard error of the mean

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 158

Page 170: Dictionary of statistics

For this example with a standard error of0.10, the t value is 10.00 (0.50/0.10 � 10.00).As this value of 10.00 is greater than the two-tailed 0.05 critical value of t, which is about1.984, we would conclude that the regressioncoefficient is statistically significant at lessthan the 0.05 two-tailed level.

One formula for the standard error is asfollows:

where R2 is the squared multiple correlationbetween the predictor and the other predictors.

Pedhazur (1982)

standard error of skewness: a measure ofthe extent to which the skewness of a distrib-ution of scores is likely to vary from one sam-ple to another. The bigger the standard error,the more likely it is that it will differ from onesample to another. The smaller the sample,the bigger the standard error is likely to be.It is used to determine whether the skewnessof a distribution of scores is significantlydifferent from that of a normal curve ordistribution.

One formula for the standard error ofskewness is

where N represents the number of cases inthe sample.

To determine whether skewness differssignificantly from the normal distribution,divide the measure of skewness by its stan-dard error. This value is a z score, the signifi-cance of which can be looked up in a table ofthe standard normal distribution. A z value of

1.96 or more is statistically significant at the95% or 0.05 two-tailed level.

Cramer (1998)

standard normal or z distribution: anormal distribution with a mean of 0 and astandard deviation of 1. It is a normal distrib-ution of standard scores. See also: standardizedscore; z score

standardized partial regression coeffi-cient or weight: the statistic in a multipleregression which describes the strength andthe direction of the linear associationbetween a predictor and a criterion. It pro-vides a measure of the unique associationbetween that predictor and criterion, control-ling for or partialling out any associationbetween that predictor, the other predictorsin that step of the multiple regression andthe criterion. It is standardized so that its val-ues vary from –1.00 to 1.00. A higher value,ignoring its sign, means that the predictorhas a stronger association with the criterionand so has a greater weight in predicting it.Because all the predictors have been stan-dardized in this way, it is possible to com-pare the relative strengths of their associationor the weights with the criterion. The direc-tion of the association is indicated by thesign of the coefficient in the same way as it iswith a correlation coefficient. No sign meansthat the association is positive with highscores on the predictor being associated withhigh scores on the criterion. A negative signindicates that the association is negative withhigh scores on the predictor being associatedwith low scores on the criterion. A coefficientof 0.50 means that for every standard devia-tion increase in the value of the predictorthere is a standard deviation increase of 0.50in the criterion.

The unstandardized partial regressioncoefficient is expressed in the unstandardizedscores of the original variables and can begreater than ± 1.00. A predictor which is

STANDARDIZED PARTIAL REGRESSION COEFFICIENT 159

unstandardized regression coefficientt �

standard error of the unstandardizedregression coefficient

variance of estimate�sum of squares of the predictor � (1 � R2)

6 � N � (N � 1)�(N � 2) � (N � 1) � (N � 3)

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 159

Page 171: Dictionary of statistics

measured in units with larger values is morelikely to have a bigger unstandardized partialregression coefficient than a predictor mea-sured in units of smaller values, making itdifficult to compare the relative weights ofpredictors when they are not measured interms of the same scale or metric.

A standardized partial regression coefficientcan be converted into its unstandardizedcoefficient by multiplying the stand-ardized partial regression coefficient by thestandard deviation (SD) of the criterion anddividing it by the standard deviation of thepredictor:

The statistical significance of the standard-ized and the unstandardized partial coeffi-cients is the same value and can be expressedin terms of either a t or an F value. The t testfor the partial regression coefficient is theunstandardized partial regression coefficientdivided by its standard error:

standardized regression coefficient orweight: see correlation; regression equa-tion; standardized partial regression coeffi-cient or weight

standardized, standard or z score: ascore which has been transformed to the stan-dard scale in statistics. Just as measures oflength can be standardized into metres, suchas when an imperial measure is turned intoa metric measure (i.e. 1 inch becomes 25.4millimetres), so can scores be converted to thestandard measuring scale of statistics. Herethe analogy with physical scales breaks downas in statistics scores are standardized by con-verting into new ‘standardized’ scores with amean of 0 and a standard deviation of 1. Forany set of scores such as

5, 9, 14, 17 and 20

it is a relatively easy matter to convert themto a sample of scores with a mean of 0. Allthat is necessary is that the mean of the sam-ple of scores is calculated and then this meansubtracted from each of the scores. The meanof the above five scores is 65/5 � 13.0.Subtract 13.0 from each of the above scores:

5 � 13.0, 9 � 13.0, 14 � 13.0, 17 � 13.0 and20 � 13.0

This gives the following set of scores in whichthe mean score is 0.0:

−8, −4, 1, 4 and 7

By doing this, only the mean has beenadjusted. The next step would be to ensurethat the standard deviation of the scoresequals 1.0. The standard deviation of theabove scores as they stand can be calculatedwith the usual formula and is 5.404. If wedivide each of the adjusted scores (deviationsfrom the mean) by the standard deviation of5.404 we get the following set of scores:

�1.480, �0.740, �0.185, �0.740 and 1.295

If we then calculate the standard deviationof these new scores, we find that the newstandard deviation is 1.0. Hence, we havetransformed a set of scores into a new set ofscores with a mean of 0 and a standard devi-ation of 1. Hence, the new scores correspondto the standard normal distribution by defin-ition since the standard normal distributionhas a mean of 0 and a standard deviation of 1.

It follows from this that any distribution ofscores may be transformed to the standardnormal distribution. This allows us to use asingle generic frequency distribution (thestandard normal distribution) to describe thefrequency distribution of any set of scoresonce it has been transformed to the standardnormal distribution. Table S.2 gives a shortversion of this distribution.

Thus, if we know what a score is on thestandard normal distribution we can use thetable of the standard normal distribution toassess where it is relative to others. So if thescore transformed to the standard normal

THE SAGE DICTIONARY OF STATISTICS160

unstandardizedpartialregression coefficient

standardized partialregressioncoefficient

� �criterion SD

predictor SD

unstandardized regression coefficientt �

standard error of the unstandardizedregression coefficient

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 160

Page 172: Dictionary of statistics

distribution is 1.96, Table S.2 tells us that only2.5% of scores in the distribution will bebigger than it.

The quicker way of doing all of this is sim-ply to work out the following formula for thescore which comes from a set of scores with aknown standard deviation:

The z score is a score transformed to itsvalue in the standard normal distribution. Soif the z score is calculated, Table S.2 will placethat transformed score relative to otherscores. It gives the percentage of scores lowerthan that particular z score.

Generally speaking, the requirement of thebell shape for the frequency distribution isignored as it makes relatively little differenceexcept in conditions of gross skewness. Seealso: standard deviation; z score

statistic: a characteristic such as a mean,standard deviation or any other measurewhich is applied to a sample of data. Appliedto a population exactly the same charact-eristics are described as parameters of thepopulation.

statistical inference: see estimated stan-dard deviation; inferential statistics; nullhypothesis

statistical notation: the symbols used instatistical theory are based largely on Greekalphabet letters and the conventional (Latinor Roman) alphabet. Greek alphabet lettersare used to denote population characteristics(parameters) and the letters of the conven-tional alphabet used to symbolize samplecharacteristics. Hence, the value of the popu-lation mean (i.e. a parameter) is denoted as �(mu) whereas the value of the sample mean(i.e. the statistic) is denoted as X

–. Table S.3

gives the equivalent symbols for parametersand statistics.

Constants are usually represented by thealphabetical symbols (a, b, c and similar fromthe beginning of the alphabet).

Scores on the first variable are primarilyrepresented by the symbol X. Scores on a sec-ond variable are usually represented by thesymbol Y. Scores on a third variable are usu-ally represented by the symbol Z.

However, as scores would usually be tabu-lated as rows and columns, it is conventionalto designate the column in question. So X1

STATISTICAL NOTATION 161

Table S.2 Standard normal distribution (z distribution)

z % lower z % lower z % lower z % lower z % lower z % lower z % lower

−4.0 0.00 −2.2 1.39 −1.3 9.68 −0.4 34.46 0.5 69.15 1.4 91.92 2.3 98.93−3.0 0.13 −2.1 1.79 −1.2 11.51 −0.3 38.21 0.6 72.57 1.5 93.32 2.4 99.18−2.9 0.19 −2.0 2.28 −1.1 13.57 −0.2 42.07 0.7 75.80 1.6 94.52 2.5 99.38−2.8 0.26 −1.9 2.87 −1.0 15.87 −0.1 46.02 0.8 78.81 1.7 95.54 2.6 99.53−2.7 0.35 −1.8 3.59 −0.9 18.41 0.0 50.00 0.9 81.59 1.8 96.41 2.7 99.65−2.6 0.47 −1.7 4.46 −0.8 21.19 0.1 53.98 1.0 84.13 1.9 97.13 2.8 99.74−2.5 0.62 −1.6 5.48 −0.7 24.20 0.2 57.93 1.1 86.43 2.0 97.50 2.9 99.81−2.4 0.82 −1.5 6.68 −0.6 27.43 0.3 61.79 1.2 88.49 2.1 97.72 3.0 99.87−2.3 1.07 −1.4 8.08 −0.5 30.85 0.4 65.54 1.3 90.32 2.2 98.21 4.0 100.00

Special values: a z of 1.64 has 95% lower; a z of −1.64 has 5% lower; a z of 1.96 has 97.5% lower; and az of −1.96 has 2.5% lower.

z score �score – mean

= X − X–

standard deviation SD

Table S.3 Symbols for population andsample values

Symbol for Symbol for population value sample value

Concept (parameter) (statistic)

Mean µ (mu) X–

Standard σ (sigma) sdeviation

Correlation p (rho) rcoefficient

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 161

Page 173: Dictionary of statistics

would be the first score in the column, X4

would be the fourth score in the column andXn would be the final score in the column. Therows are designated by a second subscript ifthe data are in the form of a two-dimensionaltable or matrix. X1, 5 would be the score in thefirst column and fifth row. X4,2 would be thescore in the fourth column and second row.

statistical package: an integrated set ofcomputer programs enabling a wide range ofstatistical analyses of data usually using asingle spreadsheet for the data. Typicalexamples include SPSS, Minitab, etc. Moststatistical analyses are now conducted usingsuch a package. See also: Minitab; SAS;SPSS; SYSTAT

statistical power: see power of a test

statistical significance: see hypothesistesting; significant; t distribution

stem and leaf diagram: a form of statisti-cal diagram which effectively enables boththe illustration of the broad trends of the dataand details of the individual observations. Itcan be used to represent small samples ofscores. There are no foolproof ways of con-structing such a diagram effectively apartfrom trial and error. Take a look at the follow-ing diagram:

STEM LEAF

7 5 78 2 4 7 89 2 3 4 4 5

10 1 1 3 4 6 7 811 2 5 6 712 3 6 713 1 414 2 3

Imagine that the diagram represents the IQ ofa class of schoolchildren. The first column isthe stem labelled 07, 08, . . . , 14. These are IQranges such that 07 represents IQs in the 70s,13 represents IQs in the 130s. The leaf indi-cates the precise IQs of all of the children inthe sample. So the row represents a child withan IQ of 75 and a child with an IQ of 77. Thelast row indicates a child with an IQ of 142and another with an IQ of 143. Also noticehow in general the pattern seems to indicatea roughly normal distribution.

Choosing an appropriate stem is crucial.That is, too many or too few stems makethe diagram unhelpful. Furthermore, toomany scores and the diagram also becomesunwieldy.

stepwise entry: a method in which predic-tors are entered according to some statisticalcriteria in statistical methods such as logisticregression, multiple regression, discriminantfunction analysis and log–linear analysis. Theresults of such an analysis may depend onvery small differences in predictors meetingthose criteria. One predictor may be enteredinto an analysis rather than another predictorwhen the differences between the two predic-tors are very small. Consequently, the resultsof this method always need to be interpretedwith care.

stratified random sample: straight-forward random sampling may leave out aparticular class of cases. Thus, it is conceiv-able that a random sample drawn, say, from alist of electors will consist overwhelmingly ofwomen with few men selected despite thefact that the sexes are equally common on theelectoral list. Random sampling, theoretically,may lead to a whole range of different out-comes some of which are not representativeof the population. One way of dealing withthis is to stratify the population in terms ofcharacteristics which can be assessed fromthe sampling frame. For example, the sam-pling could be done in such a way that 50% ofthe sample is male and the other 50% is

THE SAGE DICTIONARY OF STATISTICS162

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 162

Page 174: Dictionary of statistics

female. For example, males may be selectedfrom the list and then randomly sampled,then the process repeated for females. In thisway, variations in sample sex distribution arecircumvented. See also: cluster sample;probability sampling

structural equation modelling: a sophisti-cated and complex set of statistical procedureswhich can be used to carry out confirmatoryfactor analysis and path analysis on quantita-tive variables. It enables the statistical fit ofmodels depicting the relationship betweenvariables to be determined. Various statisticalpackages have been developed to carry it outincluding AMOS, EQS and LISREL. See also:AMOS; attenuation, correcting correlationsfor; confirmatory factor analysis; discrimi-nant function analysis; EQS; identification;just-identified model; LISREL; manifest vari-able; maximum likelihood estimation; over-identified model

Cramer (2003)

Student’s t test: see t test

Student Newman–Keuls multiple com-parison test: see Newman–Keuls method,procedure or test

studentized range statistic q: used insome multiple comparison tests such as theTukeya or HSD (Honestly SignificantDifference) test which determine which ofthree or more means differ significantly fromone another. It is the difference or rangebetween the smallest and the largest meandivided by the standard error of the range ofmeans which is the square root of the divi-sion of the error mean square by the numberof cases in a group:

It is called ‘studentized’ after the pseudonym‘Student’ – the pen name of William SealeyGossett who also developed the t test.

The distribution of this statistic dependson two sets of degrees of freedom, one for thenumber of means being compared and theother for the error mean square. With fourgroups of three cases each, the degrees offreedom for the number of groups beingcompared are 4 and for the error mean squareis the number of cases minus the number ofgroups, which is 8 (12 � 4 � 8). The table forthis distribution can be found in some statis-tics texts such as the one listed below. Valuesthat this statistic has to be or exceed to be sig-nificant at the 0.05 level are given in Table S.4for a selection of degrees of freedom. Seealso: Duncan’s new multiple range test;Newman–Keuls method;Tukeya and Tukeyb

testsKirk (1995)

subject: a traditional and increasinglyarchaic word for participant which denotes aperson taking part in a study. Participant is amore acceptable term because it denotes theactive participation of individuals in theresearch process as opposed to subject whichtends to denote someone who obeys the willof the researcher. Case is commonly used instatistical packages as they need to be able to

SUBJECT 163

q �largest mean − smallest mean

�error mean square/number of cases in a group

Table S.4 The 0.05 critical values of thestudentized range

df forerrormeansquare df for number of means

2 3 4 5 6 7

8 3.26 4.04 4.53 4.89 5.17 5.4012 3.08 3.77 4.20 4.51 4.75 4.9520 2.95 3.58 3.96 4.23 4.45 4.6230 2.89 3.49 3.84 4.10 4.30 4.4640 2.86 3.44 3.79 4.04 4.23 4.3960 2.83 3.40 3.74 3.98 4.16 4.31120 2.80 3.36 3.69 3.92 4.10 4.24∞ 2.77 3.31 3.63 3.86 4.03 4.17

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 163

Page 175: Dictionary of statistics

deal with other units of interest such asorganizations. The term subject cannot totallybe discarded because statistics include termslike related subjects design. Proband isanother term sometimes used to describe asubject or person. This term is more com-monly found in studies examining the influ-ence of heredity. The term is derived from theLatin meaning tested or examined.

The term subject should be avoided exceptwhere traditional use makes it inevitable (orat least its avoidance almost impossible).

subject variable: a term which is some-times used to describe the characteristics ofsubjects or participants which are not mani-pulated such as their gender, age, socio-economic status, abilities and personalitycharacteristics. A distinction may be madebetween subject variables, which have notbeen manipulated, and independent vari-ables, which have been manipulated.

subtract: deduct one number from another.Take one number away from the other. Seealso: negative values

sum of squares (SS) or squared deviations:a statistic in analysis of variance which is usedto form the mean square (MS). The meansquare is the sum of squares divided by itsdegrees of freedom. Square is short for squareddeviations or differences. What these differ-ences are depends on what term or source ofvariation is being calculated. For the between-groups source of variation it is the squareddifference between the group mean and theoverall mean which is then multiplied by thenumber of cases in that group. The squared dif-ferences for each group are then added togetheror summed. See also: between-groups vari-ance; error sum of squares

suppressed relationship: the apparentabsence of a relationship between two vari-ables which is brought about by one or moreother variables. The relationship between thetwo variables becomes apparent when theseother variables are statistically controlled.For example, there may be no relationshipbetween how aggressive one is and howmuch violence one watches on TV. The lack ofa relationship may be due to a third factorsuch as how much TV is watched. Peoplewho watch more TV may also as a result seemore violence on TV even though they areless aggressive. If we statistically control forthe amount of TV watched we may find apositive relationship between being aggres-sive and watching violence on TV.

suppressor variable: a variable which sup-presses or hides the relationship between twoother variables. See also: multiple regression;suppressed relationship

survey: a method which generally refers to asample of people being asked questions onone occasion. Usually the purpose is to obtaindescriptive statistics which reflect the popu-lation’s views. The collection of informationat one point in time does not provide a strongtest of the temporal or causal relationshipbetween the variables measured.

symmetry: usually refers to the shape of afrequency distribution. A symmetrical fre-quency distribution is one in which the halvesabove and below the mean, median or mode,are mirror images of one another. They wouldalign perfectly if folded around the centre of thedistribution. In other words, symmetry meansexactly what it means in standard English.Asymmetry refers to a distribution which isnot symmetric or symmetrical.

THE SAGE DICTIONARY OF STATISTICS164

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 164

Page 176: Dictionary of statistics

SYSTAT: one of several widely used statisticalpackages for manipulating and analysingdata. Information about SYSTAT can befound at the following website:http://www.systat.com/

systematic sample: a sample in which thecases in a population are consecutively num-bered and cases are selected in terms of theirorder in the list, such as every 10th number.

Suppose, for example, we need to select 100cases from a population of 1000 cases whichhave been numbered from 1 to 1000. Ratherthan selecting cases using some random pro-cedure as in simple random sampling, wecould simply select every 10th case (1000/100 � 10) which would give us a sample of100 (1000/10 � 100). The initial number wechose could vary anywhere between 1 and10. If that number was, say, 7, then we wouldselect individuals numbered 7, 17, 27 and soon up to number 997.

SYSTEMATIC SAMPLE 165

Cramer Chapter-S.qxd 4/22/04 5:16 PM Page 165

Page 177: Dictionary of statistics

t distribution: a distribution of t valueswhich varies according to the number of casesin a sample. It becomes increasingly similarto a z or normal distribution the bigger thesample. The smaller the sample, the flatterthe distribution becomes. The t distribution isused to determine the critical value that t hasto be, or to exceed, in order for it to achievestatistical significance. The larger this value,the more likely it is to be statistically signifi-cant. The sign of this value is ignored. Forlarge samples, t has to be about 1.96 or moreto be statistically significant at the 95% or 0.05two-tailed level. The smaller the sample, thelarger t has to be to be statistically significantat this level.

The critical value that t has to be, or to bebigger than, to be significant at the 0.05 two-tailed level is shown in Table T.1 for selectedsamples increasing in size. Sample size isexpressed in terms of (df ) rather than numberof cases because the way in which thedegrees of freedom are calculated variesslightly according to the t test used for deter-mining statistical significance. For samples of20 or more this critical value is close to 2.00.

t test: generally determines whether twomeans are significantly different from eachother or the mean of a sample is significantlydifferent from that of the population fromwhich it may have been drawn. It is also usedto ascertain the statistical significance ofcorrelations and partial correlations, andregression and partial regression coefficients.

The exact nature of the test depends on the useto which it is being put. The sign of the t test isignored since it is arbitrary and depends onwhat mean is subtracted from the other. Thelarger the t value of the test, the more likely thetest is statistically significant. The test can betwo tailed or one tailed. A one-tailed test isused when there are good grounds for pre-dicting the direction or sign of the differenceor the association. When there are no suchgrounds a two-tailed test is applied. The tvalue has to be higher for a two- than a one-tailed test. So, the two-tailed test is less likelyto be statistically significant than the one-tailed test. See also: t test for one sample; ttest for related samples; t test for relatedvariances; t test for unrelated samples

t test for one sample: determines whetherthe mean of a sample differs significantly

T

Table T.1 The 0.05 two-tailed critical valueof t

df Critical value

1 12.7065 2.57110 2.22820 2.08650 2.009100 1.9841000 1.962∞ 1.960

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 166

Page 178: Dictionary of statistics

from that of the population. It is the differencebetween the population mean and the samplemean divided by the standard error of themean:

The larger the t value, disregarding itssign, the more likely the two means will besignificantly different. The degrees of free-dom for this test are the number of casesminus one.

t test for related samples: also known asthe paired-samples t test. It determineswhether the means of two samples that comefrom the same or similar cases are signifi-cantly different from each other. It is the dif-ference between the two means divided bythe standard error of the difference in means:

The larger the t value, ignoring its sign, themore likely the two means differ significantlyfrom each other. The degrees of freedom for thistest are the number of pairs of cases minus one.

t test for related variances: determineswhether the variances of two samples thatcome from the same or similar cases are sig-nificantly different from each other. The fol-lowing formula can be used to calculate it,where r is the correlation between the pairs ofscores:

The larger the t value, the more likely it isthat the variances will differ significantlyfrom each other. The degrees of freedom arethe number of cases minus one.

McNemar (1969)

t test for unrelated samples: also knownas the independent-samples t test. It is usedto determine whether the means of two sam-ples that consist of different or unrelatedcases differ significantly from each other. It isthe difference between the two meansdivided by the standard error of the differ-ence in means:

The larger the t value, ignoring its sign, themore likely the two means differ significantlyfrom each other.

The way in which the standard error is cal-culated depends on whether the variancesare dissimilar and whether the group size isunequal. The standard error is the samewhen the group sizes are equal. It may be dif-ferent when the group sizes are unequal. Theway in which the degrees of freedom are cal-culated depends on whether the variancesare similar. When they are similar the degreesof freedom for this test is the number of casesminus one. When they are dissimilar, they arecalculated according to the following for-mula, where n refers to the number of casesin a sample and where subscripts 1 and 2refer to the two samples:

This formula in many cases will lead todegrees of freedom which are not a wholenumber but which involve decimal places.When looking up the statistical significanceof the t values for these degrees of freedom, itmay be preferable to round up the degrees offreedom to the nearest whole number makingthe test slightly more conservative. See also:pooled variances

Pedhazur and Schmelkin (1991)

T2 test: determines whether Pearson’s corre-lation between one variable and two othersdiffers significantly when all three variablesare measured in the same sample. For example,

T2 TEST 167

population mean � sample mean

standard error of the mean

mean of one sample � mean of other sample

standard error of the difference in means

(large variance � smaller variance) ��number of cases � 2

�(1 � r2) � (4 � larger variance � smallervariance)

mean of one sample � mean of other sample

standard error of the difference in means

[(variance1/n1)2 � (variance2/n2)]

2

(variance1/n1)2

�(variance2/n2)

2

(n1 � 1) (n2 � 1)

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 167

Page 179: Dictionary of statistics

the correlation between relationship satisfactionand relationship conflict may be compared withthe correlation between relationship satis-faction and relationship support. The statisticalsignificance of the T2 test is the same as thet test except that the degrees of freedom arethe number of cases minus three.

Cramer (1998); Steiger (1980)

Tamhane’s T2 multiple comparisontest: a post hoc or multiple comparison testwhich is used to determine which of threeor more means differ from one anotherwhen the F ratio in an analysis of variance issignificant. It was developed to deal withgroups with unequal variances. It can be usedwith groups of equal or unequal size. It isbased on the unrelated t test as modifiedin the Dunn–Sidak multiple comparisontest and degrees of freedom as calculated inthe Games–Howell multiple comparisonprocedure.

Kirk (1995)

test–retest reliability: the correlation coef-ficient between the scores on a test given to asample of participants at time A and theirscores at a later time B. It is then essentially ameasure of the consistency of the scores fromone time to the next. It allows one to assesswhether the scores remain stable over time.Test–retest reliability indicates how likely itis that a person’s score on the test will besimilar relative to the scores of other partici-pants from one administration to the next. Inother words, it does not measure whether theactual scores are similar. For variables whichought to be stable over time (such as, say,intelligence) high test–retest reliability is agood thing. However, there are variableswhich may be fairly unstable over time (e.g.how happy a person feels) and may beexpected to vary from day to day. In thatcase, a good and valid measure might beexpected to be fairly unstable over time andhence have low test–retest reliability. See also:reliability

ties or tied scores: two or more scores whichhave the same value on a variable. When werank these scores we need to give all of the tiedscores the same rank. This is calculated as beingthe average rank which would have been allo-cated had we arbitrarily ranked the tied scores.For example, if we had the five scores of 5, 7, 7,7 and 9, 5 would be given a rank of 1, 7 a rankof 3 [(2 � 3 � 4)/3 � 3] and 9 a rank of 5.

total: the amount obtained when a series ofnumbers or frequencies are added together. Itis also known as the sum.

transformations: these are mathematical pro-cedures or adjustments applied to scores in anattempt to make the distribution of the scores fitrequirements. Statistical procedures are oftenaffected by the presence of skewness or outliers.A transformation will often reduce the impactof these. The nature of the transformation willdepend on the nature of the problem and theyrange from trimming to logarithmic transfor-mations. Table T.2 illustrates some transforma-tions though none may be ideal for a particularpurpose. The transformations for values of 0can be problematic and it is best to adjust thescores to give a minimum value of 1 beforetransforming them. Adding a constant to everyscore does not affect the outcome of the statisti-cal calculations but one needs to readjustmeans, for example by subtracting the constantwhen reporting the findings. Variance is unaf-fected by adding a constant to every score.

There is nothing intrinsically good aboutcarrying out a data transformation particu-larly as transformed scores cease to have anyof the obvious meaning that they may havehad in their original form. They are there todeal with difficult data. As such, they have aplace but their use is relatively uncommon inmodern practice. See also: logarithm

treatment group or condition: also knownas the experimental group/condition. The

THE SAGE DICTIONARY OF STATISTICS168

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 168

Page 180: Dictionary of statistics

condition to which higher levels of theindependent variable are applied as opposedto the control condition which receives noincreased level of the independent variable. Thetreatment condition is the direct focus of theinvestigation and the impact of a particularexperimental treatment. There may be severaltreatment groups each receiving a differentamount or type of the treatment variable.

trend analysis in analysis of variance:may be used in analysis of variance to deter-mine the shape of the relationship betweenthe dependent variable and an independentvariable which is quantitative in that it repre-sents increasing or decreasing levels ofthat variable. Examples of such a quantita-tive independent variable include increasingquantities of a drug such as nicotine or alco-hol, increasing levels of a state such as sleepdeprivation or increasing intensity of a vari-able such as noise. If the F ratio of the analy-sis of variance is statistically significant, wemay use trend analysis to find out if the rela-tionship between the dependent and theindependent variable is a linear or non-linearone and, if it is non-linear, what kind of non-linear relationship it is.

The shape of the relationship between twovariables can be described in terms of a poly-nomial equation. A first-degree or linearpolynomial represents a linear or straight linerelationship between two variables where thevalues of one variable either increase or

decrease as the values of the other variableincrease. For example, performance maydecrease as sleep deprivation increases. A lin-ear trend can be defined by the followingequation, where y represents the dependentvariable, b the slope of the line, x the inde-pendent variable and a a constant.

y � (b1 � x) � a1

A minimum of two groups or levels is neces-sary to define a linear trend.

A second-order or quadratic relationshiprepresents a curvilinear relationship in whichthe values of one variable increase (ordecrease) and then decrease (or increase) asthe values of the other variable increase. Forinstance, performance may increase as back-ground noise increases and then decrease asit becomes too loud. A quadratic trend can bedefined by the following equation:

y � (b2 � x)2 � (b1 � x) � a2

A minimum of three groups or levels is nec-essary to define a quadratic trend.

A third-order or cubic relationship repre-sents a curvilinear relationship in whichthe values of one variable first increase (ordecrease), then decrease (or increase) and thenincrease again (or decrease). A cubic trend canbe defined by the following equation:

y � (b3 � x)3 � (b2 � x)2 � (b1 � x) � a3

A minimum of four groups or levels is neces-sary to define a cubic trend.

TREND ANALYSIS IN ANALYSIS OF VARIANCE 169

Table T.2 Some examples of transformations

Log 10 Natural log Reciprocal Square rootScore transformation transformation transformation transformation

1 0.000 0.000 1.000 1.0005 0.699 1.609 0.200 2.23610 1.000 2.303 0.100 3.16250 1.699 3.912 0.020 7.071100 2.000 4.605 0.010 10.000500 2.699 6.215 0.002 22.3611000 3.000 6.908 0.001 31.6235000 3.699 8.517 0.0002 70.71110,000 4.000 9.210 0.0001 100.000

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 169

Page 181: Dictionary of statistics

A fourth-order or quartic relationshiprepresents a curvilinear relationship in whichthe values of one variable first increase (ordecrease), then decrease (or increase), thenincrease again (or decrease) and then finallydecrease again (or increase). A quartic trendcan be defined by the following equation:

y � (b4 � x)4 � (b3 � x)3 � (b2 � x)2

� (b1 � x) � a4

A minimum of five groups or levels are nec-essary to define a quartic trend.

These equations can be represented byorthogonal polynomial coefficients which area special case of contrasts. Where the numberof cases in each group is the same and wherethe values of the independent variable areequally spaced (such as 4, 8, 12 and 16 hoursof sleep deprivation), the value of these coef-ficients can be obtained from a table which isavailable in some statistics textbooks such asthe one listed below. The coefficients for threeto five groups are presented in Table T.3. Theprocedure for calculating orthogonal coeffi-cients for unequal intervals and unequalgroup sizes is described in the reference listedbelow. The number of orthogonal polynomialcontrasts is always one less the number ofgroups. So, if there are three groups, there aretwo orthogonal polynomial contrasts whichrepresent a linear and a quadratic trend.

The formula for calculating the F ratio for atrend is as follows:

For the example given in the entry for theNewman–Keuls method there are fourgroups of three cases each. We will assumethat the four groups represent an equallyspaced variable. The four means are 6, 14, 12and 2 respectively. The error mean square is9.25. As there are four groups, there are threeorthogonal polynomial contrasts which rep-resent a linear, a quadratic and a cubic trend.

The F ratio for the linear trend is 3.18:

This F ratio has two sets of degrees of free-dom, one for the contrast or numerator of theformula and one for the error mean squareor denominator of the formula. The degreeof freedom is 1 for the contrast and thenumber of cases minus the number ofgroups for the error mean square, which is8 (12 � 4 � 8). The critical value of F thatthe F ratio has to be or to exceed to be sta-tistically significant at the 0.05 level withthese two degrees of freedom is about 5.32.This value can be found in a table in manystatistics texts and is usually given in statis-tical packages which compute this contrast.As the F ratio of 3.18 is less than the criticalvalue of 5.32, the four means do not repre-sent a linear trend.

THE SAGE DICTIONARY OF STATISTICS170

Table T.3 Coefficients for orthogonalpolynomials

Trends

Three groups:Linear −1 0 1Quadratic 1 −2 1

Four groups:Linear −3 −1 1 3Quadratic 1 −1 −1 1Cubic −1 3 −3 1

Five groups:Linear −2 −1 0 1 2Quadratic 2 −1 −2 −1 2Cubic −1 2 0 −2 1Quartic 1 −4 6 −4 1

F �

[(group 1 weight � group 1 mean) � (group 2 weight� group 2 mean) � . . . ] 2

Error group 1 weight2 group 2 weight2

mean � � � . . .square group 1 n group 2 n� �

[(�3 � 6) � (�1 � 14) � (1 � 12) � (3 � 2)]2

9.25 � ��32

��12

�12

�32

�3 3 3 3

(�18 � �14 � 12 � 6)2

9.25 � �9 �

1 �

1 �

9�3 3 3 3

�142

9.25 � �9 � 1 � 1 � 9�3

196 196 196�

9.25 �20

�9.25 � 6.67

�61.70

� 3.18

3

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 170

Page 182: Dictionary of statistics

The F ratio for the quadratic trend is 26.28:

As the F ratio of 26.28 is greater than the criti-cal value of 5.32, the four means represent aquadratic trend in which the means increaseand then decrease.

The F ratio for the cubic trend is 0.06:

As the F ratio of 0.06 is less than the criticalvalue of 5.32, the four means do not representa cubic trend.

Kirk (1995)

trimmed samples: sometimes applied todata distributions which have difficultiesassociated with the tails, such as small

numbers of extreme scores. A trimmed sampleis a sample in which a fixed percentage ofcases in each tail of the distribution is deleted.So a 10% trimmed sample has 10% of eachtail removed. See also: transformations

true experiment: an experiment in whichone or more variables have been manipulated,all other variables have been held constantand participants have been randomly assignedeither to the different conditions in a between-subjects design or to the different orders of theconditions in a within-subjects design. Themanipulated variables are often referred to asindependent variables because they areassumed not to be related to any other vari-ables that might affect the measured variables.The measured variables are usually known asdependent variables because they are assumedto depend on the independent ones. Oneadvantage of this design is that if the depen-dent variable differs significantly between theconditions, then these differences should onlybe brought about by the manipulation of thevariables. This design is seen as being the mostappropriate one for determining the causalrelationship between two variables. The rela-tionship between the independent and thedependent variable may be reciprocal. Todetermine this, the previous dependent vari-able needs to be manipulated and its effect onthe previous independent variable needs tobe examined. See also: analysis of covariance;quasi-experiments

Cook and Campbell (1979)

true score: the observed score on a measureis thought to consist of a true score and randomerror. If the true score is assumed not to varyacross occasions or people and the error is ran-dom, then the random error should cancel outleaving the true score. In other words, the truescore is the mean of the scores. If it is assumedthat the true score may vary between individ-uals, the reliability of a measure can be definedas the ratio of true score variance to observedscore variance. See also: reliability

TRUE SCORE 171

[(1 � 6) � (�1 � 14) � (�1 � 12) � (1 � 2)]2

9.25 � �12

��12

��12

�12

�3 3 3 3

(6 � �14 � �12 � 2)2

9.25 � �1 �

1 �

1 �

1�3 3 3 3

�182

9.25 � �1 � 1 � 1 � 1�3

324 324 324�

9.25 �4

�9.25 � 1.333

�12.33

� 26.28

3

[(�1 � 6) � (3 � 14) � (�3 � 12) � (1 � 2)]2

9.25 � ��12

�32

��32

�12

�3 3 3 3

(−6 � 42 � �36 � 2)2

9.25 � �1 �

9 �

9 �

1�3 3 3 3

22

9.25 � �1 � 9 � 9 � 1�3

4 4 4�

9.25 �20

�9.25 � 6.67

�61.70

� 0.06

3

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 171

Page 183: Dictionary of statistics

Tukeya or HSD (Honestly SignificantDifference) test: a post hoc or multiple com-parison test which is used to determinewhether three or more means differ signifi-cantly in an analysis of variance. It assumesequal variance in the three or more groupsand is only approximate for unequal groupsizes. This test is like the Newman–Keulsmethod except that the critical difference thatany two means have to be bigger than, to bestatistically significant, is set by the totalnumber of means and is not smaller the closerthe means are in size. So, if four means arebeing compared, the difference that any twomeans have to exceed to be statistically sig-nificant is the same for all the means.

The formula for calculating this critical dif-ference is the value of the studentized rangemultiplied by the square root of the errormean square divided by the number of casesin each group:

W � studentized range� �error mean square/group n

The value of the studentized range variesaccording to the number of means being com-pared and the degrees of freedom for theerror mean square. This value can beobtained from a table which is available insome statistics texts such as the source below.

For the example given in the entry for theNewman–Keuls method there are fourgroups of three cases each. This means thatthe degrees of freedom for the error meansquare are the number of cases minus thenumber of groups, which is 8 (12 � 4 � 8).The 0.05 critical value of the studentizedrange with these two sets of degrees of free-dom is 4.53. The error mean square for thisexample is 9.25.

Consequently, the critical difference thatany two means need to exceed to be signifi-cantly different is about 7.97:

4.53 � �9.25/3 � 4.53 � �3.083� 4.53 � 1.76 � 7.97

The means of the four groups are 6, 14, 12 and2 respectively. If we look at Table N.1 whichshows the difference between the four means,we can see that the absolute differences

between groups 2 and 4 (14 � 2 � 12), groups3 and 4 (12 � 2 � 10) and groups 2 and 1 (14 �6 � 8) exceed this critical difference of 7.97and so differ significantly.

Because the Tukeya test does not make thecritical difference between two means smallerthe closer they are, this test is less likelyto find that two means differ significantlythan the Newman–Keuls method. For thisexample, in addition to these three differ-ences being significant, the Newman–Keulsmethod also finds the absolute differencebetween groups 3 and 1 (12 � 6 � 6) to be sta-tistically significant as the critical differencefor this test is 5.74 and not 7.97.

We could arrange these means into threehomogeneous subsets where the means in asubset would not differ significantly fromeach other but where some of the means inone subset would differ significantly fromthose in another subset. We could indicatethese three subsets by underlining the meanswhich did not differ as follows.

2 6 12 14

Kirk (1995)

Tukeyb or WSD (Wholly SignificantDifference) test: a post hoc or multiple com-parison test which is used to determinewhether three or more means differ signifi-cantly in an analysis of variance. It assumesequal variances in each of the three or moremeans and is approximate for unequal groupsizes. It is a compromise between the Tukeya

or HSD (Honestly Significant Difference) testand the Newman–Keuls method. Both theTukeya and the Newman–Keuls test use thestudentized range to determine the criticaldifference two means have to exceed to besignificantly different. The value of this rangediffers for the two tests. For the Tukeya test itdepends on the total number of means beingcompared whereas for the Newman–Keulstest it is smaller the closer the two means arein size in a set of means. In the Tukeyb test, themean of these two values is taken.

The formula for calculating the critical dif-ference is the value of the studentized rangemultiplied by the square root of the error

THE SAGE DICTIONARY OF STATISTICS172

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 172

Page 184: Dictionary of statistics

mean square divided by the number of casesin each group:

W � studentized range� �error mean square/group n

The value of the studentized range variesaccording to the number of means beingcompared and the degrees of freedom for theerror mean square. This value can beobtained from a table which is available insome statistics texts such as the source below.

For the example given in the entry for theNewman–Keuls test there are four groups ofthree cases each. This means that the degreesof freedom for the error mean square are thenumber of cases minus the number ofgroups, which is 8 (12 � 4 � 8). The 0.05 crit-ical value of the studentized range with thesetwo sets of degrees of freedom is 4.53. For theTukeya test, this value is used for all the com-parisons. For the Newman–Keuls test, thisvalue is only used for the two means that arefurthest apart, which in this case is twomeans apart. The closer the two means are insize, the smaller this value becomes. So, fortwo means which are only one mean apart,this value is 4.04 whereas for two adjacentmeans it is 3.26.

The Tukeyb test uses the mean of these twovalues which is 4.53 [(4.53 � 4.53)/2 �9.06/2 � 4.53] for the two means furthestapart, 4.29 [(4.53 � 4.04)/2 � 8.57/2 � 4.29]for the two means one mean apart and 3.90[(4.53 � 3.26)/2 � 7.79/2 � 3.90] for the twoadjacent means.

To calculate the critical difference we needthe error mean square which is 9.25. Thevalue by which the value of the studentizedrange needs to be multiplied is the same forall comparisons and is 1.76 (�9.25/3 � �3.083 �1.76).

The critical difference for the two meansfurthest apart is about 7.97 (4.53 � 1.76 � 7.97).For the two means one mean apart it is about7.55 (4.29 � 1.76 � 7.55), whereas for the twoadjacent means it is about 6.86 (3.90 � 1.76 �6.86).

The means of the four groups, ordered inincreasing size, are 2, 6, 12 and 14. If we takethe two means furthest apart (2 and 14),we can see that their absolute difference of

12 (2 �14 � �12) exceeds the critical value of7.97 and so these two means differ signifi-cantly. If we take the two pairs of means onemean apart (2 and 12, and 6 and 14), we cansee that both their absolute differences of 10(2 �12 � �10) and 8 (6 �14 � �8) aregreater than the critical value of 7.55 and soare significant differently. Finally, if we takethe three pairs of adjacent means (2 and 6, 6and 12, and 12 and 14), we can see that none oftheir absolute differences of 4 (2 � 6 � �4), 6(6 �12 � �6) and 2 (12 � 14 � �2) is greaterthan the critical value of 6.86 and so they donot differ significantly.

We could arrange these means into threehomogeneous subsets where the means in asubset do not differ significantly from eachother but where some of the means in onesubset differ significantly from those inanother subset. We could indicate these threesubsets by underlining the means which didnot differ as follows:

2 6 12 14

Howell (2002)

Tukey–Kramer test: a post hoc or multiplecomparison test which is used to determinewhether three or more means differ signifi-cantly in an analysis of variance. It assumesequal variance for the three or more meansand is exact for unequal group sizes. It usesthe studentized range to determine the criti-cal difference two means have to exceed to besignificantly different. It is a modification ofthe Tukeya or HSD (Honestly SignificantDifference) test.

The formula for calculating this critical dif-ference is the value of the studentized rangemultiplied by the square root of the errormean square which takes into account thenumber of cases in the two groups (n1 and n2)being compared:

W � studentized range

���error mean square ��1�

1 ���2n1 n2

The value of the studentized range variesaccording to the number of means being

TUKEY–KRAMER TEST 173

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 173

Page 185: Dictionary of statistics

compared and the degrees of freedom for theerror mean square. This value can beobtained from a table which is available insome statistics texts such as the source below.

For the example given in the entry for theNewman–Keuls method there are fourgroups of three cases each. This means thatthe degrees of freedom for the error meansquare are the number of cases minus thenumber of groups, which is 8 (12 � 4 � 8).The 0.05 critical value of the studentizedrange with these two sets of degrees of free-dom is 4.53. The error mean square for thisexample is 9.25.

Consequently, the critical difference thatany two means need to exceed to be signifi-cantly different is about 7.97:

4.53 � �[9.25 � (1/3 � 1/3)]/2

� 4.53 � �[9.25 � 0.667]/2

� 4.53 � �6.17/2 � 4.53 � �3.09

� 4.53 � 1.76 � 7.97

This value is the same as that for the Tukeya

or HSD (Honestly Significant Difference) testwhen the number of cases in each group isequal.

The means of the four groups are 6, 14, 12and 2 respectively. If we look at Table N.1which shows the difference between the fourmeans, we can see that the absolute differ-ences between groups 2 and 4 (14 � 2 � 12),groups 3 and 4 (12 � 2 � 10) and groups 2and 1 (14 � 6 � 8) exceed this critical differ-ence of 7.97 and so differ significantly.

Kirk (1995)

two-tailed level or test of statisticalsignificance: see correlation; directionlesstests; significant

two-way relationship: see bi-directionalrelationship

Type I or alpha error: the probability ofassuming that there is a difference or association

between two or more variables when there isnone. It is usually set at the 0.05 or 5% level.See also: Bonferroni test; Dunn’s test;Dunnett’s T3 test; Waller–Duncan t test

Type I, hierarchical or sequentialmethod in analysis of variance: a methodfor determining the F ratio for an analysisof variance with two or more factors withunequal or disproportionate numbers ofcases in the cells. In this situation the factorsand interactions are likely to be related andso share variance. In this method the factorsare entered in an order specified by theresearcher.

Cramer (2003)

Type II or beta error: the probabilityof assuming that there is no difference orassociation between two or more variableswhen there is one. This probability is gener-ally unknown. See also: Bonferroni test;Waller–Duncan t test

Type II, classic experimental or leastsquares method in analysis of variance:a method for determining the F ratio for ananalysis of variance with two or more factorswith unequal or disproportionate numbers ofcases in the cells. In this situation the factorsand interactions are likely to be related and soshare variance. In this method main effectsare adjusted for all other main effects (andcovariates) while interactions are adjusted forall other effects apart from interactions of ahigher order. For example, none of the vari-ance of a main effect is shared with anothermain effect.

Cramer (2003)

Type III, regression, unweighted meansor unique method in analysis of variance:a method for determining the F ratio for an

THE SAGE DICTIONARY OF STATISTICS174

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 174

Page 186: Dictionary of statistics

analysis of variance with two or more factorswith unequal or disproportionate numbers ofcases in the cells. In this situation the factorsand interactions are likely to be related andso share variance. In this method each effectis adjusted for all other effects (includingcovariates). In other words, the varianceexplained by that effect is unique to it and isnot shared with any other effect.

Cramer (2003)

Type IV method in analysis of variance:a method for determining the F ratio for ananalysis of variance with two or more factorswith unequal or disproportionate numbers ofcases in the cells. In this situation the factorsand interactions are likely to be related andso share variance. This method is similar toType III except that it takes account of cellswith no data.

TYPE IV METHOD IN ANALYSIS OF VARIANCE 175

Cramer Chapter-T.qxd 4/22/04 2:20 PM Page 175

Page 187: Dictionary of statistics

under-identified model: a term used instructural equation modelling to describe amodel in which there are not sufficient vari-ables to identify or to estimate all the para-meters or pathways in the model. See also:identification

unequal sample size: generally speaking,there is little difficulty created by usingunequal sample sizes in research althoughequal sample sizes should be regarded as anideal. Some statistical techniques have theiroptimum effectiveness (power) when samplesizes are equal. Statistically unequal samplesmay lead to less precise estimates of popula-tion values but this is not a matter of greatconcern as the effect is usually small. However,some statistical calculations are much moredifficult with unequal sample sizes. This isparticularly the case for the analysis of vari-ance. Of course, computer analyses of dataare not affected by the issue of computationallabour or difficulty. Nevertheless, researchersmay come across publications in which calcu-lations were done by hand. These probablywill have ensured equal sample sizes for noreason other than to ease the difficulty of thecomputation. There is little reason to useequal sample sizes when carrying out analy-ses with computers. It is essential to haveequal sample sizes when counterbalancingorder, though equal sample sizes are an idealto aim for when collecting data for maximumprecision in many statistical analyses. Seealso: analysis of variance

uni-directional relationship: a relation-ship between two variables in which thedirection of the causal relationship betweenthe two variables is thought to be one-way inthat one variable, A, is thought to cause theother variable, B, but the other variable, B, isnot thought to cause the first variable, A. Abi-directional relationship is one in which bothvariables are thought to influence each other.

uni-lateral relationship: see uni-directional relationship

unimodal distribution: a distribution ofscores which has one mode as opposed to two(bimodal) or more than one mode (multimodal).

unique method in analysis of variance:see Type III, regression, unweighted meansor unique method in analysis of variance

univariate: the analysis of single variableswithout reference to other variables. The com-monest univariate statistics are descriptivestatistics such as the mean, variance, etc. Thereare relatively few univariate inferential statis-tics other than those which compare a singlesample against a known population distribution.

U

Cramer Chapter-U.qxd 4/22/04 2:20 PM Page 176

Page 188: Dictionary of statistics

Examples would be the one-sample chi-square,the runs test, one sample t test, etc.

unplanned or post hoc comparisons: seeBonferroni test; multiple comparison tests;omnibus test; post hoc, a posteriori orunplanned tests

unrelated t test: see t test for unrelatedsamples

unrepresentative sample: see biasedsample

unstandardized partial regression coef-ficient or weight: an index of the size anddirection of the association between a predic-tor or independent variable and the criterionor dependent variable in a multiple regressionin which its association with other predictorsand the criterion has been controlled or par-tialled out. The direction of the association isindicated by the sign of the regression coeffi-cient in the same way as it is with a correlationcoefficient. No sign means that the associationis a positive one with high scores on the pre-dictor going with high scores on the criterion.A negative sign shows that the association is anegative one in which high scores on the pre-dictor go with low scores on the criterion.

The size of the regression coefficient indi-cates how much change there is in the originalscores of the criterion for each unit change ofthe scores in the predictor. For example, if theunstandardized partial regression coefficientwas 2.00 between the predictor of years ofeducation received and the criterion ofannual income expressed in units of 1000euros, then we would expect a person’sincome to increase by 2000 euros (2.00 � 1000 �2000) for every year of education received.

The size of the unstandardized partialregression coefficient will depend on the

units or the scale used to measure the predictor.If income was measured in the unit of a sin-gle euro rather 1000 euros, and if the sameassociation held between education andincome, then the unstandardized partialregression coefficient would be 2000.00 ratherthan 2.00.

Unstandardized partial regression coeffi-cients are used to predict what the likelyvalue of the criterion will be (e.g. annualincome) when we know the values of the pre-dictors for a particular case (e.g. their years ineducation, age, gender, and so on). If we areinterested in the relative strength of the asso-ciation between the criterion and two or morepredictors, we standardize the scores so thatthe size of these standardized partial regres-sion coefficients is restricted to vary from�1.00 to 1.00. A higher standardized partialregression coefficient, regardless of its sign,will indicate a stronger association.

An unstandardized partial regression coeffi-cient can be converted into its standardizedcoefficient by multiplying the unstandardizedpartial regression coefficient by the standarddeviation (SD) of the predictor and dividing itby the standard deviation of the criterion:

The statistical significance of the standard-ized and the unstandardized partial coeffi-cients is the same value and can be expressedin terms of either a t or an F value. The t testfor the partial regression coefficient is theunstandardized partial regression coefficientdivided by its standard error:

See also: regression equation

unweighted means method in analysisof variance: see Type III, regression,unweighted means or unique method inanalysis of variance

UNWEIGHTED MEANS METHOD IN ANALYSIS OF VARIANCE 177

standardizedpartialregression coefficient

unstandardized partialregressioncoefficient

� �predictor SDcriterion SD

unstandardized regression coefficientt �

standard error of the unstandardizedregression coefficient

Cramer Chapter-U.qxd 4/22/04 2:20 PM Page 177

Page 189: Dictionary of statistics

valid per cent: a term used primarily incomputer output to denote percentagesexpressed as a proportion of the total numberof cases minus missing cases due to missingvalues. It is the percentage based on theactual number of cases used by the computerin the calculation. See also: missing values

validity: the extent to which a measure can beshown to measure what it purports or intendsto measure. Various kinds of validity havebeen distinguished. See also: concurrentvalidity; construct validity; convergent valid-ity; discriminant validity; ecological validity;external validity; face validity; factorialvalidity; internal validity; predictive validity

variable: a characteristic that consists of twoor more categories (such as occupation ornationality) or values (such as age or intelli-gence score). The opposite of a variable is aconstant, which consists of a single value. Seealso: categorical variable; confounding vari-able; criterion variable; dependent variable;dichotomous variable;dummy variable; inde-pendent variable; intervening variable; oper-ationalization;population;predictor variable,subject variable; suppressor variable

variance or population variance: a mea-sure of the extent to which the values in a

population of scores vary or differ from themean value of that population. The larger thevariance is, the more the scores differ onaverage from the mean. The variance is thesum of the squared difference or deviationbetween the mean score and each individualscore which is divided by the number ofscores:

The variance estimate differs from the vari-ance in that the sum is divided by the numberof scores minus one. The variance estimate isusually provided by statistical packages suchas SPSS and is used in many parametric sta-tistical tests. The difference in size betweenthese two measures of variance is small,which becomes even smaller the larger thesample.

variance estimate, estimated popula-tion variance or sample variance: a mea-sure of the degree to which the values in asample vary or differ from the mean value ofthe sample. It is used to estimate the varianceof the population of scores from which thesample has been taken. The larger the varianceis, the more the scores differ from the mean.The variance estimate is the sum of thesquared difference or deviation between themean score and each individual score which isdivided by the number of scores minus one:

V

sum of (mean score � each score)2

number of scores

Cramer Chapter-V.qxd 4/22/04 2:21 PM Page 178

Page 190: Dictionary of statistics

VENN DIAGRAM 179

The variance estimate differs from the variancein that the sum is divided by one less thanthe number of scores rather than the numberof scores. This is done because the variance ofa sample is usually less than that of the pop-ulation from which it is drawn. In otherwords, dividing the sum by one less than thenumber of scores provides a less biased esti-mate of the population variance.

The variance estimate rather than varianceis used in many parametric statistics as thesetests are designed to make inferences aboutthe population of scores from which the sam-ples have been drawn. Many statistical pack-ages, such as SPSS, give the variance estimaterather than the variance. See also: dispersion,measure of; Hartley’s test; moment; stan-dard error of the difference in means; vari-ance or population variance

variance of estimate: a measure of thevariance of the scores around the regressionline in a simple and multivariate regression. Itcan be calculated with the following formula:

It is also called the mean square residual. Thevariance of estimate is used to work out thestatistical significance of the unstandardizedregression or partial regression coefficient.The square root of the variance of estimate isthe standard error of the estimate.

Pedhazur (1982)

varimax rotation, in factor analysis: awidely used method of the orthogonal rota-tion of the initial factors in a factor analysis in

which the variance of the loadings of thevariables within a factor are maximized. Highloadings on the initial factors are made higheron the rotated factor and low loadings on theinitial factors are made lower on the rotatedfactor. Differentiating the variables that loadon a factor in this way makes it easier to seewhich variables most clearly define that fac-tor and to interpret the meaning of that factor.Orthogonal rotation is where the factors areat right angles or unrelated to one another.

Venn diagram: a system of representing therelationship between subsets of information.The totality is represented by a rectangle.Within that rectangle are to be found circleswhich enclose particular subsets. The circlesmay not overlap, in which case there is nooverlap between the subsets. Alternatively,they may overlap totally or partially. Theamount of overlap is the amount of overlapbetween subsets. So, Figure V.1 shows allEuropeans. One circle represents Englishpeople and the other citizens of Britain. Thelarger circle envelops the smaller one becausebeing British includes being English. However,not all British people are English. See also:common variance

sum of (mean score � each score)2

number of scores � 1

sum of squared residuals or squared differences between the criterion’s predicted

and observed scoresnumber of cases − number of predictors −1

British

English

Figure V.1 Illustrating Venn diagrams

Cramer Chapter-V.qxd 4/22/04 2:21 PM Page 179

Page 191: Dictionary of statistics

Waller–Duncan t test: a post hoc ormultiple comparison test used for determin-ing which of three or more means differ sig-nificantly in an analysis of variance. This testis based on the Bayesian t value, which dependson the F ratio for a one-way analysis of vari-ance, its degrees of freedom and a measure ofthe relative seriousness of making a Type Iversus a Type II error. It can be used forgroups of equal or unequal size.

weighted mean: the mean of two or moregroups which takes account of or weights thesize of the groups when the sizes of one ormore of the groups differ. When the size ofthe groups is the same, there is no need toweight the group mean for size and the meanis simply the sum of the means divided bythe number of groups. When the size of thegroups differ, the mean of each group is mul-tiplied by its size to give the total or sum forthat group. The sum of each group is addedtogether to give the overall or grand sum.This grand sum is then divided by the totalnumber of cases to give the weighted mean.

Take the means of the three groups inTable W.1. The unweighted mean of the threegroups is 6 [(4 � 5 � 9)/3 � 6]. If the threegroups were of the same size (say, 10 each),there would be no need to weight them as sizeis a constant and the mean is 6 [(40 � 50 �90)/30 � 6]. If the sizes of the groups differ, asthey do here, the weighted mean is higher at8.33 than the unweighted mean of 6 becausethe largest group has the highest mean.

Wilcoxon matched-pairs signed-rankstest: a non-parametric test used to determinewhether the scores from two samples thatcome from the same or similar cases are sig-nificantly different from each other. The dif-ferences between pairs of scores are ranked inorder of size, ignoring the sign or direction ofthose differences. The ranks of the differenceswith the same sign are added together. Ifthere are no differences between the scores ofthe two samples, the sum of positive rankeddifferences should be similar to the sum ofnegative ranked difference. The bigger thedifferences between the positive and negativeranked differences, the more likely the twosets of scores differ significantly from eachother.

Wilcoxon rank-sum or Wilcoxon–Mann–Whitney W test: see Mann–Whitney U test

W

Table W.1 Weighted and unweightedmean of three groups

Groups Means Size Sum

1 4 10 402 5 20 1003 9 40 360Sum 18 500Number 3 60 60Mean 6 8.33

Cramer Chapter-W.qxd 4/22/04 2:21 PM Page 180

Page 192: Dictionary of statistics

WITHIN-SUBJECTS DESIGN 181

Wilks’s lambda or ��: a test used in multi-variate statistical procedures such as canonicalcorrelation, discriminant function analysis andmultivariate analysis of variance to determinewhether the means of the groups differ on adiscriminant function or characteristic root.It varies from 0 to 1. A lambda of 1 indicatesthat the means of all the groups have the samevalue and so do not differ. Lambdas close to 0signify that the means of the groups differ. Itcan be transformed as a chi-square or an Fratio. It is the most widely used of several suchtests which include Hotelling’s trace criterion,Pillai’s criterion and Roy’s gcr criterion.

When there are only two groups, the Fratios for Wilks’s lambda, Hotelling’s trace,Pillai’s criterion and Roy’s gcr criterion arethe same. When there are more than twogroups, the F ratios for Wilks’s lambda,Hotelling’s trace and Pillai’s criterion maydiffer slightly. Pillai’s criterion is said to bethe most robust when the assumption of thehomogeneity of the variance–covariancematrix is violated.

Tabachnick and Fidell (2001)

within-groups variance: a term used instatistical tests of significance for related/correlated/repeated-measures designs. Insuch designs participants (or matched groupsof participants) contribute a minimum of twoseparate measures for the dependent vari-able. This means that the variance on thedependent variable can be assessed in termsof three sources of variance:

1 That due to the influence of the indepen-dent variables (main effects).

2 That due to the idiosyncratic characteris-tics of the participants. Since they aremeasured twice, then similarities betweenscores for the same individuals can beassessed. The variation on the dependentvariable attributable to these similaritiesover the two measures is known as thewithin-groups variance. So there is a sensein which within-groups variation oughtto be considered as variation caused bywithin-individuals factors.

3 That due to unsystematic, unassessed factorsremaining after the two above sources ofvariation, which is called the error vari-ance or sometimes the residual variance.

Another way of conceptualizing it is thatwithin-groups variance is the variation attri-butable to the personal characteristics of theindividual involved – a sort of mélange oftheir personality, intellectual and social char-acteristics which might influence scores onthe dependent variable. For example, if thedependent variable were verbal fluency thenwe would expect intelligence and educationto affect these scores in the same way for anyindividual each time verbal fluency is assessedin the study over and above the effects of thestipulated independent variables. There is noneed to know just what it is that affects verbalfluency in the same way each time just solong as we can estimate the extent to whichverbal fluency is affected.

The within-groups variance in uncorre-lated or unrelated designs simply cannot beassessed since there is just one measure of thedependent variable from each participant insuch designs. So the influence of the within-groups factors cannot be separated from errorvariance in such designs.

within-subjects design: the simplest andpurest example of a within-subjects design iswhere a single case (participant or subject) isstudied on two or more occasions. Changesbetween two occasions would form the basisof the analysis. More generally, changes inseveral participants are studied over aperiod of time. Such designs are also knownas related designs, related subjects designs orcorrelated subjects designs. They have themajor advantage that since the same indivi-dual is measured on repeated occasions, manyfactors such as intelligence, class, personality,and so forth are held constant. Such designscontrast with between-subjects designs inwhich different groups of participants arecompared with each other (unrelated designsor uncorrelated designs). See also: between-subjects design; carryover or asymmetricaltransfer effect; mixed design

Cramer Chapter-W.qxd 4/22/04 2:21 PM Page 181

Page 193: Dictionary of statistics

x axis: the horizontal axis or abscissa of agraph. See also: abscissa

X

Cramer Chapter-XYZ.qxd 4/22/04 2:22 PM Page 182

Page 194: Dictionary of statistics

y axis: the vertical axis or ordinate in a graph.See also: ordinate

Yates’s correction: a continuity correction.Small contingency (cross-tabulation) tablesdo not fit the theoretical and continuous chi-square distribution particularly well. Yates’scorrection is sometimes applied to 2 � 2 con-tingency tables to make some allowance forthis and some authorities recommend its uni-versal application to such small tables. Othersrecommend its use when there are cells with

an (observed) frequency of less than five. It isno longer a common practice to adopt it andit is possibly less misleading not to make anyadjustment. If the correction has been applied,the researcher should make this clear. Thecorrection is conservative in that the nullhypothesis is favoured slightly over thehypothesis.

yea-saying: see acquiescence

Y

Cramer Chapter-XYZ.qxd 4/22/04 2:22 PM Page 183

Page 195: Dictionary of statistics

z score: another name for a standardizedscore. A z score is merely the differencebetween the score and the mean score of thesample of scores then divided by the stan-dard deviation of the sample of scores:

The computation is straightforward. z scoresmay have positive or negative values. A pos-itive value indicates that the score is abovethe sample mean, a negative value indicatesthe score is below the sample mean. Theimportance of z scores lies in the fact that thedistribution of z has been tabulated and isreadily available in tables of the distributionof z. The z distribution is a statistical distribu-tion which indicates the relative frequency ofz scores. That is, the table indicates the num-bers of z scores which are zero and above, onestandard deviation above the mean andabove, and so forth. The distribution is sym-metrical around the mean. The only substan-tial assumption is the distribution is normalin shape. Technically speaking, the z distribu-tion has a mean of 0 and a standard deviationof 1.

To calculate the z score of the score 11 in asample of three scores consisting of 5, 9 and11, we first need to calculate the mean of thesample � 25/3 � 8.333. The standard devia-tion of these scores is obtained by squaringeach score’s deviation, summing them andthen finding their average, and then finally

taking their square root to give a value for thestandard deviation of 2.494:

So the standard score (z score) of the score 11 is

Standardized or z scores have a number offunctions in statistics and occur quite com-monly. Among their advantages are that:

1 As variables can be expressed on a com-mon unit of measurement, this greatly

Z

z score � score � sample mean

standard deviation �� (5 − 8.333)2 � (9 � 8.333)2 � (11 � 8.333)2

3

���3.3332 � 0.6672 � 2.6672

3

� � 11.109 � 0.445 � 7.1133

� � 18.6673

�� 6.222� 2.494

11 � 8.333�

2.667 2.494 2.494

= 1.07

standard deviation � ���X � X� �2

N

Cramer Chapter-XYZ.qxd 4/22/04 2:22 PM Page 184

Page 196: Dictionary of statistics

facilitates the addition of differentvariables in order to achieve a compositeof several variables.

2 As z or standardized scores refer to a dis-tribution of known characteristics, therelative standing of a person’s scoreexpressed as a z score is easily calculated.For example, if a person’s standardizedscore is 1.96 this means that the score is inthe top 2.5% of scores. This is simplyobtained from the table of the distributionof z scores.

See also: correlation; standardized scores

z test: determines whether Pearson’s corre-lation from two unrelated samples differs sig-nificantly. When the correlation in thepopulation is not 0, the sampling distributionof the correlation is not approximately nor-mal. It becomes progressively skewed as thepopulation correlation approaches ±1.00. As aconsequence it is difficult to estimate its stan-dard error.

To overcome this problem Pearson’s corre-lation is transformed into a z correlationusing the following formula, where loge

stands for the natural logarithm and r forPearson’s correlation:

zr � 0.5 � loge [(1 � r)/(1 � r)]

This transformation makes the sampling dis-tribution approximately normal and allowsthe standard error to be calculated with theformula

1√ n – 3

where n is the size of the sample.The z test is simply a ratio of the difference

between the two z correlations to the stan-dard error of the two samples:

z has to be 1.65 or more to be statistically sig-nificant at the one-tailed 0.05 level and 1.96 ormore at the two-tailed 0.05 level.

Z2* test: determines whether Pearson’s

correlation between two variables differs sig-nificantly from that between two other vari-ables from the same sample. For example, theone-year test-retest or autocorrelation forrelationship satisfaction may be comparedwith that for relationship conflict. The statis-tical significance of this test is the same asthat for the z test.

Cramer (1998); Steiger (1980)

z to r transformation: the change of a zcorrelation into Pearson’s product momentcorrelation. This can be done by looking upthe value in the appropriate table or using thefollowing formula:

r � (2.7182 � zr – 1)/(2.7182 � zr � 1)

The values of the z correlation vary from 0to about ±3 (though, effectively, infinity), asshown in Table Z.1, and the values ofPearson’s correlation from 0 to ±1.

z transformation: see Fisher’s ztransformation

zero-order correlation: a correlationbetween two variables in which one or morevariables are not partialled out. A first-orderpartial correlation is one in which one othervariable has been partialled out, a second-order

ZERO-ORDER CORRELATION 185

z �z1 � z2

�[1/(n1 − 3)] + [1/(n2 − 3)]

Table Z.1 z to r transformation

z r z r

0.00 0.00 0.80 0.660.10 0.10 0.90 0.720.20 0.20 1.00 0.760.30 0.29 1.50 0.910.40 0.38 2.00 0.960.50 0.46 2.50 0.990.60 0.54 3.00 1.000.70 0.60

Cramer Chapter-XYZ.qxd 4/22/04 2:22 PM Page 185

Page 197: Dictionary of statistics

THE SAGE DICTIONARY OF STATISTICS186

partial correlation one in which two variableshave been partialled out, and so on.

zero point: the score of 0 on the measure-ment scale. This may not be the lowest possible

score on the variable. Absolute zero is thelowest point on a scale and the lowest possiblemeasurement. So absolute zero temperature isthe lowest temperature that can be reached.This concept is not in general of much practi-cal significance in most social sciences.

Cramer Chapter-XYZ.qxd 4/22/04 2:22 PM Page 186

Page 198: Dictionary of statistics

Baltes, Paul B. and Nesselroade, John R. (1979)‘History and rationale of longitudinal research’,in John R. Nesselroade and Paul B. Baltes (eds),Longitudinal Research in the Study of Behavior andDevelopment. New York: Academic Press.pp. 1–39.

Cattell, Raymond B. (1966) ‘The scree test for thenumber of factors’, Multivariate BehavioralResearch, 1: 245–276.

Cohen, Jacob and Cohen, Patricia (1983) AppliedMultiple Regression/Correlation Analysis for theBehavioral Sciences, 2nd edn. Hillsdale, NJ:Lawrence Erlbaum.

Cook, Thomas D. and Campbell, Donald T. (1979)Quasi-Experimentation. Chicago: Rand McNally.

Cramer, Duncan (1998) Fundamental Statistics forSocial Research. London: Routledge.

Cramer, Duncan (2003) Advanced Quantitative DataAnalysis. Buckingham: Open University Press.

Glenn, Norval D. (1977) Cohort Analysis. BeverlyHills, CA: Sage.

Hays, William L. (1994) Statistics, 5th edn. New York:Harcourt Brace.

Howell, David C. (2002) Statistical Methods forPsychology, 5th edn. Belmont, CA: DuxburyPress.

Howitt, Dennis and Cramer, Duncan (2000) FirstSteps in Research and Statistics. London:Routledge.

Howitt, Dennis and Cramer, Duncan (2004) AnIntroduction to Statistics in Psychology: A CompleteGuide for Students, Revised 3rd edn. Harlow:Prentice Hall.

Howson, Colin and Urbach, Peter (1989) ScientificReasoning: The Bayesian Approach. La Salle, IL:Open Court.

Huitema, Bradley E. (1980) The Analysis ofCovariance and Alternatives. New York: Wiley.

Hunter, John E. and Schmidt, Frank L. (2004) Methodsof Meta-Analysis, 2nd edn. Newbury Park, CA: Sage.

Kenny, David A. (1975) ‘Cross-lagged panelcorrelation: A test for spuriousness’, PsychologicalBulletin, 82: 887–903.

Kirk, Roger C. (1995) Experimental Design, 3rd edn.Pacific Grove, CA: Brooks/Cole.

McNemar, Quinn (1969) Psychological Statistics,4th edn. New York: Wiley.

Menard, Scott (1991) Longitudinal Research.Newbury Park, CA: Sage.

Novick, Melvin R. and Jackson, Paul H. (1974)Statistical Methods for Educational and Psycho-logical Research. New York: McGraw-Hill.

Nunnally, Jum C. and Bernstein, Ira H. (1994)Psychometric Theory, 3rd edn. New York:McGraw-Hill.

Owusu-Bempah, Kwame and Howitt, Dennis(2000) Psychology beyond Western Perspectives.Leicester: British Psychological Society.

Pampel, Fred C. (2000) Logistic Regression: A Primer.Newbury Park, CA: Sage.

Pedhazur, Elazar J. (1982) Multiple Regression inBehavioral Research, 2nd edn. Fort Worth, TX:Harcourt Brace Jovanovich.

Pedhazur, Elazar J. and Schmelkin, Liora P. (1991)Measurement, Design and Analysis. Hillsdale, NJ:Lawrence Erlbaum.

Rogosa, David (1980) ‘A critique of cross-laggedcorrelation’, Psychological Bulletin, 88: 245–258.

Siegal, Sidney and Castellan N. John, Jr (1988)Nonparametric Statistics for the Behavioral Sciences,2nd edn. New York: McGraw-Hill.

Steiger, James H. (1980) ‘Tests for comparing ele-ments of a correlation matrix’, PsychologicalBulletin, 87: 245–251.

Stevens, James (1996) Applied Multivariate Statisticsfor the Social Sciences, 3rd edn. Hillsdale, NJ:Lawrence Erlbaum.

Tabachnick, Barbara G. and Fidell, Linda S. (2001)Using Multivariate Statistics, 4th edn. Boston:Allyn and Bacon.

Some Useful Sources

Cramer-Sources.qxd 4/22/04 2:23 PM Page 187

Page 199: Dictionary of statistics

Tinsley, Howard E. A. and Weiss, David J. (1975)‘Interrater reliability and agreement of subjectivejudgments’, Journal of Counselling Psychology, 22:358–376.

Toothaker, Larry E. (1991) Multiple Comparisons forResearchers. Newbury Park, CA: Sage.

THE SAGE DICTIONARY OF STATISTICS188

Cramer-Sources.qxd 4/22/04 2:23 PM Page 188