psychometrics lecture 2.ppt

58
1 Practice question What was Stern’s definition of an IQ? In your answer, explain the concept of mental age. What was the major drawback with the measure that Stern proposed?

Upload: sens-march

Post on 03-Jan-2016

81 views

Category:

Documents


2 download

DESCRIPTION

.

TRANSCRIPT

Page 1: Psychometrics Lecture 2.ppt

1

Practice question

What was Stern’s definition of an IQ? In your answer, explain the concept of mental age. What was the major drawback with the measure that Stern proposed?

Page 2: Psychometrics Lecture 2.ppt

2

Points about short answers

• Write in COMPLETE SENTENCES: phrases or notes are insufficient.

• When you are defining a term, avoid using the term itself in its own definition: “A test is reliable when it can be relied upon …”.

• Decide upon the most important points and get those down first. If any time remains, add more detail.

Page 3: Psychometrics Lecture 2.ppt

3

Main points

The three essential ingredients of your answer are as follows:

1. Explanation of MENTAL AGE. 2. STERN’S DEFINITION. Define the IQ verbally, as well

as giving the formula. 3. The PROBLEM (psychometric mental age doesn’t

increase beyond 15, which is problematic for the measurement of adult intelligence).

Anything else you may be able to add is a luxury. Remember, this question hasn’t asked for a solution to the problem, so you don’t have to give one. You need only show the examiner that you understand WHY there’s a problem.

Page 4: Psychometrics Lecture 2.ppt

4

Mental age

• A person’s MENTAL AGE is the chronological age at which most children can perform at the same level.

• So if a 25-year-old man performs at the levels of typical 9-year-olds, his CHRONOLOGICAL AGE is 25, but his MENTAL AGE is 9.

Page 5: Psychometrics Lecture 2.ppt

5

The Intelligence Quotient (IQ)

• In 1912, the German psychologist Stern proposed the INTELLIGENCE QUOTIENT (IQ), which he defined as the ratio of a person’s mental age to their chronological age, multiplied by 100.

• The formula is given below.

Page 6: Psychometrics Lecture 2.ppt

6

The problem with mental age

• The problem is that, since mental age does not increase beyond 15 years, a person’s IQ as defined by Stern will progressively diminish with each year that passes, EVEN IF THE PERSON CONTINUES TO PERFORM AT EXACTLY THE SAME LEVEL.

Page 7: Psychometrics Lecture 2.ppt

7

Lecture 2

RELIABILITY

Page 8: Psychometrics Lecture 2.ppt

8

The first intelligence test

• Last week, I described the construction of the first intelligence test by Binet and Simon.

• It’s not easy to construct a psychological test: it took Binet and Simon several years to do it.

• For example, we saw that, if we are to have a really useful test, we need NORMATIVE DATA, or NORMS.

• The norms provide us with the comparison we need to assess the performance of the child we are testing.

• This week, I am going to look more closely at two essential characteristics of a good test.

Page 9: Psychometrics Lecture 2.ppt

9

Two essential qualities

For a test to be useful, it must have two essential qualities:

1.It must be RELIABLE;

2.It must be VALID. I shall briefly consider the second of these properties first.

Page 10: Psychometrics Lecture 2.ppt

10

Validity

• A test is a MEASURING INSTRUMENT. • A test is said to be VALID if it measures what it

is supposed to measure. • Applicants for a post in senior management may

be given a psychometric test of leadership capability.

• But do a candidate’s responses really indicate his or her suitability for the post?

• This is a question about the VALIDITY of a test.

Page 11: Psychometrics Lecture 2.ppt

11

Validity…

When we say the test is VALID, we mean that a person’s responses to the questions in the test really do tell us something about how that person would perform in a real situation requiring managerial capability.

Page 12: Psychometrics Lecture 2.ppt

12

Are intelligence tests valid?

• Binet assembled items that differentiated between typical children at various ages.

• He was trying to measure general scholastic aptitude.

• But are children who achieve a certain mental age on his test really capable of learning school subjects at a level typical of children of that chronological age?

Page 13: Psychometrics Lecture 2.ppt

13

Age norms: Reproducing a figure from memory

• A 5-year-old can copy a square from memory, but not a diamond or a cylinder.

• An 8-year-old can copy a square and a diamond, but not a cylinder.

• An 11-year-old can copy all three figures.

Page 14: Psychometrics Lecture 2.ppt

14

Validity …

• According to Binet, the child who can draw the cone from memory has a greater scholastic aptitude than a child who can not.

• But is that true? Is a child’s ‘mental age’, as measured by Binet’s test, really a measure of scholastic aptitude? Is the child who can draw the cone really better at school subjects (such as French, geometry or chemistry) than a child who can not?

• These are questions about the VALIDITY of a psychological test. Does it measure the hypothetical quality that it is supposed to measure?

Page 15: Psychometrics Lecture 2.ppt

15

Reliability

• If a test is to be VALID, it must, in the first place, be RELIABLE.

• A RELIABLE test is one that gives CONSISTENT RESULTS if taken by the same participants on different occasions or when they are tested by different examiners.

Page 16: Psychometrics Lecture 2.ppt

16

Reliability …

• A reliable test produces CONSISTENT results.

• If John scored at the 70th percentile on the first occasion of testing, he would, if tested on other occasions, score at similar percentile levels.

• His scores when tested on subsequent occasions are indicated by the dashed lines at the 69th, the 68th, the 73rd and the 72nd percentiles.

• He’s always somewhere near the 70th percentile.

• This is a RELIABLE test.

Page 17: Psychometrics Lecture 2.ppt

17

Unreliability …

• An unreliable test gives INCONSISTENT results.

• John scores at the 70th percentile on the first occasion.

• On subsequent occasions, however, he scores at the 20th, the 40th, the 45th, the 75th and the 60th percentiles (not necessarily in that order).

• A test showing this sort of inconsistency is an UNRELIABLE test.

Page 18: Psychometrics Lecture 2.ppt

18

An elastic tape measure

• Suppose you were to try to measure a set of objects with an elastic tape measure and repeat this operation on the same objects on several occasions.

• Each time such a tape measure was used, it would be stretched to a different extent, EVEN WHEN YOU WERE MEASURING THE SAME OBJECT.

• So the dimensions of the same objects would be recorded as having different values on different occasions.

• Our hypothetical tape measure would be useless. • AN UNRELIABLE TEST IS LIKE AN ELASTIC TAPE

MEASURE.

Page 19: Psychometrics Lecture 2.ppt

19

The right distribution

• Implicit in this concept of consistency is the assumption that we have a measure that SPREADS PEOPLE OUT.

• The assumption is that what we are measuring is a VARIABLE. If so, it must have a proper DISTRIBUTION and people’s scores on our test must reflect this.

• A normal distribution is ideal.

Page 20: Psychometrics Lecture 2.ppt

20

Consistency without reliability • It is insufficient to say that a test is

reliable if people get the same scores on different occasions.

• When a test is too difficult, people will score similarly on different occasions (0), but the scores do not have a satisfactory distribution. This is known as a FLOOR EFFECT.

• The same problem obtains when a test is too easy: this is known as a CEILING EFFECT.

• These are not truly reliable tests, because the scores do not differentiate among those tested.

• THE SCORES MUST HAVE AN APPROPRIATE DISTRIBUTION.

Page 21: Psychometrics Lecture 2.ppt

21

Definition of reliability

1. The scores must have a distribution that DIFFERENTIATES among those tested.

2. A test is said to be reliable if, given that the scores display the necessary variability and DISTRIBUTION, individuals retain their relative standing in the distribution from occasion to occasion of testing, and when tested by different administrators. A child should score at similar PERCENTILES from occasion to occasion.

3. A reliable test thus gives CONSISTENT RESULTS.

Page 22: Psychometrics Lecture 2.ppt

22

Relationship between reliability and validity

• Reliability and validity are two DIFFERENT PROPERTIES.

• Reliability, however, is a NECESSARY condition for validity – otherwise you have an elastic tape measure.

• Binet’s age norms attest to the reliability of his intelligence test.

• But there remains the question of whether mental age really reflects scholastic aptitude. That is a question about VALIDITY.

• Reliability is not a SUFFICIENT condition for validity. • I shall return to validity later on.

Page 23: Psychometrics Lecture 2.ppt

23

Measuring the reliability of a test

• There are several ways of measuring reliability.

• They all make use of the Pearson correlation.

• There are situations in which some are applicable but not others.

• There is no ‘best’ method for all purposes.

Page 24: Psychometrics Lecture 2.ppt

24

Composite or aggregate scores

• An important consideration in the determination of reliability is whether the test leads to a single score or the final ‘score’ is actually an aggregate of scores on several different items.

• The DIGIT SPAN test produces a single score. Each person attempts to reproduce successively larger lists of digits until a maximum is reached.

• An INTELLIGENCE TEST, which contains many items, produces an aggregate score.

• Most personality tests yield aggregate scores.

Page 25: Psychometrics Lecture 2.ppt

25

Personality tests

• Many tests of personality have several subsections, each of which measures a distinct aspect of personality.

• So an overall aggregate score may be a sum of several scores which are themselves aggregates of scores on the items in the various subsections of the test.

• Cattell’s personality test produces scores on 16 subscales, each supposedly measuring one of 16 personality factors.

Page 26: Psychometrics Lecture 2.ppt

26

Short-term or working memory

• Brain damage often results in impairment of various memory functions, both short-term and long term.

• Short term retention of verbal and nonverbal material is thought to be delivered by different functions.

• In Alan Baddeley’s theory of working memory, verbal working memory is served by the PHONOLOGICAL LOOP; non-verbal working memory is served by the VISUO-SPATIAL SKETCHPAD – and perhaps the CENTRAL EXECUTIVE as well.

Page 27: Psychometrics Lecture 2.ppt

27

The digit span and Corsi Blocks tests

• The DIGIT SPAN test is one measure of verbal working memory.

• The CORSI BLOCKS test is a measure of non-verbal working memory. Both tests are widely used in the clinical context, when doctors or psychologists are testing for loss of memory function in brain-damaged patients.

• Until recently, the Corsi test was the more widely used measure of nonverbal working memory.

Page 28: Psychometrics Lecture 2.ppt

28

The Corsi Blocks test

• The tester and the patient sit opposite one another at a table.

• On the table, is a board about the size of a chessboard, upon which some wooden cubes are fixed in a haphazard arrangement.

• On the tester’s side, the blocks are numbered, so that they can be touched in predefined sequences.

• The experimenter taps a sequence of the cubes.

• The patient is asked to tap the same cubes in the same order.

Page 29: Psychometrics Lecture 2.ppt

29

The Corsi span

• The tester taps the blocks in progressively longer sequences, until the patient cannot reproduce the sequence of taps.

• The Corsi span is the longest sequence the patient can reproduce.

• The entire procedure results in a single score, the Corsi span.

Page 30: Psychometrics Lecture 2.ppt

30

The Visual Patterns test

• Arguably, the Corsi Blocks test taps both visual storage and SPATIAL MEMORY, which has a nonvisual element consisting of memories for felt body position.

• The VISUAL PATTERNS TEST (VPT) is intended to tap purely VISUAL nonverbal working memory, excluding the nonvisual spatial element.

Page 31: Psychometrics Lecture 2.ppt

31

Visual patterns

• You can build up increasing complex patterns by increasing the size of the grid.

• The lower pattern is, of course, much more difficult to reproduce from memory than would be one in a smaller grid.

Page 32: Psychometrics Lecture 2.ppt

32

Obtaining the visual span • The patient is shown a grid, some

of whose squares are blackened. • After a fixed inspection period, the

grid is removed and the patient is asked to reproduce the pattern by marking in pencil with crosses the corresponding squares in a blank grid the same size as the original.

• Increasingly large patterned grids are presented, until the patient can no longer reproduce the exact pattern of the black and white squares.

• The patient’s VISUAL SPAN is the size of the largest pattern he or she is able to reproduce.

Page 33: Psychometrics Lecture 2.ppt

33

Age norms

• Norms are available for both the Corsi Blocks and Visual Patterns tests.

• Both Corsi Blocks and Visual Patterns spans decrease noticeably with age.

• To assess whether someone in their seventies has sustained cognitive impairment, that person’s score must be related to the distribution of scores of people in that age group.

Page 34: Psychometrics Lecture 2.ppt

34

Methods of determining reliability

1. Test-retest.

2. Parallel (or equivalent) forms

3. Split-half.

Page 35: Psychometrics Lecture 2.ppt

35

1. Test-retest reliability

• Give the test to a large number of people.• Give the test again to the same people.• You will have a bivariate data set

comprising the scores of each person on the two tests.

• Calculate the Pearson correlation r between Score on the FIRST occasion and Score on the SECOND occasion.

• The value of r should be at least .75.

Page 36: Psychometrics Lecture 2.ppt

36

2. Parallel forms.

• Construct two equivalent forms of the same test, Form A and Form B. Ensure that people score at similar levels on the two forms. This has been done with the Visual Span test.

• Test each of a large sample of people with both Form A and Form B.

• Let Variable A contain their scores on Form A of the test; Variable B contains their scores on Form B of the test. This is a bivariate data set.

• Calculate the Pearson correlation between A and B. The correlation should be at least .75.

Page 37: Psychometrics Lecture 2.ppt

37

3. Split-half reliability: Requirements

• The test must consist of several questions and yield a composite score.

• The items must be divisible into two EQUIVALENT sub-groups, as by taking the ODD-NUMBERED and EVEN-NUMBERED items.

• There shouldn’t be any systematic difference in the difficulty or nature of the items in the two sub-groups.

Page 38: Psychometrics Lecture 2.ppt

38

The method

• Each person can now be given two totals:– 1. A total on the odd-numbered items.– 2. A total on the even-numbered items.

• You will now have a bivariate data set comprising the Odd and Even totals achieved by all the people tested.

• Calculate the Pearson correlation to determine the split-half reliability.

• The value of r should be at least .75.

Page 39: Psychometrics Lecture 2.ppt

39

An example

• Suppose that a test comprises ten items, each item being marked 0 or 1, for a wrong and a right answer, respectively.

• The score a person finally gets on the test is an aggregate of the ones and zeros over all ten items.

• A person’s total score, therefore, can vary from 0 to 10.

Page 40: Psychometrics Lecture 2.ppt

40

Obtaining the odd and even totals

• In the table, the bottom two rows contain the Odd and Even half-totals for three people.

• Fred did best, Joe did worst and Mary’s score is intermediate.

Page 41: Psychometrics Lecture 2.ppt

41

The split-half reliability

• Each of the people tested has now two ‘scores’: and ‘even’ score; an ‘odd’ score.

• We have a bivariate data set and we can calculate a Pearson correlation.

• The value of this correlation is 0.86.

• The SPLIT-HALF measure of reliability is the Pearson correlation between the Odd and Even subtotals.

Page 42: Psychometrics Lecture 2.ppt

42

The scatterplot

• The scatterplot is indicative of the assumed linear relationship between scores on the odd and even items in the test.

• The split-half reliability is 0.86.

Page 43: Psychometrics Lecture 2.ppt

43

Test-retest: Disadvantages

• On a test of attitudes or prejudice, memory for previous answers would make the test seem more reliable than it really is.

• The shorter the interval between the first and second testing, the stronger the memory effect is likely to be.

• There is therefore uncertainty about how long to make the interval between the two sessions.

• The test-retest method may OVERSTATE a test’s reliability.

Page 44: Psychometrics Lecture 2.ppt

44

Parallel forms method: Advantages

• The reliability sample is tested twice in one session, once with Form A and once with Form B. So there can be no uncertainty about the length of the interval between testing sessions.

• Different items are used for Form A and Form B, greatly reducing the possibility practice or memory effects.

Page 45: Psychometrics Lecture 2.ppt

45

Parallel forms: Disadvantages

• There could still be some practice effect, because the items in Form A and Form B will be similar.

• If two tests are given during a single session, it would be wise to vary the order of presentation of Forms A and B (ie. counterbalance the order) among the participants.

• The parallel forms method produces LOWER ESTIMATES of reliability than does the test-retest method.

• This is because you are SAMPLING from a larger pool of possible items; and sampling implies SAMPLING ERROR. The VARIANCE of the scores is increased, which REDUCES the correlation.

Page 46: Psychometrics Lecture 2.ppt

46

The split-half method: Advantages

• You only need only test your reliability sample once.

• You don’t need go to the trouble of constructing Form A and Form B of the test and establishing that they are equivalent.

Page 47: Psychometrics Lecture 2.ppt

47

Split-half reliability: Disadvantages

• The method isn’t workable with tests like the Corsi or the Visual Patterns, which produce a single score.

• Essentially, you are producing two tests, each of which is half as long as the original one. Unfortunately, SHORTER TESTS ARE LESS RELIABLE THAN LONGER TESTS.

• The split-half method produces LOWER ESTIMATES of reliability than either the test-retest or parallel forms methods. In fact the method produces the LOWEST reliability estimates of the three methods I have described.

Page 48: Psychometrics Lecture 2.ppt

48

Effect of the length of a test

• Longer tests are more reliable than shorter ones. • For example, a vocabulary test with 50 items is more

reliable than one with 10 items. • This is because the words in a test are a SAMPLE from

a much larger pool of possible words.• Sampling entails SAMPLING ERROR or VARIABILITY. • We have seen that the statistics of small samples vary

more than the statistics of large samples. In the same way, a person’s score on a short vocabulary test would show more variation than scores on a longer test.

• This is RANDOM variation and reduces the reliability of the test.

Page 49: Psychometrics Lecture 2.ppt

49

True and random error components

• A person’s vocabulary score consists of a TRUE component (true relative size of voculary) and a RANDOM component.

• The random (or ERROR) component is contributed to by the element of luck in the selection of words for the test.

Page 50: Psychometrics Lecture 2.ppt

50

A score’s components

Page 51: Psychometrics Lecture 2.ppt

51

Good variance, bad variance

• We need variance to achieve the right distribution and spread people’s scores.

• But there’s good variance and bad variance.

• Good variance is determined by variation in the true component of people’s scores.

• Bad variance is variation in the random or error component.

Page 52: Psychometrics Lecture 2.ppt

52

Longer tests

• Tests with more items produce scores with relatively greater true components and relatively less error.

• Increase the true component of the total score by HAVING MORE ITEMS IN YOUR TEST.

Page 53: Psychometrics Lecture 2.ppt

53

Low average inter-item correlation

• A test consists of 30 items.

• The average correlation among the various pairs of items is only 0.2 .

• The test yields a total score, which is the sum of the participant’s scores on all the items.

• What is the reliability of the test?

Page 54: Psychometrics Lecture 2.ppt

54

The Spearman-Brown formula

• The number of items is 30.

• The average correlation between pairs of items is 0.2.

• Substituting in the formula, we find that the reliability of the TOTAL SCORE on all 30 items is 0.88.

Page 55: Psychometrics Lecture 2.ppt

55

Summary

• Reliability, in the technical sense of the term, was defined.

• Three methods of determining reliability were described: (1) the TEST-RETEST method;(2) the PARALLEL FORMS method; (3) the SPLIT-HALF method.

• Each method has its own advantages, disadvantages and applicability.

Page 56: Psychometrics Lecture 2.ppt

56

Summary …• Reliability is a necessary, but not a sufficient,

condition for validity. • The reliability of intelligence tests, field

dependence tests (Rod-and-frame, Embedded Figures) and personality tests (Introversion-Extraversion, Neuroticism-Stability) is very high, often .9 or greater.

• This fact in itself, however, does not demonstrate the VALIDITY of these tests.

• Next week, I shall turn to the ways in which psychometricians attempt to validate their tests.

Page 57: Psychometrics Lecture 2.ppt

57

Short question

• What, in the context of mental testing, is meant by the RELIABILITY and VALIDITY of a test?

• Can a test be valid without being reliable?

• Can a test be reliable without being valid?

• Describe two approaches to the measurement of reliability, explaining the advantages and disadvantages of each.

Page 58: Psychometrics Lecture 2.ppt

58

Practice question

What is a DEVIATION IQ? In your answer, explain how a deviation IQ differs from IQ as defined by Stern. What is the advantage of a deviation IQ?