validity: conceptual issues furr & bacharach chapter 8

61
Validity: Conceptual Issues Furr & Bacharach Chapter 8

Upload: blake-hampton

Post on 13-Dec-2015

227 views

Category:

Documents


1 download

TRANSCRIPT

Validity: Conceptual Issues

Furr & BacharachChapter 8

Contrasting Reliability & Validity

Both fundamental to a sophisticated understanding of psychometrics

Must have a clear understanding of the relationship between the two

Definitions – notice differences Reliability

Degree to which differences in test scores reflect differences among people in their levels of the trait that affects those scores, whatever that trait may be

Quantitative property of the test scores Validity

Tied to interpretation of test score Tied to theory and implication of scores

LINK Validity requires reliability

Stable traits (Intelligence & IQ) Measure at two point in time, scores should be

stable across time (test-retest reliability) If not, the test cannot be a valid test of IQ

States (Depression & BDI) If poor internal consistency, can’t be valid

Reliability does not imply validity Stable Trait (Autism & AQ)

May have excellent test-retest reliability or good internal consistency, but may not be interpreted in a valid manner

Iowa story Don’t want to hire people who might

abuse clients anymore!!! Personality tests…

Is there a test that measures the construct? Does it validly measure abusive personality? Is there a test that was designed to predict

the likelihood that a particular individual will abuse people?

What is validity?

Definition Implications of the contemporary

definition of validity

Validity ----- Definition

Basic Definition The degree to which a test measures

what it is supposed to measure Contemporary Definition

“The degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses” of the test

Implications of the contemporary definition

Implication 1

Interpretation and use of test scores

Validity about interpretation & use of test scores

NEO-PI-R Conscientiousness scale – 48

items High scores reflect an “active process of planning, organizing and carrying out tasks,“ and people with high scores on this scale are “purposeful, strong willed, and determined”

NEO-PI-RConscientiousness Scale What is the correct question about the

scale’s validity or invalidity?

Are the test items valid or invalid?

Are the test scores valid or invalid?

Is the interpretation of the test scores valid or invalid?

Not “are items or scores valid or invalid?”

The question is:Are the authors’ interpretations of the scores valid or invalid?

Are conscientiousness scores validly interpreted in terms of planfulness, organization, and determination?

Proposed use of scores…

Employers may use NEO-PI-R Conscientiousness Scale to screen potential employees

BELIEF: Differentiates potentially better and worse employees? Predictive power of

conscientiousness scale score?

Hammer is a useful tool if you need to drive a nail…

What if you need to saw a piece of wood?

Hammer is not a useful tool irrespective of the need

Simplistic & inaccurate to say…

“Conscientiousness scale is valid without regard to the way in which it will be interpreted and used”

Rather (what is accurate) Scores can be interpreted validly as an

indicator of conscientiousness Scale is not valid as a measure of

intelligence or extraversion Not a valid predictor of successful

employment

Compare:

“Scores on the Conscientiousness scale of the NEO-PI-R are validly interpreted as a measure of conscientiousness.”

vs.“The Conscientiousness scale of

the NEO-PI-R is valid.”

Implication 2 Validity is a matter of degree

Strong vs. weak NOT valid vs. invalid

Select test if strong enough evidence supporting intended interpretation and use

http://www.wired.com/wired/archive/9.12/aqtest.html

Concern about the Autism Spectrum Quotient…

Marginal internal consistency, so reliability is already of concern

What about validity? Is it valid to interpret a high score on

the test as reflecting a high degree of autism traits?

Interpretation of AQ Magical

Ideation Physical

Anhedonia Perceptual Aberration

Social Anhedonia

Pearson

Correlation .371** .231* .230* .573**

Sig. (2-

tailed) .000 .013 .014 .000

Autism Spectrum Quotient

N 114 114 114 114

SCID-II Paranoid

SCID II Schizotypy

SCID II Schizoid

SCL-90 Paranoid

SCL-90 Psychoticism

Pearson

Correlatio

n

.399** .314** .309** .255** .194**

Sig. (2-

tailed) .000 .000 .000 .001 .010

Autism Spectrum Quotient

N 179 179 179 178 178

**. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).

Regret vs. Autism? (r = .45)

Regret Scale

1. Whenever I make a choice, I ’m curious about what would have happened if I had chosen differently.

2. Whenever I make a choice, I try to get information about how the other alternatives would have turned out.

3. I f I make a choice and it turns out well, I still feel like somewhat of a failure if I find out another choice would have turned out better.

4. When I think about how I ’m doing in life, I often assess opportunities I have passed up.

5. Once I make a decision, I don’t look back.

Maximization Scale

1. When I watch TV, I channel surf, often scanning through the available options even while attempting to watch one program.

2. When I am in the car listening to the radio, I often check other stations to see if something better is playing, even if I ’m relatively satisfied with what I ’m listening to.

3. I treat relationships like clothing; I expect to try a lot on before I get the perfect fit.

4. No matter how satisfied I am with my job, it’s only right for me to be on the lookout for better opportunities.

5. I often fantasize about living in ways that are quite different from my actual life.

6. I ’m a big fan of lists that attempt to rank things (the best movies, the best singers, the best athletes, the best novels, etc.).

7. I often find it difficult to shop for a gift for a friend.

8. When shopping, I have a hard time finding clothing that I really love.

9. Renting videos is really difficult. I ’m always struggling to pick the best one.

10. I find that writing is very difficult, even if it’s just a letter to a friend, because it’s so hard to word things just right. I often do several drafts of even simple things.

11. No matter what I do, I have the highest standards for myself.

12. I never settle for second best.

13. Whenever I ’m faced with a choice, I try to imagine what all the other possibilities are, even ones that aren’t present at the moment.

AQ

http://www.wired.com/wired/archive/9.12/aqtest.html

What is to be measured?

What are the relative strengths of the alternatives that are available to measure that construct?

Select best measures of specific characteristics to be assessed

Implication 3

Validity of a test’s interpretation is based on evidence and theory

Human resources: “…in her experience, use of NEO-PI-R was useful in selection”

“Personality Color Test”

Based on color psychology (Max Luscher) Color preferences reveal something

about your personality Survey of scientific literature finds

almost no empirical evidence of validity of color preferences as a measure of personality characteristics

Evidence for “color test” Less than clear Cite implies validity Web site:

“Is the test reliable? We leave that to your opinion. We can only say that there are a number of corporations and colleges that use the Lûscher test as part of their hiring/admissions processes. It can be a useful tool for doctors and psychologists as well and is used to get a quick overview of potential issues patients may have in their lives.”

http://colorquiz.com/

“Color Quiz”

Is the test useful as a measure of personality?

Denied employment based on such a test?

Empirical evidence & theoretical underpinnings?

Data from high quality research must be available.

Theory alone is not adequate.

Contemporary view of validity

Although 3 forms, content, criterion, and construct, contemporary perspective highlights CONSTRUCT VALIDITY

Standards

Standards for Educational and Psychological Testing - revised (1999)

Co-published by American Education Research Association

(AERA) American Psychological Association (APA) National Council on Measurement in

Education (NCME

Remember

Contemporary perspective highlights CONSTRUCT VALIDITY

Standards outline 5 types of evidence relevant for establishing validity of test interpretations (AERA, APA, NCME, 1999)

Construct

Validity

Associations With Other Variables

Internal Structure

Test Content

Response Processe

s

Consequences of Use

Construct

Validity

Test Content

Validity Evidence: Test Content

Match between the actual content of a test and the content that should be included in the test.

Psychological nature of the construct should dictate the appropriate content of the test.

Face Validity

Face validity – the degree to which a measure appears to be related to a specific construct in the judgment of non-experts such as test takers and representatives of the legal system.

LOOKS relevant, and this fact may increase likelihood that the test will be well received by users and takers

Threats to content validity Construct-irrelevant content – e.g., test

includes questions on content not covered in book, lecture, or discussion

Construct under-representation – e.g., test content fails to represent the full scope of the content implied from the construct

Related practical issues – e.g., time, respondent fatigue, respondent attention, and etc. – Is content a fair representation?

Content Validity vs. Face Validity Content validity is the

degree to which the content reflects the full domain of the construct &

can only be evaluated by experts who have a deep understanding of the construct

Face validity is the degree to which non-experts perceive the

test to be relevant to what they believe is being measured by it

Construct

Validity

Internal Structure

Validity Evidence: Internal Structure of the

Test

For a test to be validly interpreted as a measure of a particular construct, the actual structure of the test should

match the theoretically based structure of the construct

Does the theoretical basis suggest a unidimensional or a multi-dimensional structure?

Internal Structure Often assess via examination of

factor structure (factor analysis) Items that are more strongly

correlated with each other than other items form clusters called factors…

Factor analysis should clarify the number of factors within a set of test questions

Example: Self esteem – is the construct uni- or multi-dimensional?

Factor analysis

1. Clarifies number of factors2. Reveals associations among the

factors within a multi-dimensional test

3. Identifies which items are linked to which factors

Rosenberg Self-Esteem Inventory (RSEI; Rosenberg 1989)

1. On the whole, I am satisfied with myself2. At times, I think I am no good at all.3. I feel that I have a number of good qualities4. I am able to do things as well as most other people5. I feel I do not have much to be proud of6. I certainly feel useless at time7. I feel that I’m a person of worth, at least on an equal

plan with others8. I wish I could have more respect for myself9. All in all, I am inclined to feel that I am a failure10. I take a positive attitude toward myself

RSEI - Scree Plot Number of factors

evident in the plot?Question: This scree plot

provides evidence for what type of structure

a. Unidimensionalb. Multidimensional

Scree Plot

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9

NumberEi

genv

alue

s

Construct

ValidityResponse Processe

s

Validity Evidence:Response Processes

Match between the psychological processes that respondents actually use when completing a measure and the processes that they should use. When I say start, raise your finger when you

feel 10 s have elapsed. Assumption: should use “feel” (feels like time

is up) but could use another process such as covert

counting, copying others, or looking at a second hand on a watch

Response processes

If a different response process used is different than the one assumed to be used, then the scores may not be interpretable as the test developer intended Attention to the internal feel of time

passing vs. use of some selected process to intentionally mark passage of time

Construct

Validity

Associations With Other Variables

Validity Evidence:Association With Other

Variables

Match between a measure’s actual associations with other measures and the associations that the test should have with the other measures.

Convergent evidence

The degree to which test scores are correlated with tests of related constructs

Discriminant evidence

Degree to which test scores are uncorrelated with tests of unrelated constructs

Example

Hypothesis: Schizophrenia and autism are diametrically opposed constructs

Measure of autism should be uncorrelated with measures of schizophrenia

Magical Ideation

Physical Anhedonia

Perceptual Aberration

Social Anhedonia

Pearson

Correlation .371** .231* .230* .573**

Sig. (2-

tailed) .000 .013 .014 .000

Autism Spectrum Quotient

N 114 114 114 114

SCID-II Paranoid

SCID II Schizotypy

SCID II Schizoid

SCL-90 Paranoid

SCL-90 Psychoticism

Pearson

Correlatio

n

.399** .314** .309** .255** .194**

Sig. (2-

tailed) .000 .000 .000 .001 .010

Autism Spectrum Quotient

N 179 179 179 178 178

**. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).

Support for C & B’s theory? NO: Convergent evidence - autism

measure correlated positively with sz measures Finding: AU & SZ are related constructs? i.e., Crespi & Badcock are wrong

Or Not really yes, but could assume strong

correlations indicate weak validity of AQ as a measure of autism construct

Concurrent validity evidence

The degree to which test scores are correlated with other relevant variables that are measured at the same time as the primary test of interest

SAT is a measure of skills needed for academic success? Compare SAT administered during high

school senior year to hs senior year GPA

Predictive validity evidence The degree to which test scores

are correlated with relevant variables that are measured at a future point in time.

SAT is a measure of skills needed for academic success? Compare SAT administered during

senior year of high school to college freshman year GPA

Validity Evidence:Consequences of Testing

Social consequences of test are a facet of validity…

Standards for Educational and Psychological Testing Validity includes “the intended and

unintended consequences of test use” E.g., does a construct and its

measurement benefit one group?

Not all agree… Consequences of a testing

program should be considered a facet of the scientific evaluation of the meaning of a test score.

Some feel that this is an intrusion of politics into science…

Can science be separated from personal and social values?

Summary

Conceptual basis for validity

Construct

Validity

Associations With Other Variables

Internal Structure

Test Content

Response Processe

s

Consequences of Use

Validity

Standard for Education and Psychological Tests (1999) The degree to which

evidence and theory support the interpretations of test scores

entailed by the proposed uses of a test

Validity

Are decisions based on valid interpretations of test scores? Educational placement Access to services Hiring Clinical decisions