innovations in situational judgment tests cullen paullin 2013

Presented to:

Minnesota Professionals For

Psychology Applied to Work

January 15, 2013

Innovations in Situational Judgment Test

Development and Delivery

Michael J. Cullen

Cheryl Paullin

Human Resources Research Organization

HumRRO Overview

Independent non-profit R&D corporation – Established in 1951

– Headquarters in Alexandria, VA, with offices in Minneapolis, MN; Louisville, KY; Monterey, CA

Diverse staff: industrial/organizational psychologists, education researchers, management analysts, statisticians, instructional designers, web programmers

Develop custom solutions while maintaining focus on contributing to science and society

90 professional staff, 20 support staff

Clients Types:

– Federal civilian and military agencies – Private sector organizations – State and local governments – Professional associations

2

HumRRO Service Areas

Talent Management

Strategic Human Capital

Measurement & Planning

Employee and Leadership

Development

Program Evaluation

Educational Research,

Assessment &

Accountability

3

What is a Situational Judgment Test?

4

Definition:

SJTs present a written or

video-based scenario that is

accompanied by a set of

alternative courses of action

Individuals are asked what

they “would do” in the situation

or what they “should do” or to

rate the effectiveness of

different courses of action

Advantages of SJTs

5

High face validity and user acceptance due to contextualized nature

Substantial criterion-related validities

– Meta-analytic r = .26 (McDaniel, et al., 2007; 118 coefficients; N = 24,756; corrected for

unreliability in the criterion)

– Incremental validity beyond other measures of cognitive ability

(depending on focus of SJT)

Smaller subgroup differences than pure “g” measures (Clevenger,

Pereira, Wiechmann, Schmitt, & Harvey, 2001)

Opaque scoring keys may make SJTs less fakeable than traditional

measures

Sample Situation Judgment Test Question

You are attending graduate school at a major university. You are not in uniform, but

your Armed Service branch is paying your salary and all of your expenses while

you are at school. One night, there is a big party that approximately 25 fellow

officers attend. At least one half of them are smoking marijuana. You are neither

the most senior, nor the most junior, officer in the group. What would you do?

1. Tell the officers who are using marijuana that if they leave the party now, you will forget what

you saw.

2. Talk to the senior officer in the group, telling him that if he does not take action, you will.

3. Leave the party.

4. Tell the officers their behavior is unacceptable and will not be tolerated in the future.

5. Warn the officers to stop using marijuana immediately or you will contact the appropriate

authorities.

6. Report the activity to local authorities.

7. Explain to each of the officers involved that their conduct is directly contrary to the Uniform

Code of Military Justice (UCMJ) and that you have no choice but to report them.

8. Report the names of all the officers who were using marijuana to your CO.

6

4.00

5.00

6.50

5.00

6.00

2.00

5.00

4.00

Key Questions/Decision Points in Creating SJTs

7

What are you

Measuring?

• What is the purpose of the SJT (selection, development, training, other)?

• What construct(s) are you attempting to measure?

Test Format

• How many items?

• How many response options per item?

• How will examinees respond (what would you do, what should you do, pick best and worst items, rank items, rate the effectiveness of items)?

Item Writing

•Who should write the scenarios and response options (applicants, new incumbents, experienced incumbents, supervisors, trainers, psychologists)?

•How will the content be generated (critical incident method, target specific constructs)

•How complex should the stem be?

Scoring

• How should effectiveness of responses be determined (e.g., SMEs, empirical keying, rational approach)

• How should they be scored? (e.g., when picking best or worst, 1 point for picking the best, 1 point for best/worst, assign mean SME rating to each response; when rating effectiveness, absolute value of difference between rating and key)

Today’s Talk: Practical Innovations in Three Areas

8

• Key considerations in developing construct-focused SJTs

Refined Attempts to Measure Multiple

Constructs

• How do you choose the response format/scoring key in selection, development, and other situations?

Response Format and Scoring

• What are the key advantages/disadvantages of animating SJT items?

Animating SJT items

Key Innovation 1

Measuring Multiple Constructs

The Innovative Part

Construct-focused development approach

SJT produces multiple scores

12

1.0

2.0

3.0

4.0

5.0

6.0

7.0

Ris

k-Ta

kin

g

Ach

ieve

me

nt

See

kin

g

Co

mp

etit

iven

ess

Pla

nin

g &

Go

al S

etti

ng

Pe

rfo

rman

ce M

nt

Tea

m B

uil

din

g

Pro

ble

m S

olv

ing

Cu

sto

me

r Fo

cus

Self

-Co

nfi

de

nce

Sco

re

Knowledge, Skill, or Ability

Scale scores must be reliable and valid to use as basis

for high stakes decisions

SJT Development Approaches

Traditional

Scenarios: Each emulates a realistic work situation

Responses: Vary in effectiveness of handling scenario

– Little explicit consideration of constructs involved

Scoring: Total score

Construct Validity

- Arguably, unnecessary

- Maybe, post-hoc analyses to identify underlying constructs

Construct-focused

Scenarios: Each designed to elicit a specific construct - Analogous to personality trait

scales

Responses: Vary in level of target construct – And also in effectiveness, as

reflected by application of lower & higher levels of target construct

Scoring: Total score and construct scale scores

Construct Validity

– Necessary

– Both a priori and post-hoc

13

Construct-Focused Item: Integrity

Response (rate effectiveness of each) True Score

(1-7) Retranslation

Tell her that she must stop applying for jobs during work time and finish

her part of the project on time. 7 Integrity

Offer to do her work so the project will be finished on time. 1 Problem Solving

Tell her that you will cover for her, but only this one time. 1 Integrity

Tell her you won’t lie for her, but that you also won’t tell anyone why

her part of the project is late. 4 Integrity

14

Scripted Scenario: Female walks into test-taker’s office, and sits in chair facing the test-taker. “Because you’re a colleague and friend, I’d like your help. I am looking for a new job, and I’ve been updating my resume and doing informational interviews with external recruiters during while I’m here at work. You know my family well enough to understand how impossible it would be for me to do this at home. Unfortunately, I’m behind on my assignment for the project we’re working on. I won’t be able to complete it in time for tomorrow’s status meeting. Will you help cover for me during the meeting, for example, by agreeing with me when I tell the project leader that the assignment turned out to require more work than originally expected? I promise to catch up in time for the next status meeting.”

Construct-Focused Item: Practical Intelligence

Scenario: Rex works on a team that has regularly scheduled project meetings every Monday at 9AM, and all members are expected to attend. The team has been charged with developing a critical new system for the company. However, Rex is almost always late to the weekly meetings, and sometimes misses them entirely. As a result, Larry and the other team members often have to cover for Rex when he misses the weekly meetings, and at times they have to complete work that Rex was supposed to have done. What should Larry do?

15

Response (rate effectiveness of each one) Problem Solving

Strategy

Accept that it's difficult for some people to be on-time for meetings, and cover for

Rex as needed to make sure the team accomplishes its goals Complying

Because Jerry, the team leader, tolerates this behavior, go over his head and

directly ask Jan, Jerry's boss, what she thinks the best way to handle the situation

would be

Consulting

Talk to Rex directly, and point out that his inability to attend team meetings on time

is putting an undue burden on the other team members Conferring

Nothing -- if Rex's lateness doesn't bother the team leader, Larry will just have to

deal with the fact that some people don't work as hard as others Avoiding

Send an email to Jerry, the team leader, letting him know Larry's concerns, and

making it clear that it's Jerry's responsibility to decide if anything needs to be done

about Rex

Delegating

Can it be Done?

Profile of typical problem-solving strategies across situations (Stemler & Sternberg, 2006)

– Authors claim content validity and discriminant analysis support

– Critics found low scale internal consistency and couldn’t confirm the underlying structure

Construct-based scale scores (HumRRO, client-proprietary)

– Scale reliability

• Adequate for relatively low-stakes situations, e.g., developmental feedback on a self-assessment

• Not strong enough, yet, to use as basis for high-stakes decisions

– Validity

• r = .09-.27 (uncorrected)

• r = .26 for SJT total score

– Replicate constructs in post-hoc analyses? Mixed results

16

Key Innovation 2

Response/Scoring Format

Response Instructions

What would you do? What should you do?

Rate Effectiveness Rank Order Actions

Many Possibilities

19

Scoring Keys

One answer correct, others incorrect

Assign SME mean effectiveness rating to

response

Compute absolute distance of candidate score from SME score

Compare rank ordering to SMEs rank ordering

Many Possibilities

20

Choosing Response Formats/Scoring Key

What is purpose of assessment?

What information is obtained using response/scoring

format?

Does information obtained match

purpose of assessment?

21

Setting 1: Assess Integrity in Developmental Setting

Information Desired:

Candidate’s standing on integrity

so that feedback can be provided

Approach: Ask what would you

do. Responses rated for level of

integrity. Item score is mean SME

rating for selected answer.

Information obtained:

Candidate’s actual standing on

integrity. Approach treats SJT like

a personality test

Problems: 1) Are all realistic

responses captured? 2) Will

candidates be truthful? 3) A lot of

work for one item

22

You are attending graduate school at a

major university. You are not in uniform, but

your Armed Service branch is paying your

salary and all of your expenses while you

are at school. One night, there is a big

party that approximately 25 fellow officers

attend. At least one half of them are

smoking marijuana. You are neither the

most senior, nor the most junior, officer in

the group. What would you do?

1.Tell the officers who are using marijuana that if

they leave the party now, you will forget what you

saw.

2.Talk to the senior officer in the group, telling him

that if he does not take action, you will.

3.Leave the party.

4.Tell the officers their behavior is unacceptable and

will not be tolerated in the future.

2.00

6.50

4.00

5.00

Setting 2: Assess Integrity in Selection Settings

Information Desired:

Candidate’s standing on

Integrity so that selection

decisions can be made

Approach: Rate effectiveness

of each response. Score for

each response is distance from

SME mean effectiveness rating


Candidate’s practical judgment

ability in integrity-related

situations

Problems: 1) Does question

measure Integrity? Can know

which answers are effective

without having Integrity

23










the group. Rate the effectiveness of each

of these possible responses.



saw.



3.Leave the party.



3.00

5.00

4.00

5.00

Setting 3: Assess Integrity in Training Settings

Information desired: Class’

standing on Integrity so that

class-level training needs can be

assessed.

Approach: Ask what would

majority of classmates do.

Responses rated for level of

Integrity by SMEs and mean

computed. Class item score is

mean of judgments for item.


Presumably, classmates’ true

standing on Integrity because

social desirability eliminated

Problems: 1) Are raters

mistaken about class’ ratings? 2)

What is meaning of class score?

24










the group. What would the majority of your

classmates do?



saw.



3.Leave the party.



2.00

5.00

4.00

6.50

Choosing Response Instructions

25

Considerations to Keep in Mind

Purpose Development/Selection/Training?

Information Obtained Candidate’s standing on specific

construct, team standing, judgment

ability?

Does information obtained match

purpose?

Does your response format allow you

to measure the intended construct?

What unintended constructs are you

measuring (e.g. cognitive ability)?

What are the psychometric

advantages/disadvantages of the

approach?

Will you be able to create a reliable

measure? Is the test construct valid?

What about faking?

Key Innovation 3

Multimedia SJTs and Virtual

Role Plays

Animated Situational Judgment Test

27

Each SJT scenario describes a

KSA-relevant problem or

challenge that requires a solution

Video-based presentation enhances user

acceptance and reduces reading load

Each response option describes realistic actions

that could be taken to handle the situation

35-45 minutes

Driving Forces Behind Multimedia SJTS

Organizational Credibility

• Organizations can demonstrate they are “ahead of the technological curve” and these multimedia displays are appealing to tech-savvy generation

• Opportunity to present information about organizational values and culture turns multimedia selection process into a marketing tool

Applicant Reactions

• May enhance perceptions of procedural justice if perceived as providing “opportunity to perform” (Schleicher, Venkataramani, Morgeson, & Campion, 2006)

• Feedback enhances positive reactions (Anseel & Lievens, 2009)

Predictive Validity

• Holding content constant, video-based SJTs may have stronger predictive validity than traditional written SJT formats (e.g., Lievens & Sackett, 2006)

• Video format may enhance correspondence to criterion, leading to enhanced validity

28

Challenges/Considerations in Multimedia SJT Delivery

Ensuring construct validity of assessment

– Base assessment on job analysis

– Create assessment plan for how constructs will be measured and

scored in assessment

Ensuring freshness of multimedia content

– Content may become easily dated and expensive to update

Applicant reactions may be decreased by certain events

– When job under consideration requires a lot of human interaction,

candidates may perceive online multimedia tests to be less job-related

than a standard interview (Bauer, Truxillo, Mack, & Costa, 2011)

– Technological difficulties (e.g., bandwidth) associated with multimedia

SJTs may decrease perceptions of fairness

29

Demo: Online SJT

http://www.humrro.org/simdemo.html

Log in: humrrodemo Password: humrrodemo

Taking Measurement a

Step Further

Link Scenarios to Create a More Complex Assessment

32

Items are no longer independent – Responses to later items may depend, in part, on information from earlier scenarios

– Test-takers may take different paths through the assessment

At some point, it becomes a simulation, performance-based measure, virtual role play

Several vendors offer these – Call center simulation

– Pharmaceutical sales

– Retail sales simulation

– Medical diagnosis

Video game-like simulations

Development and scoring are more complex

Easier to make it look great than to ensure great measurement

properties

– Matters most if high-stakes outcomes depend on scores

Scoring

– Different question types are possible, but can you score them

accurately?

• How should you assign team members to tasks?

• Which action should be taken first?

• How is the customer feeling at this point?

– Can you produce multiple construct scores from the same assessment?

33

Demo: Virtual Role Play

(Extended SJT)

Multimedia Assessments: Technology Considerations

Test-taker computer requirements

– Audio, graphics

• Standard features are adequate on most modern computers

– Some animation packages can’t play on devices without Flash (e.g.,

iPad)

Allow test-takers to take assessment on mobile devices?

– Can you prevent it?

– If you know this will happen, think about implications

Adequate bandwidth if multiple test-takers take assessment

concurrently

– Multimedia assessments do require more bandwidth than non-animated

– Usually not a big concern

35

References

Anseel, F., & Lievens F. (2009). The mediating role of feedback acceptance in the relationship between feedback and attitudinal and performance outcomes. International Journal of Selection and Assessment, 17, 362-376.

Bauer, T.N., Truxillo, D.M., Mack, K., & Costa, A.B. (2011). Applicant reactions to technology-based selection. In N.T. Tippins & S. Adler, Technology-enhanced assessment of talent (pp. 190-223). San Francisco: Jossey-Bass.

Clevenger, J., Pereira, G.M., Wiechmann, D., Schmitt, N. and Harvey, V.S. (2001) Incremental validity of situational judgment tests. Journal of Applied Psychology, 86, 410–417.

McDaniel, M.A., Hartman, N.S., Whetzel, D.L. & Grubb. W.L., III (2007). Situational judgment tests, response instructions and validity: A meta-analysis. Personnel Psychology, 60, 63-91. DOI: 10.1111/j.1744-6570.2007.00065.x

McDaniel, M.S. & Whetzel, D.L. (2007). Situational judgment tests. In G.R. Wheaton & D. Whetzel (Eds.), Applied measurement: Industrial psychology in human resource management. New York, NY: Taylor & Francis Group, Lawrence Erlbaum & Associates.

Motowidlo, S.J., & Beier, M.E. (2010). Differentiating specific job knowledge from implicit trait policies in procedural knowledge measured by a situational judgment test. Journal of Applied Psychology, 95, 321-333. DOI: 10.1037/a0017975

36

References (Continued)

Schleicher, D.J., Venkataramani, Y., Morgeson, F.P., & Campion, M.A. (2006). So you

didn’t get the job. Now what do you think? Examining opportunity to perform fairness

perceptions. Personnel Psychology, 59, 559-590.

Weekley, J.A., & Ployhart, R.E. (2006). Situational Judgment Tests: Theory,

Measurement, and Application. Mahwah, NJ: Erlbaum.

Westring, A.J.F., Oswald, F.L., Schmitt, N., Drzakowski, S., Imus, A., Kim, B., &

Shivpuri, S. (2009). Estimating trait and situational variance in a situational judgment

test. Human Performance, 22, 44-63. DOI: 10.1080/08959280802540999

37

innovations in situational judgment tests cullen paullin 2013

Business