innovations in situational judgment tests cullen paullin 2013
DESCRIPTION
Describes factors to consider when developing and delivery situational judgment tests, and innovations in linking scenario-based items to create branching simulationsTRANSCRIPT
Presented to:
Minnesota Professionals For
Psychology Applied to Work
January 15, 2013
Innovations in Situational Judgment Test
Development and Delivery
Michael J. Cullen
Cheryl Paullin
Human Resources Research Organization
HumRRO Overview
Independent non-profit R&D corporation – Established in 1951
– Headquarters in Alexandria, VA, with offices in Minneapolis, MN; Louisville, KY; Monterey, CA
Diverse staff: industrial/organizational psychologists, education researchers, management analysts, statisticians, instructional designers, web programmers
Develop custom solutions while maintaining focus on contributing to science and society
90 professional staff, 20 support staff
Clients Types:
– Federal civilian and military agencies – Private sector organizations – State and local governments – Professional associations
2
HumRRO Service Areas
Talent Management
Strategic Human Capital
Measurement & Planning
Employee and Leadership
Development
Program Evaluation
Educational Research,
Assessment &
Accountability
3
What is a Situational Judgment Test?
4
Definition:
SJTs present a written or
video-based scenario that is
accompanied by a set of
alternative courses of action
Individuals are asked what
they “would do” in the situation
or what they “should do” or to
rate the effectiveness of
different courses of action
Advantages of SJTs
5
High face validity and user acceptance due to contextualized nature
Substantial criterion-related validities
– Meta-analytic r = .26 (McDaniel, et al., 2007; 118 coefficients; N = 24,756; corrected for
unreliability in the criterion)
– Incremental validity beyond other measures of cognitive ability
(depending on focus of SJT)
Smaller subgroup differences than pure “g” measures (Clevenger,
Pereira, Wiechmann, Schmitt, & Harvey, 2001)
Opaque scoring keys may make SJTs less fakeable than traditional
measures
Sample Situation Judgment Test Question
You are attending graduate school at a major university. You are not in uniform, but
your Armed Service branch is paying your salary and all of your expenses while
you are at school. One night, there is a big party that approximately 25 fellow
officers attend. At least one half of them are smoking marijuana. You are neither
the most senior, nor the most junior, officer in the group. What would you do?
1. Tell the officers who are using marijuana that if they leave the party now, you will forget what
you saw.
2. Talk to the senior officer in the group, telling him that if he does not take action, you will.
3. Leave the party.
4. Tell the officers their behavior is unacceptable and will not be tolerated in the future.
5. Warn the officers to stop using marijuana immediately or you will contact the appropriate
authorities.
6. Report the activity to local authorities.
7. Explain to each of the officers involved that their conduct is directly contrary to the Uniform
Code of Military Justice (UCMJ) and that you have no choice but to report them.
8. Report the names of all the officers who were using marijuana to your CO.
6
4.00
5.00
6.50
5.00
6.00
2.00
5.00
4.00
Key Questions/Decision Points in Creating SJTs
7
What are you
Measuring?
• What is the purpose of the SJT (selection, development, training, other)?
• What construct(s) are you attempting to measure?
Test Format
• How many items?
• How many response options per item?
• How will examinees respond (what would you do, what should you do, pick best and worst items, rank items, rate the effectiveness of items)?
Item Writing
•Who should write the scenarios and response options (applicants, new incumbents, experienced incumbents, supervisors, trainers, psychologists)?
•How will the content be generated (critical incident method, target specific constructs)
•How complex should the stem be?
Scoring
• How should effectiveness of responses be determined (e.g., SMEs, empirical keying, rational approach)
• How should they be scored? (e.g., when picking best or worst, 1 point for picking the best, 1 point for best/worst, assign mean SME rating to each response; when rating effectiveness, absolute value of difference between rating and key)
Today’s Talk: Practical Innovations in Three Areas
8
• Key considerations in developing construct-focused SJTs
Refined Attempts to Measure Multiple
Constructs
• How do you choose the response format/scoring key in selection, development, and other situations?
Response Format and Scoring
• What are the key advantages/disadvantages of animating SJT items?
Animating SJT items
Key Innovation 1
Measuring Multiple Constructs
The Innovative Part
Construct-focused development approach
SJT produces multiple scores
12
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Ris
k-Ta
kin
g
Ach
ieve
me
nt
See
kin
g
Co
mp
etit
iven
ess
Pla
nin
g &
Go
al S
etti
ng
Pe
rfo
rman
ce M
nt
Tea
m B
uil
din
g
Pro
ble
m S
olv
ing
Cu
sto
me
r Fo
cus
Self
-Co
nfi
de
nce
Sco
re
Knowledge, Skill, or Ability
Scale scores must be reliable and valid to use as basis
for high stakes decisions
SJT Development Approaches
Traditional
Scenarios: Each emulates a realistic work situation
Responses: Vary in effectiveness of handling scenario
– Little explicit consideration of constructs involved
Scoring: Total score
Construct Validity
- Arguably, unnecessary
- Maybe, post-hoc analyses to identify underlying constructs
Construct-focused
Scenarios: Each designed to elicit a specific construct - Analogous to personality trait
scales
Responses: Vary in level of target construct – And also in effectiveness, as
reflected by application of lower & higher levels of target construct
Scoring: Total score and construct scale scores
Construct Validity
– Necessary
– Both a priori and post-hoc
13
Construct-Focused Item: Integrity
Response (rate effectiveness of each) True Score
(1-7) Retranslation
Tell her that she must stop applying for jobs during work time and finish
her part of the project on time. 7 Integrity
Offer to do her work so the project will be finished on time. 1 Problem Solving
Tell her that you will cover for her, but only this one time. 1 Integrity
Tell her you won’t lie for her, but that you also won’t tell anyone why
her part of the project is late. 4 Integrity
14
Scripted Scenario: Female walks into test-taker’s office, and sits in chair facing the test-taker. “Because you’re a colleague and friend, I’d like your help. I am looking for a new job, and I’ve been updating my resume and doing informational interviews with external recruiters during while I’m here at work. You know my family well enough to understand how impossible it would be for me to do this at home. Unfortunately, I’m behind on my assignment for the project we’re working on. I won’t be able to complete it in time for tomorrow’s status meeting. Will you help cover for me during the meeting, for example, by agreeing with me when I tell the project leader that the assignment turned out to require more work than originally expected? I promise to catch up in time for the next status meeting.”
Construct-Focused Item: Practical Intelligence
Scenario: Rex works on a team that has regularly scheduled project meetings every Monday at 9AM, and all members are expected to attend. The team has been charged with developing a critical new system for the company. However, Rex is almost always late to the weekly meetings, and sometimes misses them entirely. As a result, Larry and the other team members often have to cover for Rex when he misses the weekly meetings, and at times they have to complete work that Rex was supposed to have done. What should Larry do?
15
Response (rate effectiveness of each one) Problem Solving
Strategy
Accept that it's difficult for some people to be on-time for meetings, and cover for
Rex as needed to make sure the team accomplishes its goals Complying
Because Jerry, the team leader, tolerates this behavior, go over his head and
directly ask Jan, Jerry's boss, what she thinks the best way to handle the situation
would be
Consulting
Talk to Rex directly, and point out that his inability to attend team meetings on time
is putting an undue burden on the other team members Conferring
Nothing -- if Rex's lateness doesn't bother the team leader, Larry will just have to
deal with the fact that some people don't work as hard as others Avoiding
Send an email to Jerry, the team leader, letting him know Larry's concerns, and
making it clear that it's Jerry's responsibility to decide if anything needs to be done
about Rex
Delegating
Can it be Done?
Profile of typical problem-solving strategies across situations (Stemler & Sternberg, 2006)
– Authors claim content validity and discriminant analysis support
– Critics found low scale internal consistency and couldn’t confirm the underlying structure
Construct-based scale scores (HumRRO, client-proprietary)
– Scale reliability
• Adequate for relatively low-stakes situations, e.g., developmental feedback on a self-assessment
• Not strong enough, yet, to use as basis for high-stakes decisions
– Validity
• r = .09-.27 (uncorrected)
• r = .26 for SJT total score
– Replicate constructs in post-hoc analyses? Mixed results
16
Key Innovation 2
Response/Scoring Format
Response Instructions
What would you do? What should you do?
Rate Effectiveness Rank Order Actions
Many Possibilities
19
Scoring Keys
One answer correct, others incorrect
Assign SME mean effectiveness rating to
response
Compute absolute distance of candidate score from SME score
Compare rank ordering to SMEs rank ordering
Many Possibilities
20
Choosing Response Formats/Scoring Key
What is purpose of assessment?
What information is obtained using response/scoring
format?
Does information obtained match
purpose of assessment?
21
Setting 1: Assess Integrity in Developmental Setting
Information Desired:
Candidate’s standing on integrity
so that feedback can be provided
Approach: Ask what would you
do. Responses rated for level of
integrity. Item score is mean SME
rating for selected answer.
Information obtained:
Candidate’s actual standing on
integrity. Approach treats SJT like
a personality test
Problems: 1) Are all realistic
responses captured? 2) Will
candidates be truthful? 3) A lot of
work for one item
22
You are attending graduate school at a
major university. You are not in uniform, but
your Armed Service branch is paying your
salary and all of your expenses while you
are at school. One night, there is a big
party that approximately 25 fellow officers
attend. At least one half of them are
smoking marijuana. You are neither the
most senior, nor the most junior, officer in
the group. What would you do?
1.Tell the officers who are using marijuana that if
they leave the party now, you will forget what you
saw.
2.Talk to the senior officer in the group, telling him
that if he does not take action, you will.
3.Leave the party.
4.Tell the officers their behavior is unacceptable and
will not be tolerated in the future.
2.00
6.50
4.00
5.00
Setting 2: Assess Integrity in Selection Settings
Information Desired:
Candidate’s standing on
Integrity so that selection
decisions can be made
Approach: Rate effectiveness
of each response. Score for
each response is distance from
SME mean effectiveness rating
Information obtained:
Candidate’s practical judgment
ability in integrity-related
situations
Problems: 1) Does question
measure Integrity? Can know
which answers are effective
without having Integrity
23
You are attending graduate school at a
major university. You are not in uniform, but
your Armed Service branch is paying your
salary and all of your expenses while you
are at school. One night, there is a big
party that approximately 25 fellow officers
attend. At least one half of them are
smoking marijuana. You are neither the
most senior, nor the most junior, officer in
the group. Rate the effectiveness of each
of these possible responses.
1.Tell the officers who are using marijuana that if
they leave the party now, you will forget what you
saw.
2.Talk to the senior officer in the group, telling him
that if he does not take action, you will.
3.Leave the party.
4.Tell the officers their behavior is unacceptable and
will not be tolerated in the future.
3.00
5.00
4.00
5.00
Setting 3: Assess Integrity in Training Settings
Information desired: Class’
standing on Integrity so that
class-level training needs can be
assessed.
Approach: Ask what would
majority of classmates do.
Responses rated for level of
Integrity by SMEs and mean
computed. Class item score is
mean of judgments for item.
Information obtained:
Presumably, classmates’ true
standing on Integrity because
social desirability eliminated
Problems: 1) Are raters
mistaken about class’ ratings? 2)
What is meaning of class score?
24
You are attending graduate school at a
major university. You are not in uniform, but
your Armed Service branch is paying your
salary and all of your expenses while you
are at school. One night, there is a big
party that approximately 25 fellow officers
attend. At least one half of them are
smoking marijuana. You are neither the
most senior, nor the most junior, officer in
the group. What would the majority of your
classmates do?
1.Tell the officers who are using marijuana that if
they leave the party now, you will forget what you
saw.
2.Talk to the senior officer in the group, telling him
that if he does not take action, you will.
3.Leave the party.
4.Tell the officers their behavior is unacceptable and
will not be tolerated in the future.
2.00
5.00
4.00
6.50
Choosing Response Instructions
25
Considerations to Keep in Mind
Purpose Development/Selection/Training?
Information Obtained Candidate’s standing on specific
construct, team standing, judgment
ability?
Does information obtained match
purpose?
Does your response format allow you
to measure the intended construct?
What unintended constructs are you
measuring (e.g. cognitive ability)?
What are the psychometric
advantages/disadvantages of the
approach?
Will you be able to create a reliable
measure? Is the test construct valid?
What about faking?
Key Innovation 3
Multimedia SJTs and Virtual
Role Plays
Animated Situational Judgment Test
27
Each SJT scenario describes a
KSA-relevant problem or
challenge that requires a solution
Video-based presentation enhances user
acceptance and reduces reading load
Each response option describes realistic actions
that could be taken to handle the situation
35-45 minutes
Driving Forces Behind Multimedia SJTS
Organizational Credibility
• Organizations can demonstrate they are “ahead of the technological curve” and these multimedia displays are appealing to tech-savvy generation
• Opportunity to present information about organizational values and culture turns multimedia selection process into a marketing tool
Applicant Reactions
• May enhance perceptions of procedural justice if perceived as providing “opportunity to perform” (Schleicher, Venkataramani, Morgeson, & Campion, 2006)
• Feedback enhances positive reactions (Anseel & Lievens, 2009)
Predictive Validity
• Holding content constant, video-based SJTs may have stronger predictive validity than traditional written SJT formats (e.g., Lievens & Sackett, 2006)
• Video format may enhance correspondence to criterion, leading to enhanced validity
28
Challenges/Considerations in Multimedia SJT Delivery
Ensuring construct validity of assessment
– Base assessment on job analysis
– Create assessment plan for how constructs will be measured and
scored in assessment
Ensuring freshness of multimedia content
– Content may become easily dated and expensive to update
Applicant reactions may be decreased by certain events
– When job under consideration requires a lot of human interaction,
candidates may perceive online multimedia tests to be less job-related
than a standard interview (Bauer, Truxillo, Mack, & Costa, 2011)
– Technological difficulties (e.g., bandwidth) associated with multimedia
SJTs may decrease perceptions of fairness
29
Demo: Online SJT
http://www.humrro.org/simdemo.html
Log in: humrrodemo Password: humrrodemo
Taking Measurement a
Step Further
Link Scenarios to Create a More Complex Assessment
32
Items are no longer independent – Responses to later items may depend, in part, on information from earlier scenarios
– Test-takers may take different paths through the assessment
At some point, it becomes a simulation, performance-based measure, virtual role play
Several vendors offer these – Call center simulation
– Pharmaceutical sales
– Retail sales simulation
– Medical diagnosis
Video game-like simulations
Development and scoring are more complex
Easier to make it look great than to ensure great measurement
properties
– Matters most if high-stakes outcomes depend on scores
Scoring
– Different question types are possible, but can you score them
accurately?
• How should you assign team members to tasks?
• Which action should be taken first?
• How is the customer feeling at this point?
– Can you produce multiple construct scores from the same assessment?
33
Demo: Virtual Role Play
(Extended SJT)
Multimedia Assessments: Technology Considerations
Test-taker computer requirements
– Audio, graphics
• Standard features are adequate on most modern computers
– Some animation packages can’t play on devices without Flash (e.g.,
iPad)
Allow test-takers to take assessment on mobile devices?
– Can you prevent it?
– If you know this will happen, think about implications
Adequate bandwidth if multiple test-takers take assessment
concurrently
– Multimedia assessments do require more bandwidth than non-animated
– Usually not a big concern
35
References
Anseel, F., & Lievens F. (2009). The mediating role of feedback acceptance in the relationship between feedback and attitudinal and performance outcomes. International Journal of Selection and Assessment, 17, 362-376.
Bauer, T.N., Truxillo, D.M., Mack, K., & Costa, A.B. (2011). Applicant reactions to technology-based selection. In N.T. Tippins & S. Adler, Technology-enhanced assessment of talent (pp. 190-223). San Francisco: Jossey-Bass.
Clevenger, J., Pereira, G.M., Wiechmann, D., Schmitt, N. and Harvey, V.S. (2001) Incremental validity of situational judgment tests. Journal of Applied Psychology, 86, 410–417.
McDaniel, M.A., Hartman, N.S., Whetzel, D.L. & Grubb. W.L., III (2007). Situational judgment tests, response instructions and validity: A meta-analysis. Personnel Psychology, 60, 63-91. DOI: 10.1111/j.1744-6570.2007.00065.x
McDaniel, M.S. & Whetzel, D.L. (2007). Situational judgment tests. In G.R. Wheaton & D. Whetzel (Eds.), Applied measurement: Industrial psychology in human resource management. New York, NY: Taylor & Francis Group, Lawrence Erlbaum & Associates.
Motowidlo, S.J., & Beier, M.E. (2010). Differentiating specific job knowledge from implicit trait policies in procedural knowledge measured by a situational judgment test. Journal of Applied Psychology, 95, 321-333. DOI: 10.1037/a0017975
36
References (Continued)
Schleicher, D.J., Venkataramani, Y., Morgeson, F.P., & Campion, M.A. (2006). So you
didn’t get the job. Now what do you think? Examining opportunity to perform fairness
perceptions. Personnel Psychology, 59, 559-590.
Weekley, J.A., & Ployhart, R.E. (2006). Situational Judgment Tests: Theory,
Measurement, and Application. Mahwah, NJ: Erlbaum.
Westring, A.J.F., Oswald, F.L., Schmitt, N., Drzakowski, S., Imus, A., Kim, B., &
Shivpuri, S. (2009). Estimating trait and situational variance in a situational judgment
test. Human Performance, 22, 44-63. DOI: 10.1080/08959280802540999
37