hr avatar assessment solution technical...

HR AVATAR ASSESSMENT SOLUTION TECHNICAL MANUAL 2019 1

HR Avatar Assessment Solution

Technical Manual

Updated: December 2, 2019

www.hravatar.com 41101 Haybine Lane, Aldie VA 20105

703-688-3981


Contents

Overview ............................................................................................................................... 9

Testing Standards ................................................................................................................. 9

Solution Summary ............................................................................................................... 10

Cognitive Work Simulations ............................................................................................... 11

Previous Research ................................................................................................................................................ 11

Content Development ......................................................................................................................................... 13

Attention to Detail 13

Analytical Thinking 13

Reliability ............................................................................................................................................................... 15

Validity ................................................................................................................................................................... 17

Fairness .................................................................................................................................................................. 17

Attitudes, Interests and Motivations Assessment .............................................................. 29



Needs Structure 30

Innovative and Creative 31

Enjoys Problem-Solving 31

Competitive 31

Seeks Perfection 31

Develops Relationships 31

Expressive and Outgoing 32

Corporate Citizenship 32


Exhibits a Positive Work Attitude 32

Adaptable 32

Reliability ............................................................................................................................................................... 33

Validity ................................................................................................................................................................... 34

Fairness .................................................................................................................................................................. 35

Emotional Intelligence Module .......................................................................................... 40



Self-Control 40

Self-Awareness 41

Empathy 41

Reliability ............................................................................................................................................................... 41

Fairness .................................................................................................................................................................. 41

Workplace Competency Assessment .................................................................................. 43



Coaching and Developing Others 43

Exercising Political Savvy 44

Guiding, Directing, and Motivating Others 44

Resolving Conflicts and Meeting Customer Needs 44

Team Building 44

Behavioral History Survey ................................................................................................... 46



Performance 47

Tenure 47

Reliability ............................................................................................................................................................... 48

Fairness .................................................................................................................................................................. 49

Knowledge and Skills Tests ................................................................................................ 50

Development Overview ...................................................................................................................................... 50


Sales Situation Analysis ....................................................................................................... 51

Test Development ............................................................................................................................................... 51

Reliability ............................................................................................................................................................... 52

Fairness .................................................................................................................................................................. 52

Typing Speed and Accuracy................................................................................................ 53


Reliability ............................................................................................................................................................... 53

Fairness .................................................................................................................................................................. 53

Data Entry ........................................................................................................................... 54


Reliability ............................................................................................................................................................... 55

Fairness .................................................................................................................................................................. 55

Essay Test ............................................................................................................................ 56


Descriptives .......................................................................................................................................................... 57

Fairness .................................................................................................................................................................. 57

Solution Scoring .................................................................................................................. 59

Construct Validity Evidence ............................................................................................................................... 60

Technical Requirements ..................................................................................................... 64

Future Research .................................................................................................................. 64

Reliability ............................................................................................................................................................... 64

Validity ................................................................................................................................................................... 65

Norms .................................................................................................................................................................... 65

References ........................................................................................................................... 66

Appendix A: Summary of the HR Avatar Solutions ........................................................... 72

Appendix B: Historical Validity Evidence for the Cognitive Scales .................................. 92

Appendix C: Historical Validity Evidence for AIMS.......................................................... 93

Appendix D: Scoring Rubric for Essays ............................................................................. 98

Appendix E: Directions for Rating Essay Tests ................................................................. 99

Appendix F: Validity Evidence for HR Avatar Tests ....................................................... 102


Appendix G: Validity Evidence for HR Avatar Tests ....................................................... 105


Tables Table 3. Descriptive Statistics and Reliability Evidence for the Cognitive Workplace Simulation Scales ....... 16

Table 4. Evaluation of Cognitive Score Differences by Gender ........................................................................... 19

Table 5. Evaluation of Cognitive Score Differences by Age Group..................................................................... 21

Table 6. Descriptive Statistics and Reliability Evidence for the AIMS Scales ..................................................... 34

Table 7. Evaluation of AIMs Score Differences by Gender .................................................................................. 35

Table 8. Evaluation of AIMs Score Differences by Ethnicity ............................................................................... 36

Table 9. Evaluation of AIMs Score Differences by Age Group ........................................................................... 36

Table 10. Evaluation of AIMs Score Differences by Race Groups: Asian and White ....................................... 37

Table 11. Evaluation of AIMs Score Differences by Race Groups: Black or African American and White . 37

Table 12. Descriptive Statistics and Reliability Evidence for the Emotional Intelligence Module .................. 41

Table 7. Evaluation of AIMs Score Differences by Gender .................................................................................. 42

Table 8. Evaluation of AIMs Score Differences by Ethnicity ............................................................................... 42

Table 9. Evaluation of AIMs Score Differences by Age Group ........................................................................... 42

Table 11. Evaluation of AIMs Score Differences by Race Groups: Black or African American and White . 42

Table 12. Descriptive Statistics and Reliability Evidence for the Professional Behavioral History Survey .... 48

Table 13. Descriptive Statistics and Reliability Evidence for the Entry-Level Behavioral History Survey ..... 48

Table 14. Evaluation of Biographical History Survey Score Differences by Gender ......................................... 49

Table 15. Evaluation of Biographical History Survey Score Differences by Ethnicity ...................................... 49

Table 16. Evaluation of Biographical History Survey Score Differences by Age Group .................................. 50

Table 17. Evaluation of Biographical History Survey Score Differences by Race Groups: Asian and White 50

Table 21. Evaluation of Biographical History Survey Score Differences by Race Groups: Black and White 50

Table 18. Descriptive Statistics and Reliability Evidence for the Sales Situation Analysis Assessment .......... 52

Table 19. Evaluation of Sales Situation Analysis Score Differences ..................................................................... 52

Table 20. Descriptive Statistics and Reliability Estimates for the Typing Test ................................................... 53

Table 21. Evaluation of Business Typing Differences by Subgroup .................................................................... 53

Table 21. Evaluation of Academic Typing Differences by Subgroup .................................................................. 54

Table 20. Descriptive Statistics and Reliability Estimates for the Data Entry Tests .......................................... 55

Table 21. Evaluation of 10-key Score Differences by Subgroup ........................................................................... 55

Table 21. Evaluation of Alphanumeric Score Differences by Subgroup ............................................................. 55

Table 21. Evaluation of Oral Alphanumeric Score Differences by Subgroup .................................................... 55

Table 22. Descriptive Statistics for Essay Scores ..................................................................................................... 57

Table 23. Evaluation of Essay Score Differences by Subgroup – Living in a Big City Prompt ....................... 57

Table 23. Evaluation of Essay Score Differences by Subgroup – Working from Home Prompt ................... 58


Table 24. Correlations between the AIMs Scales ..................................................................................................... 61

Table 25. Correlations between the Professional Behavioral History Scales ....................................................... 62

Table 26. Correlations between the Entry-Level Behavioral History Scales ....................................................... 62

Table 27. Correlations between HR Avatar AIMs and Behavioral History Scales ......... Error! Bookmark not

defined.

Table 28. Correlations between the Two Cognitive Workplace Simulation Scales: Attention to Detail and

Analytical Thinking ................................................................................................... Error! Bookmark not defined.

Table 29. Correlations between the AIMs and the Cognitive Workplace Simulation Scales Error! Bookmark

not defined.

Table 30. Correlations between the Behavioral History Survey and the Cognitive Workplace Simulation

Scales ........................................................................................................................... Error! Bookmark not defined.

Table 31. Compiled Validity Evidence for the Original Content of the Cognitive Work Simulation Scales . 92

Table 32. Study 1 Results: Concurrent Validation Study for Insurance Consultants N=122-136 ................... 93

Table 33. Study 2 Results: Concurrent Validation Study for Inside Sales N=105 .............................................. 94

Table 34. Study 3 Results: Concurrent Validation Study for an Internet Services Order Processing N=84 .. 94

Table 35. Study 4 Results: Concurrent Validation Study for an Internet and Cable Sales and Service Position

N=72-93 ......................................................................................................................................................................... 94

Table 36. Study 5 Results: Concurrent Validation Study Auto Rental Sales Role N=80-92 ............................. 94

Table 37. Study 6: Concurrent Validation Study for Paramedics N=85 .............................................................. 95

Table 38. Summary of Relationships between Performance Measures and Original AIMS Scales.................. 96


Figures Figure 1. Summary of HR Avatar Solution ............................................................................................................... 11

Figure 2. Cognitive Work Simulation ........................................................................................................................ 14

Figure 3. Hypothesized Relationships between the AIMs Scales and the Big Five ............................................ 33


Overview

The HR Avatar Employment Assessment series was designed as a flexible assessment instrument for

helping corporations obtain quality hires by measuring cognitive abilities, biographical data, personality

characteristics and job knowledge related to performance and tenure in the workplace. To cater to differing

individual client needs, HR Avatar offers both individual assessments as well as complete assessment

solutions. By combining assessments and measuring multiple competencies, the assessment solutions allow

for a broader evaluation of overall candidate fit with a particular job.

This technical manual contains a summary of testing standards for the development of psychological

assessments in the workplace and an overall summary of the development of the assessment solution. There

are sections for each assessment within the solution including previous research, content development, and

reliability evidence, validity evidence, and evidence for fairness. The technical manual concludes with a

summary of HR Avatar’s research agenda.

Testing Standards

The Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 2014), the Principles for the Validation and Use of Personnel Selection Procedures, (Society for Industrial Organizational Psychology, 2003), the Uniform Guidelines on Employee Selection Procedures (Equal Employment Opportunity Commission, C. S. C. U. S. D. L. U., & Equal Employment Opportunity Commission, 1978), and Testing and Assessment: An Employer’s Guide to Good Practices (Saad, Carter, Rothenberg, & Israelson, 2000) are all documents that “govern” the development and use of tests in employment settings. The developers of the HR Avatar Employment Assessment series have made, and continue to make, efforts to adhere to recommendations for test development put forth by these documents. The evidence produced in this technical manual is a summary of all of the theoretical and empirical research done on the tool to date. We anticipate frequent updates as we complete additional research. The following list summarizes the key test development standards from the above listed documents. The documents include additional standards governing test usage and we encourage test users to review and adhere to these standards.

• Define the psychological construct measured by the test

• Document the intended use of the test

• Document how test scores should be interpreted

• Demonstrate validity for the specific use of the assessment and interpretation of scores including:

o Previous research on the construct

o Previous empirical research that demonstrates the extent to which the validity evidence may

be generalizable (e.g. meta-analytic studies)

o Research supporting the link between the content of the assessment and the tasks,

knowledge, skills, and abilities required for the job (i.e., content-related validity evidence)


o Research supporting a specific response process takes place if it is inferred in the use or

interpretation of the test

o Evidence that the internal structure of the assessment conforms to theoretical expectations

o Relationships between scores and any outcomes the assessment claims to be able to predict,

such as job performance (i.e. criterion-related validity evidence)

o Evidence demonstrating that the test is statistically related to other measures in ways in

which are consistent with theoretical expectations (i.e., convergent and divergent validity

evidence)

• Document sufficient details related to validity research in order to allow the reader an opportunity to

independently evaluate the quality of research

• Document reliability estimates and the Standard Error of Measurement (SEM) for each score, sub-

score, and combination of scores

• Use a type of reliability estimate (e.g., internal consistency, test-retest) that is appropriate for the test

• Document sufficient details related to reliability research in order to allow the reader an opportunity

to independently evaluate the quality of research

• Design test in such a way that it allows all test takers an equal opportunity to demonstrate their

standing on the construct regardless of subgroup membership (e.g., age, disability status, ethnicity,

gender, age, or race)

• Provide empirical evidence demonstrating that the construct is measured the same way across

subgroups

Solution Summary

The HR Avatar Employment Assessment series has been designed for use in identifying or screening out

those job applicants with the lowest potential for success in a given role by assessing job-relevant cognitive

abilities, personality characteristics, behavioral background, and knowledge and skills. There are currently

254 HR Avatar solutions (see Appendix A) in the HR Avatar Employment Assessment series. Each solution

has been designed for specific job roles and contains several components: one of 30 different Cognitive

Work Simulations, one of two different Attitudes, Interests, and Motivations assessment forms (AIMs), an

emotional intelligence module, one of two different Behavioral History Surveys, and job-specific Knowledge

and Skills assessments. The 31 Cognitive Work Simulations represent 31 distinct work contexts and are

discussed in more detail below. The AIMs assessment and the Behavioral History Survey are both available

in two forms. One form is designed for professional positions and the other is designed for entry-level

positions.


Figure 1. Summary of HR Avatar Solution

The assessments are delivered via a computer or other mobile device via the internet. Applicants are sent an

email with a link to take the assessment. Once the applicant clicks on the link in the email, they are brought

to a page in their web browser. The assessment begins with an animated “host”. The host guides the

applicant through the process and provides instructions prior to the start of each assessment. The animated

host is used to provide the applicant with a more engaging and positive assessment experience. Most

solutions take about 40 minutes to complete.

Prior to implementing the solution for selecting employees, HR Avatar recommends completing a job

analysis to ensure that the competencies measured by the HR Avatar assessments are important for success

in your organization. Additionally, HR Avatar recommends that the organization conduct a pilot study to

examine the relationships between the assessments and relevant job performance criteria and to evaluate the

potential for adverse impact. This pilot study should be done prior to using the assessment for screening out

employees.

Cognitive Work Simulations

Previous Research

Those individuals with higher cognitive abilities are more equipped to solve complex problems and learn

new skills. Cognitive ability is necessary for successful performance on a broad range of tasks including

written documentation, oral communication, identifying and managing details, solving problems, reading

and responding to email messages, quickly learning and applying new information, quantitative

computations and analyses, and making well-reasoned decisions

The published research evidence consistently demonstrates a strong, predictive relationship between

cognitive ability, or general mental ability, and job performance and training success. In fact, several meta-

•Measures 1-3 specific cognitive abilities

•1 of 30 job family-specific versionsCognitive Work Simulation

•10 personlity scales

•2 versions (professional and entry-level)Attitudes, Interests, Motivations

• 3 Emotional intelligence scales.Emotional Intelligence

•2 biographical data scales

•2 versions (professional and entry-level)Behavioral History Survey

•1 or more of 55 job-specific knowledge and skills tests

•Sales Situation Analysis, Customer Service, etc.

•Typing Speed/Accuracy, Data Entry, Essay Writing, Etc.Knowledge and Skills tests


analyses have been completed, summarizing and combing the results of many research studies examining

the validity of cognitive ability (Hunter & Hunter, 1984; Schmidt & Hunter, 1998) and the results generalize

across geographical regions (Bertua, Anderson, & Salgado, 2005; Salgado & Anderson, 2003; Salgado,

Anderson, Moscoso, Bertua, & De Fruyt, 2003a; Salgado et al., 2003b). The operational validity estimates

from these meta-analytic studies tend to be in the .5-.6 range. Though cognitive ability predicts performance

at all job levels, the relationship is moderated by complexity such that the relationship is stronger for more

complex jobs (Bertua et al., 2005; Salgado et al., 2003b).

Currently, the most commonly accepted taxonomy for cognitive abilities is the CHC model (Cattell, 1941;

Horn, 1985; Carroll, 1993). The emergence of this accepted hierarchical taxonomy came from the merging

of two independent lines of research examining the taxonomy and structure of cognitive abilities (McGrew,

2008). The taxonomy posits general cognitive ability at Stratum I, broad abilities at Stratum II, and more

specific abilities at Stratum III. For the prediction of job performance, we are most interested in measuring

abilities at the second stratum as these are deemed to be broad enough to apply to multiple positions, but

specific enough to yield incremental prediction in job performance. Furthermore, recent research has

demonstrated that using specific abilities in selection decisions can reduce adverse impact without

decreasing the validity of the selection process (Kehoe, 2002; Wee, Newman, & Joseph, 2014).

The broad abilities listed in Stratum II consist of: fluid reasoning, comprehension-knowledge, short-term

memory, visual processing, auditory processing, long-term storage and retrieval, cognitive processing speed,

decision and reaction speed, reading and writing, quantitative knowledge, general (domain-specific)

knowledge, tactile abilities, kinesthetic abilities, olfactory abilities, psychomotor abilities, and psychomotor

speed (McGrew, 2008). Based on our understanding of these abilities, we believe that fluid reasoning, short-

term memory, reading and writing, quantitative knowledge, and general (domain-specific) knowledge would

be the most generalizable across jobs. However, the general (domain-specific) knowledge was determined to

be better measured by the knowledge and skills assessments. HR Avatar designed two scales for each job

family: Attention to Detail and Analytical Thinking. The development of these scales is discussed further on

the content development section, but, to summarize, Attention to Detail taps short-term memory, whereas

Analytical Thinking is more closely related to fluid reasoning and quantitative knowledge. Both scales

require that applicants read information, therefore, to some extent, the reading component of reading and

writing is also measured by both scales.

Current meta-analytic research has examined the relationship of these specific abilities with overall job

performance and training performance. In a European sample, Salgado et al. (2003b) found that perceptual

ability and memory (similar to the competency measured by the Attention to Detail scale) had corrected

mean validity coefficients of .52 and .56 with job performance and .25 and .34 with training performance,

respectively. Additioanlly, they found that verbal ability and numerical ability (similar to the competency

measured by the Analytical Thinking scale) had corrected mean validity coefficients of .35 and .52 with job


performance and .44 and .48 with training perofrmance, respectively. In a UK sample, Bertua et al. (2005)

found that perceptual abilities had a corrected mean validity coefficient of .50 with job performance and .50

with training performance. Verbal ability and numerical ability had corrected mean validity coefficients

of .39 and .42 with job performance and .49 and .54 with training perofrmance, respectively.

Content Development

The Cognitive Work Simulations measure two cognitive competencies: Attention to Detail and Analytical

Thinking. Attention to Detail most represents short-term memory from the CHC model (see above) and

Analytical Thinking is a combination of fluid reasoning and quantitative reasoning.

Attention to Detail

The competency is defined by the ability to process information, recall information accurately, and identify

appropriate resources for locating specific pieces of information. This competency is important for data

entry, identifying errors in data, processing orders, working with tables, understanding policies and

procedures, and working with numerical data such as the type found in financial reports.

Analytical Thinking

The competency is defined by the ability to understand language and the numerical relationships. This

competency is characterized by the ability to process complex information, synthesize data, identify and

solve problems, and make well-reasoned decisions.

The Cognitive Work Simulation requires that candidates complete various job-related tasks in a virtual

environment designed to replicate the workplace. There are 31 versions of the Cognitive Work Simulation

and each is designed to represent scenarios common to a specific job family (listed in Table 3). Candidates

are evaluated on how they respond to avatars representing customers and colleagues and how well they

solve problems in various work-related scenarios. The realistic experience is designed to be more engaging

for applicants and to provide a better measure of ability. During a simulation, the candidate might be

required to read email, listen to voicemail and perform basic keyboard and screen navigation tasks to solve

typical business problems. Note that not all 31 versions were analyzed in this report because adequate

response data was not available for some of the instances.


Figure 2. Cognitive Work Simulation

The items used in the Cognitive Work Simulation were based on an existing library of ability items with a

history of predicting job performance in various workplace settings. Both the original items and the new

content were developed by examining the results of multiple job analyses. There are between 5 and 25 items

in each competency measured by the simulation. Responses to assessment items are scored dichotomously.

In some items, there is more than one correct response. In these situations, an applicant is awarded a point

for each correct response selected. It takes approximately 18-25 minutes to complete a simulation.

All Cognitive Simulations

Basic Entry-Level

Face-to-Face Customer Service

Administration

Business Sales

Business/Finance

Entry-Level Administration

Entry-Level Business & Finance

Entry-Level Office

First-Line Supervisor

General Office Workplace

Information Technology


Manager

Customer Service (with Email)

Technician

Retail Sales (Hardware Store)

Teller

Teller with Sales

Customer Service (With Email And Calls)

Collections

Driver

Warehouse

Medical Assistant

Construction*

Flight Attendant*

Fast Food Worker*

Restaurant Worker*

Real Estate Agent*

Hospitality Worker*

Retail Sales (Electronics Store)*

Retail Sales (Fashion Store)*

Retail Sales (Sunglasses Store)*

Train Station Ambassador*

Train Station Operations Manager*

Inside Sales (Call Center Sales)*

• Indicates that this module was not analyzed in this assessment due to insufficient data availability.

Reliability

To evaluate the properties of the cognitive assessments as well as other assessments in the solution, HR

Avatar conducted a research study. Participants from the study were live applicants to the job families listed

in Table 3. Data were collected from March 2016 to August 2018 and were collected on the current versions

of each of the assessments. All product testing data, product demonstrations, and retests (first test was kept)

were excluded from analysis. Additionally, only participants with complete data for each competency were

included in the reliability analyses. This analytic strategy ensures what is reported here best matches with the

operational usage and reliability estimates for organizations. Only simulations with sufficient applicant data

to calculate reliability estimates are included in the table below.

Table 1 contains the descriptive statistics for each scale and the alpha estimates of internal consistency.

Please note, based on these findings, further analyses were done at the item and distractor level. These

analyses led the developers to make substantial edits to these items and we anticipate that edits will

substantially improve the reliability of the tools. Further research evaluating the effectiveness of these

changes is forthcoming.


Table 1. Descriptive Statistics and Reliability Evidence for the Cognitive Workplace Simulation Scales

Cognitive Simulation

Analytical Thinking

Attention to Detail

# of Items

N M SD Alpha # of Items

N M SD Alpha

Basic Entry-Level 19 1201 15.90 2.92 0.81 5 1201 3.78 1.40 0.68


6 251 2.68 2.17 0.84 8 251 5.39 2.01 0.69

Administration 5 1166 2.37 1.22 0.30 11 1163 7.26 2.99 0.75

Business Sales 6 2006 3.01 1.68 0.58 8 2000 3.39 2.04 0.64

Business/Finance 7 3932 4.07 1.59 0.37 6 3918 3.86 1.50 0.50


10 2204 7.82 3.54 0.78 14 2210 1.79 0.46 0.89

Entry-Level Business & Finance

7 314 3.03 1.85 0.66 5 312 3.23 1.46 0.59

Entry-Level Office 25 4991 17.20 4.72 0.81 13 4991 7.04 2.89 0.75

First-Line Supervisor 9 4921 6.41 1.63 0.49 8 4921 5.21 2.04 0.63


7 946 3.17 1.72 0.55 6 975 3.04 1.38 0.39


7 1894 3.94 1.76 0.51 7 1844 4.28 1.77 0.59

Manager 10 2901 5.37 2.19 0.54 5 2901 3.23 1.36 0.49

Remote Customer Service

6 3995 3.07 1.51 0.50 10 3999 5.94 2.68 0.76

Technician 25 263 17.5 4.37 0.79 9 264 4.11 2.98 0.84

Retail Sales 10 85 2.49 2.39 0.76 7 85 3.97 1.86 0.69

Teller 7 611 1.94 0.94 0.58 17 611 11.39 1.72 0.50

Teller with Sales 7 138 2.25 0.99 0.64 17 138 11.27 1.39 0.44

customer Service (With Email And Calls)

6 3551 2.70 1.38 0.39 11 2264 6.41 2.58 0.69

Collections 8 2614 4.32 1.24 0.33 14 2607 8.84 1.85 0.32

Driver 7 145 2.50 1.62 0.51 5 148 2.90 1.28 0.52

Warehouse 8 138 5.21 2.02 0.73 6 138 4.66 1.47 0.68

Medical Assistant 7 275 3.48 1.48 0.39 14 272 10.88 1.97 0.60

Please note that ALL cognitive scales listed above have been supplemented by additional items using one of

several Cognitive Supplement Modules, described below.


Validity

As mentioned above, the simulations were developed based on an existing library of content. The original

developer has provided the results of historical validation studies done with the original content and these

are provided in Appendix B.

Validation evidence is available in two studies that demonstrate the validity of the assessments. The first is a

study with 130 managers and the second is a study with 64 managers. In the first study, the assessment

demonstrated that it predicts job performance (r=.25; p<.01), using a simple “High” or “Marginal”

categorization of job performance, and a second study demonstrated the assessment predicted performance

on a 100-point administrative performance appraisal, using a Spearman correlation (r=.25; p<.05). We

expect validity evidence to be more robust in future studies with better criterion measures. A brief summary

of the results is presented in Appendix G.

Fairness

In order to evaluate how different demographic subgroups perform on the assessments and to assess the

likelihood of adverse impact, analyses were conducted to compare mean scores of the subgroups. The

results are provided in Table 4 and Table 5. Included in the tables are the descriptive statistics for each

subgroup and Cohen’s d. Cohen’s d is an effect size and an indication of how large the mean differences are.

For reference, according to Cohen (1992), effect sizes of .20 are small, .50 are medium, and .80 are large.

Future research will continue to examine subgroup differences as more data become available.

For gender, the weighted-average Cohen’s d values are -0.003 for Analytical Thinking and 0.073 for

Attention to Detail. Negative Cohen’s d values indicate that females scored higher. Overall, these are very

small effect sizes, indicating that there are little-to-no differences between men and women on the cognitive

scales. In the table below, the larger effect sizes tend to be associated with smaller sample sizes, indicating

these results are likely due to sampling error. Based on these results, it is unlikely there will be significant

adverse impact against either men or women by using the cognitive scales for selection decisions.

For age, the weighted-average Cohen’s d values are -0.023 for Analytical Thinking and -.060 for Attention to

Detail. Negative Cohen’s d values indicate that people 40 and Older scored higher. Overall, these are very

small effect sizes, indicating that there are little-to-no differences between ages on the cognitive scales. In

the table below, the larger effect sizes tend to be associated with smaller sample sizes, indicating these

results are likely due to sampling error. Based on these results, it is unlikely there will be significant adverse

impact against age groups by using the cognitive scales for selection decisions.

For ethnicity, the weighted-average Cohen’s d values are -0.031 for Analytical Thinking and .012 for

Attention to Detail. Negative Cohen’s d values indicate that Hispanic or Latino people scored higher.

Overall, these are very small effect sizes, indicating that there are little-to-no differences between ethnicities


on the cognitive scales. In the table below, the larger effect sizes tend to be associated with smaller sample

sizes, indicating these results are likely due to sampling error. Based on these results, it is unlikely there will

be significant adverse impact against ethnic groups by using the cognitive scales for selection decisions.

For Black-White differences, the weighted-average Cohen’s d values are 0.403 for Analytical Thinking

and .280 for Attention to Detail. Positive Cohen’s d values indicate that people White applicants scored

higher. These are small to moderate effect sizes that can lead to adverse impact against Black applicants;

however, these are smaller than values typically seen in the research literature.

For Asian-White differences, the weighted-average Cohen’s d values are 0.312 for Analytical Thinking

and .574 for Attention to Detail. Positive Cohen’s d values indicate that people White applicants scored

higher. These are moderate effect sizes that can lead to adverse impact against Asian applicants and are

slightly higher than is seen in the research literature. A likely hypothesis about these differences is a language

barrier; future research will examine how to mitigate these differences.


Table 2. Evaluation of Cognitive Score Differences by Gender

Cognitive Simulation Scale Female Male

n M SD n M SD d

Basic Entry-Level Analytical Thinking 718 16.01 2.69 264 15.92 2.63 -0.03

Attention to Detail 718 3.64 1.43 264 3.85 1.36 0.15


Analytical Thinking 41 2.34 2.21 184 2.76 2.18 0.19


Administration Analytical Thinking 184 2.46 1.13 749 2.41 1.24 -0.05


Business Sales Analytical Thinking 965 3.05 1.67 746 3.00 1.71 -0.03


Business/Finance Analytical Thinking 1333 4.25 1.60 2111 4.00 1.54 -0.16

Attention to Detail 1333 3.92 1.51 2111 3.85 1.46 -0.05




Entry-Level Business/Finance

Analytical Thinking 166 3.48 1.76 110 2.54 1.88 -0.51


Entry-Level Office Analytical Thinking 381 18.89 4.78 4246 17.12 4.7 0.37


First-Line Supervisor Analytical Thinking 1907 6.45 1.67 2341 6.41 1.58 -0.02








Manager Analytical Thinking 1266 5.58 2.29 1114 5.18 2.12 -0.18





Technician Analytical Thinking 184 16.98 4.55 71 18.69 3.58 0.44


Retail Sales Analytical Thinking N/A

Attention to Detail N/A

Teller Analytical Thinking 144 2.03 0.92 423 1.92 0.94 -0.12


Teller with Sales Analytical Thinking N/A





Collections Analytical Thinking 419 4.80 1.17 2022 4.21 1.22 -0.49


Driver Analytical Thinking N/A



Warehouse Analytical Thinking N/A


Medical Assistant Analytical Thinking N/A


Note. Positive d-values mean the referent group scored higher.


Table 3. Evaluation of Cognitive Score Differences by Age Group

Cognitive Simulation Scale 40 and Older Under 40

n M SD n M SD d






Administration Analytical Thinking 275 2.38 1.20 667 2.57 1.24 0.15


Business Sales Analytical Thinking 283 2.96 1.68 1436 3.32 1.70 0.21




















Manager Analytical Thinking 1046 5.30 2.15 1266 5.56 2.27 0.12





Technician Analytical Thinking 72 17.61 4.33 177 17.44 4.56 -0.04




Teller Analytical Thinking 73 1.95 0.95 504 2.01 0.88 0.07


Teller with Sales Analytical Thinking 48 2.26 0.98 83 2.21 1.03 -0.04





Collections Analytical Thinking 740 4.24 1.25 1680 4.55 1.16 0.26







Medical Assistant Analytical Thinking 53 3.23 1.49 146 3.66 1.45 0.29




Table 4. Evaluation of Cognitive Score Differences by Ethnicity

Cognitive Simulation Scale Hispanic or Latino

Not Hispanic or Latino

n M SD n M SD d
















Analytical Thinking N/A


Entry-Level Office Analytical Thinking 699 17.62 4.75 3033 17.37 4.78 -0.05










Manager Analytical Thinking 206 5.55 2.25 1615 5.04 2.24 -0.22























Medical Assistant Analytical Thinking 94 3.69 1.40 105 3.02 1.53 -0.45




Table 5. Evaluation of White-Black Cognitive Score Differences

Cognitive Simulation Scale Black White

n M SD n M SD d

Basic Entry-Level Analytical Thinking 442 15.45 2.97 319 16.73 2.02 0.49



















First-Line Supervisor Analytical Thinking 41 5.75 1.77 85 6.47 1.56 0.44



Analytical Thinking N/A














Teller Analytical Thinking 186 1.91 0.95 144 1.93 0.94 0.02














Medical Assistant Analytical Thinking 42 3.52 1.58 108 3.53 1.41 0.00




Table 6. Evaluation of White-Asian Cognitive Score Differences

Cognitive Simulation Scale Black White

n M SD n M SD d

Basic Entry-Level Analytical Thinking 76 15.70 2.47 319 16.73 2.02 0.49



















First-Line Supervisor Analytical Thinking 4032 6.47 1.60 85 6.47 1.56 0.00































Medical Assistant Analytical Thinking N/A



Supplemental Modules

In order to improve the reliability of the cognitive simulations, supplemental modules were introduced to

expand the item pool. These supplemental modules are administered alongside the original simulation for a

seamless candidate experience. But in the supplemental modules, a proportion of items from the entire pool

are randomly drawn to be included in the assessment. The effect is that candidates take a slightly longer

simulation and the scores from the test are more reliable.

The table below contains information on the supplemental modules. For each cognitive scale, the size of the

supplemental item pool, the number of items, and the estimated reliability using the Spearman-Brown

formula are reported. Note that due to low sample sizes, fairness analyses were not able to be computed.

These will be analyzed when sufficient data becomes available.

Table 7. Estimated Reliability Estimates for the Supplemental Cognitive Workplace Simulation Scales

Cognitive Simulation Analytical Thinking Attention to Detail

# of Items in pool

# of Items given

Alpha # of Items

in pool # of Items

given Alpha

Basic Entry-Level 78 26 0.93

Administration 39 14 0.62 87 29 0.84

Business Sales 78 26 0.88 50 17 0.85


45 15 0.79 81 27 0.78

Manager 48 16 0.75 66 22 0.84

Customer Service 96 24 0.76 56 14 0.84

All cognitive simulations now have a sequential cognitive supplement module in operation to bolster

reliability.


Attitudes, Interests and Motivations Assessment

Previous Research

The Big Five framework for understanding personality is the most widely accepted taxonomy of personality

traits (Goldberg, 1992; 1993; McCrae & Costa, 1997; McCrae & John, 1992). The five factors, or traits, are

Agreeableness, Conscientiousness, Emotional Stability, Extraversion, and Openness. Agreeableness is

characterized as warmth, ability to get along with others, amd friendliness. Individuals who are

conscientiousness are characterized as being reliable, responsible, dependable, thorough, dutiful,

achievement-oriented and competent. Individuals considered Emotionally Stable are more resilient, stable,

and centered. Extraverted individuals are characterized as being sociable, outgoing, and high in energy.

Openness describes indiciduals who are “creativc, flexible, curious, and unconventional” (Judge & Ilies,

2002)

A substantial amount of research, including meta-analytic research, has demonstrated the relationship

between personality and job performance (Barrick & Mount, 1991; Bartram, 2005; Hogan & Holland, 2003;

Hurtz & Donovan, 2000; Mount & Barrick, 1995; Salgado., 1997; Tett, Jackson, & Rothstein, 1991). In a

meta-analysis of 25 years of personality and performance research drawn from 117 studies of managers,

professionals, sales people, skilled, and semi-skilled workers; researchers concluded that extroversion

predicted success in management and sales; openness to experience predicted training ability; and,

conscientiousness correlated with success in sales, management, professional, skilled and semi-skilled

positions (Barrick & Mount, 1991). The findings were similar in the European community (Salgado, 1997).

Personality traits are stable and slow to change (McCrae, Jang, Livesley, Riemann, & Angleitner, 2001). The

stability of personality traits and their relationship to job performance makes them extremely important to

measure before making a hiring decision.

Theory on personality in the workplace has advanced and researchers have demonstrated that validity

coefficients for personality assessments are stronger for those job performance criteria that can be

theoretically linked to a given personality trait. For example, Borman and Motowidlo (1993, 1997) found

support for their hypotheses that personality would have a stronger relationship with contextual rather than

task performance. The researchers defined contextual performance as those areas of performance that are

important to an organization but not explicity part of the job duties of an individual (e.g., following

organizational rules, giving additional effort when needed, providing support to other organizational

memebers, etc.). Hogan and Holland (2003) found that the Hogan Personality Inventory, a personality

assessment based on the Big Five, was substantially related to multiple measures of job performance across

a wide variety of positions. The relationships were stronger for personality traits that were theoretically

linked to the job performance measure. Bartram (2005) had similar findings, supporting the notion that

theoretically linked personality constructs and job performance domains had stronger relationships than


personality traits with overall performance. The trait-activation model proposed by Tett and Burnett (2003)

provides a theoretical model allowing for more detailed predictions regarding when a specific personality

trait might be linked to job performance.

Another approach to improving prediction with personality assessment is to measure narrower traits that are

specific facets of the Big Five (Paunonen, Rothstein, & Jackson, 1999; Schneider, Hough, & Dunnette,

1996). Researchers who argue for this approach posit that some facets of a trait may relate to a particular job

performance dimension, but that others may not - or may even be negatively related. Measuring at the broad

level may dilute the relationship between the narrower facet and the job performance dimension. For

example, achievement and reliability are both narrow facets of conscientiousness. One might expect a strong

positive relationship between reliability and retention, but there may not be a relationship between

achievement and retention. Using an assessment that measures the broad trait of conscientiousness may

under-predict retention. Researchers conducted a meta-analysis and demonstrated the superiority of using

narrow facets to predict specific job performance dimensions (Woo, Chernyshenko, Stark, & Conz, 2014).

Furthermore, many personality assessments used for pre-employment selection are actually compound traits

– or constellations of narrow facets (Schneider et. al., 1996). These constellations may come from one or

more of the broader personality traits. In fact, instruments used in both the Hogan and Holland (2003) and

Bartram (2005) meta-analyses contain measures of compound traits.

AIMS v3 Content Development

A review of the literature on personality in the workplace allowed us to identify several competencies, or

compound traits, that are related to successful job performance for many jobs. The ten scales, their

definitions, and example items are listed below. Please note, example items are similar to, but do not reflect,

actual items from the assessment.

Needs Structure

Often, following rules and procedures is critical to successful job performance. When employees do not

follow established organizational policies, work processes may be performed incorrectly, work products may

be flawed, customer service levels may suffer, the organization may become a victim to fraudulent wage and

expense billings, and in some cases the organization may be liable for the actions of the employee. The

Needs Structure scale was designed to evaluate an applicants’ tendency to adhere to organization rules and

procedures and should be important in jobs where rule-abiding behaviors are related to job performance.

Example: I prefer to make extensive plans before I start any project.


Innovative and Creative

For many positions, Innovation and Creativity is critical to successful job performance. Individuals with

high levels of this competency should be more likely to generate new products, develop improved methods

for producing work, and develop solutions to problems.

Example: Some of my suggestions are really eccentric.

Enjoys Problem-Solving

Enjoys Problem-Solving is a competency designed to predict which applicants will be more successful in

roles dealing with data, completing analyses, and conducting research. Individuals who enjoy problem

solving should be more successful in roles that require analytical thinking.

Example: I enjoy learning about how things work.

Competitive

The Competitive competency evaluates the extent to which in individual is likely to do what is necessary to

accomplish their goals – which may be of concern if the individual’s goals are not aligned with the

organization’s goals. Individuals who score highly on this competency are characterized as having concern

for outcomes rather than the feelings of coworkers.

Example: I am not above using people to get my way.

Seeks Perfection

When the quality of work (as opposed to pace) is important for success in a role, the Seeks Perfection

competency should predict performance. Those who score high on this competency are more likely to

double-check their work and have higher rates of accuracy. Additionally, individuals high on this

competency are more likely to be detail-oriented.

Example: My work tends to be faultless.

Develops Relationships

For many roles, working in teams and with others is critical for successful performance. The Develops

Relationships scale was designed to predict which individuals are more likely to develop productive working

relationships and be successful in a team situation.

Example: I have never deliberately said anything that hurt someone's feelings.


Expressive and Outgoing

Individuals who are Expressive and Outgoing are more likely to engage with others. Being Expressive and

Outgoing is also characterized by the ability to influence and persuade others. This competency is important

for success in roles where influence is necessary such as leadership and sales positions.

Example: I tend to take control in most work situations.

Corporate Citizenship

The Corporate Citizenship competency was designed to measure applicants’ tendency to conduct

themselves ethically and responsibly in an organization. Individuals who score highly on this competency are

likely to be honest with managers and colleagues and avoid taking advantage of people and/or situations

just because it suits their self-interest.

Example: I would never use manipulation as a tactic to advance my goals.

Exhibits a Positive Work Attitude

This competency evaluates the tendency for an individual to feel positively toward their job and the

organization they work for. Individuals who score highly on this competency are characterized as feeling

satisfied with their work and are more likely to put in extra effort when needed.

Example: I volunteer for additional work.

Adaptable

The Adaptable competency measures an individual’s tendency to adjust to changes in their work

environment. Individuals who score highly on this competency thrive in fast-paced settings and respond

well to variety.

Example: I think organizational changes are fun and exciting.

The figure below outlines the hypothesized relationships between the AIMs scales and the Big Five traits.

As mentioned above, the AIMs scales are compound traits and measure competencies related to multiple

narrow facets of personality and these facets may be under one or more of the five broader personality

traits. Therefore, an AIMs scale may be expected to relate, albeit moderately, to multiple Big Five traits.


Figure 3. Hypothesized Relationships between the AIMs Scales and the Big Five

The original assessment consisted of 100, Likert-type items. Each item consists of a single statement and

candidates are asked to indicate their level of agreement with the statement. Preliminary data analyses

allowed us to shorten the scales such that the assessment contains a total of 65 items. The AIMs assessment

is available in two forms. The professional form measures all ten competencies. The entry-level form

measures eight competencies as it was determined that Expressive and Outgoing and Innovative and

Creative would be less critical competencies for these roles. The time to take the professional form is

estimated to be between 6 and 7 minutes, and the time to take the entry-level form is estimated to be 5 and

6 minutes. The estimates were calculated using a sample of 402 individuals.

Reliability

The descriptive statistics and reliability estimates of the AIMs scales were estimated using the data collected

from live applicants in the course of applying for jobs. Analyses were restricted to US samples. As can be

seen from the table below, most AIMs scales have acceptable levels of internal consistency. Please note,

Emotional

Stability

Openness

Extraversion

Conscientiousnes

s

Agreeableness


Innovative and Creative

Competitive

Seeks Perfection


Expressive and Outgoing


Exhibits a Positive Work

Attitude

Adaptable

Needs Structure


based on these findings, further analyses were done at the item and distractor level. These analyses led the

developers to make substantial edits to these items and we anticipate that edits will substantially improve the

reliability of the tools. Further research evaluating the effectiveness of these changes is forthcoming.

Table 8. Descriptive Statistics and Reliability Evidence for the AIMS v3 Scales

Competency N M SD Alpha

Needs Structure 1999 30.47 3.75 .69

Innovative & Creative 1999 26.94 4.34 .77

Enjoys Problem-Solving 1999 27.94 4.71 .84

Competitive 1999 32.22 6.15 .77

Seeks Perfection 1999 27.95 4.46 .73

Develops Relationships 1999 29.39 3.96 .70

Expressive & Outgoing 1999 19.54 5.52 .69

Corporate Citizenship 1999 47.87 5.68 .81

Exhibits a Positive Work Attitude 1999 30.92 3.85 .74

Adaptable 1999 29.49 3.70 .48

Validity

A Confirmatory Factor Analyses (CFA) was run using R (version 3.1.2) and the “sem” package to evaluate

whether or not the internal structure of the assessment conformed to the hypothesized structure. In other

words, we assessed the extent to which the items loaded onto the particular scale they were assigned to. The

Root Mean Square Error (RMSEA), an indicator of model fit, indicates that the model was satisfactory,

RMSEA=.0675, χ2 (1970, N=567) =7057.80, p<.001. This analysis provides validation support for the

internal structure of the assessment.

As mentioned previously, the content for the AIMs scales were modified and adapted from a longer

assessment that had been used in previous validation research. The developer of the original content

provided the results of several criterion-related validity studies and tables with these results can be found in

Appendix C.





on a 100 point administrative performance appraisal, using a Spearman correlation (r=.25; p<.05). We




Fairness

In order to evaluate how different demographic subgroups perform on the assessments, mean differences

between protected subgroups were compared (Tables 8 -12). There were too few cases (n<30) to run

analyses for the American Indian or Alaska Native and Native Hawaiian or Other Pacific Islander

subgroups. There were some substantive mean differences on the AIMs scales between males and females;

however, all differences were inconsistent and relatively small, with an average standardized mean difference

of 0.029, very slightly favoring males. There were no substantive mean differences based on ethnicity, with

an average standardized mean difference of 0.033, very slightly favoring Not Hispanic or Latino people.

There were a few substantive mean differences with respect to age group, with a small standardized mean

difference favoring individuals under 40 (0.108). There were a few substantive mean differences between the

Asian and White subgroups, notably on the Corporate Citizenship scale. However, these differences might

be reflections of cultural differences; care should be taken to document the business necessity for use of

scales with large group differences. Overall, there was a small standardized mean difference favoring the

White subgroup (0.113). There were some small differences between the Black or African American

subgroup and the White subgroup, with the average standardized mean difference (-0.230) and most scales

favoring the Black or African American subgroup.

Table 9. Evaluation of AIMs v3 Score Differences by Gender

Scale

Male Female n M SD n M SD d

Competitive 462 33.90 6.37 1383 30.42 6.18 -0.56

Corporate Citizenship 462 48.84 5.27 1383 50.63 4.52 0.38

Develops Relationships 462 29.02 4.04 1383 29.54 3.99 0.13

Enjoys Problem Solving 462 29.11 4.71 1383 28.37 4.71 -0.16

Exhibits a Positive Work Attitude 462 31.02 3.79 1383 31.73 3.70 0.19

Expressive and Outgoing 462 19.71 5.24 1383 17.68 4.96 -0.40

Innovative and Creative 462 27.48 4.50 1383 27.11 4.53 -0.08

Needs Structure 462 30.53 3.79 1383 30.81 3.96 0.07

Seeks Perfection 462 28.55 4.57 1383 28.55 4.48 0.00

Adaptable 462 29.11 3.80 1383 29.62 3.65 0.14

Note. Positive d-values indicate females scored higher


Table 10. Evaluation of AIMs v3 Score Differences by Ethnicity

Hispanic or

Latino Not Hispanic or

Latino

Scale n M SD n M SD d

Competitive 233 30.48 6.20 1327 31.33 6.44 0.13

Corporate Citizenship 233 51.11 4.12 1327 50.06 4.80 -0.22


Enjoys Problem Solving 233 28.34 4.52 1327 28.59 4.76 0.05

Exhibits a Positive Work Attitude 233 31.81 3.66 1327 31.43 3.83 -0.10

Expressive and Outgoing 233 17.27 4.84 1327 18.38 5.14 0.22

Innovative and Creative 233 26.74 4.31 1327 27.23 4.50 0.11

Needs Structure 233 31.03 4.13 1327 30.60 3.93 -0.11


Adaptable 233 29.46 3.82 1327 29.46 3.65 0.00

Note. Positive d-values indicate Not Hispanic or Latino scored higher

Table 11. Evaluation of AIMs v3 Score Differences by Age Group

40 and Older Less than 40


Competitive 623 30.00 6.13 1238 31.95 6.42 0.31






Innovative and Creative 623 26.29 4.37 1238 27.75 4.50 0.33

Needs Structure 623 30.48 3.99 1238 30.91 3.88 0.11


Adaptable 623 29.81 3.51 1238 29.39 3.76 -0.11

Note. Positive d-values indicate people under 40 scored higher


Table 12. Evaluation of AIMs v3 Score Differences by Race Groups: Asian and White

Asian White


Competitive 601 32.32 6.41 637 30.47 6.29 -0.29

Corporate Citizenship 601 46.99 5.77 637 51.66 2.77 1.04

Develops Relationships 601 29.56 4.04 637 29.08 3.84 -0.12


Exhibits a Positive Work Attitude 601 30.75 4.03 637 31.54 3.40 0.21

Expressive and Outgoing 601 20.10 5.50 637 17.66 4.47 -0.49


Needs Structure 601 30.28 3.65 637 30.16 4.15 -0.03


Adaptable 601 27.96 3.50 637 29.96 3.41 0.58

Note. Positive d-values indicate White applicants scored higher

Table 13. Evaluation of AIMs v3 Score Differences by Race Groups: Black or African American and White

Black or African

American White


Competitive 503 31.32 6.25 637 30.47 6.29 -0.13


Develops Relationships 503 29.93 4.20 637 29.08 3.84 -0.21

Enjoys Problem Solving 503 29.94 4.68 637 28.45 4.43 -0.33




Needs Structure 503 32.12 3.68 637 30.16 4.15 -0.50

Seeks Perfection 503 29.64 4.57 637 28.48 4.42 -0.26

Adaptable 503 30.73 3.59 637 29.96 3.41 -0.22


AIMS v4 Content Development

A common request from employers who use selection assessments is to have a shortened form so that

candidates for employment spend less time being assessed. This is designed to improve the applicant

experience and make for less administrative overhead. Psychometrically, however, this poses a challenge, as

shortening scales can reduce the assessment’s reliability and validity, thereby reducing its usefulness for

employers.

To address these concerns, HR Avatar created a shortened form of the AIMS assessment that focuses on

only a few scales. These scales were carefully chosen to represent the most predictive and broadly applicable

scales that would work for nearly all positions. The advantage of this strategy is that it shortens the


assessment while at the same time maintaining (or improving) the reliability and validity of the individual

scales. Utilizing a mix of old and new items, the following scales comprised the shortened AIMS assessment:

Integrity

Integrity is one of the most widely predictive personality scales in addition to being an important

psychological construct. Individuals high in integrity do the right thing even when no one is watching and

strive to act with ethical motivations at all times. Individuals low in integrity frequently break the rules This

scale is designed to predict both overall job performance as well as counter-productive behaviors at work

(e.g., absenteeism, theft).

Example: I frequently ignore the rules at work.

Teamwork

With the nature of work increasingly involving collaboration with others, it is important to have a scale that

distinguishes those who would thrive in such an environment from those who prefer individual work.

Individuals high on the Teamwork scale derive energy from working with others and enjoy tasks involving a

lot of teamwork. Individuals low on the Teamwork scale prefer to work solo and may find extensive

teamwork tiring.

Example: I do my best work when working as a member of a team.

Drive

Conscientiousness is arguably the most well-validated personality trait for the prediction of job performance

(Barrick & Mount, 1991). The Drive scale is designed to measure a subset of conscientiousness, focusing on

the most predictive parts, achievement and dependability (Dudley, Orvis, Lebiecki, & Cortina, 2006).

Individuals high on this scale are driven to achieve as much as possible at work while being detailed.

Individuals low on this scale may not achieve as much, nor are the particularly detailed.

Example: I always try to achieve as much as possible at work.

Empathy and Emotional Self-Control

Empathy and Emotional Self-Control is derived from a model of emotional intelligence (see the next

section) and is included in this version of the AIMS assessment. High scores on this scale indicate people

are able to appropriately regulate their own emotions and be sensitive to the needs of others. Low scores on

this scale indicate people may struggle to understand others and be more expressive in terms of their own

emotions.


Example: I can tell when my coworkers are upset.

Reliability and Fairness

The descriptive statistics and reliability estimates of the AIMs v4 scales were estimated using the data

collected from live applicants in the course of applying for jobs. Analyses were restricted to US samples. As

can be seen from the table below, all these scales have acceptable levels of internal consistency.

Unfortunately due to a low volume of testing, fairness analyses were not able to be completed at this time; a

future update will include these analyses once sufficient data are available.

Table 14. Descriptive Statistics and Reliability Evidence for the AIMS v4 Scales

Competency N M SD Alpha

Integrity 271 69 8.9 .78

Teamwork 271 58 5.2 .79

Drive 271 64 5.2 .79

Empathy and Emotional Self-Control

271 61 6.3 .77


Emotional Intelligence Module

Previous Research

Emotional Intelligence (EI) is a psychological construct that broadly refers to a person’s capability to

understand and reason through emotions and their effects on others. There are two primary models of EI:

ability models view EI as a form of intelligence, while trait models view EI as primarily a set of personality

variables. Though substantial work has been done explaining why different models of EI relate to job

performance (e.g., Joseph, Jin, Newman, & O’Boyle, 2015), its utility as a predictor of job performance is

well-established.

Several meta-analyses have established the validity of the relationship between EI and job performance.

Joseph and Newman (2010) found a validity of .47 for trait-based EI and .23 for ability-based EI. O’Boyle,

Humphrey, Pollack, Hawver, and Story (2011) expanded upon this work and found a validity of .28 for

commercially available EI measures built upon the trait model of EI. Additionally, research has consistently

shown that trait-based EI measures provide incremental validity in predicting job performance over

cognitive ability and personality measures (Joseph & Newman, 2010; O’Boyle et al., 2011; Andrei, Siegling,

Aloe, Baldaro, & Petrides, 2016).

EI has also been linked to several other important criteria, including academic performance, with a validity

of .20 for trait models (Perera, & DiGiacomo, 2013) and job satisfaction, with a validity of .39 for trait

models (Miao, Humphrey, & Qian, 2017). With a measure that predicts a wide variety of important criteria,

Emotional Intelligence is a critical trait to assess.

Content Development

The Emotional Intelligence module consists of three dimensions intending to represent a trait-approach to

the construct: self-control, self-awareness, and empathy. Questions were developed PhD-level

Industrial/Organizational psychologists with expertise in emotional intelligence. Likert-type items were

developed targeting the three scales; initial pilot testing settled on the set of items included in this measure.

The three Emotional Intelligence subscales have six items each and are (note that example items are

representative of, but not included on the final scale):

Self-Control

The Self-Control scale asks questions about a person’s ability to regulate their emotions.

Example: I am able to stay calm in tense situations at work.


Self-Awareness

The Self-Awareness scale asks questions about a person’s ability to understand their emotions and the

effects they have on others.

Example: I can see how my moods affect my colleagues.

Empathy

The Empathy scale asks questions about a person’s ability to understand and respond appropriately to the

emotions and feelings of others.

Example: I can usually tell when my colleagues are upset.

Reliability

The descriptive statistics and reliability estimates of the Emotional Intelligence scales were estimated using

the data collected from live applicants in the course of applying for jobs. Analyses were restricted to US

samples. As can be seen from the table below, the overall Emotional Intelligence has acceptable levels of

reliability, while the individual subscales are provided for developmental purposes only. Please note, based

on these findings, further analyses were done and will inform future scale revisions.

Table 15. Descriptive Statistics and Reliability Evidence for the Emotional Intelligence Module

Competency N M SD ρxx

Emotional Intelligence Overall 413 78.95 9.42 .77

- Self-Control 413 30.31 3.12 .60

- Self-Awareness 413 26.23 4.65 .63

- Empathy 413 25.31 4.68 .69

Fairness



analyses for most racial groups; only Black/White differences are reported here. There were no substantive

mean differences on the Emotional Intelligence scales between males and females. There were some small

to moderate group differences on the Emotional Intelligence scales for Ethnicity, Age, and Race; care

should be taken when using these scores so as not to create unnecessary adverse impact. Further research

will investigate reasons for these differences.


Table 16. Evaluation of EI Score Differences by Gender

Scale


Emotional Intelligence Overall 56 80.39 9.57 317 78.87 9.43 -0.16

Self-Control 56 30.48 3.03 317 30.37 3.10 -0.04

Self-Awareness 56 27.00 4.49 317 26.10 4.78 -0.19

Empathy 56 25.45 4.67 317 25.37 4.72 -0.02


Table 17. Evaluation of EI Score Differences by Ethnicity

Hispanic or

Latino Not Hispanic or

Latino


Emotional Intelligence Overall 84 76.61 9.33 272 79.39 9.32 0.30

Self-Control 84 30.38 2.67 272 30.35 3.12 -0.01

Self-Awareness 84 24.86 4.29 272 26.40 4.67 0.34

Empathy 84 24.12 5.07 272 25.58 4.59 0.31


Table 18. Evaluation of EI Score Differences by Age Group



Emotional Intelligence Overall 160 76.76 9.19 232 80.63 9.26 0.42

Self-Control 160 29.99 3.12 232 30.70 2.97 0.23

Self-Awareness 160 25.33 4.84 232 26.87 4.45 0.33

Empathy 160 24.36 4.68 232 25.95 4.67 0.34


Table 19. Evaluation of EI Score Differences by Race Groups: Black or African American and White

Black or African

American White


Emotional Intelligence Overall 152 81.75 10.11 183 77.56 7.87 -0.47

Self-Control 152 31.27 2.99 183 29.66 3.04 -0.53

Self-Awareness 152 27.61 4.91 183 25.43 3.77 -0.50

Empathy 152 25.89 5.06 183 25.34 4.17 -0.12



Workplace Competency Assessment

Previous Research

Sometimes called low-fidelity simulations (e.g., Motowidlo, Dunnette, & Carter, 1990), situational judgment

tests (SJTs) have a long history in personnel selection. These assessments present a job-related situation

followed by several possible responses to that situation. The applicant is then asked to either choose a

response or rate the effectiveness of the responses.

The predictive validity of SJTs has long been established. In the first meta-analysis on the issue, McDaniel,

Morgeson, Finnegan, Campion, and Braverman (2001) found that SJTs predicted overall job performance

quite well, with a validity of = .34. This has been confirmed in updated meta-analyses (McDaniel,

Hartman, Whetzel, & Grub, 2007) and extended to note that SJTs are predictive of multiple facets of job

performance (Christian, Edwards, & Bradley, 2010).

Regarding expected adverse impact, there are some small-to-moderate race and ethnic group differences

that could lead to adverse impact (Whetzel, McDaniel, & Nguyen, 2008). For African-American, Asian, and

Hispanic groups, the differences range from .20 to .40, depending on the type of test used. These

differences could lead to adverse impact against these groups. Roth, Bobko, and Buster (2013) extended

these findings to note that there were smaller differences for SJTs focusing on interpersonal skills rather

than cognitively loaded situations. There were no substantial gender differences found (Whetzel, et al, 2008).

Content Development

After a literature review, HR Avatar worked with a team of subject matter experts to define target

competencies predictive of leadership and job performance. From these definitions, a PhD-trained I/O

Psychologist with expertise in SJT test development wrote the items, which were then reviewed with the

team of SMEs. After the items were finalized, the items were pilot tested to rate the effectiveness of each

response option. The final version of the Workplace Competency assessment consists of five competencies

that listed below.

Coaching and Developing Others

Identifies the development needs of others and coaches, mentors, or otherwise helps others to improve

their knowledge or skills. Starts coaching and developing with building a relationship of mutual trust,

working together to decide what to accomplish, set a goal, make a roadmap for reaching the goal, and give

feedback along the way. Provide specific behavioral examples when giving feedback on performance issues,

clarify expectations, and get a commitment from the employee to act.


Exercising Political Savvy

Understands how to position self and communicate objectives in the context of organizational issues and

other personnel, to maximize outcomes both for one’s group and the organization. Gets people to

cooperate with oneself, socializes ideas and builds bridges to meet others halfway.

Guiding, Directing, and Motivating Others

Provides direction and guidance to subordinates, including setting performance standards and monitoring

performance. Coordinates the work and activities of others. Encourages goal accomplishment. Makes

detailed plans that consider what is most important. Communicates priorities to team members. Holds team

accountable for their work. Provides advice that is reasonable and socially aware.

Resolving Conflicts and Meeting Customer Needs

Handles complaints. Looks for ways to solve problems collectively and agree on next steps. Settles disputes

and resolves grievances and conflicts, or otherwise negotiates with others. Works to understand the views of

both sides of a conflict, ensures relevant information is shared and considered, and helps parties in a conflict

to find common objectives.

Team Building

Engages and participates in activities that support improved team social relations, building mutual trust,

respect, communication, understanding and cooperation among team members. Focuses on providing a

team environment that is conducive to collaboration, fostering innovation and creativity, promoting

increased comfort level and celebration among team members.

Reliability

The descriptive statistics and reliability estimates of the Workplace Competency scales were estimated using

the data collected from live applicants in the course of applying for jobs. Applicants provide a rating of

effectiveness on each scenario, and their score is calculated as the absolute value of the z-score difference

from the true effectiveness; as such, the minimum possible score is zero, and lower scores are better. As can

be seen from the table below, the Workplace Competency scales have acceptable levels of reliability. Please

note, based on these findings, further analyses will be done to inform and improve future scale revisions.


Table 20. Descriptive Statistics and Reliability Evidence for the Emotional Intelligence Module

Competency N M SD ρxx

Coaching and Developing Others 996 0.827 0.278 .80

Exercising Political Savvy 996 0.823 0.250 .77

Guiding, Directing, and Motivating Others 996 0.816 0.248 .71

Resolving Conflicts and Meeting Customer Needs 996 0.797 0.254 .69

Team Building 996 0.814 0.270 .76

Note. Lower scores are better.

Fairness



analyses for racial groups. Most of the group differences were quite small, suggesting only minimal risk for

adverse impact against the groups investigated. Further research will monitor these differences.

Table 21. Evaluation of EI Score Differences by Gender

Scale


Coaching and Developing Others 306 0.806 0.271 233 0.872 0.288 0.24

Exercising Political Savvy 306 0.809 0.244 233 0.841 0.246 0.13


306 0.820 0.256 233 0.843 0.243 0.09


306 0.799 0.259 233 0.801 0.264 0.01

Team Building 306 0.812 0.274 233 0.841 0.266 0.10


Table 22. Evaluation of EI Score Differences by Ethnicity

Hispanic or Latino Not Hispanic or

Latino





63 0.795 0.247 318 0.850 0.265 0.21


63 0.817 0.245 318 0.825 0.277 0.03

Team Building 63 0.788 0.246 318 0.854 0.278 0.24



Table 23. Evaluation of EI Score Differences by Age Group






198 0.783 0.248 334 0.858 0.261 0.29


198 0.784 0.231 334 0.828 0.288 0.16

Team Building 198 0.777 0.248 334 0.861 0.291 0.30


Behavioral History Survey

Previous Research

Past behavior often predicts future behavior. Biographical data, or bio-data, assessments contain items

developed by identifying patterns associated with high productivity and low turnover. Hunter and Hunter

(1984) report the average validity coefficient for bio-data assessments to be .38. Other researchers have

estimated the validity to be .35 for supervisors (Rothstein, Schmidt, Erwin, Owens, & Sparks, 1990) and .53

for managers (Carlson, Scullen, Schmidt, Rothstein, & Erwin, 1999).

For example, Rothstein, et al., (1990) identified several bio-data factors that could be applied to many jobs.

These included things like having a pervasive feeling of self-worth and confidence; believing that he or she

works better and faster than others in his or her area of specialization; having been recognized for

accomplishments; being outgoing; being a good communicator; taking clear positions; and, feeling healthy

and satisfied with current life situations. Rothstein screened each biographical item for cross-validity then

meta-analyzed 11,000 first-line supervisors from different organizations, age levels, genders, job experience

levels and tenures. He concluded that in all cases, validity estimates for these factors were generalizable,

stable across time, and did not appear to stem from acquired skills, knowledge or abilities.

McDaniel (1989) evaluated biographical questions about school suspensions, drug use, quitting school, prior

employment experience, grades, club memberships, contacts within the legal system, and socioeconomic

status. The results successfully predicted discharge from the military for problems such as alcohol and drug

use, desertion, imprisonment, and “discreditable incidents.”

Oswald, Schmitt, Kim, Ramsay, and Gillespie (2004) reported statistically significant bio-data correlations

with 12 dimensions of college student performance: knowledge, learning, artistic ability, multicultural


sensitivity, leadership, interpersonal skills, citizenship, health, careers, adaptability, perseverance, and ethics.

Their work showed incremental validity over the traditional use of SAT and ACT with fewer differences

between subgroups than traditional admission measures.

In a similar vein, Kanfer, Crosby, and Brandt (1988) identified correlations between bio-data and tenure; and

a study of 555 real estate agents, Klimoski and Childs (1986) identified five major bio-data factors associated

with job, personal and career success. They included social orientation, economic stability, work ethic

orientation, educational achievement and interpersonal confidence.

Content Development

The Behavioral History Survey consists of three biographical history areas that generalize across most jobs:

tenure, performance, and unproductive behavior. Questions and forms were developed for both

professional and entry-level positions using a panel of experienced managers. Each bio-data item was

reviewed by an expert management panel and scored using a modified Angoff method. There are a few

versions of each form available because some of the items were relevant for some positions but irrelevant

for others. For example, an item asking about experience with an industry would not be appropriate for a

position that spans multiple industries (e.g. Administrative Assistant). Scores for each competency are

calculated by averaging across items within that competency. The use of averaging, rather than summing

across items, allows HR Avatar to compare similar scales across multiple versions of the assessment. The

entry-level form contains 14-15 items and takes approximately 2 minutes to complete. The professional

form contains 18-20 items and takes approximately 6 minutes to complete. The estimates were estimated

with two samples (N= 70 and N=371, respectively).

Performance

The Performance scale asks questions related to past performance on the job and should predict multiple

dimensions of job performance.

Example: How many times have you been promoted at work?

Tenure

The Tenure scale asks questions specifically related to tenure on previous jobs. This scale should predict

retention.

Example: What is your longest tenure with an organization?


Reliability

The scales in the Behavioral History Survey are relatively short – some with as few as three items.

Additionally, the scales were not expected to be internally consistent as there is no evidence that the

individual item responses should be highly correlated with one another. Therefore, a test-retest reliability

estimate was determined to be more appropriate than one of internal consistency (e.g., Cronbach’s alpha).

Descriptive statistics and reliability estimates were calculated using data collected via MTurk. This data

collection effort is summarized in the Cognitive Work Simulations section. Because participants were

allowed to take multiple versions of the solution, several participants took the Behavioral History Surveys

more than once. Test-retest reliability estimates were calculated by correlating the scale scores between the

first and second administrations of the form.

For the Professional form, the sample was restricted to those individuals who had at least two days between

administrations. The largest number of days between administrations was 25. On average, there were 11

days between administrations (SD=7.49). The descriptive statistics and test-retest reliability estimates are in

Table 12 and all reliability estimates are acceptable.

For the Entry-Level form, the sample was restricted to those individuals who had at least two days, but less

than 30 days between administrations. The largest number of days between administrations was 20. On

average, there were 7 days between administrations (SD=4.76). The descriptive statistics and test-retest

reliability estimates are in Table 13 and all reliability estimates are acceptable.

Please note, there are multiple versions of the Performance scale for both the Professional and the Entry-

Level forms as described above. The descriptive statistics, test-retest reliability estimates and fairness

analyses for this scale are estimated by averaging across various versions of the scales.

Table 24. Descriptive Statistics and Reliability Evidence for the Professional Behavioral History Survey

Competency N M SD ρxx N for

ρxx

Performance 4567 2.45 0.24 .81 65

Tenure 4567 2.42 0.37 .71 65

Table 25. Descriptive Statistics and Reliability Evidence for the Entry-Level Behavioral History Survey

Competency N M SD ρxx N for

ρxx

Performance 1594 2.25 0.47 .74 151

Tenure 1594 2.50 0.33 .71 151


Fairness

In order to evaluate how different demographic subgroups perform on the assessments, mean comparisons

were computed for each of the subgroups (Tables 14-17). There were too few cases (n<25) to run analyses

for the American Indian or Alaska Native, Black or African American, Native Hawaiian or Other Pacific

Islander, Two or More Races, and Other subgroups. There were significant mean differences on the

Behavioral History scales between males and females. However, all differences were relatively small or they

were in favor of the focal subgroup, females. There were a few significant mean differences based on

ethnicity. Unfortunately, the sample is too small for the Hispanic and Latino subgroup to draw any

conclusions. As data become available, further research will explore these differences at the item and

response level. There were significant mean differences with respect to age group, but all were in favor of

the focal subgroup, 40 and older. There were a few significant mean differences between the Asian and

White subgroups. However, individuals in this sample were from multiple countries and it’s possible that

these differences might be reflections of cultural differences. Additional research is needed to further

evaluate these differences, to calculate country-specific norms, and to evaluate subgroup differences within

country.

Table 26. Evaluation of Biographical History Survey Score Differences by Gender

Male Female

Scale n M SD N M SD d

Professional Performance 1310 2.48 0.23 2945 2.43 0.24 -0.22

Professional Tenure 1310 2.36 0.38 2945 2.45 0.37 0.24

Entry-Level Performance 694 2.23 0.48 742 2.29 0.47 0.14

Entry-Level Tenure 694 2.48 0.34 742 2.54 0.31 0.17

Note. *p<.05

Table 27. Evaluation of Biographical History Survey Score Differences by Ethnicity

Hispanic or Latino

Not Hispanic or

Latino


Professional Performance 448 2.44 0.24 3155 2.47 0.24 0.11




Note. *p<.05


Table 28. Evaluation of Biographical History Survey Score Differences by Age Group







Note. *p<.05

Table 29. Evaluation of Biographical History Survey Score Differences by Race Groups: Asian and White

Asian White






Note. *p<.05

Table 30. Evaluation of Biographical History Survey Score Differences by Race Groups: Black and White

Black or African

American White



Professional Tenure 687 2.70 0.25 753 2.63 0.28 -0.26

Entry-Level Performance 257 2.52 0.41 463 2.41 0.38 -0.28

Entry-Level Tenure 257 2.64 0.26 463 2.63 0.29 -0.04

Note. *p<.05

Industry-specific Supplements

To improve the reliability and validity of the biodata scales, some industry-specific items were added. These

additional items were only administered to candidates applying for jobs in specific industries. Data are

currently being collected on these items; reliability, validity, and fairness information will be updated once

sufficient data have been collected.

Knowledge and Skills Tests

Development Overview

Candidates need more than just the right combination of abilities, personality characteristics and

background in order to be successful in a job. Most often, specific knowledge and/or specific skills are also


required. Job knowledge tests measure job-relelevant declaritive knowledge such as technical information,

standards, and best practices as well as knowledge of specific processess and procedures. Job knowledge

tests serve as an indicator of previous job performance and serve as proximal predictors of future job

performance. Skills assessments evaluate a candidate’s ability to perform a specific task. Candidiates who

begin work with a sufficient level of knowledge and/or skill should require less training and beable to

perform better faster. Hunter and Hunter (1984) report the average validity coefficient for job knowledge

tests to be .48. In their meta-analysis, Dye, Reck, and McDaniel (2007) found the average corrected

correlation coefficient for job knowledge tests to be .45 for job performance and .47 for training.

Correlations were even higher for complex jobs and higher to the extent that the job knowledge test was

similar to the job.

HR Avatar has developed dozens of knowledge assessments. See Appendix A for a complete listing. The

knowledge assessments were developed by reviewing multiple resources to identify appropriate items. For

example, The Food Safety Fundamentals assessment was developed by reviewing, among other resources,

the USDA Safe Food Handling Fact Sheets. Each knowledge test contains an item bank and half of the

items are randomly selected to be administered to an applicant. Research on the validity, reliability, and

fairness of these assessments is forthcoming. HR Avatar has three skills assessments: Sales Situation

Analysis, Typing Test, and Essay Test. The development and research on these assessments is described in

more detail below.

Validity

Initial validation evidence is available, in a study with 130 managers, which demonstrates the validity of the

assessments. The assessment demonstrated that it predicts job performance (r=.25; p<.01), using a simple

“High” or “Marginal” categorization of job performance. We expect validity evidence to be more robust in

future studies with better criterion measures. A brief summary of the results is presented in Appendix G.

Sales Situation Analysis

Test Development

The Sales Situation Analysis is a scale that is integrated into the Business Sales Cognitive Work Simulation

(described above). In the simulation, the candidate is asked to assist and respond to customers and

colleagues and solve business-related problems. The Sales Situation Analysis scale specifically evaluates a

candidate’s ability to understand a customer’s needs and identify the most appropriate follow-up actions. In

the simulation, the candidate must read an email communication from a customer and then identify the

customer’s primary concern. Next, the candidate must identify which of several action items are most

appropriate for the specific sales situation.


Reliability

The descriptive statistics and reliability estimate for the Sales Situation Analysis were estimated using live

applicant data for sales positions. Table 18 contains the descriptive statistics and reliability estimate for this

scale. The Sales Situation Analysis scale is not hypothesized to be unidimensional, so the Cronbach’s alpha

estimate below is a significant underestimate of the actual reliability of the scale; unfortunately, test-retest

reliability estimates are unavailable at this time. Further research evaluating the effectiveness of these scales

and their revisions is forthcoming.

Table 31. Descriptive Statistics and Reliability Evidence for the Sales Situation Analysis Assessment

# of

Items N M SD Alpha

Sales Situation Analysis 6 2004 2.92 1.15 .30

Fairness


between protected subgroups were compared. The results are provided in Table 19. Small-to-zero

differences were observed between genders and between ethnicities. Applicants 40 and older scored slightly

higher on average. Data were also collected for racial groups; however only Asian and Black or African

American subgroups had sufficient data to report. Some differences between racial groups were observed,

with the White subgroup scoring higher. The Black-White effect size is relatively small; the Asian-White

difference is moderately sized; future research will examine the source of these differences.

Table 32. Evaluation of Sales Situation Analysis Score Differences

n M SD d

Female 746 2.98 1.13

Male 965 2.89 1.18 .08

40 and older 283 3.16 1.21

Less than 40 1436 2.87 1.14 .25

Hispanic or Latino 126 2.93 1.16

Not Hispanic or Latino 1320 2.93 1.14 .00

White 237 3.49 1.10

Asian 1285 2.84 1.12 -.58

Black or African American 48 3.20 1.20 -.26


Typing Speed and Accuracy

Test Development

The typing test consists of three typing tasks. For each task, applicants are asked to type a short passage.

Each typing task is randomly selected from a group of five passages (15 total passages). For each task, the

words per minute are calculated. This score is modified to an accuracy-adjusted words per minute by

calculating and factoring in the rate of typing errors. The typing scores are calculated by averaging the words

per minute and accuracy-adjusted words per minute across the three tasks. There are two forms of this test

depending on the intended usage: Business and Academic.

Reliability

The typing test was piloted via MTurk (N=155). The majority of the sample was from the US (92.9%). The

pilot assessment consisted of five typing tasks and each typing task was randomly selected from a group of

five passages (25 total passages). Five one-way analyses of variance were completed to evaluate differences

in scores across the passages available for each item. No significant differences were found, F(4, 150)=.56,

p=.70, F(4, 150)=.27, p=.89, F(4, 150)=.79, p=.54, F(2, 121)=1.12, p=.33, F(2, 121)=.22, p=.80.

Descriptive statistics and the alpha estimate of internal consistency were estimated using live applicant data.

The internal consistency of the assessment is well above accepted standards.

Table 33. Descriptive Statistics and Reliability Estimates for the Typing Test

Typing Test N M SD alpha

Business 727 37.08 13.64 .97

Academic 1005 28.57 12.19 .98

Fairness

In order to evaluate how different demographic subgroups perform on the assessment, independent

samples t-tests were performed to compare mean scores of the subgroups (Table 21). Although data were

collected on ethnic group, the sample size for the ‘Hispanic or Latino’ subgroup was too small for analyses

(n=11). There were too few cases (n<10) to run analyses for the African American or Black, American

Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, Two or More Races, and Other

subgroups. Significant mean differences were found for gender and between the Asian and White racial

subgroups. Given that this sample consisted of people piloting the assessment but not necessarily taking the

assessment in order to obtain a job, it is possible that the individuals were not attempting to perform their

best. Further research will evaluate norms using an applicant sample.

Table 34. Evaluation of Business Typing Differences by Subgroup


n M SD d

Female 452 36.72 13.37 .17

Male 172 39.05 13.27

40 and Older 131 36.36 14.62 .10

Less than 40 500 37.65 12.69

Hispanic or Latino 58 38.90 12.08 -.04

Not Hispanic or Latino 491 38.39 13.44

Asian 77 34.92 12.02 .66

Black 239 33.08 10.53 .86

White 241 43.57 13.51

Note. Negative d-values means the referent group scored lower

Table 35. Evaluation of Academic Typing Differences by Subgroup

n M SD d

Female 587 30.55 11.77 -.40

Male 259 25.82 11.96

40 and Older 196 31.48 12.86 -.29

Less than 40 665 27.97 11.69

Hispanic or Latino 79 30.44 11.80 -.10


Asian 437 26.00 10.52 .91

Black 101 27.63 10.00 .72

White 230 36.29 12.73


Data Entry

Test Development

The data entry tests consist of three versions: 10 Key data entry where applicants enter data using a numeric

10-key keyboard, alphanumeric data entry where applicants type what appears on the screen, and oral

alphanumeric data entry where applicants type what they hear. For each test, applicants are asked to type the

data from a prompt, with each task randomly selected from a group of equivalently difficult prompts. For

each task, the keystrokes per hour are calculated. This score is modified to an accuracy-adjusted keystrokes

per hour by calculating and factoring in the rate of data entry errors. The data entry scores are calculated by

averaging the keystrokes per hour and accuracy-adjusted keystrokes per hour the tasks.


Reliability

Descriptive statistics, reported as keystrokes per hour, and the alpha estimate of internal consistency were

estimated using live applicant data. The internal consistency of the assessment is well above accepted

standards.

Table 36. Descriptive Statistics and Reliability Estimates for the Data Entry Tests

Typing Test N M SD # of

tasks alpha

10 Key 112 4597.76 1908.68 3 .83

Alphanumeric 104 4921.03 2726.08 10 .96

Oral Alphanumeric 256 3280.97 1075.49 5 .93

Fairness

In order to evaluate how different demographic subgroups perform on the assessment, mean scores were

compared between subgroups (Tables 35-37). Although data were collected on ethnic group, racial group,

and age, there were not enough data to calculate stable differences (N<30). There were small differences

between genders on each of the three test versions. As such, it is unlikely these tests will give rise to adverse

impact. Further research will investigate differences in other subgroups as more data accumulates.

Table 37. Evaluation of 10-key Score Differences by Subgroup

n M SD d

Female 35 4803.78 2155.19 -.14

Male 67 4521.90 1865.07


Table 38. Evaluation of Alphanumeric Score Differences by Subgroup

n M SD d

Female 39 4621.73 2369.35 .16

Male 64 5048.81 2915.12


Table 39. Evaluation of Oral Alphanumeric Score Differences by Subgroup

n M SD d

Female 75 3407.07 1082.38 -.08

Male 45 3380.43 1032.88

Note. *p<.05


Essay Test

Test Development

Written communication is a key skill in many positions. Communicating via email, writing reports, and

creating presentations all require the ability to communicate effectively. The HR Avatar Essay Test is a fast

method for evaluating an applicant’s written communication skills. The HR Avatar Essay Test consists of

one of two writing prompts. The writing prompts were designed to be general enough to provide an

opportunity for anyone to be able to write a short essay. The writing prompts are included below.

1. Describe the pros and cons of working from home.

2. Describe the pros and cons of living in a big city.

Applicants are asked to write a short essay with a minimum of 100 words and are given an unlimited time to

do so.

The essays are scored using Discern, an open source, machine learning program. Discern was designed by

edX, a nonprofit organization founded by Harvard and the Massachusetts Institute of Technology (MIT)

(edX, 2015; Markoff, 2013). A YouTube video was published and provides some additional information on

how the program works: https://www.youtube.com/watch?v=zFeP678054U (Paruchuri, 2015). The system

produces a score that ranges from 0 to 100. A confidence estimate for the score is also computed. The

confidence estimate can range from 0 to 1. Scores with confidence estimates less than .10 are not considered

valid.

In order to calibrate the program, HR Avatar used MTurk to collect writing samples for the two writing

prompts (N=170 and N=163). The essays were scored on three areas: Grammar, Structure and Content, by

three independent raters. Prior to rating each essay, the raters were provided with scoring rubrics (Appendix

D) and training (Appendix E) for how to score the essays using the rubrics. A total score was calculated for

each essay by aggregating scores on Grammar, Structure, and Content and then averaging across the raters

and linearly transforming the scores to a scale of 0-100. Scores were entered into the program as the

calibration sample for Discern.

In order to evaluate the reliability of the ratings provided by the three raters, intra-class correlation

coefficients were calculated using a two-way, mixed effects model (ICC3) (Shrout & Fleiss, 1979) . The

ICC3 was chosen because it is the reliability of the average rating made by the specific raters in this study

that were of interest, and it is not necessary to generalize the reliability estimate to the population of

potential raters. There were ratings available for all three raters for 317 essays and the reliability of the

average aggregate rating was acceptable (ICC (3,3)=.74). There was also a large relationship between scores

on the two essay prompts, r(152)=.62, p<.01.


In early January 2015 additional improvements were made to the Essay scoring. First, the 196 essays that

were completed since the initial rollout were manually scored by an individual rater and re-entered into the

Discern program to further calibrate the system. Second, HR Avatar added some additional safeguards to

the automated scoring engine. The essays are truncated to 800 words. Essays with fewer than 100 words,

consisting of more than 25% of spelling errors, or more than 25% of grammar errors or style errors are

given a score of 0. An additional program was written to search the HR Avatar database for matching essay

content. The system uses a combination of three methods to determine the similarity between the essay and

existing content in the system as a way of detecting plagiarism: The Levenshtein Distance Strategy (Navarro,

2001), the Jaro distance metric (Jaro, 1989; Jaro, 1995), and the Jaro-Winkler distance metric (Winkler,

1990). If matching content is discovered, it is assumed that the essay is plagiarized and applicants will receive

a score of 0.

Descriptives

HR Avatar has collected essays from applicants applying to numerous jobs. Descriptive statistics for the

machine score and confidence have been provided for the essays in the table below.

Table 40. Descriptive Statistics for Essay Scores

n M SD

Prompt 1 (Working from home)

Score 930 55.24 12.12

Confidence 930 0.76 0.14

Prompt 2 (Living in a big city)

Score 943 49.43 11.03

Confidence 943 0.76 0.14

Fairness

In order to evaluate how different demographic groups perform on the assessment, mean scores were

compared between subgroups on the machine score. As can be seen in the tables below, the differences

between protected groups tends to be fairly small. Only comparisons between racial groups showed some

moderate to large differences; however, with a relatively small sample size for the referent group, these

results should be interpreted with caution. Future research will further investigate these differences.

Table 41. Evaluation of Essay Score Differences by Subgroup – Living in a Big City Prompt

n M SD d

Female 478 50.35 10.76 -.17

Male 370 48.42 11.68

40 and Older 100 52.06 11.66 -.26


Less than 40 709 49.17 11.12

Hispanic or Latino 63 48.49 11.23 .14


Asian 686 49.05 10.21 .91

Black or African American 34 53.47 13.75 .39

White 62 58.47 11.97

Table 42. Evaluation of Essay Score Differences by Subgroup – Working from Home Prompt

n M SD d

Female 528 56.40 10.24 -.25

Male 301 53.43 14.42

40 and Older 100 55.98 11.69 -.05

Less than 40 738 55.34 12.02

Hispanic or Latino 51 55.22 15.44 .05


Asian 658 55.19 12.19 .28

Black or African American 47 55.85 10.57 .24

White 65 58.57 11.79

Validity





on a 100 point administrative performance appraisal, using a Spearman correlation (r=.25; p<.05). We




Solution Scoring

In addition to providing scores at the scale level, HR Avatar also provides an overall score as an indication

of overall fit between the candidate and a given job. All competencies are grouped within four broad

categories: Cognitive Abilities, Skills and Knowledge, AIMs, and Behavioral History. Each competency, or

scale, is converted to a z-score. For the Cognitive Abilities, Skills and Knowledge, and Behavioral History

categories, a category score is created by averaging z-scores within each category.

For the AIMs and Cognitive Ability categories, O*NET is used to determine which of the competencies are

relevant for a given job and to determine the appropriate weights. O*NET is an online database that

contains specific information about hundreds of occupations (Peterson, et al., 2001). The information

gathered about each job is categorized using strongly supported theoretical models about behavior in the

workplace. Additionally, the process for gathering the information and documenting it is a collaborative

effort. The O*NET Skills and Abilities Importance ratings are the average importance ratings given by at

least eight occupational analysts (Fleisher & Tsacoumis, 2012a, 2012b). The Occupational Analysts all have

two or more years of work experience, two or more years of graduate level education in a program related to

human resources or workplace psychology, and coursework in research methods and job analysis.

Occupational Analysts are provided with extensive training and detailed information about each job prior to

making ratings including job description, knowledge requirements, task descriptions and work context. The

AIMs and Cognitive scales were mapped onto the O*NET Worker Characteristics and Worker

Requirements. Weights are applied to the scales such that each scale receives a weight that is equivalent to

the proportion of its importance rating within O*NET. When a competency is listed more than once, which

is sometimes the case given that several Worker Characteristics or Worker Requirements might be mapped

to a given competency, the weight given to the competency corresponds to the highest importance rating.

For the knowledge tests, the raw scores are the percent correct. As of yet, these assessments do not have

sufficient data to estimate stable normative parameters. Therefore, a mean of .70 and an SD of .25 will be

used to estimate z-scores for applicants.

An overall z-score is calculated by computing a weighted average of the competency categories. The

following weights are assigned to each category:

• Cognitive Ability competencies = 1

• Skills/Knowledge competencies = 0.8

• AIMs competencies = 0.7

• Behavioral History competencies = 0.4


The overall z-score is transformed to a Normal Curve Equivalent (NCE) score. NCE scores have a mean of

50 and a standard deviation of 21.06 and maintain their equal-interval properties.

Construct Validity Evidence

Tables 41-45 contain the correlations between the scales in the HR Avatar Assessment Solution. Generally

speaking, the relationships conform to what would be expected. For example, within the AIMs assessment,

there is a strong negative relationship between the Competitive scales and the Exhibits a Positive Work

Attitude scale. The Enjoys the Problem Solving and Innovative and Creative scales are highly correlated

which is to be expected as they are both facets of the Openness trait of the Big Five. Correlations between

various assessments can vary depending on the solution and providing all possible combinations is beyond

the scope of this technical report. Specific analyses and tables are available upon request.


Table 43. Correlations between the AIMs Scales

1. EaPWA 2. CC 3. C 4. E&O 5. I&C 6. SP 7. DR 8. EP-S 9. NS 10. A

1. Exhibits a Positive Work Attitude

1

2. Corporate Citizenship

0.559 1

3. Competitive -0.067 -0.195 1

4. Expressive and Outgoing

-0.260 -0.471 0.497 1

5. Innovative and Creative

0.249 0.088 0.407 0.310 1

6. Seeks Perfection 0.224 0.165 0.456 0.161 0.571 1

7. Develops Relationships

0.252 0.113 0.335 0.290 0.646 0.535 1

8. Enjoys Problem-Solving

0.344 0.229 0.351 0.162 0.723 0.561 0.534 1

9. Needs Structure 0.313 0.252 0.238 -0.033 0.469 0.576 0.452 0.520 1

10. Adaptability 0.426 0.401 0.106 -0.062 0.418 0.357 0.400 0.441 0.331 1

Note. N = 1999


Table 44. Correlations between the Professional Behavioral History Scales

Performance Tenure

Performance 1

Tenure .037 1

Note. N = 2063

Table 45. Correlations between the Entry-Level Behavioral History Scales

Performance Tenure

Performance 1

Tenure .263 1

Note. N = 1035

Table 46. Correlations between the Emotional Intelligence Scales

Self Control

Self Awareness

Empathy

Self Control 1

Self Awareness .179 1

Empathy .081 .686 1

Note. N = 413


Table 47. Correlations between the Workplace Competency Scales

1. CaDO 2. EPS 3. GDaMO 4. RCaMCN 5. TB

1. Coaching and Developing Others

1

2. Exercising Political Savvy

0.684 1

3. Guiding, Directing, and Motivating Others

0.709 0.679 1

4. Resolving Conflicts and Meeting Customer Needs

0.502 0.613 0.554 1

5. Team Building 0.711 0.640 0.679 0.530 1

Note. N = 996


Technical Requirements

The HR Avatar solution is designed to be taken on a personal computer or a mobile device

including tablets and mobile phones. A high bandwidth connection is recommended, but not

required. All HR Avatar videos are compressed to less than 500kbps. Lower bitrate versions are

used for mobile devices.

The following web browsers are supported:

• Internet Explorer 6 and above with Flash 9.1.115 or above

• Internet Explorer 9 and above without Flash

• Chrome

• Firefox

• Opera

• Safari

The Flesch-Kincaid Reading Grade Level score is estimated to be 5.8. This indicates that an

applicant must have a reading level similar to that of a 5th or 6th grader in order to comprehend the

text in the assessment.

HR Avatar recommends that the applicant take the assessment in a quiet setting that is free from

distractions. This will allow the applicant the best opportunity for demonstrating their skills and

abilities.

Future Research

HR Avatar is committed to providing employers with high quality and legally defensible assessments

for hiring employees. To that end, HR Avatar plans to continue accumulating reliability, validity, and

fairness evidence to support the use of the solutions. The list below contains several items from our

research agenda. Please contact us, if you have any interest in partnering with HR Avatar on any of

the projects below.

Reliability

• Establish the internal consistency of the knowledge tests.

• Establish the test-retest reliability of the knowledge tests, the Typing Test and the Essay

Test.

• Establish the test-retest reliability of the composite scores for each solution.


Validity

• Conduct studies to establish the content-related validity of the knowledge assessments,

Typing Test and Essay Test.

• Conduct a study to examine the convergent validity evidence for the Cognitive Work

Simulation by comparing scores on the Cognitive Work Simulation and other established

measures of cognitive ability.

• Conduct a study to examine the convergent and divergent validity evidence for the AIMS

assessment by comparing scores on the AIMS assessment with other established measures of

personality – particularly those that measure the Big Five. The hypothesized relationships

can be found in Figure 1.

Accumulate additional criterion-related validity evidence at the individual assessment level and the

composite level by conducting multiple studies on each solution. Using these studies, conduct meta-

analyses to provide evidence for validity generalization. Validation evidence is available in two

studies that demonstrate the validity of the assessments. The first is a study with 130 managers and

the second is a study with 64 managers. In the first study, the assessment demonstrated that it

predicts job performance (r=.25; p<.01), using a simple “High” or “Marginal” categorization of job

performance, and a second study demonstrated the assessment predicted performance on a 100

point administrative performance appraisal, using a Spearman correlation (r=.25; p<.05). We expect

validity evidence to be more robust in future studies with better criterion measures. A brief summary


• Fairness

• Conduct a sensitivity review of all assessment content.

• Conduct Differential Item Functioning (DIF) analyses for the items in the assessments to

determine if any of the items behave differently for subgroups as defined by race, ethnicity,

gender, and age group.

• Evaluate mean score differences on each assessment and at the composite level by subgroup.

• Simulate selection ratios for each group at various passing rates to estimate adverse impact

ratios.

Norms

• Update estimates of global norms and estimate norms at the country and/or region level.

• Compare scores across formats (PCs, tablets, mobile phones).


References

American Educational Research Association, American Psychological Association, National Council

on Measurement in Education. (2014). The Standards for Educational and Psychological Testing.

Washington DC: American Educational Research Association.

Andrei, F., Siegling, A. B., Aloe, A. M., Baldaro, B., & Petrides, K. V. (2016). The incremental

validity of the Trait Emotional Intelligence Questionnaire (TEIQue): A systematic review

and meta-analysis. Journal of Personality Assessment, 98, 261-276.

Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A

meta-analysis. Personnel Psychology, 44, 1-26.

Bartram, D. (2005). The great eight competencies: A criterion-centric approach to validation. Journal

of Applied Psychology, 90, 1185-1203.

Bertua, C., Anderson, N., & Salgado, J. F. (2005). The predeictive validity of cognitive ability tests: A

UK meta-analysis. Journal of Occupational and Organizational Psychology, 78, 387-409.

Borman, W. C., & Motowidlo, S. J. (1993). Expanding the criterion domain to include elements of

contextual performance. In N. Schmitt, & W. C. Borman, Personnel Selection in Organizations.

San Francisco: Jossey-Bass.

Borman, W. C., & Motowidlo, S. J. (1997). Task performance and contextual performance: The

meaning for personnel selection research. Human Performance, 10, 99-109.

Buhrmester, M., Kwang, T., & Gosling, S. D. (n.d.). Amazon's Mechanical Turk: A new source of

inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3-5.

Carlson, K. D., Scullen, S. E., Schmidt, F. L., Rothstein, H., & Erwin, F. (1999). Generalizable

biographical data validity can be achieved without multi-organziational development and

keying. Personnel Psychology, 52, 731-755.

Carroll, J. (1993). Human Cognitive Abilities: A Survey of Factor Analytic Studies. New York: Cambridge

University Press.

Cattell, R. B. (1941). Some theoretical issues in adult intelligence testing. Psychological Bulletin, 38, 592.


Christian, M. S., Edwards, B. D., & Bradley, J. C. (2010). Situational judgment tests: Constructs

assessed and a meta-analysis of their criterion-related validities. Personnel Psychology, 63, 83-

117.

Cohen, J. (1992). A Power Primer. Psychological Bulletin, 112, 155-159.

Dudley, Nicole M., Orvis, Karin A., Lebiecki, Justin E., & Cortina, Jose M. (2006). A meta-analytic

investigation of conscientiousness in the prediction of job performance: Examining the

intercorrelations and incremental validity of narrow traits. Journal of Applied Psychology, 91, 40-

57.

Dye, D. A., Reck, M., & McDaniel, M. A. (2007). The validity of job measures. International Journal of

Selection and Assessment, 1, 153-157.

edX. (2015, February 25). Retrieved from discern: NY Times

Equal Employment Opportunity Commission, C. S. C. U. S. D. L. U., & Equal Employment

Opportunity Commission. (1978). Unifrom Guidelines on Employee Selection Procedures. Federal

register, 43(166), 38295-38309.

Fleisher, M. S., & Tsacoumis, S. (2012). O*NET analysit occupational skills ratings: Procedures update.

(Tech. Rep. No. FR-11-67) Alexandria, VA.: Human Resources Research Orgnaization

(HumRRO).

Fleisher, M. S., & Tsacoumis, S. (2012). O*NET analyst occupational abilities ratings: procedure update.

(Tech. Rep. No. FR-11-66). Alexandria, VA: Human Resources Research Organization

(humRRO).

Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological

Assessment, 4, 26-42.

Goldberg, L. R. (1993). The structure of phenotypic personlity traits. American Psychologist, 48, 26-34.

Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job-performance relations:

A socioanalytic perspective. Journal of Applied Psychology, 88, 100-112.

Horn, J. (1985). Handbook of Intelligence. New York: Wiley.

Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job

performance. Psychological Bulletin, 96, 72-98.


Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited.

Journal of Applied Psychology, 85, 869-879.

Jaro, M. A. (1989). Advances in record linkage methodology as applied to the 1985 census of Tampa

Florida. Journal of the American Statistical ASsociation, 84, 414-420.

Jaro, M. A. (1995). Probalistic linkage of large public health data file. Statistics in Medicine, 14, 491-498.

Johnson, D. R., & Borden, L. A. (2012). Particiants at your fingertips: Using Amazon's Mechanical

Turk to increase student-faculyy collaborative research. Teaching of Psychology, 39, 245-251.

Joseph, D. L., Jin, J., Newman, D. A., & O'Boyle, E. H. (2015). Why does self-reported emotional

intelligence predict job performance? A meta-analytic investigation of Mixed EI. Journal of

Applied Psychology, 100, 298-342.

Joseph, D. L., & Newman, D. A. (2010). Emotional intelligence: An integrative meta-analysis and

cascading model. Journal of Applied Psychology, 95, 54-78.

Judge, T. A., & Ilies, R. (2002). Relationship of personality to performance motivation: A meta-

analytic review. Journal of Applied Psychology, 87, 797-807.

Kanfer, R., Crosby, J. R., & Brandt, D. M. (1988). Investigating behavioral antecedenty of turnover

at three job tenure levels. Journal of Applied Psychology, 73, 331-335.

Kehoe, J. (2002). General mental ability and selection in private sector organizations: A commentary.

Human Performance, 15, 97-106.

Klimoski, R. J., & Childs, A. (1986). Successfully predicting career success: An application of the

biographical inventory. Journal of Applied Psychology, 71, 3-8.

Markoff, J. (2013, April 4). Essay-Grading Software Offers Professors a Break. NY Times. Retrieved

February 25, 2015, from http://www.nytimes.com/2013/04/05/science/new-test-for-

computers-grading-essays-at-college-level.html?_r=0

McCrae, R. R., & Costa, P. T. (1997). Personality trait structure as a human universal. American

Psychologist, 52, 509-516.

McCrae, R. R., & John, O. P. (1992). An introduction to the five-factor model and its applications.

Journal of Personality, 60, 175-215.


McCrae, R. R., Jang, K. L., Livesley, W. J., Riemann, R., & Angleitner, A. (2001). Sources of

structure: Genetic, environmental, and artifactual influences on the covariation of personality

traits. Journal of Personality, 69, 511-535.

McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grub, W. L., III (2007). Situational judgment

tests, response instructions, and validity: A meta-analysis. Personnel Psychology, 60, 63-91.

McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E. P. (2001).

Use of situational judgment tests to predict job performance: A clarification of the literature.

Journal of Applied Psychology, 86, 730-740.

McGrew, K. (2008). CHC theory and the human cognitive abilities project: Standing on the shoulers

of the giants of psychometric intelligence research. Intelligence, 37, 1-10.

Miao, C., Humphrey, R. H., & Qian, S. (2017). A meta-analysis of emotional intelligence and work

attitudes. Journal of Occupational and Organizational Psychology, 90, 177-202.

Miller, J. D., Gentile, B., Wilson, L., & Campbell, W. K. (2013). Grandiose and vulnerable narcissism

and the DSM-5 Pathelogical Personality Trait Model. Journal of Personality Assessment, 95, 284-

290.

Minton, E., Gurel-Atay, E., Kahle, L., & Ring, K. (2013). Comparing data collection alternatives:

Amazon mTurk, college students, and secondary analysis. AMA Winter Educators' Conference

Proceedings, 24, (pp. 36-37).

Motowidlo, S. J., Dunnette, M. D., & Carter, G. W. (1990). An alternative selection procedure: The

low-fidelity simulation. Journal of Applied Psychology, 75, 640-647.

Mount, M. K., & Barrick, M. R. (1995). The Big Five personality dimensions: Implications for

research and practice in human resources management. In K. R. (Eds.), Research in Personnel

and Human Resources Management (Vol. 13) (pp. 153-200). Greenwich, CT: JAI Press.

Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33, 31-

88.

O'Boyle, E. H., Humphrey, R. H., Pollack, J. M., Hawver, T. H., and Story, P. A. (2011). The

relation between emotional intelligence and job performance: A meta-analysis. Journal of

Organizational Behavior, 32, 788-818.


Oswald, F. L., Schmitt, N., Kim, B. H., Ramsay, L. J., & Gillespie, M. A. (2004). Developing a

biodata measure and situational jusgment inventory as predictors of college student

performance. Journal of Applied Psychology, 89, 187-207.

Paruchuri, V. (2015, February 25). Retrieved from YouTube:

https://www.youtube.com/watch?v=zFeP678054U

Paunonen, S. V., Rothstein, M. G., & Jackson, D. (1999). Narrow reasoning about the use of broad

personaloty measures for personnel selection. Journal of Organizational Behavior, 20, 389-405.

Perera, H. N., & DiGiacomo, M. (2013). The relationship of trait emotional intelligence with

academic performance: A meta-analytic review. Learning and Individual Differences, 23, 20-33.

Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., Fleishman, E. A., Levin, K.

Y., . . . Dye, D. M. (2001). Understanding work using the occupational information network

(O*NET): Implications for practice and research. Personnel Psychology, 54, 451-492.

Roth, P. L., Bobko, P, & Buster, M. A. (2013). Situational judgment tests: The influence and

importance of applicant status and targeted constructed on estimates of Black-White

subgroup differences. Journal of Occupational and Organizational Psychology, 86, 394-409.

Rothstein, H. R., Schmidt, F., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data

in employment selection: Can validities be made generalizable? Journal of Applied Psychology,

75, 175-184.

Saad, S., Carter, G. W., Rothenberg, M., & Israelson, E. (2014, March 9). Testing and Assessment: An

Employer's Guide to Good Practices. Retrieved from O*NET:

http://www.onetcenter.org/dl_files/empTestAsse.pdf

Salgado, J. E. (1997). The Five Factor Model of personality and job performance in the European

community. Journal of Applied Psychology, 82, 30-43.

Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., & De Fruyt, F. (2003). International validity

generalization of GMA and cognitive abilities: A European community meta-analysis.

Personnel Psychology, 56, 573-605.

Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., de Fruyt, F., & Rolland, J. P. (2003). A Meta-

Analytic Study of General Mental Ability Validity for Different Occupations in the

Eurpopean Community. Journal of Applied Psychology, 88, 1068-1081.


Salgado, J., & Anderson, N. (2003). Validity generalization of GMA tests across countries in the

European community. European Journal of Work & Organizational Psychology, 12, 1-17.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel

psychology: Practical and theoretical implications of 85 years of research findings.

Psychological Bulletin, 124, 262-274.

Schneider, R. J., Hough, L. M., & Dunnette, M. D. (1996). Broadsided by broad traits: How to sink

science in five dimensions or less. Journal of Organizational Behavior, 17, 639-655.

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass Corrlations: Uses in Assessing Rater Reliability.

Psychological Bulletin, 86, 420-428.

Society for Industrial Organizational Psychology. (2003). Principles for the Validation and Use of Personnel

Selection Procedures (4th ed.). Bowling Green, OH: Author.

Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job

performance. Journal of Applied Psychology, 88, 500-517.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job

performance. Personnel Psychology, 44, 703-742.

U.S. Census Bureau. (2015, February 17). American Fact Finder. Retrieved from factfinder.census.gov:

factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_12_1YR

_CP05&prodType=table

Wee, S., Newman, D. A., & Joseph, D. L. (2014). More than g: Selection quality and adverse impact

implications of considering second-stratum cognitive abilities. Journal of Applied Psychology, 99,

547-563.

Whetzel, D. L., McDaniel, M. A., & Nguyen, N. T. (2008). Subgroup differences in situational

judgment test performance: A meta-analysis. Human Performance, 21, 291-309.

Winkler, W. E. (1990). String comparator metrics and enhanced decision rules in the Fellegi-Sunter

Model of record linkage. Proceedsings of the Section on Survey Research Methods (pp. 354-359).

American Statistical Association.

Woo, S. E., Chernyshenko, O. S., Stark, S. E., & Conz, G. (2014). Validity of six openness facets in

predicting wok behaviors: A meta-analysis. Journal of Personality Assessment, 96, 76-86.


Appendix A: Summary of the HR Avatar Solutions

Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module

Emotional Intelligence

Biodata Module

Skill Module 1

Skill Module 2

Sales Representative - Wholesale &

Manufacturing

41-4012.00 Business Sales Yes Professional EQ Professional Fundamental Sales Concepts

Customer Service Representative (with

Email)

43-4051.00 Customer Service (with

email)

Yes Professional EQ Professional Core Customer

Service Concepts

Customer Service Representative (with

Email and Calls)

43-4051.00 Customer Service (with email)Calls


Service Concepts

Telemarketer 41-9041.00 Customer Service (with email)Calls

Professional EQ Professional Fundamental Sales Concepts

Computer Applications Software

Developer

15-1132.00 Information Technology

Yes Professional EQ Professional

First-Line Supervisor - Office and

Administrative Support

43-1011.00 First-Line Supervisor

Yes Professional EQ Professional Supervisor Fundamentals

Sales Representative - Technical and

Scientific


Sales Representative - Services


Computer Systems Analyst




Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Collections Specialist 43-3011.00 Collections Professional EQ Professional

Cashier 41-2011.00 Customer Service (with email)Face

Professional EQ Hourly

Chief Executive 11-1011.00 Manager Yes Professional EQ Professional

Retail Salesperson (Home Goods Store)

41-2031.00 Retail Sales (Hardware

Store)

Professional EQ Hourly Fundamental Sales Concepts

Retail Salesperson (Fashion Store)

41-2031.00 Retail Sales (Fashion)


Retail Salesperson (Electronics Store)

41-2031.00 Retail Sales (Electronics)


Retail Salesperson (Sunglasses Store)

41-2031.00 Retail Sales (Sunglasses)


General Manager 11-1021.00 Manager Yes Professional EQ Professional

Clerk - Bookkeeping, Accounting, and

Auditing

43-3031.00 Admin Assistant

(Entry-Level)

Hourly Hourly Accounting Fundamentals

MS Excel 2019

Simulation

Sales Agent - Insurance

41-3021.00 Business Sales Professional EQ Professional Insurance Fundamentals

Secretary / Administrative

Assistant


Yes Professional EQ Hourly Typing Speed and Accuracy

MS Word 2019

Simulation

Manager - Financial 11-3031.00 Manager Yes Professional EQ Professional Accounting Fundamentals

MS Excel 2019

Simulation

Specialist - Computer User Support


Professional EQ Professional Core Customer

Service Concepts


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

First-Line Supervisor - Non-Retail Sales



Computer Systems Software Developer


Yes Professional Professional

Analyst - Information Security



Management Analyst 13-1111.00 Business and Finance

Yes Professional EQ Professional Accounting Fundamentals

MS Excel 2019

Simulation

Bank Teller 43-3071.00 Bank Teller Professional EQ Professional Bank Teller Fundamentals

Bank Teller with Sales

43-3071.00 Bank TellerSales

Professional EQ Professional Bank Teller Fundamentals

First-Line Supervisor - Retail Sales


Professional EQ Professional Supervisor Fundamentals

Clerk - General Office


(Entry-Level)

Hourly Hourly

Clerk - Billing and Posting


(Entry-Level)

Hourly Hourly

Manager - Computer and Information

Systems

11-3021.00 Manager Yes Professional EQ Professional

Driver - Sales and Delivery

53-3031.00 Driver Professional EQ Professional

Analyst - Market Research

13-1161.00 Business and Finance

Yes Professional EQ Professional Core Marketing Concepts

MS Excel 2019

Simulation


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Manager - Administrative

Services


Computer Programmer



Computer Programmer - Java


Yes Professional Professional Core Java Programming

Computer Programmer - C


Yes Professional Professional Core C Programming

Computer Programmer - C++


Yes Professional Professional Core C++ Programming

Computer Programmer - Web

Developer


Yes Professional Professional Core Web Programming

Computer Programmer - PHP


Yes Professional Professional Core PHP Programming

Computer Programmer -

Javascript


Yes Professional Professional Core Javascript

Programming

Computer Programmer - Actionscript


Yes Professional Professional Core Actionscript

Programming

Computer Programmer - Python


Yes Professional Professional Core Python Programming

Computer Programmer -

ASP .NET Web Pages


Yes Professional Professional Core ASP.NET

Programming (Web Pages)

Computer Programmer - Ruby


Yes Professional Professional Core Ruby


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Computer Programmer - Java

EE


Yes Professional Professional Java EE Fundamentals

Manager - Food Service

11-9051.00 Manager Yes Professional EQ Professional Food Safety Fundamentals

Aide - Home Health 31-1011.00 Customer Service (with email)Face

Professional EQ Professional

Network and Computer Systems

Administrator


Yes Professional EQ Professional Computer Networking Concepts

Claims Adjuster, Examiner,

Investigator


(Entry Level)


Manager - Other 11-9199.00 Manager Yes Professional EQ Professional

Manager - Security 11-9199.07 Manager Yes Professional EQ Professional

Manager - Sales 11-2022.00 Manager Yes Professional EQ Professional Fundamental Sales Concepts

First-Line Supervisor - Production /

Operations



Clerk - Insurance Claims / Policy

Processing


(Entry Level)

Professional Professional Insurance Fundamentals

Hospitality Industry Customer Service

Worker

43-4081.00 Hospitality Professional EQ Professional Core Hospitality Concepts

Technician - Medical Records and Health

Information

29-2071.00 General Office (Entry-

Level)

Professional Professional Core Health Administratio

n (U.S.)


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Loan Officer 13-2072.00 Business and Finance

Yes Professional EQ Professional Banking Fundamentals

Executive Secretary / Administrative

Assistant


Yes Professional EQ Professional Typing Speed and Accuracy

Accountant / Auditor 13-2011.00 Business and Finance


MS Excel 2019

Simulation

Analyst - Budget 13-2031.00 Business and Finance


Manager - Real Estate and Community Association


Analyst - Financial 13-2051.00 Business and Finance


MS Excel 2019

Simulation

Specialist - Computer Network Support


Professional Professional Computer Networking Concepts

Manager - Architectural and

Engineering


Analyst - General 13-1081.02 General Office

Yes Professional EQ Professional MS Excel 2019

Simulation

Meeting, Convention, and Event Planner

13-1121.00 General Office


Paralegal /Legal Assistant



MS Word 2019

Simulation


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Legal Secretary 43-6012.00 Admin Assistant


MS Word 2019

Simulation

Technician - General Maintenance and

Repair

49-9071.00 Technician Professional Professional Mechanical Aptitude

Loan Interviewer 43-4131.00 Business and Finance

Yes Professional EQ Professional Banking Fundamentals

Manager - Marketing 11-2021.00 Manager Yes Professional EQ Professional Core Marketing Concepts

MS Excel 2019

Simulation

Manager - Industrial Production


Restaurant Waiter / Waitress

35-3031.00 Restaurant Hourly EQ Hourly

Sales Agent - Real Estate

41-9022.00 Real Estate Worker

Yes Professional EQ Professional Fundamental Sales Concepts

Clerk - Payroll and Timekeeping


(Entry-Level)

Hourly Hourly Typing Speed and Accuracy

Data Entry Keyers 43-9021.00 Admin Assistant

(Entry-Level)


Aide - Personal Care 39-9021.00 Customer Service (with email)Face

Hourly EQ Hourly

Specialist - Human Resources

13-1071.00 Manager Yes Professional EQ Professional Human Resource

Fundamentals

MS Word 2019

Simulation


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Clerk - Counter / Rental

41-2021.00 Customer Service (with email)Face

Hourly EQ Hourly

Compliance Officer 13-1041.00 General Office


Specialist - Regulatory Affairs



Technician - Pharmacy

29-2052.00 Technician Professional Professional

Sales Agent - Securities, Financial

Services


Clerk - Production, Planning, and

Expediting


(Entry-Level)

Hourly Hourly

First-Line Supervisor - Mechanics,

Installers, Repairers



Mechanical Aptitude

Customer Service Representative - With

Sales

41-9041.00 Business Sales Professional EQ Professional Fundamental Sales Concepts

Agent - Purchasing 13-1023.00 Business and Finance


Salesperson - Parts and Accessories

41-2022.00 Retail Sales (Hardware

Store)

Professional EQ Professional Fundamental Sales Concepts

Driver - Heavy and Tractor-Trailer

53-3032.00 Driver Professional Professional

Specialist - Public Relations


Yes Professional EQ Professional Core Marketing Concepts


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Clerk - File 43-4071.00 Admin Assistant

(Entry-Level)


Nursing Assistant 31-1014.00 Medical Assistant


Specialist - Training and Development



Instructional Designer


Yes Professional EQ Professional Project Management Fundamentals

Graphic Designer 27-1024.00 General Office

Professional Professional

Art Director 27-1011.00 General Office


Graphic Designer - Web Development


Professional Professional HTML4 and CSS 2

Core HTML5

Manager - Medical and Health Services

(General)

11-9111.00 Manager Yes Professional EQ Professional Core Health Administratio

n (U.S.)

Manager - Medical and Health Services

(US)

11-9111.00 Manager Yes Professional EQ Professional Core Health Administratio

n (U.S.)

Receptionist 43-4171.00 Admin Assistant

(Entry-Level)

Hourly EQ Hourly

Secretary - Medical (General)

43-6013.00 Medical Assistant

Professional EQ Professional Typing Speed and Accuracy

MS Word 2019

Simulation

Secretary - Medical (US)

43-6013.00 Medical Assistant

Professional EQ Professional Typing Speed and Accuracy

Core Health Administratio

n (U.S.)


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Recruiter 43-4111.00 General Office

Yes Professional EQ Professional Core Recruiting Knowledge

Childcare Worker 39-9011.00 General Office (Entry-

Level)

Professional EQ Professional Childcare Fundamentals

First-Line Supervisor - Food Preparation /

Serving


Professional EQ Professional Food Safety Fundamentals

Host / Hostess - Restaurant

35-9031.00 Restaurant Hourly EQ Hourly Core Hospitality Concepts

Specialist - Office and Administrative

Support


(Entry-Level)


MS Word 2019

Simulation

Clerk - Order Processing


(Entry-Level)


Driver Transit and Intercity Bus

53-3021.00 Driver Professional EQ Professional

Driver - Light Truck / Delivery

53-3033.00 Driver Professional Professional

Mechanic - Heating, Air Conditioning,

Refrigeration

49-9021.00 Technician Professional Professional HVAC Fundamentals

Mechanical Aptitude

Pharmacist (General) 29-1051.00 General Office

Yes Professional EQ Professional Core Health Administratio

n (U.S.)

Pharmacist (US) 29-1051.00 General Office

Yes Professional EQ Professional Core Health Administratio

n (U.S.)


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Trainer - Athletic 39-9031.00 General Office (Entry-

Level)


Attorney 23-1011.00 General Office


Medical Assistant 31-9092.00 Medical Assistant


Administrator - Elementary and

Secondary School



Inspector, Tester, Sorter, Sampler,

Weigher

51-9061.00 Technician Professional Professional Mechanical Aptitude

Counselor - Educational,

Guidance, School, Vocational



Personal Financial Advisor



Hairdresser, Hairstylist,

Cosmetologist

39-5012.00 Basic Entry Level

Hourly EQ Hourly

Social / Human Service Assistant

21-1093.00 General Office (Entry-

Level)

Professional EQ Professional Social Work Fundamentals

Security Guard 33-9032.00 General Office (Entry-

Level)


Medical / Clinical Laboratory

Technologist



Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Fast Food Worker 35-3021.00 CogFastFoody Hourly EQ Hourly

Clerk - Shipping / Receiving


(Entry-Level)

Hourly Hourly

Teacher - Preschool 25-2011.00 Customer Service (with email)Face

Hourly EQ Hourly Education Delivery

Fundamentals

Dental Assistant 31-9091.00 Basic Entry Level

Hourly EQ Hourly

Engineer - Mechanical


Yes Professional Professional Mechanical Aptitude

Engineer - Other 17-2199.00 General Office


Engineer - Industrial 17-2112.00 General Office


Electrical Engineer 17-2071.00 General Office


Specialist - Radio Frequency

Identification (RFID)

17-2072.01 Technician Yes Professional EQ Professional

Nurse - Registered 29-1141.00 General Office


Production Worker 51-9199.00 General Office (Entry-

Level)

Hourly Hourly Mechanical Aptitude

Operator - Power Plant

51-8013.00 Technician Yes Professional EQ Hourly Mechanical Aptitude

Dental Hygienist 29-2021.00 Basic Entry Level


Machinist 51-4041.00 Basic Entry Level



Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Radiologic Technologist



Taxi Driver / Chauffeur

53-3041.00 Driver Hourly EQ Hourly

Technician - Emergency Medical

/ Paramedic

29-2041.00 Technician Professional EQ Professional

Operator - Packaging / Filling

Machines

51-9111.00 Technician Hourly Hourly

Mail Carrier 43-5052.00 Customer Service (with email)Face

Hourly EQ Hourly

Social Worker - Child, Family, School

21-1021.00 Customer Service (with email)Face

Hourly EQ Hourly Social Work Fundamentals

Technician - Medical and Clinical Laboratory


Clerk - Stockroom 43-5081.00 Admin Assistant

(Entry-Level)

Hourly Hourly

Technician - Automotive Service

49-3023.00 Technician Hourly Hourly Mechanical Aptitude

Operating Engineer 47-2073.00 General Office


Engineer - Civil 17-2051.00 General Office

Yes Professional Professional Construction Fundamentals

Physical Therapist 29-1123.00 General Office (Entry-

Level)



Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Correctional Officer 33-3012.00 General Office


Dispatcher - General 43-5032.00 General Office


Maid / Housekeeping

Cleaner


Hourly EQ Hourly

Welder, Cutter, Solderer, Brazer


Bartender 35-3011.00 Restaurant Hourly EQ Hourly

Attendant - Food Services


Laborer - Freight and Warehouse

53-7062.00 Warehouse Hourly Hourly

Recreation Worker 39-9032.00 Basic Entry Level

Hourly EQ Hourly

Enforcement Officer 33-3051.00 General Office


Food Server - Nonrestaurant


Hourly EQ Hourly

Teacher - Elementary School


Yes Professional EQ Professional Education Delivery

Fundamentals

Mechanic - Industrial Machinery


Teacher - Other 25-3099.00 General Office

Professional EQ Professional Education Delivery

Fundamentals

Food Preparation Worker


Hourly EQ Hourly Food Safety Fundamentals


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Nurse - Licensed Practical / Vocational



Assembler / Fabricator - Other



Team Assembler 51-2092.00 Basic Entry Level

Hourly EQ Hourly

Laborer - Packing / Packaging

53-7064.00 Warehouse Hourly Hourly

Carpenter 47-2031.00 Construction Hourly Hourly Carpentry Fundamentals

Janitor 37-2011.00 Basic Entry Level

Hourly Hourly

Plumber, Pipefitter, Steamfitter

47-2152.00 Construction Hourly Hourly Plumbing Fundamentals

Mechanical Aptitude

Cook - Restaurant 35-2014.00 Basic Entry Level

Hourly Hourly Food Safety Fundamentals

Helper - Dining Room and Cafeteria


Teacher - Secondary School



Fundamentals

Laborer - Landscaping and Groundskeeping


Hourly Hourly

Laborer - Construction

47-2061.00 Construction Hourly Hourly Construction Fundamentals

Teacher - Middle School



Fundamentals


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Cook - Fast Food 35-2011.00 CogFastFoody Hourly Hourly Food Safety Fundamentals

Dishwasher 35-9021.00 Basic Entry Level

Hourly Hourly

Firefighter 33-2011.00 General Office (Entry-

Level)


Cleaner - Vehicles and Equipment


Hourly Hourly

Operator - Industrial Trucks / Tractors

53-7051.00 Driver Hourly Hourly Mechanical Aptitude

First-Line Supervisor - Construction /

Extraction



Construction Fundamentals

Helper - Production 51-9198.00 Basic Entry Level


Attendant - Amusement /

Recreation


Hourly EQ Hourly

Teacher Assistant 25-9041.00 General Office (Entry-

Level)

Professional EQ Professional Education Delivery

Fundamentals

Cook - Short Order 35-2015.00 Basic Entry Level


Electrician 47-2111.00 Technician Hourly Hourly Electrician Fundamentals

Mechanical Aptitude

Teacher - Substitute 25-3098.00 General Office


Fundamentals

Cook - Institution and Cafeteria




Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Driver - School Bus 53-3022.00 Basic Entry Level

Hourly EQ Hourly

Laborer - Agricultural 45-2092.00 Construction Hourly Hourly

Mechanic - Bus,Truck, Diesel

Engine


Installer / Repairer - Telecommunications

Equipment



Manager - Construction

11-9021.00 Manager Professional EQ Professional Construction Fundamentals

Teacher - Postsecondary



Fundamentals

Coach / Scout 27-2022.00 General Office


Laundry and Dry-Cleaning Worker


Hourly EQ Hourly

Teacher - Special Education



Fundamentals

Assembler - Electrical and

Electronic Equipment


Hourly Hourly Electrician Fundamentals

First-Line Supervisor - Transportation and

Material-Moving



Cost Estimator 13-1051.00 Business and Finance

(Entry Level)

Professional EQ Professional MS Excel 2019

Simulation


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Painter - Construction and

Maintenance


Hourly Hourly

Operator - Machine - Metal and Plastic


Teacher - Self-Enrichment Education



Fundamentals

Clerk - Information and Record Clerks


(Entry-Level)


Printing Press Operators

51-5112.00 Technician Hourly Hourly

First-Line Supervisor - Housekeeping and

Janitorial



First-Line Supervisor - Helpers, Laborers, and Material Movers



Trimmer - Meat, Poultry, and Fish



Teacher - Kindergarten



Fundamentals

Baker 51-3011.00 Basic Entry Level

Hourly Hourly

Teacher - Health Specialties -

Postsecondary



Fundamentals


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

General Project Manager

11-1021.00 Manager Yes Professional EQ Professional Project Management Fundamentals

MS Excel 2019

Simulation

Information Technology Project

Manager


Yes Professional EQ Professional Project Management Fundamentals

MS Excel 2019

Simulation

Information Systems Architect/Engineer



Analyst - Business Intelligence



Simulation

Database Administrator


Yes Professional Professional Relational Database Concepts

Computer Programmer - Web

Developer with jQuery


Yes Professional Professional Core HTML5 JQuery Fundamentals

Manager - Human Resources

11-3121.00 Manager Yes Professional EQ Professional Human Resource

Fundamentals

Account Manager 43-4051.00 General Office


Service Concepts

Computer Programmer - C

Sharp (C#)


Yes Professional Professional Core C Fundamentals

Bank Teller / Universal Banker

43-3071.00 Bank TellerSales

Yes Professional EQ Professional Universal Banker

Fundamentals


Job Title O*Net SOC

Simulation Module

(Codgnitive)

Essay AIMS Module


Biodata Module

Skill Module 1

Skill Module 2

Flight Attendant 53-2031.00 Flight Attendant


Airline Pilot, Copilot, or Flight Engineer



Technical Writer 27-3042.00 General Office

Yes Professional EQ Professional Mechanical Aptitude

MS Word 2019

Simulation

Manager - Hospitality

11-9081.00 Manager Yes Professional EQ Professional Core Hospitality Concepts

Basic Cognitive & Behavioral

Assessment - Entry Level Version


Hourly

Fraud Examiner, Investigator, Analyst


(Entry Level)


Specialist - Risk Management



Simulation

Actuary 15-2011.00 Business and Finance


Travel Agent 41-3041.00 General Office

Yes Professional EQ Professional Fundamental Sales Concepts


Appendix B: Historical Validity Evidence for the

Cognitive Scales

Table 48. Compiled Validity Evidence for the Original Content of the Cognitive Work Simulation Scales

Performance Factor (various organizations)

Attention to Detail

Analytical Thinking

Listening Score (Organization A) .37

Performance Rating .50

Listening Score (Organization B) -.43

Listening Score (Organization C) -.35 -.34

Performance Rating (Organization A) .42

Performance Rating (Organization B) .39

Sales/Hour -.26 -.34

Cross Selling .37 .39

Response Quality .33

Average 2nd Contact .21

Schedule Conformance -.09

Performance Rating (Organization C) .39 .16

*Study results provided by original content developer. Sample sizes are all larger than 200 and p<.05


Appendix C: Historical Validity Evidence for AIMS

Tables 47-53 contain the results of criterion-related validity studies conducted by the developer of the original AIMS content. Please note that

these results are based on the longer, 100-item version of the assessment. Table 53 summarizes the statistically significant relationships (p<.05)

between the original AIMs scales and various measures of performance. The table includes results for over 5000 applicants and over 12 different

organizations from multiple industries including Financial Services, Insurance, Hospitality, Market Research, and Pharmaceuticals. Note that all

significant correlations are reported and although some of these correlations are negative, we would not expect all of the scales to be positively

related to all job performance dimensions (see above section on previous personality research).

Table 49. Study 1 Results: Concurrent Validation Study for Insurance Consultants N=122-136

Competency Performance

Appraisal Average

Policy Count Average

QRF Level Average

Idle Time Average CPH

Service Average Second

Contact

Needs Structure .18 -.15 .18

Innovative & Creative .16 .19

Enjoys Problem-Solving .24

Seeks Perfection

Exhibits a Positive Work Attitude -.19


Table 50. Study 2 Results: Concurrent Validation Study for Inside Sales N=105

Competency Perfectionism Quality

Innovative & Creative .26


Competitive

Seeks Perfection .19

Develops Relationships .28 .16

Expressive & Outgoing .27 .18

Corporate Citizenship -.17


-.18

Adaptable -.17

Table 51. Study 3 Results: Concurrent Validation Study for an Internet Services Order Processing N=84

Competency Quality


Develops Relationships -.27

Competitive -.22

Table 52. Study 4 Results: Concurrent Validation Study for an Internet and Cable Sales and Service Position N=72-93

Competency Performance Quality Attendance

Needs Structure .30 .26


Competitive

Seeks Perfection .21

Develops Relationships .24

Exhibits a Positive Work Attitude .25

Adaptable -.20

Table 53. Study 5 Results: Concurrent Validation Study Auto Rental Sales Role N=80-92

Competency Quality Yield Productivity

Needs Structure .30 .25 .29

Innovative & Creative .32

Seeks Perfection .34 .23

Adaptable .19


Table 54. Study 6: Concurrent Validation Study for Paramedics N=85

Competency Reading Learning

Office Procedures

Assessing Following Procedures

Dealing with

Difficulty

Treating Others

with Respect

On time

Overtime Flexibility Sick

Needs Structure

-.21 -.25

Innovative & Creative

-.26 -.15


-.27

.18 -.26 -.20

Competitive -.22 .20 .24

Seeks Perfection

-.21 .18


.19 -.21

Expressive & Outgoing

-.24 -.29


.24 .23


-.24 -.22

Adaptable .36 .26

HR AVATAR ASSESSMENT SOLUTION TECHNICAL MANUAL - DECEMBER 2, 2019 96

Table 55. Summary of Relationships between Performance Measures and Original AIMS Scales

Performance Measure

Enjoys Proble

m Solving

Innovative and

Creative

Needs Structur

e

Adaptable

Develops Relationship

s

Expressive and

Outgoing

Competitive

Seeks Perfectio

n

Positive Work Attitud

e

Corporate Citizenshi

p

Total Score -.15 .20

Conversion Rate

-.15 -.16 .21 -.12

Calls Per Hour -.28 .22 -.17 -.17

Unavailability .18 .34 .14

Positive Attitude

Team Attitude -.14 .13 .14

Service Attitude

Caring Attitude .12

Ownership -.12

Performance -.13 .13

Range .13

Ranking -.12 .23 -.17

Sales -.13 -.15 -.10

Talk Time -.20

Account Weight -.14 -.18 -.14 -.14

Total Score -.49 .58

Conversion Rate

.47

Ranking (A) .20 .19

Ranking (B) .20 .28 .21 .21

Call Management

.26 .28 .27 .26 .19 .15 -.13

Average Hold Time

.15

Weighted Rating

.27

Supervisor Rating of Overall Performance

.23 .15


Performance Measure

Enjoys Proble

m Solving

Innovative and

Creative

Needs Structur

e

Adaptable

Develops Relationship

s

Expressive and

Outgoing

Competitive

Seeks Perfectio

n

Positive Work Attitud

e

Corporate Citizenshi

p

Supervisor Rating of Skill Acquisition

.34 .29 .25

Supervisor Rating of Summary Performance

-.41 .39

Supervisor Rating of Overall Performance

-.36 .28 .30


Appendix D: Scoring Rubric for Essays

Grammar Content Structure

Syntax, vocabulary, spelling, sentence structure, natural-sounding, command of English

Conciseness, appropriateness, addressed prompt

Logic, flow, organization, format

0 - 3 Latent with mechanical errors. May include errors in spelling, punctuation use, contractions, prepositions, missing words, verb conjugation and tense, etc. Contains a generally poor command of English. Errors are to the point that the writing is barely understandable, if at all.

Does not address more than half of requirements set up by prompt. Response may be incomplete or extremely off-topic. Response may have been written in a completely inappropriate tone for the intended audience. May be extremely unclear.

Ideas are so jumbled by illogical organization that they may be hard or impossible to follow. Format is not apparent or completely inappropriate for intended audience. There may be little to no transition between different ideas.

4 - 6 There are a few errors in grammar, but they do not severely hinder comprehension of the writing. English/wording may sound somewhat unnatural. Main idea of writing is still understood.

Addresses many or all requirements set up by prompt, but they may not have been thoroughly developed, may be missing information, or may be unclear. Response may be slightly off-topic.

Organization of ideas is slightly off-putting and confusing, but reader should be able to follow them and the main idea is still communicated. Ideas may lack transition when needed.

7 - 10 There may be one or two small errors, but they are very minor and do not affect comprehension of the writing. English sounds natural and flows well.

All requirements set up by the prompt are addressed clearly and well developed. There are no confusing spots. Response is not off-topic or missing information.

Organization of ideas is logical and easy to follow. Transitions are usually or always apparent where appropriate.


Appendix E: Directions for Rating Essay Tests

Scoring with a Rubric:

1. Read the rubric.

2. Read the prompt

3. Outline all requirements listed in the prompt

4. Read the response

5. Choose an appropriate score for the writing sample in each of the areas listed on the

rubric. Be sure to rate each area separately, and not to allow a good/bad score in one area to

affect the way you score another area in the same writing sample; a writing sample with

many spelling errors may still reflect all the required content.

Example of a writing sample with a perfect score:

Prompt:

Pretend you are an administrative assistant. Your boss wants to have an offsite team meeting to set

goals for you and the rest of the team next year. Write an email asking team members to attend an

all-day meeting on the first Monday of next month. Tell team members to write down their goals for

the year and come prepared to present them to the group. Additionally, ask if anyone wants to

volunteer to plan a team activity.


Response:

This person scored a 30/30 on their writing sample, on a basis of three factors: grammar, content,

and structure.

There are no errors in grammar or spelling. The English sounds natural and the writing flows well.

The writer scored a 10/10 for grammar.

The writer answered the question thoughtfully and thoroughly. Despite the fact that the author was

not presented with a date, time, or place in the prompt, he/she recognized that, in an actual work

setting, these figures would be required in a successful email, and included all necessary information.

The response addresses all requirements outlined in the prompt and leaves nothing unclear. 10/10

content.

The sample is structured logically so that it flows without abrupt jumps or changes in idea. It is easy

to read and follow. The writer also included a subject line and list of recipients, an appropriate

introduction (Team Members:) and conclusion (Thank you,). The sample received a 10/10 in

structure, earning it a 30/30 overall.

Example of a poor response to the same prompt:

Response:

This person scored a 14/30 on their writing sample based on the same three factors: grammar,

content, and structure.

We are going have a meeting on the first Monday next month. I want everybody to think of

some goals and write them to present to everyone. Does anyone want to plan a team activity?

Team Members:

Arthur is hosting a mandatory team meeting to set goals for next year. The meeting will take

place January 30, 2014 from 1:00 P.M. to 4:00 P.M. at the Ritz Carlton in Vienna, VA in

Conference Room B.

Arthur is expecting each of you to attend. Also, he is expecting you to prepare for the meeting

by documenting your goals and being prepared to present them to the team.

Arthur is looking for a volunteers to lead a team activity during the meeting. If you would like to

volunteer, contact Arthur with your idea by January 15th.

Thank you,


The grammar is understandable and there are no glaring errors, however, there is a missing word

(We are going have) and the wording is not always clear. The writer scored a 7 for grammar.

Only a few of the requirements set up by the prompt are addressed here, and when they are, the

ideas are hardly developed or elaborated on. Were this a real email, much would be left to confusion.

This person received a 3/10 for content.

There is little transition between ideas or logical flow in this response. Also, this response is not

structured in an email format. The sample received a 4/10 for structure, giving it a 14/30 overall.


Appendix F: Validity Evidence for HR Avatar Tests

The HR Avatar Automated Essay Scoring System

January, 2016

Introduction

Written communication is a key skill in many positions. Communicating via email, writing

reports, and creating presentations all require the ability to communicate effectively.

Traditionally, essays written for assessment purposes are scored using human raters and a pre-

defined scoring rubric. However, the cost of human scorers is relatively high and humans can

become fatigued and erratic in high volume situations.

Luckily, machine learning has advanced to the point where computers can substitute for human

raters reliably. The HR Avatar Essay Test is an implementation of this technique, which results

in lower cost and faster scoring turn-around.

The HR Avatar Essay Test consists of several writing prompts. The writing prompts were

designed to be general enough to provide an opportunity for anyone to be able to write a short

essay. It is easy to add additional prompts for specific situations or for general use.

Applicants are asked to write a short essay with a minimum of 100 words and are given an

unlimited time to do so. The essays are scored using Discern, an open source, machine learning

program. Discern was designed by edX, a nonprofit organization founded by Harvard and the

Massachusetts Institute of Technology (MIT) (edX, 2015; Markoff, 2013). The system produces

a score that ranges from 0 to 100. A confidence estimate for the score is also computed, which

ranges from 0 to 1. Scores with confidence estimates less than .10 are not considered valid.

How it works

HR Avatar uses open source essay scoring software originally published by EDX Corporation, a

spin-off of The Massachusetts Institute of Technology (MIT).

Software addresses and performs regression to produce a score for each submitted essay along a

continuous scale. This is different from classification, in which the software would simply

attempt to categorize each essay into one or more 'groups' or to rank the essays relative to one

another.

Each essay is written according to a predetermined set of instructions typically referred to as the

"Prompt." A typical prompt might be: "In a short essay of 100-400 words, explain whether it's

better to be a planner or to be a dreamer."

All essays are scored by the machine learning algorithm based on a "Training Set" upon which a

regression model has been built. The algorithm essentially analyzes all of the training essays and

produces a best guess at how the human scorers who created the training set would have judged

the new essay.


The application is written in Python and utilizes several open source machine-learning tools and

is centered around a machine-learning library called scikit-learn (http://scikit-learn.org), which in

turn uses a number of other open source mathematical and data manipulation packages

In order perform its task, the application converts each essay into a number of different

"features." Features are measurable aspects of the essay, such as spelling errors per character, or

grammar errors per character. In concept, the essay is reduced to an N-dimensional vector

containing all of the essay's feature scores. However, some features are complex vectors in and

of themselves.

Each feature is measured using specialized software for text analysis. For example, the grammar

errors are determined by looking for good and bad 'ngrams' which are essentially models of

either good or bad grammar.

Another important feature known as a "bag of words" is also generated. The bag-of-words model

is a simplifying representation used in natural language processing and information retrieval

(IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset)

of its words, disregarding grammar and even word order but keeping multiplicity. This results in

a vector with a length equal to the number of unique words in the largest essay evaluated.

It's helpful to understand the Bag of Words approach in terms of how it's used to filter out junk

email.

In Bayesian spam filtering, used by many spam filters, an e-mail message is modeled as an

unordered collection of words selected from one of two probability distributions: one

representing spam and one representing legitimate e-mail ("ham"). Imagine that there are two

literal bags full of words. One bag is filled with words found in spam messages, and the other

bag is filled with words found in legitimate e-mail. While any given word is likely to be found

somewhere in both bags, the "spam" bag will contain spam-related words such as "stock",

"Viagra", and "buy" much more frequently, while the "ham" bag will contain more words related

to the user's friends or workplace.

To classify an e-mail message, the Bayesian spam filter assumes that the message is a pile of

words that has been poured out randomly from one of the two bags, and uses Bayesian

probability to determine which bag it is more likely to be.

Along with the bag of words feature, another feature is generated that represents how 'topical' the

essay is, by using the bag of words vector that was generated.

Once the features are generated, the application formulates a model, using all training essays,

and their accompanying human-generated scores. The model is essentially a catalog of all feature

measurements for all of the training essays, along with their scores. Once created, this model can

then be used to determine where in the score space a new, unscored essay lies, based on its

feature measurements. In addition to score values, error values, which indicate how consistent

the training essay set was, can be calculated. This can provide a confidence value for the final

score.

The software uses a technique called Gradient Boosting Regression to pinpoint the score within

the model for a given essay by comparing the features for the new essay against the feature sets


of the pre-scored or 'training' essays. This is a well-established machine learning technique. Data

theory shows that this technique yields excellent results for regression problems like essay

scoring.

How does it perform?

The best indication of how well the machine learning algorithm works is to measure how well it

predicts the score a human rater would have come up with for any given essay. To do this we can

evaluate the machine-human rater reliability.

Reliability is a critical aspect of any assessment. It describes whether the score is consistent, and

puts an upper limit on the validity of the assessment. The data were analyzed to ascertain the

reliability of the machine scores of the essays to represent the scoring of human essay raters.

One thousand, two hundred and fourteen (N=1,214) essays were scored using both human

scoring and machine scoring. The correlation between the ratings was .73, representing an inter-

rater reliability of .73, which indicates that the machine scoring reliably rates the essays similarly

to human raters. In the world of testing, a reliability value of 0.73 is generally considered more

than acceptable.

Therefore, the machine scoring of the HR Avatar essay test was demonstrated to be a reliable

method for scoring essays that is similar to human ratings, but significantly more efficient,

requiring little or no human time or effort to arrive at an assessment of a large number of

applicants’ writing skills.


Appendix G: Validity Evidence for HR Avatar Tests

Results for the HR Avatar High Potential Solution

February 23, 2016

Validation studies have been conducted to confirm that the HR Avatar High Potential Solution

predicts job performance. Two separate studies are reported below. The first compared performance

on the assessment to a dichotomous outcome measure. In the second study, scores on the

assessment were correlated with a 1-100 performance rating.

Study 1

Three companies were included in the first study of the ability of the HR Avatar assessment to

predict manager job performance. The three companies were Johnson & Johnson, Harte Hanks, and

Manulife. They were identified as leaders in the Philippines in terms of using best practices in

personnel management. Individually, their sample sizes were too small to conduct separate

validation studies, but together the number of participants was high enough to enable confidence in

the results.

Results are reported below. The managers received a job performance rating of either “high” or

“marginal.” The sample included 130 managers. It is likely that the performance measure attenuated

the correlation because it is only two levels, which limits the amount of variance, accuracy, and

consistency it can provide. However, it does provide a measure of performance. Based on the

literature review of the components in the HR Avatar Assessment, we expect validity to be very

strong, approximately .35 - .40 uncorrected, and in the .50 - .60 range when corrected for criterion

unreliability.

Table 1 presents the results of the correlation analysis. The overall score predicted job performance

significantly (r=.25; p<.01). Although we are pleased to see a statistically significant correlation that

approaches .30, we believe this is an underestimate of the actual validity, due to the two-level

performance measure. Among the subcomponents, Attention to Detail was particularly robust in

predicting performance (r=.22; p<.05), as was Adaptable (r=.21; p<.05).

The high/marginal performance rating probably has low reliability. Thus, if we use a low reliability

of .40 for the correction for criterion unreliability, the validity increases to .63. We also used a higher

estimate for criterion reliability of .60, which is typically used as an estimate of reliability for well-

developed multi-level performance ratings, as an estimate of the reliability of the high/marginal

performance rating. The result was a more conservative correction for attenuation that yielded a

corrected correlation is .42.


Further support of the assessment’s prediction of job performance was provided by T-Test analyses,

which indicated that the high-performing group had a significantly higher average overall score

(X=63.16) on the assessment (p<.01) than those who were in the low-to-average job performance

group (X= 54.31).

In addition to predicting test performance, various scores on the assessment were related to

Leadership Engagement and Leadership Aspiration, suggesting that the underlying constructs are

related to important attitudes for leadership potential. For example, Needs Structure, Develops

Relationships, and Corporate Citizenship were related to Leadership Engagement. Further, several

test component scores were related to Leadership Aspiration, including Innovative and Creative,

Enjoys Problem Solving, Develops Relationships, and Adaptable.

Table 1: Correlations between Test Components and Job Performance and Job Attitude

Measures

Test Score Performance (High/

Marginal)

Leadership Engagement

Leadership Aspiration

Overall Score .25** .00 .16

Writing -.06 -.15 .09

Analytical Thinking .17 -.07 -.07

Attention to Detail .22* .01 .09

Adaptable .21* .10 .39**

Develops Relationships .02 .35** .50**

Enjoys Problem Solving .16 .26** .51**

Expressive and Outgoing -.01 -.13 .02

Innovative and Creative .18* .30** .67**

Needs Structure .02 .35** .36**

Seeks Perfection .13 .24** .36**

Frontline Management Fundamentals

.03 .03 .16

Corporate Citizenship .04 .28** .28**

Exhibits A Positive Work Attitude -.04 .11 .06

Competitive -.01 -.07 .00

Notes. *=p<.05; **=p<.01. N=130.

Study 2

The second study was conducted for an organization called UCPB LEAP. The sample was 64

managers who completed the HR Avatar assessment, and their scores were compared to their

previous year’s performance appraisal. The sample was small, which necessitated using a

nonparametric form of correlation called Spearman’s Rho, which indicates the extent to which the

rank order on the test was similar to the rank order on the performance measure. The correlation


was r=.25, which was significant (p<.05). This supports the assertion that managers who scored

higher on the assessment achieved higher performance ratings.

Conclusion

Based on the two studies presented above, we can say with confidence that managers who score

higher on the assessment perform better on the job. The uncorrected correlation of .25 between

overall score and the high/marginal performance measure is significant. When corrected for

criterion unreliability using a conservative approach, the correlation becomes .42. T-Tests also

support the same conclusion. More research is needed to build on the initial, yet promising results

described above. We expect that the results will demonstrate larger effect sizes and more robust

prediction of job performance when we are able to obtain measures of job performance that have

more variance and subjects are not simply placed into “high” and “marginal” categories. When we

have the time to adjust the scoring and weighting of the overall scores and have better criterion

measures, we believe it will be closer to .35 or more uncorrected, which would then yield a corrected

validity of approximately .60.

hr avatar assessment solution technical...

Documents