random and non-random measurement error in hrm research: measuring and explaining differences in...

Random and Non-Random Measurement Error in HRM Research: Measuring and Explaining Differences in Management-Employee Representative Responses in

WERS2004

RICCARDO PECCEIDepartment of Management

King’s College London

Email: [email protected]

Tel: 0207-848 4094

The author acknowledges the Department of Trade and Industry, the Economic and Social Research Council, the Advisory, Conciliation and Arbitration Service and the Policy Studies Institute as the originators of the 2004 Workplace Employment Relations Survey data, and the Data archive at the University of Essex as the distributor of the data. None of these organizations bears any responsibility for the author’s analysis and interpretation of the data.

**Please note that this is work in progress, results are preliminary**

Rationale of the Study1. Unquestionable value of large-scale surveys like WERS2004.

2. One potential problem with these surveys, however, is that they typically rely

on the responses of a limited number of key informants in each unit to describe organisational properties of interest (e. g. what HR practices are in place).

3. But organisational members’ (e.g. managers’ and employee representatives’) perceptions of organisational properties are often subject to considerable error and distortion. Not unusual for different respondents to disagree with each other and provide substantially different answers to the same survey items (Payne & Pugh, 1976; Starbuck & Mezias, 1996, 2000).

• Problems of measurement error due to low interrater agreement and reliability of this kind are well documented in the areas of OB and HRM (Gerhart et al., Ostroff & Schmidt, 1993).

• They have also been noted in relation to previous WERS surveys in terms of managers’ and employee representatives’ responses to a range of specific items (Cully & Marginson, 1995; Peccei & Benkhoff, 2001).

Rationale of the Study

4. These problems, in turn, raise serious questions about the reliability and validity of measures used in analysis and, therefore, of study results more generally. As a result, they make cumulative research progress much more difficult to achieve.

5. These problems are particularly acute in research based on large-scale surveys using a

single-respondent design format, i.e. where information about organisational properties of interest (e.g. HR practices) comes from a single key (commonly management) respondent.

• This is the case, for example, with most WERS-based HRM studies (Ramsey et al., 2000).

• Also true of much of the American survey-based strategic HRM research looking at the link between human resource (HR) practices and firm performance.

• Much of this research uses a single senior management respondent to rate and describe the HR practices of an entire organisation (Huselid, 1995; Delery and Doty, 1996; Huselid et al., 1997).


6. Despite the continued reliance of much HRM research on a single-respondent design format, there is substantial evidence to suggest that single respondent measures of HR practices have low reliability, i.e. they contain large amounts of rater-related measurement error (Gerhart et al., 2000; Wright et al., 2001). “Single raters provide unreliable measures of HR practices” (Wright et al., 2001: 899).

• In the HRM literature, measurement error due to raters, assessed in terms of interater reliability, is commonly assumed to be random.

• Some researchers (e.g. Wright et. al. 2001) have considered whether error due to raters in the measurement of HR practices may be patterned, rather than being purely random (whether interrater reliability may be lower, for example, in large than in small organisations).

• But little systematic research has been done in this area.

• As a result, little is known about the factors that might influence interrater reliability and the extent to which the measurement error that is commonly associated with single respondent measures of HR practices is random or non-random in nature.


7. This is a key issue since whether the error involved is random or systematic/patterned can make a major difference to the interpretation of observed links between HR practices and various outcomes of interest. In particular, it affects the corrections for attenuation due to measurement error that might be applied to observed regression and correlation coefficients for specific HR practices-outcomes relationships.

Aims of the Study

1. Key aim is to contribute to this area of inquiry through a detailed analysis and comparison of management and employee representative responses to a set of 45 matched primary (i.e. non-filtered) questions in a sample of 459 WERS2004 establishments.

2. The specific aims are:

• To identify and describe the extent of interrater reliability, as well as agreement and bias, between management respondents (MR) and employee representative respondents (RR) across the set of matched WERS2004 survey items.

• To develop a general model of interrater reliability/agreement that is then used to explore the extent to which observed measurement error in the WERS2004 data is random or patterned.

• To consider the analysis and survey design implications of the results, both in general terms and in terms of WERS2004 in particular.

Conceptualisation of Interrater Reliability, Agreement and Bias

1. There is an extensive statistical, psychometric and psychological literature dealing with issues of validity and reliability of measurement in the social sciences, including methods for the analysis of interrater agreement and reliability.

2. Following Agresti (1992), Uebersax (1992) and Bliese (2000), three main components or dimensions of interrater reliability/agreement can be distinguished:

(a) Interrater Reliability (ICC(1) and ICC(2))

• Interrater reliability refers to the degree of association of raters’ ratings, or the relative consistency of responses among raters (i.e. the extent to which raters’ ratings correlate with each other) (Bliese, 2000).

• When each target (e.g. organisational unit) is rated on a particular item/property by two or more raters (e.g an MR and an RR), interrater reliability is most commonly assessed by means of two major forms of the interclass correlation coefficient (ICC): the ICC(1) and the ICC(2) (or what Shrout & Fleiss (1979) refer to as the ICC(1,1) and the ICC (1,k) respectively – where k refers to the number of raters).


(i) The ICC(1) can be interpreted as a measure of the reliability associated with a single assessment of the group mean.

• ICC(1) is an aggregate level measure based on the ratio of between to within mean squares across sets (pairs) of raters – i.e. it is a function of the extent of both within and between group variation in ratings.

• ICC(1) normally ranges between 0 - 1 and corresponds quite closely to the Pearson correlation coefficient between pair of raters (0 = low interrater reliability / no association between pair of raters, 1 = high interrater reliability / association between pairs of raters).

(ii) The ICC(2), on the other hand, provides an estimate of the reliability of the group means, i.e. of the reliability of an aggregate group level measure (Bartko, 1976; Bliese, 2000).

• ICC(2) is related to the ICC(1) but is mainly a function of within group variation in ratings and group size. It increases as a function of group size.

• ICC2 also varies from 0 - 1 (0 = low reliability of group mean, 1 = high reliability of group mean).


(b) Interrater Agreement/Consensus

• Interrater agreement proper refers to the extent to which raters make essentially the same ratings.

• The most commonly used measure of within-group interrater agreement in the OB and HRM literature is James et al.’s (1984, 1993) rwg index.

• Here I use the rwg* index (Lindell et al. 1999) which is a variation of the rwg. For dichotomous measures, the rwg* is equivalent the the rwg, as well as to other unit level agreement indexes that have been proposed in the literature (e.g. Burke et al.’s (1999) Average Deviation index).

• The rwg*, like the rwg, is an organisationally specific or unit level coefficient that essentially measures the degree of within-group variance in (or absolute level of disagreement between) raters’ scores. Separate rwg* estimates can be calculated for each pair of raters on each separate target/item that is being rated.

• The rwg* ranges from 0-1 (0 = high variance / no consensus in ratings, 1 = low variance / high consensus in ratings).

• At aggregate level of analysis I also use the average raw MR and RR agreement scores across the set of 459 establishments, as well as Yule’s Q.

• For dichotomous items, Yule’s Q provides an aggregate omnibus measure of the degree of MR and RR agreement adjusted for chance. Yule’s Q ranges from 0 (low agreement) to 1 (high agreement).


(c) Interrater Bias

• Rater bias refers to the tendency of given raters to make ratings that are generally higher or lower than those of other raters (Uebersax,1988).

• This component, therefore, refers not to the absolute level of disagreement between raters (i.e. the absolute difference or variance in raters’ scores) as such, but to the direction of disagreement between them.

• The simplest measure of bias is just the difference in scores between two raters (positive or negative bias).

• Bias measures (i.e. differnce scores) can be calculated at either unit or aggregate level. At aggregate level can then use t-tests to assess the degree of difference between mean scores of two groups of raters.

Model of Interrater Agreement/Reliability

1. There are a number of factors that have been suggested to affect the extent to which raters are likely to agree about particular targets/elements that they are rating, such as the contentiousness of the phenomenon involved (Green & James, 2003).

2. The best known model of rater agreement/consensus is that proposed by Kenny (1991) in the social psychological domain (see also Hoyt & Kerns, 1999).

3. In the present study I draw on this model but adapt it and extend it to the HRM domain, linked specifically to WERS2004.

Model of Interrater Agreement/Reliability

4. Specifically, MR-RR interrater agreement/reliability in WERS2004 can be expected to be affected by five main factors (see Table 1 for details):

a) The nature of the attributes/items being rated (e.g. objective vs. subjective; HR practices vs. non-HR practices).

b) The nature (complexity, heterogeneity and stability) of the target system/organisation being rated (e.g. size, age and stability of the establishment).

c) The nature of the MR and RR raters, especially in terms of their knowledge and experience of the target system they are rating (e.g. their level of seniority, formal position and length of tenure in the establishment).

d) Relational factors designed to capture common/shared MR and RR experiences (e.g. frequency of contact between MR and RR raters, IiP accreditation).

e) Shared world view and understanding of MR and RR raters (e.g. whether RR are union or non-union representatives, extent of mutual trust between MR and RR).

Table 1 –Model of Interrater Agreement/Reliability in WERS2004

Factors

Expected Effect on MR & RR Interrater

Agreement/Reliability Nature of Items Subjective or Semi-Subjective items (0) vs. Objective items (1)

(+) Positive

Non-HR practice item (0) vs. HR practice item (1) Indeterminate

Nature of Target/Establishment Establishment size (small < 200 employees = 0) (large 200 + employees = 1)

(-) Negative

Single/Independent establishment (part of larger organisation = 0) (single = 1)

(+) Positive

Age of establishment (> 5 years old = 0) (Up to 5 years old = 1)

(-) Negative

Organisational Shock Index (major changes at establishment over last 5 years ) (few changes/high stability = 0) (many changes/low stability = 1)

(-) Negative

Nature of MR & RR Raters MR & RR both senior informants (MR & RR not both senior = 0) (MR & RR both senior = 1)

(+) Positive

MR & RR both full-time on HR/IR (MR & RR not both full-time = 0) (MR & RR both full-time = 1)

(+) Positive

Table 1 continued –Model of Interrater Agreement/Reliability in WERS2004

Factors

Expected Effect on MR & RR Interrater Agreement/Reliability

Relational Factors High frequency of MR & RR contact (MR & RR meet < once a week = 0) (MR & RR meet at least once a week = 1)

(+) Positive

Establishment has attained IiP recognition (No = 0) (Yes = 1)

(+) Positive

Shared Understanding MR & RR of the same sex (N0 = 0) (Yes = 1)

(+) Positive

RR is non-union employee representative (N0 =0) (Yes = 1)

(+) Positive

MR & RR report high mutual trust (No = 0) (Yes = 1)

(+) Positive

MR = Management respondent RR = Employee representative respondent Note that the effect of the nature of items on interrater agreement/reliability can only be tested at the aggregate level of analysis. In contrast, the effect of the other factors can, in principle, be tested at both aggregate and unit (i.e. establishment) level of analysis.

Sample and Data1. Establishments = 459 establishments for which both MR and RR data were available on a

matched set of 45 questions in WERS2004.

2. Respondents = 918 respondents (459 pairs of MR and RR – one MR-RR pair per establishment).

3. Overall Rating Design = nested design (two raters – one MR and one RR – nested within

each target / establishment, see Hoyt, 2000). 4. Questions = 45 non-filtered matched questions from the management and employee

representative questionnaires in WERS2004. The majority of questions involved a dichotomous (yes/no) response format. A number of the questions, however, used categorical (non-continuous or non-interval) response scales (e.g. whether management at the establishment did not inform, informed, consulted, or negotiated with employee representatives over a range of specific issues). Response categories on all categorical scales were treated as separate dichotomous (yes/no) items, making for a total of 74 matched dichotomous items for use in the main analysis.

5. Total items used in the analysis = 74 matched dichotomous items based on the 45

dichotomous and categorical matched MR and RR questions.

Sample and Data4. Type of Items Covered in the Analysis

a. HR Practices (47 items)(i) Presence of formal ‘due process’ procedures (5 items) - (e.g. whether there are formal grievance and disciplinary procedures at the establishment)

(ii) Information-sharing practices (3 items) - (i.e. whether management shares information on a range of strategic issues at the establishment)

(iii) Representative voice practices (39 items) - (i.e. whether management informs, consults or negotiates with employee representatives on a range of 13 issues)

b. Events / HR Outcomes (25 items)(i) Occurrence of various form of industrial conflict at the establishment (10 items)

(ii) Threat of various forms of industrial action at the establishment (8 items)

(iii) Occurrence of various types of changes at the workplace (7 items) – (e.g. introduction of new technology)

Sample and Datac. Attitudes (2 items)

(i) Management more or less favourable to union membership at the establishment (2 items)

(see Table H1 in handout for more detailed listing of items)

5. Dependent Variables (see above)• Interrater Agreement Measures

a) % MR & RR agreement (aggregate level only)b) Yule’s Q (aggregate level only)

c) rwg* (aggregate and unit level)

• Interrater Bias a) MR-RR difference scores (aggregate level only)

• Interrater Reliabilitya) ICC(1) (aggregate level only) b) ICC(2) (aggregate level only) c) (3) Pearson correlation (aggregate level only)

6. Independent Variables (see Table 1 above)

Results

A. Aggregate Level Analysis

1. Descriptives (Table H1 in handout and Tables 2a and 2b)

2. Correlations (Table 3)

4. Corrections for Attenuation (Table 4)

5. Regressions for items (Table 5)

6. Bivariate test of model (Table 6)

B. Unit/Establishment Level of Analysis

1. Multivariate test of model (Table 7)

C. Summary of Aggregate and Unit Level of Analysis

1. Test of model: Summary results (Table 8)

Table 2a - Aggregate level analysis: Aggregate measures of interrater agreement and reliability - Mean scores by type of item (N = 74 items across 459 pairs of MR and RR raters in 459 establishments)

1 2 3 4 5 6 7 Type of Item/Variable

% MR & RR

Agreement

Yule’s Q

Bias: Mean

MR-RR

Pearson Corr.

MR & RR

ICC(1)

ICC(2)

Mean Rwg*

HRPs: Objective (8 items) (Average of items A1-A8)

79% .42 .04 .14 .09 .17 .84

HRPs: Semi-objective (39 items) (Average of items B1-B39)

66% .28 .01 .11 .10 .18 .65

Events: Objective (10 items) (Average of items C1-C10)

96% .68 .00 .34 .29 .39 .87

Events: Semi-objective (15 items) (Average of items D1-D15)

73% .48 .00 .18 .17 .27 .85

Attitudes: Subjective (2 items) (Average of items E1, E2)

67% .51 .04 .26 .26 .41 .67

Total HRPs (47 items) (Average of items A1-B39)

67% .30 .02 .12 .10 .18 .69

Total Non-HRPs (27 items) (Average of items C1-E2)

81% .56 .00 .24 .22 .32 .84

Grand Total (74 items) (Average of items A1-E2)

72% .40 .01 .16 .14 .23 .74

MR = Management respondent RR = Employee representative respondent

Table 2b – Aggregate level analysis: Comparison of mean scores by type of item for aggregate measures of interrater agreement and reliability

1 2 3 4 5 6 7 Type of Item/Variable

% MR & RR

Agreement

Yule’s Q

Bias: Mean MR-RR

Pearson Corr. MR &

RR

ICC(1)

ICC(2)

Mean Rwg*

Difference between means of five types of items (see Table 2a)

*** ** ns *** ** ** ***

Difference between means of HRP items and non-HRP items (see Table 2a)

*** *** ns *** *** ** ***

ns Difference between means not significant at < .05 level* Difference between means significant at < .05 level** Difference between means significant at < .01 level*** Difference between means significant at < .001 level

Table 3 – Aggregate level analysis: Correlations between aggregate measures of interrater agreement and reliability (N = 74 items)

1 2 3 4 5 6 7 Aggregate Level Measures of Interrater Agreement and Reliability

% MR & RR

Agreement

Yule’s Q

Bias: Mean MR-RR

Pearson Corr. MR &

RR

ICC(1)

ICC(2)

Mean Rwg*

1. % MR & RR Agreement

-----

2. Yule’s Q

.56*** -----

3. Bias: Mean MR-RR

-.33** -.05 -----

4. Pearson Correlation MR & RR

.41*** .65*** .04 -----

5. ICC(1)

.42*** .67*** .02 .99*** -----

6. ICC(2)

.37** .73*** .03 .97*** .98*** -----

7. Mean Rwg*

.73*** .34** -.26* .19 .09 .06 -----

MR= Management respondent RR = Employee representative respondent

* p < .05 ** p < .01 *** p < .001

Table 4 – Correlations between selected outcomes and HR practice measures based on MR & RR respondents – Observed uncorrected correlations vs. correlations corrected for attenuation (average interater error/unreliability scores)

Outcome Variables MR HRP Scale RR HRP Scale Rate of absence at establishment

-.11* (-.37***)a

-.12* (-.39***)

Rate of voluntary turnover at establishment

-.12* (-.38***)

-.04 (-.14*)

Establishment comparative financial performance

.02 (.08)

-.01 (-.03)

Establishment comparative productivity

-.06 (-.22**)

.01 (.02)

Establishment comparative quality of products/services

-.05 (-.18**)

.00 (.00)

a Corrected correlations in parentheses.

MR = Management respondent RR = Employee representative respondent

•p < .05 ** p < .01 *** p < .001

Assumptions: (1) Average (interrater) reliability of HRP measure = .10 (see Table 1a)(2) Average reliability of objective absence and turnover measures = 1.00(3) Average reliability of financial, productivity and quality performance measures based on MR subjective assessments = .70

Table 5 – Aggregate level regression analysis: Effect of type of item on aggregate measures of interrater agreement and reliability (N = 74 items)

1 2 3 4 5 6 7 Independent Variables

% MR & RR

Agreement

Yule’s Q

Bias: Mean MR-RR

Pearson Corr. MR &

RR

ICC(1)

ICC(2)

Mean Rwg*

Non-HRP vs. HRP items (0 – 1)

-.29**a -.37** .09 -.38** -.38** -.35***

-.43***

Subjective & Semi-objective vs. Objective items (0 – 1)

.50*** .25* .07 .26* .16 .11 .31**

Total R2

.41*** .25*** .01 .27*** .20*** .15** .34***

Total F

23.35*** 10.81*** .37 12.07*** 8.81*** 6.46** 18.15***

Adjusted R2

.39*** .22*** .00 .24*** .18*** .13** .32***

(N)

(70) (69) (74) (70) (74) (74) (74)

a Standardised beta coefficients

MR= Management respondent RR = Employee representative respondent

* p < .05 ** p < .01 *** p < .001

Table 6 – Aggregate level analysis: Bivariate test of rater agreement model – Comparison of mean correlations between MR & RR ratings for each independent variable by type of item

Mean Correlations between MR & RR Ratings for:

Independent Variables HRP

Items (N = 47)

Non-HRP Items

(N =27)

All Items (N = 47)

Sector (0) Private [establishment N (%) = 216 (47%)] (1) Public [establishment N (%) = 243 (53%)] Difference between means (Item N)

.15 .09

.06** (44)

.19 .24 -.05 (22)

.16 .14 .02 (66)

Establishment Size (0) Small [establishment N (%) = 170 (37%)] (1) Large [establishment N (%) = 289 (63%)] Difference between means (N)

.11 .12 -.01 (43)

.25 .22 .03 (23)

.16 .16 .00 (66)

Single Establishment (0) No (multiple) [establishment N (%) = 421 (92%)] (1) Yes (single) [establishment N (%) = 38 ( 8%)] Difference between means (N)

.12 .20

-.08* (41)

.22 .34

-.12* (20)

.15 .24

-.09** (58)

Establishment Age (0) Less than 5 years [establishment N (%) = 34 ( 7%)] (1) 5 or more years [establishment N (%) = 425 (93%)] Difference between means (N)

.06 .12

-.06* (43)

.22 .19 .03 (15)

.10 .14 .04 (58)

Organisational Shocks (0) Low shocks [establishment N (%) = 240 (52%)] (1) High shocks [establishment N (%) = 219 (48%)] Difference between means (N)

.14 .09

.05* (44)

.25 .22 .03 (22)

.18 .13 .05* (66)

Both MR & RR are Senior Informants (0) No [establishment N (%) = 85 (19%)] (1) Yes [establishment N (%) = 374 (81%)] Difference between means (N)

.14 .11 .03 (43)

.20 .25 -.05 (21)

.16 .16 .00 (64)

Table 6 Continued – Aggregate level analysis: Bivariate test of rater agreement model – Comparison of mean correlations between MR & RR ratings for each independent variable by type of item

Mean Correlations between MR & RR Ratings for:

Independent Variables HRP

Items (N = 47)

Non-HRP Items

(N = 27)

All Items (N = 47)

Both MR & RR are Full-time (0) No [establishment N (%) = 336 (73%)] (1) Yes [establishment N (%) = 123 (27%)] Difference between means (N)

.10 .17

-.07*** (43)

.23 .19 .04 (23)

.14 .18 -.04 (66)

Frequency of MR & RR Contact (0) Low [establishment N (%) = 208 (45%)] (1) High [establishment N (%) = 251 (55%)] Difference between means (N)

.10 .12 -.02 (43)

.23 .26 -.03 (22)

.14 .17 -.03 (65)

Establishment has IiP Recognition (0) No [establishment N (%) = 194 (42%)] (1) Yes [establishment N (%) = 265 (58%)] Difference between means (N)

.11 .13 -.01 (44)

.23 .24 -.01 (24)

.15 .17 (68) -.02

MR & RR are of the Same Sex (0) No [establishment N (%) = 193 (42%)] (1) Yes [establishment N (%) = 266 (58%)] Difference between means (N)

.12 .12 .00 (44)

.28 .22 .06 (23)

.17 .15 .02 (67)

RR is Non-union Employee Representative (0) No [establishment N (%) = 430 (94%)] (1) Yes [establishment N (%) = 29 ( 6%)] Difference between means (N)

.10 .07 .03 (34)

.17 .36

-.19* (10)

.11 .14 -.03 (44)

MR & RR Report High Mutual Trust (0) No [establishment N (%) = 330 (72%)] (1) Yes [establishment N (%) = 129 (28%)] Difference between means (N)

.11 .12 -.01 (43)

.27 .29 -.02 (17)

.16 .17 -.01 (60)

MR = Management respondent RR = Employee representative respondent * p < .05 ** p < .01 *** p < .001

Table 7 – Establishment level analysis: Test of rater agreement model - Regression results for Rwg* by type of item and overall

1 2 3 4 5 6 7 8 Independent Variables

Rwg* Objective

HRP Items

Rwg* Semi-obj. HRP Items

Rwg* Objective

Events Items

Rwg* Semi-obj.

Events Items

Rwg* Subjective Attitude

Items

Rwg* HRP Items

Rwg* Non- HRP Items

Rwg* Total Items

Public sector

.06a -.14** -.06 .11* .06 -.13* .09 -.08

Establishment size

.15** .03 -.03 .04 -.02 .07 .02 .07

Single independent establishment

-.16** .11* -.10* .02 -.06 .07 -.04 .04

Establishment less than 5 years old

-.05 -.04 .08 .04 .09 -.05 .09* -.01

Organisational shocks index

.01 -.08 .04 .12* -.02 -.07 .10* -.02

Both MR & RR are senior informants

.01 .02 .09* -.05 .03 .02 .01 .03

Both MR & RR are full-time on HR

.03 .14** -.01 .05 -.02 .14** .03 .14**

High frequency of contact between MR & RR

.06 -.01 -.00 .01 -.02 .00 .00 .00

Establishment has IiP recognition

.02 .04 .02 .01 -.05 .04 -.01 .04

MR & RR are of the same sex

.02 -.01 -.08 -.02 .01 -.01 -.04 -.02

RR is a non-union employee representative

-.02 .09 .07 .08 .13** .08 .14** .13**

MR & RR report high mutual trust

.07 -.01 .14** -.01 .09 .01 .07 .04

Total R2

.07** .07** .07** .04 .04 .06** .05* .06**

Total F

2.72** 2.72** 2.70** 1.54 1.64 2.46** 1.99* 2.29**

Adjusted R2

.04** .04** .04** .01 .02 .04* .03* .03**

(N)

(459) (459) (459) (459) (459) (459) (459) (459)

MR = management respondent RR= Employee representative respondent a Standardised beta coefficients* p < .05 ** p < .01 *** p < .001

Table 8 – Test of interrater agreement model: Summary of aggregate level bivariate and establishment level multivariate results

Aggregate Level Reliability Analysis Observed Effects

Establish. Level Agreement Analysis

Observed Effects

Independent Variables Expected

Effect on Agreement/ Reliability

MR &RR Correlation HRP items

MR & RR Correlation Non-HRP

Items

MR & RR Correlation All Items

Rwg* HRP Items

Rwg* Non-HRP Items

Rwg* All

Items

Public Sector

Control - No predict.

- ** - ns + ns - * + ns - ns

Establishment Size

(-) Negative

+ ns - ns - ns + ns + ns + ns

Single Establishment

(+) Positive

+ * + * + ** + ns - ns + ns

Establishment < 5 years old

(-) Negative

- * + ns - ns - ns + * - ns

Organisational Shock Index

(-) Negative

- * - ns - * - ns + * -ns

MR & RR both Senior Informants

(+) Positive

- ns + ns + ns + ns + ns + ns

MR & RR Both Full-time

(+) Positive

+ *** - ns - ns + ** + ns + **

High Frequency of MR & RR Contact

(+) Positive

+ ns + ns + ns + ns + ns + ns

Establishment has IiP Recognition

(+) Positive

+ ns + ns + ns + ns - ns + ns

MR & RR of Same Sex

(+) Positive

- ns - ns - ns - ns - ns - ns

RR is Non-union Employee Rep.

(+) Positive

- ns + * + ns + ns + ** + **

MR & RR Report High Mutual Trust

(+) Positive

+ ns + ns +ns + ns + ns + ns

Total R2

NA.

NA.

NA.

NA.

.06**

. 05**

.06**

For detailed results see Table 2 and Table 3. MR = Management respondent RR = Employee representative respondent + = positive observed effect; - = negative observed effect ns p > .05 * p < .05 ** p < .01 *** p < .001

Conclusions

1. Interrater measurement error (i.e. error due to raters) in WERS2004 is not completely random. Rather, it is mildly patterned/predictable (e.g. it is greater for HRP items than for non-HRP items, greater in newer than in older establishments, etc.)

2. This patterning, however, is not very marked.

3. Therefore, treating rater error as random is not likely to significantly distort/affect coefficients of attenuation that might be applied to the WERS2004 data (e.g. to estimated correlation and regression coefficients).

4. In principle, therefore, it is likely to be acceptable to use average or overall ICC(1) and ICC(2) values to correct for attenuation in WERS2004 data/results.

Conclusions

5. But important to note two points in this context:

• First, more research is required to determine the extent of generalisability of the interrater reliability estimates (i.e. ICC(1) and ICC(2) values) obtained in the present study. The key question here is whether rater-related measurement error (and rater-related error structures more generally) is/are (a) rater-pair specific, (b) domain, sub-domain and item specific.

• Second, even if the present interrater reliability estimates turn out to be generalisable, correcting for attenuation in multiple independent variables (i.e. in complex multivariate models using large numbers of predictors and control variables) is likely to prove extremely difficult, if not impossible, to do in practice.

Conclusions

6. Implications of Results for:

(a) Analysis of WERS2004 data

• Correct for attenuation?

• Do nothing / business as usual?

• Other?

(b) Survey Design

• Increase number or raters/respondents?

• Have mixed designs?

• Other?

• Costs and benefits of different options?

random and non-random measurement error in hrm research: measuring and explaining differences in...

Documents

strategic hrm research

single key

hrm gerhart

single raters

cumulative research

problems of measurement

human resource hr practices

social research council