the reliability of selected motion

Upload: arinlun

Post on 02-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 The Reliability of Selected Motion

    1/8

    Manual Therapy 12 (2007) 7279

    Technical and measurement report

    The reliability of selected motion- and pain provocation tests

    for the sacroiliac joint

    Hilde Stendal Robinsona,, Jens Ivar Broxb, Roar Robinsonc, Elisabeth Bjellandc,Stein Solemc, Trym Teljec

    aSection for Health Science, University of Oslo, P.O. Box 1153, Blindern, N-0318 Oslo, NorwaybThe National Hospital, Orthopaedic Department, N-0027 Oslo, Norway

    cHans & Olaf Physiotherapy Clinic, Oslo, Norway

    Received 17 March 2004; received in revised form 3 June 2005; accepted 20 September 2005

    Abstract

    The objective of the study was to assess inter-rater reliability of one palpation and six pain provocation tests for pain of sacroiliac

    origin. The sacroiliac joint (SIJ) is a potential source of low back and pelvic girdle pain. Diagnosis is made primarily by physical

    examination using palpation and pain provocation tests. Previous studies on the reliability of such tests have reported inconclusive

    and conflicting results. Fifty-six women and five men aged 1850 years old were included in the study. Fifteen patients had

    ankylosing spondylitis; 30 women had post partum pelvic girdle pain for more than 6 weeks; and 16 people had no low back or

    pelvic girdle pain. All participants were examined twice on the same day by experienced manual therapists. Percentage agreement

    and kappa statistic were used to evaluate the tests reliability. Results showed percentage agreement and kappa values ranged from

    67% to 97% and 0.43 to 0.84 for the pain provocation tests. For the palpation test the percent agreement was 48% and the kappa

    value was 0.06. Clusters of pain provocation tests were found to have good percentage agreement, and kappa values ranged from

    0.51 to 0.75. In conclusion this study has shown the reliability of the pain provocation tests employed were moderate to good, andfor the palpation test, reliability was poor. Clusters out of three and five pain provocation tests were found to be reliable. The cluster

    of tests should now be validated for assessment of diagnostic power.

    r 2005 Elsevier Ltd. All rights reserved.

    Keywords: Sacroiliac joint; Pelvic girdle pain; Clinical tests; Reliability

    1. Introduction

    The sacroiliac joint (SIJ) as a source of back pain is a

    recurrent subject of controversy (Walker, 1992), but

    several authors state that the SIJ is a potential source forpain in the lumbar spine and buttock area (Potter and

    Rothstein, 1985; Shaw, 1992; Schwarzer et al., 1995;

    Mooney, 1997).

    The prevalence for SIJ dysfunction as a primary

    source of low back pain is reported from 0.4% ( Cyriax,

    1978) to 35% (Schwarzer et al., 1995) to 98% (Shaw,

    1992). This disparity is partly explained by the lack of

    valid criteria that prevalence can be judged by van der

    Wurff et al. (2000a). Although the SIJ is accepted as a

    source of pain, there is no general agreement concerning

    the different diagnostic tests and their reliability andvalidity. The primary diagnosis of sacroiliac pain is

    made by clinical history and physical examination.

    A wide variety of SIJ tests are available for detecting

    dysfunction, but no test seems to be superior to

    another. Many of the tests are influenced by various

    structures in the lower back, the hip joint and the soft

    tissues overlying the SIJ and consequently the tests lose

    their precision (Maigne et al., 1996). Furthermore,

    assessment and interpretation of the tests are often not

    ARTICLE IN PRESS

    www.elsevier.com/locate/math

    1356-689X/$ - see front matterr 2005 Elsevier Ltd. All rights reserved.

    doi:10.1016/j.math.2005.09.004

    Corresponding author. Tel.: +47 22 85 84 21; fax: +4722 85 84 11.

    E-mail address: [email protected] (H.S. Robinson).

    http://www.elsevier.com/locate/mathhttp://localhost/var/www/apps/conversion/tmp/scratch_2/dx.doi.org/10.1016/j.math.2005.09.004mailto:[email protected]:[email protected]://localhost/var/www/apps/conversion/tmp/scratch_2/dx.doi.org/10.1016/j.math.2005.09.004http://www.elsevier.com/locate/math
  • 8/10/2019 The Reliability of Selected Motion

    2/8

    standardized. However, it is necessary for test results to

    be both valid and reliable, since reliability alone is not

    sufficient to support the quality of a diagnostic test (van

    der Wurff et al., 2000b).

    Several studies have assessed inter-examiner reliability

    of tests for SIJ pain and dysfunction. These tests are

    divided into those that assess movement or position bypalpation (palpation tests) and those that stress the

    structure to reproduce the patients symptoms (pain

    provocation tests) (Laslett and Williams, 1994).

    Previous studies have reported that reliability is low

    for palpation tests and from poor to excellent for pain

    provocation tests (Potter and Rothstein, 1985; Laslett

    and Williams, 1994;Strender et al., 1997).

    A single test might not be sufficient for diagnosing SIJ

    pain and some authors have suggested the use of a

    cluster of tests (Haas, 1991; Cibulka and Koldehoff,

    1999; Kokmeyer et al., 2002; Riddle and Freburger,

    2002). Others doubt this will be better because it will be

    possible for two observers to disagree on single tests, yet

    be in agreement on the final conclusion (van der Wurff

    et al., 2000a; Freburger and Riddle, 2001). There must

    be an assumption that all the tests used in a cluster

    should have acceptable reliability.

    Van der Wurff et al. (2000a, b) published a review

    article on reliability studies. They found that the

    majority of studies (six out of 11) reported that the

    reliability of SIJ tests for mobility and pain provocation

    was low. There was a tendency for positive conclusions

    to be inversely proportional to the methodological score

    (van der Wurff et al., 2000a). Their recommendation for

    future studies was that the population should include acontrol group without SIJ dysfunction together with a

    group of patients with presumed SIJ dysfunction (van

    der Wurff et al., 2000b).

    Because of the inconclusive and conflicting results in

    previous studies of tests for SIJ pain, our study aimed to

    assess the inter-examiner reliability for seven commonly

    used clinical tests. We also included tests not previously

    evaluated. We included only one palpation test because

    previous studies have reported their poor reliability. To

    avoid confusion and discrepancies in the testing

    methods, we used standardized operational definitions

    and the tests were performed by experienced phy-

    siotherapists who are specialized in manual therapy

    (MT).

    2. Methods

    After approval by the Committee for Medical

    Research Ethics, Health Region I in Norway, subjects

    for participation were recruited consecutively from

    family physicians, rheumatologists, physiotherapists,

    obstetricians and midwives. The study included 61

    participants; women and men aged 1850 years old.

    The first group of participants were recruited among

    patients with ankylosing spondylitis (AS). They could

    have no ankylosis in the pelvic area and be without

    obvious kyphosis; the second group of patients were

    recruited among women with post partum pelvic girdle

    pain for more than 6 weeks; and the control group were

    healthy subjects with no low back pain, pelvic girdlepain or hip pain for the previous 3 months. The first and

    second groups were selected to increase the chance that

    the origin of their pain could be the SIJ.

    The following data was collected for each participant:

    age, gender, diagnosis, number of childbirths (groups 2

    and 3), duration of symptoms, former back pain, pain

    location using a pain drawing and pain intensity by

    using a visual analogue scale (VAS).

    The physiotherapists in this study, specialized in MT,

    had an average of 5.8 years of work experience after

    completing their MT education.

    2.1. Testing procedure

    Every participant was examined by two physiothera-

    pists, with a break of 1 h between examinations. One of

    the therapists examined all the participants; the second

    therapist was randomly selected from a pool of three

    therapists based on their availability for a given

    participant. The second assessor examined the partici-

    pant first or second according to randomisation. Both

    assessors examined each participant and both performed

    all the included tests. The therapists were blinded for the

    participants diagnosis, their pain history and pain

    drawing. The two assessors were also blinded to theresults of the other. In clinical practice it is important to

    reproduce the patients pain in an examination in order

    to determine the pain origin. In this study we have tried

    to take this into account, by asking the participant not

    only if they felt pain during testing but also if they could

    recognize the pain. During the examination we did not

    ask about pain localisation.

    For the pain provocation tests, the results were

    classified using a specially designed examination form

    recording pain in three categories. They were: concor-

    dant pain (reproduction of the patients pain), discor-

    dant pain (pain different from patients pain) and no

    pain. The results for the mobility test were also classified

    in three categories: (1) movement in right SIJ greater

    than movement in left SIJ; (2) movement in right SIJ less

    than movement in left SIJ and; (3) movement in right

    SIJ equal to movement in left SIJ. After each examina-

    tion the form was completed, placed in an envelope and

    sealed to avoid bias.

    2.2. Tests

    Tests 15 are passive pain provocation tests. Test 6 is

    a palpation test. Test 7 is an active pain provocation

    ARTICLE IN PRESS

    H.S. Robinson et al. / Manual Therapy 12 (2007) 7279 73

  • 8/10/2019 The Reliability of Selected Motion

    3/8

    test. The latter means that the provocation procedure is

    generated by the participants themselves:

    (1) Compression test: The subject lies on their side and

    the examiner stands in front of the subject. Pressure

    is applied into the pelvis by the examiner leaning his

    chest against the uppermost iliac crest. The test isassumed to stretch the posterior sacroiliac ligaments

    or compress the anterior part of the SIJ (Fig. 1).

    (2) Distraction or gapping test: The subject lies supine

    close to the side of the table. The examiner stands at

    the side of the table facing the subject. The examiner

    applies cross-armed pressure to the anterior superior

    iliac spines (ASIS) directed laterally. This procedure

    is assumed to stretch the anterior sacroiliac liga-

    ments.

    (3) Posterior pelvic pain provocation test (P4): The

    subject lies supine on the table. Standing at the side

    of the table, the examiner flexes the ipsilateral leg to

    approximately 901 hip flexion while knee remains

    relaxed. The examiner applies a graded force

    through the long axis of femur, there by causing a

    posterior shearing stress in the SIJ. Excessive

    adduction of the hip is avoided, because the

    combination of flexion and adduction normally is

    uncomfortable or painful to the patient.

    (4) Patrick Faber test: The subject lies supine on the

    table, and the examiner stands next to the subject.

    The examiner brings the subjects ipsilateral hip and

    knee into flexion and puts the heel against the knee

    of the other leg and then fixates the contralateral

    ASIS to ensure the lower back stays in a neutral

    position. The ipsilateral leg is then lowered against

    the table and the examiner applies a light over-

    pressure to the subjects knee. It is assumed thatboth the anterior sacroiliac ligament and the hip

    joint are stressed.

    (5) Bilateral internal rotation of the hip and unilateral

    internal rotation of the hip (Fig. 2): The subject lies

    prone on the table. In both tests the examiner

    internally rotates the femur at the hip joint and when

    reaching the end of the movement it is assumed that

    the SIJ is stressed.

    (6) Test of joint-play (palpation test): The subject lies

    prone. When testing the right SIJ the examiner

    stands by the left side of the table facing the

    subject. The examiners right hand lifts the right

    ilium (the hand under anterior spina iliaca

    superior) and the left index finger palpates the

    movement between the sacrum and ilium

    with the heel of that hand stabilising the sacrum.

    This test is assumed to test the passive range of

    motion (ROM) between the ilium and the sacrum

    (Magee, 1997).

    (7) Drop-test: This is an active pain provocation test.

    The subject performs a standardized motion which

    ARTICLE IN PRESS

    Fig. 1. Compression test.

    H.S. Robinson et al. / Manual Therapy 12 (2007) 727974

  • 8/10/2019 The Reliability of Selected Motion

    4/8

    may provoke the pain. Standing on one foot, the

    subject lifts the heel from the floor and drops down

    on the heel again (Fig. 3).

    Before the study started, the physiotherapists had

    training sessions to make sure that the tests were

    assessed and interpreted in the same manner.

    2.3. Statistical analysis

    Statistical analysis of the data was performed using

    the SPSS statistical package version 11.0. For each SIJ

    test on each side, the percentage agreement between the

    two therapists and the kappa agreement coefficient (k)

    with 95% confidence interval were calculated. The

    kappa coefficient effectively discounts the proportion

    of agreement that is expected by chance. K ranges in

    values between 1 and +1. Positive values signify

    agreement better than chance; 0 denotes agreement no

    better than chance; and negative value signify agreement

    worse than chance (Altman, 1991).

    We also looked at two clusters including the pain

    provocation tests with the best kappa values, and

    calculated percentage agreement and kappa for the

    clusters for each side separately. The following five tests

    were included in cluster 1: the distraction test, P4,

    PatrickFaber, bilateral internal rotation and one-sided

    internal rotation. We calculated a sum score for

    examination 1 and for examination 2 for each side

    separately. Maximum score was 5 on each side (five

    positive tests). Agreement on a case definition was based

    on at least three/four positive tests out of five. Cluster 2

    included 3 tests (P4, PatrickFaber and internal rota-

    tion) and case definition was two positive tests out of

    three. The sum score was calculated as for cluster 1, with

    a maximum score 3.

    3. Results

    Sixty-one people (56 women and five men) with a

    mean age 31.6 years participated in the study. Fifteen

    patients had AS and low back pain. They had no

    ankylosis in the pelvic area and were without obvious

    kyphosis; 30 women had post partum pelvic girdle pain

    for more than 6 weeks; and 16 subjects (the control

    group) did not report low back pain, pelvic girdle pain

    or hip pain for the previous 3 months. Table 1presents

    the demographic data for the participants. The patients

    in the AS group (15) and post partum group (30) had

    experienced their symptoms for a median period of 82

    and 11 months, respectively. All the patients had

    reported pain in the pelvic and low back region on pain

    drawings. All participants were examined twice, and

    there were no dropouts. One participant was excluded

    due to language problems.

    The percentage agreement for each pain provocation

    test was between 67% and 97% and the kappa

    coefficient from moderate to good with k0:4320:84.

    ARTICLE IN PRESS

    Fig. 2. Bilateral internal rotation of the hip joint.

    H.S. Robinson et al. / Manual Therapy 12 (2007) 7279 75

  • 8/10/2019 The Reliability of Selected Motion

    5/8

    The percentage agreement was 48% and the kappa was

    0.06 for the joint-play test.

    When we merged discordant pain and no pain, and

    repeated the analysis, the percentage agreement in-

    creased to between 74% and 97% and kappa also

    increased for all tests to k 0:4820:88. The exceptions

    were compression test right side and the drop test

    (Table 2).

    For the case definition (agreement on at least three

    out of five tests) in cluster 1, it was 7690% agreement

    and kappa was between 0.51 and 0.75. For cluster 2 it

    was 8991% agreement and kappa between 0.69 and0.75 (Table 3). The healthy controls were correctly

    identified by both clusters.

    4. Discussion

    The reliability of the pain provocation tests in this

    study is acceptable. Reliability can be influenced by

    three factors: the participants, the therapists and the

    clinical tests. The participants were recruited from

    patients seen by doctors, physiotherapists and midwives,

    and can be seen as a consecutive sample. The observed

    difference concerning sex and age in the groups is

    considered of little importance. We attempted to have

    only women in the AS group, but we were not able to

    recruit the desired number of women with AS. The

    overall prevalence of AS is reported from 0.1% to 1.4%.

    The ratio between men and women is reported 23:1

    (Gran and Husby, 1993, 1998). According to a recent

    study there is no important clinical or radiological

    gender difference reported (Ryall and Helliwell, 1998).

    The control group has a matched number of male

    participants. The AS patients selected had no ankylosis

    in the pelvic area and they had no obvious kyphosis to

    disturb the blinding of the therapists.While preparing the study we discussed how to

    interpret the participants responses to the tests. Did

    the participant recognize the pain (concordant), or was

    it another pain (discordant)? This is similar to what we

    are doing in clinical practice. We are attempting to

    reproduce the patients actual pain on examination. Still

    the subjective nature of pain may confound the

    ARTICLE IN PRESS

    Table 1

    Demographic data

    Demographic data Ankylosing spondylitis n 15 Post partum pelvic painn 30 No back painn 16

    Age, mean (range) 34.2 (2550) 31.2 (2543) 29.6 (2434)

    Women (%) 12 (80) 30 (100) 14 (87)

    Parity 1 17 10

    2 11 1

    3 2 0

    Age in months, youngest child, mean (range) 6.3 (1.520) 5.7 (2.512)

    Symptoms in earlier pregnancies (%) 6 (20) 1 (6.25)

    Previous back pain (%) 14 (93) 15 (50) 6 (37.5)

    Pain drawing in SIJ area (%) 12 (80) 22 (73) 0 (0)

    Low back/pelvic area on pain drawing (%) 15 (100) 30 (100) 1 (6.25)

    Period of pain in months, mean (range) 82 (6252) 11 (424) 0 (0)

    Mean of VAS in mm (SD) 47 (20.1) 36 (21.4) 0 (0)

    Number of areas on pain drawing, mean (range) 14.6 (638) 7 (118) 1.5 (09)

    Fig. 3. Drop-test.

    H.S. Robinson et al. / Manual Therapy 12 (2007) 727976

  • 8/10/2019 The Reliability of Selected Motion

    6/8

    interpretation of the test results (Saal, 2002). The

    research setting, with its restrictions, made it difficult

    to communicate about the patients pain. This may also

    have influenced the interpretation and thereby the

    reliability.

    The order of the therapists on examination was

    randomized in the present study and therefore did notinfluence the results. None of the pairings of therapists

    did better than the others, but one pair had inferior

    results on two tests. They disagreed more on the

    PatrickFaber test and on the compression test right

    side. These disagreements most likely contributed to a

    lower kappa on these tests compared with the others.

    During the training sessions prior to the study, there was

    no disagreement on performance and interpretation of

    the tests. The most obvious reason for this difference is

    that this couple examined fewer participants than the

    other pairings.

    When discordant pain and no pain were merged into

    no pain, the percentage agreement and kappa increased

    markedly for every test except for the compression test

    right side and the drop test. For the drop test the

    number of patients reporting pain was small compared

    with the other tests. For the compression test on right

    side none of the participants reported discordant pain.

    This most likely explains the difference for the two tests.

    Our findings are in agreement with other studies

    reporting acceptable reliability for the P4 test (Laslett

    and Williams, 1994; Dreyfuss et al., 1996) and we

    confirm the high reliability reported by Ostgaard et al.

    (1994). We also found acceptable reliability for thecompression and distraction test. Previous studies have

    reported divergent results ranging from poor to

    acceptable for these tests. McCombe et al. (1989)

    reported poor reliability for both compression and

    distraction test, most likely because of differences in

    examination technique (McCombe et al., 1989). The side

    differences and the wide confidence intervals reported

    for the compression test and drop test in the present

    study suggest that the estimated kappa values of the best

    side for these tests are not good indicators of reliability.

    Therefore, an alternative interpretation of the variation

    reported is that these tests do not have acceptable

    reliability. Strender et al. (1997) conclude that the

    compression test is not reliable (Strender et al., 1997),

    whileLaslett and Williams (1994)report good reliability

    for both tests, with an agreement of 88% and kbetween

    0.69 and 0.73 (Laslett and Williams, 1994).

    In former studies most researchers have used a

    population of back patients when evaluating SIJ tests

    (Potter and Rothstein, 1985;Herzog et al., 1989;Dreyfuss

    et al., 1994; Laslett and Williams, 1994; Schwarzer et al.,

    1995; Maigne et al., 1996; Laslett, 1997; Broadhurst and

    Bond, 1998; Slipman et al., 1998; Cibulka and Koldehoff,

    1999; Freburger and Riddle, 1999; Levangie, 1999;

    Meijne et al., 1999). Some researchers have studied thetests in pregnant or postpartum women (Ostgaard et al.,

    1994; Wormslev et al., 1994; Kristiansson and Svardsudd,

    1996; Albert et al., 2000; Mens et al., 2001, 2002). In the

    present study we included two patient groups (patients with

    AS and women with post partum pelvic girdle pain)

    together with a healthy control group, to increase the

    chance that SIJ might be the origin for pain. This is in

    accordance with former recommendations for studies on

    SIJ tests (van der Wurff et al., 2000b).

    ARTICLE IN PRESS

    Table 2

    Inter-examiner agreement on SIJ tests when no pain and discordant

    pain is merged

    n 61 % Agreement Kappa 95% CI

    Compression test right side 82 0.48 0.180.78

    Compression test left side 88 0.67 0.430.91Distraction test 82 0.63 0.430.83

    P4 right side 84 0.76 0.480.86

    P4 left side 87 0.74 0.570.91

    PatrickFaber test right side 80 0.60 0.390.81

    PatrickFaber test left side 74 0.48 0.270.69

    Bilateral internal rotation 79 0.56 0.330.79

    Internal rotation right side 90 0.78 0.600.94

    Internal rotation left side 89 0.88 0.751.01

    Drop test right side 97 0.84 0.611.06

    Drop test left side 88 0.47 0.110.83

    Table 3

    Cluster reliability results

    Case definition n 61 n 45 (only patients)

    % Agreement Kappa 95% CI % Agreement Kappa 95% CI

    Cluster 1 right side 3 of 5 80 0.60 0.400.80 76 0.51 0.260.76

    Cluster 1 left side 3 of 5 85 0.69 0.510.87 80 0.60 0.360.84

    Cluster 1 right side 4 of 5 90 0.71 0.490.93 87 0.68 0.440.92

    Cluster 1 left side 4 of 5 90 0.75 0.560.94 87 0.71 0.490.93

    Cluster 2 right side 2 of 3 91 0.71 0.470.95 89 0.69 0.440.94

    Cluster 2 left side 2 of 3 91 0.75 0.540.96 89 0.72 0.490.95

    Cluster 1: Distraction test, P4 left or right, PatrickFaber left or right, bilateral internal rotation, internal rotation left or right. Cluster 2: P4,

    PatrickFaber and internal rotation left or right, respectively.

    H.S. Robinson et al. / Manual Therapy 12 (2007) 7279 77

  • 8/10/2019 The Reliability of Selected Motion

    7/8

    We have chosen to present test results for the left and

    right side separately, in contrast to what other authors

    have done. We have used the statistically recommended

    method because the tests on the left and right sides are

    not independent in an individual patient (Altman, 1991).

    Although some authors have suggested that motion

    and position palpation tests are reliable, (Herzog et al.,1989; Cibulka and Koldehoff, 1999) there is little

    evidence to support this view. In our study kappa was

    negative for the joint play test (0.06). This is in

    agreement with previous studies from Potter and

    Rothstein (1985) and van Deursen et al. (1990)

    demonstrating poor reliability for palpation tests,

    although they did not use the same test (Potter and

    Rothstein, 1985;van Deursen et al., 1990).

    In recent years several authors have recommended a

    multi-test regimen for evaluation of SIJ pain (Haas,

    1991; Cibulka and Koldehoff, 1999; Kokmeyer et al.,

    2002; Riddle and Freburger, 2002). Cibulka and

    Koldehoff (1999)used only palpation tests and classified

    the tests as positive and negative, regardless of what side

    the SIJ dysfunction was (Cibulka and Koldehoff, 1999).

    As previously discussed it is recommended to calculate

    kappa for each side (Altman, 1991). Riddle and

    Freburger (2002) used the same four tests as Cibulka,

    but concluded differently. They found poor reliability

    for the cluster, with a slightly higher kappa on single

    tests (Riddle and Freburger, 2002).

    Palpation is considered an important tool in MT, but

    it seems difficult to gain reliable inter-examiner agree-

    ment. A valid diagnostic test procedure requires reliable

    tests. (Herzog et al., 1989; van der Wurff et al.,2000a, b). Other factors influence validity, but the aim

    of the present study was to evaluate reliability.

    Kokmeyer et al. (2002)used a regimen of five SIJ pain

    provocation tests, and the threshold for a positive

    selection was set at three positive tests. Using this

    definition, only seven out of 78 subjects had positive

    tests identified by both examiners. Weighted kappa was

    0.70 (Kokmeyer et al., 2002). They also calculated

    kappa for each individual test (distraction test, compres-

    sion test, Gaenslen test, Patrick test and Thigh thrust

    (P4) test) and reported kappa between 0.46 and 0.67,

    which is in agreement with our results on single tests.

    Kokmeyer et al. (2002) used a cluster to make a

    diagnosis in each patient (Kokmeyer et al., 2002). In

    the present study the main purpose was to examine the

    reliability for single tests and no diagnoses were made by

    the examiners. When we calculated kappa for a cluster

    of tests, we looked at each side separately and the

    agreement on case definition was based on three positive

    tests out of five in cluster 1. This is in accordance with

    what clinicians do in their daily practice when diagnostic

    decisions are judged from the results of several tests.

    Kappa was good (0.600.75) (Altman, 1991), which is in

    accordance with the results from Kokmeyer et al.

    (2002). The good kappa results should be linked to the

    case definition and the number of cases. When three

    positive tests out of five were required we found

    agreement on 18 and 20 cases for the right and left

    sides, respectively. This is a higher number of cases than

    in former studies, probably because of the selected

    population. We also looked at a case definition of fourpositive tests out of five for cluster 1, and found kappa

    to be good also for this definition (0.680.75) (Altman,

    1991). The difference of clusters 1 and 2 is that the latter

    consists of three one-sided tests only (P4, PatrickFaber

    and internal rotation) and case definition was set on two

    positive tests out of three in cluster 2. Kappa was also

    good for this cluster (0.690.75) (Altman, 1991). We

    found agreement on case definition for 8 and 10 cases on

    right and left sides, respectively, when using cluster 2.

    This is fewer than when using cluster 1. The asympto-

    matic individuals were correctly identified by both

    clusters. The main difference when using the two clusters

    is on the number of cases identified.

    The validity of several tests included some of the tests

    included in this present study have been reported

    previously (Dreyfuss et al., 1996; Slipman et al., 1998).

    There is still no agreement about the gold standard for

    SIJ pain. Anaesthetic blocks have been used, but seem to

    investigate intra-articular sources of pain, thus neglect-

    ing the structures surrounding the SIJ (van der Wurff

    et al., 2000b). The pain provocation tests claim to load

    the whole SIJ complex including the surrounding

    structures. If this is the case such tests should be

    classified as provocation tests for pelvic girdle pain, but

    a gold standard which takes this into consideration isstill not available (van der Wurff et al., 2000b;

    Kokmeyer et al., 2002). Our aim was to assess reliability.

    Results were good for a few individual tests and for a

    medium and a small cluster of tests, but poor for the

    commonly used palpation tests. Future studies should

    assess the validity of individual tests and especially

    clusters of tests. In the present study the examiners were

    physiotherapists experienced in MT. They were working

    in the same setting, attempting to optimize agreement by

    practicing the skills of the required procedures. Thus,

    less agreement is expected in an ordinary clinical setting

    or between various medical specialists and procedures.

    5. Conclusion

    Among experienced therapists, reliability was moder-

    ate to good for all the pain provocation tests and poor

    for the palpation test (joint play). For the compression

    test and the drop test the confidence intervals were wider

    and the side difference larger than for the other tests. In

    clinical practice we usually make conclusions based on

    the results of several tests. The clusters of three and five

    pain provocation tests used in this study showed good

    ARTICLE IN PRESS

    H.S. Robinson et al. / Manual Therapy 12 (2007) 727978

  • 8/10/2019 The Reliability of Selected Motion

    8/8

    reliability, although further studies are needed to assess

    their validity.

    Acknowledgment

    The authors would like to thank Professor NinaVollestad at Section for Health Science, University of

    Oslo for valuable advice on the statistics in this study.

    References

    Altman DG. Practical statistics for medical research. London: Chap-

    man & Hall; 1991.

    Albert H, Godskesen M, Westergaard J. Evaluation of clinical tests

    used in classification procedures in pregnancy-related pelvic joint

    pain. European Spine Journal 2000;9(2):1616.

    Broadhurst NA, Bond MJ. Pain provocation tests for the assessment

    of sacroiliac joint dysfunction [see comments]. Journal of Spinal

    Disorders 1998;11(4):3415.

    Cibulka MT, Koldehoff R. Clinical usefulness of a cluster of sacroiliac

    joint tests in patients with and without low back pain. Journal of

    Orthopaedic and Sports Physical Therapy 1999;29(2):839.

    Cyriax J. Textbook of orthopaedic medicine, 7th ed. London: Cassell

    Ltd; 1978.

    Dreyfuss P, Dryer S, Griffin J, Hoffman J, Walsh N. Positive sacroiliac

    screening tests in asymptomatic adults. Spine 1994;19(10):113843.

    Dreyfuss P, Michaelsen M, Pauza K, McLarty J, Bogduk N. The value

    of medical history and physical examination in diagnosing

    sacroiliac joint pain [see comments]. Spine 1996;21(22):2594602.

    Freburger JK, Riddle DL. Measurement of sacroiliac joint dysfunc-

    tion: a multicenter intertester reliability study. Physical Therapy

    1999;79(12):113441.

    Freburger JK, Riddle DL. Using published evidence to guide the

    examination of the sacroiliac joint region. Physical Therapy2001;81(5):113543.

    Gran JT, Husby G. The epidemiology of ankylosing spondylitis.

    Seminars in Arthritis and Rheumatism 1993;22(5):31934.

    Gran JT, Husby G. Clinical, epidemiologic, and therapeutic aspects of

    ankylosing spondylitis. Current Opinion in Rheumatology

    1998;10(4):2928.

    Haas M. Interexaminer reliability for multiple diagnostic test regi-

    mens. Journal of Manipulative and Physiological Therapeutics

    1991;14(2):95103.

    Herzog W, Read LJ, Conway PJ, Shaw LD, McEwen MC. Reliability

    of motion palpation procedures to detect sacroiliac joint fixations.

    Journal of Manipulative and Physiological Therapeutics

    1989;12(2):8692.

    Kokmeyer DJ, van der Wurff P, Aufdemkampe G, Fickenscher TC.

    The reliability of multitest regimens with sacroiliac pain provoca-tion tests. Journal of Manipulative and Physiological Therapeutics

    2002;25(1):428.

    Kristiansson P, Svardsudd K. Discriminatory power of tests applied in

    back pain during pregnancy. Spine 1996;21(20):233743.

    Laslett M. Pain provocation sacroiliac tests: reliability and prevalence.

    In: Vleeming A, et al., editors. Movement, stability & low back

    pain. The essential role of the pelvis. 1st ed. Edinburgh: Churchill

    Livingstone; 1997. p. 28795.

    Laslett M, Williams M. The reliability of selected pain provocation

    tests for sacroiliac joint pathology. Spine 1994;19(11):12439.

    Levangie PK. Four clinical tests of sacroiliac joint dysfunction: the

    association of test results with innominate torsion among patients

    with and without low back pain. Physical Therapy 1999;79(11):

    104357.

    Magee DJ. Orthopaedic physical assessment. 3rd ed. Philadelphia:

    W.B. Saunders Company; 1997.

    Maigne JY, Aivaliklis A, Pfefer F. Results of sacroiliac joint double

    block and value of sacroiliac pain provocation tests in 54 patients

    with low back pain. Spine 1996;21(16):188992.

    McCombe PF, Fairbank JC, Cockersole BC, Pynsent PB. Volvo

    Award in clinical sciences. Reproducibility of physical signs in low-

    back pain. Spine 1989;14(9):90818.Meijne W, van Neerbos K, Aufdemkampe G, van der WP.

    Intraexaminer and interexaminer reliability of the Gillet test.

    Journal of Manipulative and Physiological Therapeutics

    1999;22(1):49.

    Mens JM, Vleeming A, Snijders CJ, Koes BW, Stam HJ. Reliability

    and validity of the active straight leg raise test in posterior pelvic

    pain since pregnancy. Spine 2001;26(10):116771.

    Mens JM, Vleeming A, Snijders CJ, Ronchetti I, Stam HJ. Reliability

    and validity of hip adduction strength to measure disease severity

    in posterior pelvic pain since pregnancy. Spine 2002;27(15):16749.

    Mooney V. Sacroiliac joint dysfunction. In: Vleeming A, Mooney V,

    Dorman T, Snijders C, Stoeckart R, editors. Movement, stability &

    low back pain. The essential role of the pelvis. 2nd ed. Edinburgh:

    Churchill Livingstone; 1997. p. 3752.

    Ostgaard HC, Zetherstrom G, Roos-Hansson E. The posterior pelvicpain provocation test in pregnant women. European Spine Journal

    1994;3(5):25860.

    Potter NA, Rothstein JM. Intertester reliability for selected clinical

    tests of the sacroiliac joint. Physical Therapy 1985;65(11):16715.

    Riddle DL, Freburger JK. Evaluation of the presence of sacroiliac

    joint region dysfunction using a c ombination of tes ts: a multicenter

    intertester reliability study. Physical Therapy 2002;82(8):77281.

    Ryall NH, Helliwell PS. A critical review of ankylosing spondylitis.

    Critical Reviews in Physical and Rehabilitation Medicine

    1998;10(3):265301.

    Saal JS. General principles of diagnostic testing as related to painful

    lumbar spine disorders: a critical appraisal of current diagnostic

    techniques. Spine 2002;27(22):253845.

    Schwarzer AC, Aprill CN, Bogduk N. The sacroiliac joint in chronic

    low back pain. Spine 1995;20(1):317.

    Shaw JL. The role of the sacroiliac joint as a cause of low back pain

    and dysfunction. In: Vleeming A, Mooney V, Snijders C, Dorman

    T, editors. First interdisciplinary world congress on low back pain

    and its relation to the sacroiliac joint, San Diego, CA, 1992.

    p. 6780.

    Slipman CW, Sterenfeld EB, Chou LH, Herzog R, Vresilovic E. The

    predictive value of provocative sacroiliac joint stress maneuvers in

    the diagnosis of sacroiliac joint syndrome. Archives of Physical

    Medicine and Rehabilitation 1998;79(3):28892.

    Strender LE, Sjoblom A, Sundell K, Ludwig R, Taube A. Inter-

    examiner reliability in physical examination of patients with low

    back pain. Spine 1997;22(7):81420.

    van der Wurff P, Hagmeijer RH, Meyne W. Clinical tests of the

    sacroiliac joint. A systemic methodological review. Part 1:

    reliability. Manual Therapy 2000a;5(1):306.

    van der Wurff P, Meyne W, Hagmeijer RH. Clinical tests of the

    sacroiliac joint. Manual Therapy 2000b;5(2):8996.

    van Deursen LLJM, Patijn AL, Ockhuysen AL, Vortman BJ. The

    Value of some clinical tests of the sacroiliac joint. Journal of

    Manual Medicine 1990;5:969.

    Walker JM. The sacroiliac joint: a critical review. Physical Therapy

    1992;72(12):90316.

    Wormslev M, Juul AM, Marques B, Minck H, Bentzen L, Hansen

    TM. Clinical examination of pelvic insufficiency during pregnancy.

    An evaluation of the interobserver variation, the relation between

    clinical signs and pain and the relation between clinical signs and

    physical disability. Scandinavian Journal of Rheumatology

    1994;23(2):96102.

    ARTICLE IN PRESS

    H.S. Robinson et al. / Manual Therapy 12 (2007) 7279 79