name of department: spine unit, department of orthopedic ...4 abbreviations and concepts agreement...

78
1 Name of department: Spine Unit, Department of Orthopedic Surgery, Rigshospitalet, Copenhagen University Hospital Blegdamsvej 9, 2100, Copenhagen, Denmark Author: Tanvir Johanning Bari Title and subtitle: Radiographic parameters in Adult Spinal Deformity Reproducibility and prediction of failure Academic advisors: Principal supervisor Associate professor Martin Gehrchen, MD, PhD, Head of Spine Unit Spine Unit, Department of Orthopedic Surgery, Rigshospitalet, Copenhagen University Hospital Primary co-supervisor Professor Benny Dahl, MD, PhD, DMSc, Director of Spine Program Department of Orthopedics and Scoliosis Surgery, Texas Children’s Hospital & Baylor College of Medicine, Houston, TX, USA Co-supervisor Associate Professor Rachid Bech-Azeddine, MD, PhD, Center for Rheumatology and Spine Diseases, Rigshospitalet Glostrup, Copenhagen University Hospital Co-supervisor Dennis Winge Hallager, MD, PhD, Spine Unit, Department of Orthopedic Surgery, Rigshospitalet, Copenhagen University Hospital Assessment committee: Chairman Professor Jes Bruun Lauritsen, MD, PhD, DMSc, Department of Orthopedic Surgery, Bispebjerg Hospital, Copenhagen University Hospital Assessor representing International research Associate Professor, Dominique André Rothenfluh, MD, PhD, Consultant Spinal Surgeon, Senior Research Fellow, Oxford University Hospitals NHS Foundation Trust Assessor representing Danish research Associate Professor Stig Mindedahl Jespersen, MD, PhD, Department of Orthopedic Surgery, Odense University Hospital

Upload: others

Post on 27-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

1

Name of department: Spine Unit, Department of Orthopedic Surgery, Rigshospitalet,

Copenhagen University Hospital

Blegdamsvej 9, 2100, Copenhagen, Denmark

Author: Tanvir Johanning Bari

Title and subtitle: Radiographic parameters in Adult Spinal Deformity

– Reproducibility and prediction of failure

Academic advisors: Principal supervisor

Associate professor Martin Gehrchen, MD, PhD, Head of Spine Unit

Spine Unit, Department of Orthopedic Surgery, Rigshospitalet,

Copenhagen University Hospital

Primary co-supervisor

Professor Benny Dahl, MD, PhD, DMSc, Director of Spine Program

Department of Orthopedics and Scoliosis Surgery, Texas Children’s

Hospital & Baylor College of Medicine, Houston, TX, USA

Co-supervisor

Associate Professor Rachid Bech-Azeddine, MD, PhD,

Center for Rheumatology and Spine Diseases,

Rigshospitalet Glostrup, Copenhagen University Hospital

Co-supervisor

Dennis Winge Hallager, MD, PhD,

Spine Unit, Department of Orthopedic Surgery, Rigshospitalet,

Copenhagen University Hospital

Assessment committee: Chairman

Professor Jes Bruun Lauritsen, MD, PhD, DMSc,

Department of Orthopedic Surgery, Bispebjerg Hospital,

Copenhagen University Hospital

Assessor representing International research

Associate Professor, Dominique André Rothenfluh, MD, PhD,

Consultant Spinal Surgeon, Senior Research Fellow,

Oxford University Hospitals NHS Foundation Trust

Assessor representing Danish research

Associate Professor Stig Mindedahl Jespersen, MD, PhD,

Department of Orthopedic Surgery, Odense University Hospital

Page 2: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

2

Table of contents

ACKNOWLEDGMENTS ................................................................................................................. 3

ABBREVIATIONS AND CONCEPTS ............................................................................................ 4

SUMMARY ........................................................................................................................................ 6

DANSK RESUMÉ.............................................................................................................................. 8

LIST OF PAPERS ........................................................................................................................... 10

INTRODUCTION ............................................................................................................................ 11

BACKGROUND .............................................................................................................................. 12 Sagittal balance ...................................................................................................................................................... 12 Compensating for imbalance ................................................................................................................................. 12 Treatment............................................................................................................................................................... 15 Complications and Adverse events ........................................................................................................................ 15 Classification systems ........................................................................................................................................... 15 Reproducibility ...................................................................................................................................................... 17

OBJECTIVES AND HYPOTHESES............................................................................................. 19 Study I: Roussouly Classification .......................................................................................................................... 19 Study II: Spinopelvic parameters .......................................................................................................................... 19 Study III: GAP-score ............................................................................................................................................. 19

METHODS ....................................................................................................................................... 20 Overall study design .............................................................................................................................................. 20 Patient inclusion .................................................................................................................................................... 20 Data collection ....................................................................................................................................................... 21 Radiographic measurements .................................................................................................................................. 21 Reproducibility measurements .............................................................................................................................. 21 Outcomes ............................................................................................................................................................... 21 Statistics ................................................................................................................................................................ 22

RESULTS ......................................................................................................................................... 24 Study I ................................................................................................................................................................... 24 Study II .................................................................................................................................................................. 26 Study III ................................................................................................................................................................. 28

DISCUSSION ................................................................................................................................... 30 The Roussouly Classification ................................................................................................................................ 30 Measurement accuracy of sagittal radiographic parameters .................................................................................. 31 Predicting mechanical failure using the GAP-score .............................................................................................. 33

LIMITATIONS ................................................................................................................................ 35

PERSPECTIVES AND FUTURE RESEARCH ........................................................................... 37

CONCLUDING REMARKS .......................................................................................................... 39

REFERENCES ................................................................................................................................. 40

APPENDICES .................................................................................................................................. 51 Study I ................................................................................................................................................................... 51 Study II .................................................................................................................................................................. 59 Study III ................................................................................................................................................................. 68 Appendix 4 - Specific statistical considerations .................................................................................................... 76

Page 3: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

3

Acknowledgments

Probably the most read part of any thesis or dissertation is this – the acknowledgments, and here are

mine:

I was first drawn to the academic field of spine surgery when I, as a medical student at a lecture in

2010, met Professor Benny Dahl. I was shortly after enrolled as a pregraduate research year student

at the Spine Unit with Benny and Associate Professor Martin Gehrchen as my supervisors. After

graduating medical school in 2014, it was an exceptional privilege to be offered a position as full-

time research fellow by these two men, for which I am very grateful. I owe a tremendous amount of

thanks to Martin and Benny for providing the foundation and structure to support the research

presented in this thesis. I have learned a great deal from the both of you, and have come to

understand how you, in a very short time span, have transformed the Spine Unit at Rigshospitalet to

a world-known scientific institute.

I am also grateful for the support that I have received from my academic supervisors Associate

Professor Rachid Bech-Azeddine and Dennis Winge Hallager. Dennis has shown me how

transparency, attention to detail and not cutting corners are essential in all academic work. Thank

you for the hard work that paved the road for my research, and for your guidance.

I would also like to thank all my co-authors Lars Valentin Hansen, Niklas Tøndevold and Ture

Karbo for contributing to the studies. A special thanks to Niklas, for keeping me grounded.

I owe a very sincere thank you to my fellow Ph.D. colleagues at the Spine Unit, Sidsel Fruergaard,

Frederik Pitter, Casper Dragsted and Søren Ohrt-Nissen, and from across the hall: Pelle

Petersen, for both social, scientific and emotional support. Your company has provided the best

work environment that I could possibly imagine. And of course, to my gym-buddy Peter Polzik, for

listening to my rants and for teaching me sarcasm – bol’shoye spasibo.

Without the support of my friends and family, this thesis would not haven possible. I would also

like to thank Aesculapius, Partur and Uglerne.

A very sincere thank you to my Father, who has endured what I can only imagine, to provide the

conditions for which I now find myself in – I will always be grateful for all that you have done and

for being my role model.

And finally, for everything you put up with, for your love, for your endless support and for the love

of our life, Hugo – I dedicate this work to my wife, Emilie.

Page 4: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

4

Abbreviations and concepts

Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

AP Anteverted pelvis

ASD Adult Spinal Deformity

AUC Area under the curve

C7 The 7th cervical vertebra

CI Confidence interval

Coronal Anterior plane of the spine

DICOM Digital Imaging and Communication in Medicine

GAP Global alignment and Proportion: a score to predict postoperative results

HRQOL Health-related quality of life

ICC Intraclass correlation coefficient, used to measure reliability

IQR Interquartile range

Kyphosis Convex curvature of the spine, normally seen in the thoracic spine

L5 The 5th lumbar vertebra

LOA Limits of Agreement, used to quantify Agreement in continuous data

Lordosis Concave curvature of the spine, normally seen in the cervical and lumbar spine

Mechanical complication Defined as a postoperative instrument failure, eg. rod breakage or proximal junctional failure

OR Odd ratio

PACS Picture Archiving and Communication System

PI Pelvic Incidence: morphologic spinopelvic radiographic parameter (Figure 3)

PJF Proximal junctional failure: PJK more than a specified amount or combined with eg. fracture or ligamentous rupture.

PJK Proximal junctional kyphosis: postoperative kyphosation above the highest level of instrumentation

Page 5: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

5

Plumb line vertical line

PT Pelvic Tilt: radiographic spinopelvic parameter representing compensation for sagittal imbalance (Figure 3)

Reliability The ability of a measurement to differentiate between subjects[1], using the ratio between measurement error (within subject) and between subject variation

Reproducibility Umbrella term for agreement and reliability, describes the similarity between repeated measures.

Revision surgery Secondary unplanned surgery after index procedure

ROC Receiver operating characteristics

Sagittal The lateral plane of the spine

SD Standard deviation

SEM Standard error of measurement: a parameter to assess Agreement

SRS Scoliosis Research Society

SS Sacral slope: a radiographic parameter of the slope of the sacral endplate (Figure 3)

SVA Sagittal vertical axis: a radiographic describing the sagittal balance of the spine (Figure 2)

UIV The uppermost instrumented vertebra

Page 6: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

6

Summary

Adult Spinal Deformity (ASD) is a common disorder which substantially impacts health-related

quality of life and is associated to considerable pain. Although the reason for ASD can be

manifold, iatrogenic causes and degenerative aging processes are the most common. Recent

decades have seen accelerated technological and surgical advances, and in addition to a

demographic shift of the global population, ASD is attracting growing interest. Non-surgical

treatment is often chosen as first-line of treatment; although, the outcome of surgery is

considerably favorable. Despite compelling effect on every-day living and pain, surgical

treatment is associated with a considerable risk of complications, including mechanical failure

requiring revision surgery. Radiographic parameters on whole-spine imaging have been

established as a crucial cornerstone in diagnosis, assessment of severity, pre-operative planning,

and post-operative evaluation of patients with this disorder. In addition, numerous classification

and scoring systems are emerging in attempts to identify at-risk patients and predict

complications requiring revision surgery.

The overall aim of this thesis was to further scientific knowledge of radiographic parameters in

patients with ASD, in efforts to ultimately reduce the surgery-related risks. The Roussouly

Classification system was developed to describe the variation in sagittal spine shapes in the

normal population, and to serve as templates for the surgical targets. In Study I, this system was

assessed for its reproducibility when classified by multiple raters and with repeated

measurements. A blinded test-re-test study was performed in 64 patients referred for ASD

assessment. Using Fleiss Kappa, inter-rater reliability was estimated as “moderate” and intra-

rater reliability as “substantial”, which we found comparable with previous similar estimates of

the most commonly used ASD classification system—the SRS-Schwab ASD Classification.

Although, single radiographic parameters of the sagittal spine are frequently used in ASD

evaluation, the measurement error has not been established. It is assumed to be approximately

±5°; although, the evidence for this is not well founded. In Study II, we aimed to address this

lack in knowledge. In a similar reproducibility study, we estimated the measurement error of

four commonly used radiographic parameters. Despite excellent reliability, we found that the

measurement error was considerably greater than previously assumed, and that the magnitude of

measurement error varied considerably between the included parameters.

Finally, the Global Alignment and Proportion (GAP) score was recently presented with excellent

predictability of mechanical failure after ASD surgery. In Study III, we aimed to validate this

Page 7: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

7

novel score in a consecutive cohort of 149 patients with minimum 2-years follow-up. Despite

only minor dissimilarities to the cohort in the original study, we found no statistical association

between the GAP score and mechanical complications after surgery for ASD.

In conclusion, we found moderate to substantial reliability of the Roussouly Classification

system; estimated the measurement error of sagittal radiographic parameters to be greater than

previously assumed; and were not able to validate the accuracy of the GAP score, all in patients

with ASD.

Page 8: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

8

Dansk resumé

Adult Spinal Deformity (ASD) er den internationale betegnelse for rygdeformiter hos voksne, og er

en hyppig og smertefuld tilstand med betydelig negativ indflydelse på livskvaliteten. Rygdeformiter

hos voksne kan opstå af forskellige grunde, og de hyppigste årsager er iatrogen deformitet (som

følge af en behandling), samt degenerativ deformitet som følge af aldring. Der har i de seneste årtier

været en betragtelig teknologisk og kirurgisk udvikling, og kombineret med den demografiske

befolkningsudvikling og deraf flere ældre, ses en øget interesse for behandling af rygdeformiteter. I

tilfælde af svær deformitet er kirurgi at foretrække frem for konservativ behandling, og de fleste

patienter vil opnå et godt resultat med betydelig reduktion af smerter og funktionsnedsættelse.

Imidlertid er kirurgi forbundet med betydelige risici, inklusive mekaniske komplikationer der ofte

kræver fornyet kirurgi. En afgørende hjørnesten i undersøgelse af rygdeformitet er radiologiske

parametre. Disse måles på røntgenbilleder taget af hele rygsøjlen med patienten i stående stilling.

De bruges til diagnostik, vurdering af grad af deformitet, planlægning af kirurgi og post-operativ

vurdering af effekten af kirurgi. Ligeledes sker der for tiden en udvikling af flere

klassifikationssystemer til at identificere patienter i højere risiko for komplikationer, og til generelt

at forudsige komplikationer som følge af kirurgi.

Denne ph.d.-afhandlings overordnede formål var at bibringe yderligere viden indenfor radiologiske

målinger hos voksne med rygdeformiteter, for at reducere den risiko der er forbundet med kirurgi.

Roussouly´s klassifikationssystem blev udviklet med det formål at beskrive den normale variation i

rygtyper hos symptomfri individer, og til at bruges som skabelon ved planlægning forud for

deformitetskirurgi. I Studie I, undersøgte vi reproducerbarheden af dette system når flere

observatører foretog gentagne målinger på de samme patienter. Forsøget udførtes som et blindet

test-re-test studie af 64 patienter henvist pga. rygdeformitet. Ved brug af Fleiss Kappa estimerede vi

reproducerbarheden som ”moderat” til ”betydelig”, sammenligneligt med tidligere skøn af det, for

tiden, mest brugte klassifikationssystem af rygdeformitet.

Selvom enkelte røntgenparametre hyppigt bruges ved vurdering af rygdeformiteter, er den reelle

måleusikkerhed ikke fastslået. Der er en generel antagelse af en måleusikkerhed på ±5°, men

antagelsen er ikke velbegrundet. I Studie II, var formålet at belyse denne problemstilling. Med

lignende fremgangsmåde som i Studie I udførtes et reproducerbarheds-studie, hvor vi ønskede at

fastslå måleusikkerheden af 4 hyppigt brugte radiologiske parametre. Måleusikkerheden var

Page 9: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

9

betydeligt større end tidligere antaget. Desuden var der en stor variation i måleusikkerhed blandt de

undersøgte parametre.

Der er for nylig publiceret et studie hvor Global Alignment and Proportion (GAP) skalaen blev

præsenteret med fremragende evne til at forudse mekaniske komplikationer efter deformitetskirurgi.

I Studie III, var formålet at validere denne nye skala hos 149 patienter efter kirurgi for

rygdeformitet, og med to års opfølgning. Med forbehold for mindre demografiske forskelle

sammenlignet med den oprindelige patient population, fandt vi ingen statistisk sammenhæng

mellem GAP score og mekanisk komplikation til kirurgi.

På baggrund af disse tre studier kan vi konkludere: 1) at Roussouly’s klassifikationssystem er

forbundet med ”moderat” til ”betydelig” reproducerbarhed, 2) at der er betydeligt større

måleusikkerhed ved brug at radiologiske parametre hos patienter med rygdeformitet end tidligere

antaget, og 3) at vi ikke kunne bekræfte GAP skalaens evne til at forudsige mekaniske

komplikationer efter deformitetskirurgi.

Page 10: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

10

List of papers

This thesis was based on the following papers

I. Bari, T. J., Hallager, D. W., Tøndevold, N., Karbo, T., Hansen, L. V., Dahl, B., &

Gehrchen, M. Moderate interrater and substantial intrarater reproducibility of the

Roussouly Classification System in patients with adult spinal deformity. Spine Deformity

2019; 7(2), 312-318.

II. Bari, T. J., Hallager, D. W., Tøndevold, N., Karbo, T., Hansen, L. V., Dahl, B., &

Gehrchen, M. Spinopelvic parameters depending on the angulation of the sacral endplate

are less reproducible than other spinopelvic parameters in adult spinal deformity

patients. Spine Deformity 2019; 7, 771-778.

III. Bari, T. J., Ohrt-Nissen, S., Hansen, L. V., Dahl, B., & Gehrchen, M. (2019). Ability of

the global alignment and proportion score to predict mechanical failure following adult

spinal deformity surgery—validation in 149 patients with two-year follow-up. Spine

Deformity 2019; 7(2), 331-337.

Page 11: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

11

Introduction

“Primum, non necere”

Adult Spinal Deformity (ASD) is a complex and heterogeneous spinal disorder involving

deformity in the coronal and the sagittal plane[2]. It is now well established that ASD is

associated with significant disability and pain [3–5]. Previous studies have estimated the

prevalence of ASD in the general population to be between 2%-23%[6–9], although a prevalence

of up to 68% has been suggested in the elderly[5,10]. This pathological condition encompasses a

variety of types and magnitudes of deformity[11]. The majority of patients with ASD are less

severe cases and these patients will often be treated using non-operative modalities such as

physical therapy and pain-medication. Patients with more severe deformities will often benefit

from surgical treatment [12–14], involving correction of the spine using pedicle screw

instrumentation. Although beneficial, surgery for ASD has definite risks, including mechanical

complications requiring revision surgery[15–17]. Therefore, increased attention has been

directed towards reducing these risks. Certain patient characteristics have been identified to be

correlated to postoperative complications and further investigation is being conducted in

developing radiographic targets for surgery[18–21]. The importance of sagittal balance is well

established and recent interest has been directed towards the development of individualized

radiographic goals for surgery[22–24]. The following sections of this thesis will aim to introduce

the importance of sagittal spine balance and the nature of sagittal spine deformity, followed by a

description of current classification systems and the gap in knowledge that necessitates the

papers presented in this thesis.

Page 12: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

12

Background

Sagittal balance

In young healthy individuals, the spine is balanced with the

center of mass vertically aligned with the feet. Due to

degenerative changes with increasing age, or other causes

which will be mentioned later, a forward tilt, or kyphosing

event, will alter this balance[25–29]. Dubousset proposed the

“cone of economy” to illustrate the essentials of standing

balance (Figure 1) [30]. Centered at the feet, a 3-dimensional

cone is projected upwards and thus defines the range within

which the body can remain standing with minimum energy

expenditure[31]. As the center of mass sways from the center

of the cone, energy expenditure increases with consequent

muscle fatigue and pain [22,32]. The most common method

of measuring sagittal balance is the Sagittal Vertical Axis

(SVA) measured on long standing radiographs of the spine

and defined as the distance between the C7-plumbline and

the superior posterior corner of S1 (Figure 2). Understanding

sagittal balance requires awareness of the spinopelvic

parameters: Pelvic Incidence (PI), Pelvic Tilt (PT) and Sacral

Slope (SS). The measurement methods of these radiographic

parameters are illustrated in Figure 3. PI is a fixed,

individually morphologic parameter and does not change with age after adulthood[33–36].

Furthermore, the PI is not affected by the patient´s body position.

Compensating for imbalance

Besides natural aging, other processes or events can also cause spinal kyphosing with consequent

sagittal imbalance [25,37–42]. Neuromuscular diseases such as Parkinson, degenerative

conditions, Scheuermann´s disease, and autoimmune processes such as Ankylosing Spondylitis

often cause spinal deformity. Others causes are osteoporosis, vertebral fractures, short segment

Figure 1. Illustrates the “Cone of

economy proposed by Dubousset. The

least amount of energy is spent in a

balanced erect position. As the center of

mass shifts forward towards the edge of

the cone, energy expenditure increases.

Page 13: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

13

degeneration, and iatrogenic events as post laminectomy syndrome or lumbar fusion in relative

Figure 2. Sagittal (lateral) spine. C7 indicate the C7

vertebrate. The solid line illustrates the sagittal vertical

axis (SVA), measured as the horizontal distance between

the superior, posterior corner of the sacral endplate and

the plumb line drawn from C7.

Figure 3. Sagittal (lateral) lower spine and pelvis.

Methods for measuring the spinopelvic radiographic

parameters: sacral slope, pelvic incidence and pelvic tilt.

Page 14: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

14

kyphosis. Regardless of the reason for sagittal imbalance,

mechanisms of self- preservation will attempt to restore

balance to enable upright gait and horizontal gaze [30,43–

45]. These mechanisms will briefly be described:

Spinal extension

In an overall imbalanced spine, extension in parts above

and below the kyphotic segments is possible in a flexible

spine and with musculature capable of erection, thus

resulting in a reduction of thoracic kyphosis. Effects may

be painful and can lead to a possible spondylolisthesis

under extreme conditions[25].

Pelvic retroversion

Pelvic retroversion involves retroversion (hip extension)

of the pelvis around the femoral heads (Figure 4) [46,47].

PT represents the extent of pelvic retroversion, and as PT

increases, SS decreases which in term enables correction

of sagittal imbalance. As PI=PT+SS, individuals with a

large PI have greater ability of pelvic retroversion as their

PT range is greater[48]. Assuming that the SS can

maximally be decreased to 0° (horizontal sacral endplate),

an individual with a PI of 70° could potentially retrovert

their pelvis to a maximum of 70° in PT (70=70+0). This

would in term mean full hip extension, and to further

compensate for sagittal imbalance, other mechanisms

need to be used.

Knee flexion

At maximum retroversion of the pelvis, further correction

of sagittal balance will involve flexion of the knees and

dorsiflexion of the ankles (Figure 5) [47,49,50]. Both

pelvic retroversion and knee flexion increases muscle

activity and, more importantly, impairs walking[46,51].

Figure 4. Pelvic retroversion can compensate for

sagittal imbalance and involves retroversion of

the pelvic around the femoral heads.

Figure 5. Flexion of the knees as a last resort of

compensation for sagittal imbalance.

Page 15: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

15

Treatment

Identifying patients in need of surgical treatment for ASD is complex, and involves radiographic

imaging (long sagittal and coronal films in standing position with knees fully extended) in

addition to a full clinical assessment[12]. Non-operative treatment is often used initially and

involves physical therapy, training and pain medication [52,53]. Bracing does not effectively

prevent curve progression and is therefore seldom used as a treatment modality of ASD[53,54].

It is now well established that surgical treatment can significantly benefit appropriately selected

patients with ASD compared to non-operative treatment[52,55–61]. However, the often

challenging decision to perform surgery for ASD is accompanied by an equally complex

decision making and surgical planning of type of correction, number of levels for instrumented

fusion and use of spinal osteotomies[20,43,52,62].

Complications and Adverse events

As operative techniques for ASD is in constant evolvement, surgeons are able to correct even

more severe deformities. Invasive procedures are often related to a certain risk of complications,

and with procedures as extensive as ASD surgery, this risk is especially considerable. A recent

systematic review by Sciubba et al. revealed an overall complication rate of 55% in 3,299

patients undergoing ASD surgery. Prospective studies have reported even higher rates of

perioperative adverse events [63,64]. Although most of these complications are minor, rarely

requiring further intervention, the risk of mechanical complications, such as pseudoarthrosis with

consequent rod breakage and proximal junctional kyphosis (PJK), is also significant. Previous

studies have reported the risk of revision surgery due to mechanical failure up to 30% within two

years of surgery [16,65–67]. Therefore, understanding sagittal imbalance, how to correct it and

identifying models to predict mechanical failure is of grave importance.

Classification systems

Attempts to classify disease was first implemented in the 18th

century and classification of

sagittal spine deformity has only been introduced in recent decades[68]. These were developed

to understand sagittal balance, to subcategorize patients based on the degree of imbalance, to

recognize differences in individual spinopelvic morphology and to set targets for surgical

treatment. In addition, classification systems provide effective tools for communication and

comparison in both clinical and academic settings. In a series of studies, Barrey et al. recognized

that sagittal imbalance can be compensated for – to an extent[69–71]. Based on the C7 plumb

line and degree of PT, patients were categorized as balanced, balanced through compensation or

Page 16: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

16

imbalanced. Lafage et al. found that pelvic retroversion was correlated to poor health-related

quality of life (HRQOL)[46]. Legaye et al. suggested a relationship between spinopelvic

parameters and the degree of lumbar lordosis (LL). Boulay et al. later proposed an equation for

predicting LL based on PI and this has since been established and expressed as PI-LL[20,22,72].

These observations lead to a series of classifications aiming to classify the adult sagittal spine in

relation to ASD.

The Roussouly Classification

Roussouly et al. recognized that the sagittal shape of the spine presents with some variation

between individuals[73]. The authors established a system to classify this normal variation as

one of four types. Recently, a fifth type was added[74]. Figure 6 illustrates these five types. Type

1 and 2 are characterized by a small PI and consequent small SS. The apex of the LL is located

caudally in Type 1 creating a short lordosis. Whilst Type 2 has a slightly more cranial location of

the apex, and a higher inflection point, in terms producing a flat lordosis and overall flat spine.

The Type 3 spine has larger PI and SS with subsequent prominent lordosis. Type 4 is also

determined by a high-grade PI in combination with the largest SS (>45°) and most prominent

and longest lordosis. The intermediate type, Type 3 anteverted pelvis (3AP), was later added,

and characterized by a large SS despite a small PI, producing this anteverted pelvis.

Scoliosis Research Society-Schwab (SRS-Schwab) Adult Spinal Deformity Classification

By analyzing multiple radiographic parameters, the SRS-Schwab classification system was

proposed based on the four parameters most correlated to HRQOL measures. Three of these are

measured in the sagittal spine: PI-LL within 10°; SVA less than 4 cm; and PT under 20°[75].

Page 17: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

17

Figure 6. The Roussouly Classification´s five main spine types. Type 1 and 2 are characterized by a small sacral slope

(SS) and small pelvic incidence (PI). Type 3 and 4 have large SS and PI. In Type 3 anteverted pelvis (3-AP), the pelvis is

shifted forward generating a large SS despite a small PI.

Global Alignment and Proportion (GAP) score

Although surgical correction of ASD can be beneficial in select cases, the risk of complications

is considerable [76,77] and up to 30% of patients are at risk of revision surgery due to

mechanical failure [16,65–67]. As the SRS-Schwab system was developed in relation to HRQOL

measures, and not in correlation to the risk of complications, the European Spine Study Group

recently proposed the GAP score[78]. The GAP score comprises four parameters based on pelvic

incidence and overall sagittal balance, in addition to a category for age. The authors developed a

scoring system based on these parameters and validated the score in a second sample of patients

with ASD. They found a substantial positive correlation between increasing GAP score and the

risk of revision surgery due to mechanical failure.

Reproducibility

As classification systems emerge, determining validity and reproducibility is essential. In the

field of deformity surgery, there is a general acceptance of a measurement error of ±5° for

radiographic parameters. Although, this is based on studies of the coronal Cobb angle, and a

recent review revealed that this assumed error is not well founded[79]. The following section

will aim to explain the terminology and describe methods for determining reproducibility. In

general, reproducibility is an umbrella-term for reliability and agreement[80].

Page 18: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

18

Reliability

Reliability describes the variability of measuring a parameter in a single subject compared to that

of the entire sample. Therefore, reliability coefficients, such as the Intraclass Correlation

Coefficient (ICC), describe a measurement method´s (or instrument) ability to distinguish

between individuals [80–82]. Consequently, reliability does not describe measurement error.

Common statistical methods for describing reliability are ICC (numeric parameters) and Kappa

(κ, categorical parameters).

Agreement

Agreement describes the measurement error in repeated measurements of a parameter of interest.

Therefore, agreement estimates the precision of any single measurement[80,82], and aids in

differentiating between real clinical change from irrelevant fluctuations[83,84]. The standard

error of measurement (SEM) or Bland and Altman´s limits of agreement (LOA) are often used

for this purpose when assessing numeric parameters [85]. For categorical variables, crude

percentages of agreement are often used.

Example

Kottner et al. used the example of rating medical students[81]. All observers rated the students as

“excellent” and the agreement score would therefore be perfect between raters. The reliability

score would be close to zero, as there is no variability and therefore no way to distinguish

between individuals.

Page 19: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

19

Objectives and hypotheses

As scientific studies are performed, and before applying the conclusion, results need to be

validated outside the context of the original study[86]. This is referred to as the external validity

and entails the extent to which a study conclusion can be generalized to other situations.

This PhD-thesis aimed to determine the reproducibility of the Roussouly Classification system

and the commonly used radiographic spinopelvic parameters PI, PT and SS in patients with

ASD. In addition, we aimed to validate the recently published GAP-score in a Danish population

of patients undergoing surgical treatment for ASD. To address these issues, we performed a

reproducibility study of radiographic parameters and a validation study of the GAP score.

Study I: Roussouly Classification

To provide estimates of inter- and intra-rater agreement and reliability of the Roussouly

Classification system in patients referred for ASD.

We hypothesized that the reproducibility would be comparable to estimates of the SRS-Schwab

Classification system. As a secondary outcome, we aimed to estimate the distribution of spine

types in this study population.

Study II: Spinopelvic parameters

To provide estimates of inter- and intra-rater agreement and reliability of the commonly used

sagittal radiographic parameters SVA, PI, PT and SS in patients referred for ASD.

We hypothesized that the measurement error would be greater than previously assumed.

Study III: GAP-score

To validate the GAP-score in a single-center cohort of patients undergoing ASD surgery.

We hypothesized that postoperative GAP score and categories would be correlated to

postoperative mechanical failure and revision surgery as reported in the original study[78].

Page 20: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

20

Methods

A detailed description of each study´s design is provided in the manuscripts accompanying this

thesis as Appendices. The overall methodological considerations are described below.

Overall study design

The typical ASD patient is diagnosed at a secondary institute following referral by the patient´s

general practitioner (family practitioner). Following diagnosis at the secondary institute, patients

are referred to our tertiary institution for evaluation of possible surgical treatment. Our institution

serves a population of approximately 2.5 million citizens.

For Studies I and II, objectives did not concern intervention or estimates of effects. Therefore,

these studies were conducted as cross-sectional reproducibility studies of patients referred to our

institution for ASD assessment. Studies I and II were performed in combination based on the

same patient sample and radiographic measurements. Study III was performed as a retrospective

validation study of a recently published predictor of mechanical failure in patients undergoing

ASD surgery.

Patient inclusion

Study I-II

All patients referred for evaluation of ASD surgery at our tertiary institution were assessed for

inclusion from August 1, 2013 to March 31, 2014. As these studies aimed to assess specific

radiographic measurements, certain inclusion and exclusion criteria had to be met. Inclusion

criteria were ≥18 years of age and availability of sufficient radiographs. This was defined as

long, standing, whole-spine radiographs taken as “fists-on-clavicles” both in the anterior and

lateral plane, and including the C7 to L5 vertebra, sacral endplate and both femoral heads.

Exclusion criteria were: previous instrumentation, vertebral fracture or neuromuscular disease.

Study III

In study II, all patients undergoing ASD surgery from January 1, 2013 to December 31, 2015

were screened for inclusion. Inclusion criteria were age of 18 years or older and sufficient

radiographs (defined as previously mentioned) taken preoperatively, postoperatively and at

follow-ups 3 months, 1 year and 2 years after surgery. Consequently, patients with less than 2-

Page 21: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

21

year follow-up were excluded. The patient sample was not exclusive to index procedures, and

thus, both primary and revision procedures were included.

Data collection

Clinical data, including demographics were obtained using electronic medical records. For

Studies I and II, this was limited to age and sex, in addition to radiographic parameters which

were measured on radiographic images. In Study III, further data were gathered including patient

characteristics and postoperative information regarding mechanical complications and secondary

unplanned surgeries (revision).

Radiographic measurements

All radiographs were obtained using the Digital Imaging and Communications in Medicine

(DICOM) format in the Radiology Information System and Picture Archiving and

Communications System IMPAX 6® (Agfa Healthcare NV, Mortsel, Belgium). Images were

then transferred to the image management system KEOPS® (SMAIO, Lyon, France). This

system is used for all radiographic measurement of patients with ASD at our institution. After

identifying key radiographic landmarks, including center of femoral heads, sacral endplate and

vertebral bodies from L5-C7, the software then generated a computerized model of the spine.

After manual adjustments of this model by the clinician, the software then calculates the

radiographic parameters including the Roussouly spine type. Again, this is the standard clinical

method for estimating radiographic parameters at our institution. For the GAP score and

category used in study III, the principal investigator calculated each specified radiographic

parameter based on the parameters provided by the KEOPS® software, in accordance to the

original paper[78].

Reproducibility measurements

In Studies I and II, reproducibility was assessed in a test-re-test setting by four raters in blinded

fashion. Detailed description of the methodology is provided in the accompanying manuscripts.

Outcomes

In Studies I and II, the primary outcomes were radiographic measurements, and no further

outcomes were recorded. In study III, the main outcome was defined as revision surgery within 2

years of index surgery due to a mechanical failure, defined as: proximal junctional kyphosis

(PJK), proximal junctional failure, distal junctional failure, rod breakage or “other”. We further

Page 22: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

22

defined PJK as ≥10° increase in kyphosis angle measured between the uppermost instrumented

vertebra (UIV) and the upper endplate of the vertebra 2 levels above UIV (UIV+2)[87]. The

increase in angle was in comparison to the radiograph taken immediately postoperatively.

Proximal junctional failure was defined in accordance to Hart et al. [87] as PJK in combination

with either: fracture of the UIV or UIV+1, instrumentation pullout at UIV or posterior

osteoligamenteous disruption. Secondary outcome was mechanical failure within 2 years of

index procedure regardless of subsequent revision surgery.

Statistics

Demographic data

Distribution of data was assessed using histograms. Data were reported as proportions (%),

means with standard deviations (SD), or medians with interquartile ranges (IQR). The Student´s

t-test was used to compare numeric Gaussian distributed data, and the Wilcoxon signed-rank test

for non-Gaussian data. When comparing categorical data, the Pearson χ2 test of independence of

trend was used unless the expected frequency was below 5 for 20% of compared groups, in

which case the Fisher exact test was used.

Reproducibility: categorical parameters

For assessing reliability, Fleiss Kappa values were calculated for variables measured by more

than two raters [1,80,88]. When the number of compared raters were two, Cohens Kappa was

used [1,88,89]. The calculated kappa coefficients were then classified in accordance to Landis

and Koch as slight (≤0.2), fair (>0.2-0.4), moderate (>0.4-0.6), substantial (>0.6-0.8) or almost

perfect reliability (>0.8)[90]. Two-tailed Z-tests were used to compare individual intra-rater

reliability. We used the Bhapkar test of marginal homogeneity to assess differences in

distribution of categories between raters. Agreement was reported using crude percentages.

Reproducibility: numerical parameters

ICCs were used to assess reliability of numeric parameters. We used ICCA,1 for both intra- and

interrater reliability[91–93]. ICCs were classified as poor (<0.5), moderate (0.5-0.75), good

(>0.75-0.9) or excellent (>0.9)[91,94]. Agreement was reported using the SEM and 95%

LOA[85,95]. The SEM and 95% LOA were derived from variances calculated using a linear

mixed effects model [83,85,96]. A more detailed description for calculating the SEM and 95%

LOA is presented in Appendix 4.

Page 23: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

23

Association to outcome

In Study III, the area under the curve (AUC) was calculated using receiver operating

characteristics (ROC) curves when assessing the diagnostic accuracy of the GAP score in

predicting postoperative outcome. The diagnostic accuracy was further classified based on the

AUC as: no or low power (0.5-0.7), moderate power (0.7-0.9) or high power (≥0.9). When

assessing GAP categories for association to outcome, the Cochrane-Armitage test of trend was

used and reported as odd ratios (OR) with 95% confidence intervals (CI). As a secondary

assessment, both GAP score and categories were analyzed for association to outcome using

logistic regression.

Page 24: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

24

Results

The following section contains a summary of each study.

Study I

Objective

To assess the reproducibility of the Roussouly Classification in patients referred for ASD

evaluation.

Summary of results

Sixty-four patients were included with a median age of 46 (IQR 23-63), 30% of patients had an

SVA of more than 5 cm and almost 70% had a PI-LL mismatch. All 64 patients were rated twice

by four spine surgeons with different levels of experience. Interrater reliability was moderate

(κ=0.60) and intrarater reliability was substantial (κ=0.68). There was a tendency towards higher

intrarater reliability with rater experience. In terms of agreement, the same spine type was

assigned in both readings by the same rater in 76% (range: 67%-83%). Interrater agreement,

assessed as the same spine type assigned by all 4 raters, was 47%. Figure 7 illustrates the

distribution of spine types across all readings.

Main finding

I. Moderate interrater and substantial intrarater reliability when using the Roussouly

Classification.

Page 25: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

25

Figure 7. Distribution of Roussouly types across four raters and two readings. The x-axis indicates the rater followed by the

reading, i.e. the first column details rater 1´s first reading, the second column rater 1´s second reading and so on.

Page 26: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

26

Study II

Objective

The aim of this study was to assess the degree of measurement error of commonly used sagittal

radiographic parameters.

Summary of results

We identified 64 patients referred for ASD treatment. Four spine surgeons measured PI, SS, PT

and SVA in all patients twice, with a 14-day delay between readings. Reliability was assessed

using ICCs and classified as “excellent” (ICC<0.9) for all parameters (Table 1). Except interrater

reliability of PI (ICC=0.89, classified as “good”). Agreement was assessed using the 95% LOA,

and the measurement error was found to be greater than previously assumed (Table 2). In

addition, we found that the apparent measurement error was larger for parameters depending on

the slope of sacral endplate (PI and SS) compared to parameters depending of its position (PT

and SVA).

Main findings

II. The measurement error of spinopelvic parameters was greater than previously assumed.

III. The measurement error PI and SS were noticeably larger than that for PT.

IV. Measurement error was considerable despite excellent reliability.

Page 27: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

27

Table 1. Assessment of reliability using intraclass correlation coefficients (ICC). Reliability was classified as “excellent” for

assessed parameters (ICC≥0.90) except for inter-rater reliability of pelvic incidence (0.89).

Table 2. Measurement precision measured as 95% Limits of Agreement (LOA) for commonly used radiographic parameters.

Data indicate the precision of a single measurement compared to the hypothetical true measurement.

Page 28: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

28

Study III

Objective

To validate the GAP-score in a sample outside of the original cohort.

Primary outcome

Postoperative mechanical failure leading to revision surgery within two years of index surgery.

Secondary outcome

Any mechanical failure within two years of index surgery.

Summary of results

We retrospectively included 149 patients undergoing ASD surgery at a single tertiary institution.

All patients were followed for two years after surgery and medical records were used to obtain

events of mechanical failure and revision surgery. Mechanical failure occurred in 51% of cases,

and in 31% patients, revision surgery was performed due to mechanical failure. The AUC using

ROC curves showed “no or low discriminatory power” of the GAP score in predicting

mechanical failure with or without subsequent revision (AUC = 0.50 and 0.49) (Figure 8). We

found no significant correlation between GAP categories and either of the measured outcomes (p

= 0.28 and 0.58). In addition, secondary analyses using logistic regression was similarly without

significant association.

Main findings

V. GAP score and categories did not predict mechanical failure after ASD surgery.

Page 29: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

29

Figure 8. Receiver operating

characteristic (ROC) curves of

the Global Alignment and

Proportion (GAP) score in

relation to either outcome.

“Mechanical failure” indicates

mechanical failure within two

years of index surgery. “Revision

surgery” indicates revision

surgery due to mechanical failure

within two years of index

surgery. The diagnostic accuracy

of the GAP score was classified

as “no or low power” for both

outcomes.

Page 30: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

30

Discussion

The Roussouly Classification

The classification system was developed in an attempt to describe the normal variation in sagittal

spine shape[73]. Mechanical complications remain a considerable risk following ASD surgery,

and researchers are attempting to establish methods of reducing this risk, including

individualized radiographic targets[67]. Naturally, part of the solution to this must be found in

the fact that there are normal variations in the shape of human spine, and the Roussouly

Classification may well play a considerable role. To the best of our knowledge, the study

presented in this thesis (Study I) is the first to assess the reproducibility of the Roussouly

Classification. We found moderate inter-rater and substantial intrarater reliability. The only

previously well-established classification system used in ASD patients is the SRS-Schwab

Classification, and results were therefore compared to the reproducibility reported for this system

in three previous studies. Schwab et al. who developed the system also aimed to validate it[75].

In their study, 21 cases were selected to represent the distribution of classification grades.

Radiographs were pre-marked for landmarks and then presented to 9 raters. They reported

substantial inter-rater reliability, almost perfect intra-rater reliability and 88% crude intra-rater

agreement. The inter-rater agreement was not reported. By using pre-marked radiographs and

selecting patients for an even distribution of categories could easily produce artificially high

reliability. A second study was published by Liu et al. of 102 ASD patients[97]. Radiographs

were not pre-marked in their study, although exclusion criteria were not given. They found

substantial inter-rater reliability, almost perfect intra-rater reliability and 42% crude inter-rater

agreement. No intra-rater agreement was given. In neither of these two studies, trait prevalence

or rater-bias was reported. Evenly distributed trait prevalence between categories and large

differences in distribution between raters tends to produce higher reliability[98]. In a third study,

Hallager et al reported moderate inter-rater and substantial intra-rater reliability in addition to

39% crude inter-rater and 70% crude intra-rater agreement[99]. In their study, radiographs were

not pre-marked, and the selection process of patients was fully disclosed, without any active

selection of patients. Rater bias and trait prevalence was given, thus producing the study with a

method most comparable to the study presented in this thesis. Results of reliability and

agreement were similar to the reliability found in our study.

Page 31: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

31

Measurement accuracy of sagittal radiographic parameters

Measurement precision

The aim of study II was to assess the measurement error of commonly used parameters on

sagittal radiographs of ASD patients, and to compare the results to the previously assumed ±5°.

We therefore used the 95% LOA, as opposed to reliability measures, to assess the measurement

error[80–82]. The 95% LOA is to be interpreted as a 95% confidence interval of the difference

between two individual measurements. A degree of error will always be present in measurements

of continuous variables [100]. Accordingly, it will affect the reading of single and repeated

measures [96] which often is of interest when treating patients with ASD. We found 95% LOA

of ±4° to ±13° for parameters measured as angles and approximately ±6 mm for SVA which is

measured as a distance (Table 2). At the time this study was conducted, no previous study had

reported the measurement accuracy of these radiographic parameters in patients with ASD using

agreement parameters. However, previous studies had attempted to describe the reproducibility

through other methods. Aubin et al. used 6 realistic spine prototypes as “gold standard”

compared to radiographic measurements and reported agreement as the standard deviation (SD)

of mean difference[101]. However, using the SD of mean difference fails to take into account a

large part of the error and tends to underestimate the measurement error[85,100]. In a study by

Lafage et al. radiographs were pre-marked for specific landmarks before presenting the images

to the raters[102]. Results can therefore be interpreted as the accuracy of the radiographic

software’s ability to calculate the parameters, based on the markings, and not the accuracy of

radiographic measurement. Finally, several additional studies have assessed the reproducibility

of these spinopelvic parameters; although, none have adhered to the methods for agreement

assessment suggested by Bland and Altman in combination with being performed on patients

with ASD[80,83,85,95,103–105].

Recently, and after submission of our study, Kyrölä et al. published their paper assessing the

reproducibility of the same radiographic parameters[106]. Their study was performed on 49

patients referred due to thoracolumbar complaints. Only 8 patients (16%) were classified as

having a structural deformity (scoliosis, kyphosis, spondylolysis or spondylolisthesis) as opposed

to our study, where all patients were referred for ASD evaluation. They reported the SEM as an

assessment of agreement, as also presented in our study. The SEM is easily compared between

studies as the SEMagreement takes into account the systematic difference between raters (Appendix

4). Furthermore, as SEM was calculated using the same method as in our study, in addition to the

given repeatability coefficient (RC), the 95% LOA can be derived[104]. Table 4 details the

Page 32: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

32

comparison of these estimates and results were similar to our study. Kyrölä et al. found greater

measurement error for most parameters, except for intra-rater agreement of PI and SS.

Table 3. Comparison of agreement assessment using 95% Limits of Agreement (LOA). Results from the study by Kyrölä

et al. was calculated as ± the repeatability coefficient and derived as the mean of both readings for inter-rater agreement

and mean of all three raters for intra-rater agreement. Results were similar.

Importance of sacral endplate slope

In relation to PI and SS, the observed measurement error was profoundly larger than the

previously assumed ±5°. PI and SS are parameters depending on the slope of the sacral endplate,

as opposed to PT which merely depends on the endplates position (Figure 3). This difference in

measurement method could explain the observed difference in measurement error when

comparing PI and SS to the measurement error of PT. Pernaa et al. found similar results in

patients following hip arthroplasty[107] as did Kyrölä et al[106]. When looking more closely at

our own results, we found that in cases with large measurement difference between raters, the

difference was often due to disagreement in selecting the sacral endplate. This yielded up to 30°

difference between measurements of PI and SS while PT was only marginally different. The

sacral endplate can be difficult to distinguish in patients with ASD due to degenerative changes

and deformity. These findings therefore highlight an unrecognized challenge in radiographic

assessment of ASD.

Page 33: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

33

Agreement describes measurement precision

In the reproducibility study of spinopelvic parameters by Kyrölä et al. [106], the authors reported

“excellent” reliability despite considerable measurement error —consistent with our results. This

further underlines the importance of choosing the correct statistical method in reproducibility

studies.

Predicting mechanical failure using the GAP-score

The seminal work of the European Spine Study Group demonstrated promising results using the

GAP score and category to predict mechanical failure in 74 patients following ASD surgery[78].

Despite multiple statistical analyses, we were not able to validate the initial findings in our

sample of 149 ASD patients. There are numerous possible explanations for this discrepancy.

First and foremost, all medical research is susceptible to type I errors, and rejection of the null

hypothesis cannot with 100% certainty be true. In addition, a certain amount of bias need to be

calculated for as the authors validated a model which they themselves developed. Therefore,

external validation is necessary before clinical implementation. Consequently, Jacobs et al. [108]

recently published their study comparing the predictability of the GAP score vs the SRS-Schwab

Classification in 39 patients with ASD. They found that the GAP score was significantly

correlated to postoperative mechanical failure, and significantly better than the SRS-Schwab

classification. At a closer look at the original study by Yilgor et al., out of 548 eligible patients,

only 222 met 2-year follow-up criteria, resulting in an eligibility of merely 40%. Of these 222

patients, 74 comprised the validation cohort. The surgical outcome of the remaining 326 (60%)

patients are unknown, and the risk of treatment-by-condition needs to be considered. In

comparison, the eligibility in our study was 80% (149 out of 186), whereas Jacobs et al. did not

disclose their exclusion process. The sample presented in our study was almost four times larger,

and the inclusion period was 3 years in comparison to 11 years in the study by Jacobs et al. All

three studies used the same definition for mechanical complications. A second common trait was

the retrospective nature. Two out of three of these studies demonstrated a positive correlation

between the GAP score and postoperative mechanical failure. Therefore, the results of our study

may well be due to a type II error (false negative). In search for the true answer to this question,

further studies are highly warranted, preferably in prospective cohorts.

A second explanation for the discrepancy could be explained by a missing aspect, or element, of

the GAP score. Previous studies have demonstrated that mechanical complications following

ASD surgery can be described to a myriad of factors: PT, sex, magnitude of thoracic kyphosis,

Page 34: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

34

degree of correction, number of instrumented vertebra, ending instrumentation at the sacrum,

obesity, osteoporosis, surgical technique and muscle volume among other factors have been

association to the risk of failure[67,109–114]. As discussed in previous sections of this thesis, the

PT in particular is a direct estimate of compensation for sagittal imbalance. Jacobs et al.

demonstrated that the Relative Pelvic Version, a subcategory of the GAP-score, showed similar

properties and predictability of mechanical failure as the pelvic tilt modifier used in SRS-Schwab

Classification. Finally, the samples presented in the original validation study and the study

presented in this thesis, differed in an aspect: the percentage of patients undergoing revision

surgery due to mechanical complications were higher in our study, as was the proportion of

patients with a history of previous spine surgery. Regardless of demographic and outcome

discrepancy, a model for accurately predicting mechanical failure should be applicable to all

ASD patients. Whether or not the GAP-score is the true solution remains to be answered.

Page 35: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

35

Limitations

There are important limitations to the three presented studies. The inclusion process in all three

studies is susceptible to selection bias, and most noticeable in studies I and II, due to the large

number of excluded patients, indicating low external validity. Most exclusions were ascribed

insufficient radiographs (not including either femoral heads or C7). Our institution has since

implemented protocols to reduce this rate, in collaboration with the local radiology department,

leading to very few ASD radiographs taken insufficiently today. In addition, all radiographic

measurements were performed using image management software. This particular system

requires the rater (or surgeon) to identify all anatomical parts of the spine necessary for

calculations of the radiographic parameters. The system then measures the angles and distances

based on the given algorithms for radiographic parameters (Figures 1, 2 and 5). This method of

measuring radiographic parameters, therefore, excludes a certain aspect of potential error – i.e.

once all landmarks and anatomical parts of the spine has been identified, no further measurement

error can occur (e.g. the rater cannot measure PI in an incorrect manner). This is what is

sometimes referred to as a “semi-automatic” system, in comparison to imaging systems where

the rater uses a measuring tool to calculate the angle between two lines. The software system

used in these studies is standard of care, at our institution, and used in all radiographic

assessment of ASD patients, including preoperative diagnostics, surgical planning and

postoperative evaluation of results. Nonetheless, the external validity can be questioned, as other

institutions may have to calculate for an additional step with potentially added measurement

error. Previous studies report that the use of dedicated software was related to greater reliability

compared to a standard picture archiving and communication (PACS) system[102,115]. This is

particularly relevant for study I. In study II, the potential extra error only adds to the conclusion:

the extent of measurement error may well be even greater when performing manual radiographic

measurements. Keeping with the methodology of study II, the study was performed as a post-hoc

study. Therefore, no power calculation was performed and results should be interpreted with

caution. Despite limitations to this method, our findings nevertheless suggest that radiographic

measurements of ASD patients are less accurate than previously assumed.

Turning to the third paper of this thesis, the aim was to validate the GAP score in an external

patient sample. The score was validated as whole in addition to the four subcategories. However,

the GAP score is comprised of five sub-elements, each contributing to the final total score. These

Page 36: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

36

sub-elements were not evaluated individually. As patient characteristics in our study differed

slightly from the original patient sample, further analysis of the sub-elements of the GAP score

might have revealed correlation to the outcomes of interest. In addition, the retrospective nature

of our study, allows for selection bias in addition to exclusion of 20% of eligible patients due to

insufficient radiographs or missing 2-year follow up.

Page 37: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

37

Perspectives and future research

With accelerated advances in surgical techniques and implant technology, combined with a

demographic shift seen in most developed countries[116,117], an increased demand for ASD

surgery is to be expected[5,118,119]. Also, the progression in anesthetic and medical treatment

allow for surgical treatment of patients previously deemed unfit to undergo major

surgery[60,120–124]. A recent Danish nationwide database study showed an increased 2-year

revision risk following ASD surgery from 2006 to 2011, with a slight decrease in recent

years[125]. Although this trend has not been validated in other studies, the risk of complications,

including revision surgery, continues to be significant[12,16,17,64,67,126].

Fixed alignment targets

Targets for surgical treatment has previously been regarded as a “one-size-fits-all”, i.e. a specific

set of radiographic parameters are regarded as “normal” and to be aimed for when performing

surgical correction of deformity. Recent studies have recognized flaws in this assumption as the

morphologic parameter PI seem to play a significant role in tailoring individual surgical

targets[18,78,127–130]. By focusing attention to the variation in spine shape in normal

individuals, the Roussouly classification was developed. This system may well play an important

role in surgical planning and evaluation of patients with spinal deformity[131], although further

studies assessing the effect of implementing this system and the effect on outcome is most

needed. Our institution is currently involved in multicenter study, a retrospective and a

prospective study investigating this precise effect, and preliminary results were presented at the

Scoliosis Research Society´s annual meeting in 2018.

Potential individuals not fitting the model

As a system for individual alignment targets is still to be established, a recent study by Ferrero et

al, found that a proportion of patients with ASD and sagittal imbalance were not able to

compensate through pelvic retroversion[132]. The authors suggested unknown neuromuscular or

hip pathology as potential causes to this phenomenon, and further investigation into this matter is

needed.

The aging spine

Degenerative processes occurring with increasing age is known to effect sagittal balance and the

occurrence of spinal deformity[26,43,133] Although, studies have suggested that partial sagittal

Page 38: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

38

imbalance may be part of normal ageing, and correction of this imbalance may be poorly

received[48,134–136]. The effect of age, and its role in both sagittal balance and correction of

deformity might not be fully understood, and has to be addressed completely before

implementing a model for radiographic targets of ASD surgery[137].

Enhanced recovery after surgery

The cost of surgery for ASD is noteworthy and, with technological advances, increasing[138–

141]. In addition, potential revision procedures adds to this cost, and the effectiveness of ASD

surgery has to be evaluated with the ever increasing overall expenditure of worldwide

healthcare[142]. Reducing costs has become a significant focus, and implementation of enhanced

recovery after surgery (ERAS) principles has resulted in major improvements in clinical

outcomes and reduction in cost in other surgical fields[143–145]. In the field of spine surgery,

most studies have reported ERAS implementation in minimal invasive surgery, and few clinical

studies have reported the effect in major spine surgery such as ASD surgery[146–152]. Clinical

studies evaluating the effect of full implementation of ERAS principles in ASD surgery is still to

be published.

Gait, functional tests and paravertebral muscles

Why certain individuals are able to compensate for progressive spinal malalignment and others

are not, remain to be fully understood. Gait and motion analysis could provide important

information in comparison, and addition, to static radiographic parameters[153–156]. Physical

activity trackers has also been suggested as means of evaluating outcome[157]. By equipping

patients with activity trackers, real-time data of activeness may provide important information in

assessing severity of deformity and effect of surgery. A sudden decrease in activity could be

interpreted as symptoms of mechanical or neurologic complications, allowing earlier

intervention than possible today. Finally, spinal musculature is an essential factor in keeping an

upright posture. The characteristic, type and extent of muscle volume loss and fat infiltration has

been proposed as factors related to skeletal muscle degeneration[158,159]. Recent studies have

suggested this fat infiltration to be different in spinal erectors compared to other muscle groups,

and that it may be related to sagittal spinal imbalance[160–162].

Page 39: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

39

Concluding remarks

Radiographic measurements are crucial in ASD assessment. In this thesis, we present three

papers of radiographic parameters in patients with ASD. Firstly, the reproducibility of the

Roussouly Classification was estimated. We found moderate to substantial reliability. Secondly,

we assessed single, commonly used radiographic parameters and found greater measurement

error than previously assumed. In addition, parameters depending on the angulation of the sacral

endplate were associated with the greatest measurement error. Finally, the recently published

GAP score was assessed for external validity in predicting mechanical failure after ASD surgery.

We found no correlation between the GAP score and mechanical failure – with or without

subsequent revision surgery.

Page 40: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

40

References

[1] Kottner J, Audige L, Brorson S, et al. Guidelines for Reporting Reliability and Agreement

Studies (GRRAS) were proposed. Int J Nurs Stud 2011;48:661–71.

[2] Good CR, Auerbach JD, O’Leary PT, et al. Adult spine deformity. Curr Rev

Musculoskelet Med 2011;4:159–67.

[3] Pellisé F, Vila-Casademunt A, Ferrer M, et al. Impact on health related quality of life of

adult spinal deformity (ASD) compared with other chronic conditions. Eur Spine J

2014;24:3–11.

[4] Schwab F, Blondel B, Chay E, et al. The comprehensive anatomical spinal osteotomy

classification. Neurosurgery 2015;76:112–20.

[5] Ames CP, Scheer JK, Lafage V, et al. Adult Spinal Deformity: Epidemiology, Health

Impact, Evaluation, and Management. Spine Deform 2016;4:310–22.

[6] Carter OD, Haynes SG. Prevalence rates for scoliosis in US adults: results from the first

National Health and Nutrition Examination Survey. Int J Epidemiol 1987;16:537–44.

[7] Francis RS. Scoliosis screening of 3,000 college-aged women. The Utah Study--phase 2.

Phys Ther 1988;68:1513–6.

[8] Kebaish KM, Neubauer PR, Voros GD, et al. Scoliosis in adults aged forty years and

older: prevalence and relationship to age, race, and gender. Spine (Phila Pa 1976)

2011;36:731–6.

[9] Pérennou D, Marcelli C, Hérisson C, et al. Adult lumbar scoliosis. Epidemiologic aspects

in a low-back pain population. Spine (Phila Pa 1976) 1994;19:123–8.

[10] Schwab F, Dubey A, Gamez L, et al. Adult scoliosis: prevalence, SF-36, and nutritional

parameters in an elderly volunteer population. Spine (Phila Pa 1976) 2005;30:1082–5.

[11] Diebo BG, Shah N V, Boachie-Adjei O, et al. Adult spinal deformity. Lancet

2019;394:160–72.

[12] Youssef JA, Orndorff DO, Patty CA, et al. Current Status of Adult Spinal Deformity.

Glob Spine J 2013;3:051–62.

[13] Passias PG, Jalai CM, Line BG, et al. Patient profiling can identify patients with adult

spinal deformity (ASD) at risk for conversion from nonoperative to surgical treatment:

initial steps to reduce ineffective ASD management. Spine J 2018;18:234–44.

[14] Teles AR, Mattei TA, Righesso O, et al. Effectiveness of Operative and Nonoperative

Care for Adult Spinal Deformity: Systematic Review of the Literature. Glob Spine J

Page 41: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

41

2017;7:170–8.

[15] Yadla S, Maltenfort MG, Ratliff JK, et al. Adult scoliosis surgery outcomes: a systematic

review. Neurosurg Focus 2010;28:E3.

[16] Charosky S, Guigui P, Blamoutier A, et al. Complications and Risk Factors of Primary

Adult Scoliosis Surgery. Spine (Phila Pa 1976) 2012;37:693–700.

[17] Sciubba DM, Yurter A, Smith JS, et al. A Comprehensive Review of Complication Rates

after Surgery for Adult Deformity: A Reference for Informed Consent. Spine Deform

2015;3:575–94.

[18] Legaye J, Duval-Beaupère G, Hecquet J, et al. Pelvic incidence: A fundamental pelvic

parameter for three-dimensional regulation of spinal sagittal curves. Eur Spine J

1998;7:99–103.

[19] Scheer JK, Keefe M, Lafage V, et al. Importance of patient-reported individualized goals

when assessing outcomes for adult spinal deformity (ASD): initial experience with a

Patient Generated Index (PGI). Spine J 2017;17:1397–405.

[20] Schwab F, Patel A, Ungar B, et al. Adult Spinal Deformity—Postoperative Standing

Imbalance. Spine (Phila Pa 1976) 2010;35:2224–31.

[21] Bess S, Protopsaltis TS, Lafage V, et al. Clinical and Radiographic Evaluation of Adult

Spinal Deformity. Clin Spine Surg 2016;29:6–16.

[22] Schwab FJ, Blondel B, Bess S, et al. Radiographical spinopelvic parameters and disability

in the setting of adult spinal deformity: a prospective multicenter analysis. Spine (Phila Pa

1976) 2013;38:E803--12.

[23] Diebo BG, Varghese JJ, Lafage R, et al. Sagittal alignment of the spine: What do you

need to know? Clin Neurol Neurosurg 2015;139:295–301.

[24] Glassman SD, Bridwell K, Dimar JR, et al. The Impact of Positive Sagittal Balance in

Adult Spinal Deformity. Spine (Phila Pa 1976) 2005;30:2024–9.

[25] Barrey C, Roussouly P, Perrin G, et al. Sagittal balance disorders in severe degenerative

spine. Can we identify the compensatory mechanisms? Eur Spine J 2011;20:626–33.

[26] Pollintine P, van Tunen MSLM, Luo J, et al. Time-Dependent Compressive Deformation

of the Ageing Spine. Spine (Phila Pa 1976) 2010;35:386–94.

[27] Ailon T, Smith JS, Shaffrey CI, et al. Degenerative Spinal Deformity. Neurosurgery

2015;77:S75–91.

[28] Grubb SA, Lipscomb HJ, Coonrad RW. Degenerative adult onset scoliosis. Spine (Phila

Pa 1976) 1988;13:241–5.

[29] Ailon T, Shaffrey CI, Lenke LG, et al. Progressive Spinal Kyphosis in the Aging

Page 42: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

42

Population. Neurosurgery 2015;77:S164–72.

[30] Dubousset J. Three-dimensional analysis of the scoliotic deformity. Pediatr Spine Princ

Pract 1994:479–496.

[31] Ames CP, Smith JS, Scheer JK, et al. Impact of spinopelvic alignment on decision making

in deformity surgery in adults: A review. J Neurosurg Spine 2012;16:547–64.

[32] Rajnics P, Templier A, Skalli W, et al. The Association of Sagittal Spinal and Pelvic

Parameters in Asymptomatic Persons and Patients with Isthmic Spondylolisthesis. J

Spinal Disord Tech 2002;15:24–30.

[33] Roussouly P, Pinheiro-Franco JL. Sagittal parameters of the spine: biomechanical

approach. Eur Spine J 2011;20:578–85.

[34] Mac-Thiong J-M, Roussouly P, Berthonnaud É, et al. Age- and sex-related variations in

sagittal sacropelvic morphology and balance in asymptomatic adults. Eur Spine J

2011;20:572–7.

[35] Mac-Thiong J-M, Berthonnaud É, Dimar JR, et al. Sagittal Alignment of the Spine and

Pelvis During Growth. Spine (Phila Pa 1976) 2004;29:1642–7.

[36] Mac-Thiong J-M, Labelle H, Roussouly P. Pediatric sagittal alignment. Eur Spine J

2011;20:586–90.

[37] Wenger DR, Frick SL. Scheuermann kyphosis. Spine (Phila Pa 1976) 1999;24:2630–9.

[38] Cortet B, Houvenagel E, Puisieux F, et al. Spinal curvatures and quality of life in women

with vertebral fractures secondary to osteoporosis. Spine (Phila Pa 1976) 1999;24:1921–5.

[39] Vaccaro AR, Silber JS. Post-traumatic spinal deformity. Spine (Phila Pa 1976)

2001;26:S111-8.

[40] Potter BK, Lenke LG, Kuklo TR. Prevention and management of iatrogenic flatback

deformity. J Bone Joint Surg Am 2004;86-A:1793–808.

[41] Boody, Barrett S. MD; Rosenthal, Brett D. MD; Jenkins, Tyler J. MD; Patel, Alpesh A.

MD; Savage, Jason W. MD; Hsu WKM. Iatrogenic Flatback and Flatback Syndrome:

Evaluation, Management, and Prevention. Clin Spine Surg 2017;30:142–149.

[42] Lu DC, Chou D. Flatback syndrome. Neurosurg Clin N Am 2007;18:289–94.

[43] Roussouly P, Nnadi C. Sagittal plane deformity: an overview of interpretation and

management. Eur Spine J 2010;19:1824–36.

[44] Vidal J. Sagittal deviations of the spine, and trial of classification as a function of the

pelvic balance [in French]. Rev Chir Orthop Reparatrice Appar Mot 1984;70:124–6.

[45] Vidal J, Marnay T. [Morphology and anteroposterior body equilibrium in

spondylolisthesis L5-S1]. Rev Chir Orthop Reparatrice Appar Mot 1983;69:17–28.

Page 43: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

43

[46] Lafage V, Schwab F, Patel A, et al. Pelvic Tilt and Truncal Inclination. Spine (Phila Pa

1976) 2009;34:E599–606.

[47] Lazennec J-Y, Brusson A, Rousseau M-A. Hip–spine relations and sagittal balance

clinical consequences. Eur Spine J 2011;20:686–98.

[48] Vaz G, Roussouly P, Berthonnaud E, et al. Sagittal morphology and equilibrium of pelvis

and spine. Eur Spine J 2002;11:80–7.

[49] Obeid I, Hauger O, Aunoble S, et al. Global analysis of sagittal spinal alignment in major

deformities: correlation between lack of lumbar lordosis and flexion of the knee. Eur

Spine J 2011;20:681–5.

[50] Diebo BG, Oren JH, Challier V, et al. Global sagittal axis: a step toward full-body

assessment of sagittal plane deformity in the human body. J Neurosurg Spine

2016;25:494–9.

[51] Roussouly P, Pinheiro-Franco JL. Biomechanical analysis of the spino-pelvic organization

and adaptation in pathology. Eur Spine J 2011;20:1–10.

[52] Crawford CH, Glassman SD. Fusing Adult Degenerative Deformities of the Lumbar

Spine. Semin Spine Surg 2011;23:222–6.

[53] Russo A, Bransford R, Wagner T, et al. Adult degenerative scoliosis insights, challenges,

and treatment outlook. Curr Orthop Pract 2008;19:357–65.

[54] Silva FE, Lenke LG. Adult degenerative scoliosis: evaluation and management.

Neurosurg Focus 2010;28:E1.

[55] Everett CR, Patel RK. A Systematic Literature Review of Nonsurgical Treatment in Adult

Scoliosis. Spine (Phila Pa 1976) 2007;32:S130–4.

[56] Scheer JK, Hostin R, Robinson C, et al. Operative Management of Adult Spinal

Deformity Results in Significant Increases in QALYs Gained Compared to Nonoperative

Management. Spine (Phila Pa 1976) 2018;43:339–47.

[57] Smith JS, Shaffrey CI, Berven S, et al. IMPROVEMENT OF BACK PAIN WITH

OPERATIVE AND NONOPERATIVE TREATMENT IN ADULTS WITH SCOLIOSIS.

Neurosurgery 2009;65:86–94.

[58] Li G, Passias P, Kozanek M, et al. Adult Scoliosis in Patients Over Sixty-Five Years of

Age. Spine (Phila Pa 1976) 2009;34:2165–70.

[59] Smith JS, Shaffrey CI, Berven S, et al. Operative Versus Nonoperative Treatment of Leg

Pain in Adults With Scoliosis. Spine (Phila Pa 1976) 2009;34:1693–8.

[60] Bridwell KH, Glassman S, Horton W, et al. Does Treatment (Nonoperative and

Operative) Improve the Two-Year Quality of Life in Patients With Adult Symptomatic

Page 44: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

44

Lumbar Scoliosis. Spine (Phila Pa 1976) 2009;34:2171–8.

[61] Glassman SD, Carreon LY, Shaffrey CI, et al. The Costs and Benefits of Nonoperative

Management for Adult Scoliosis. Spine (Phila Pa 1976) 2010;35:578–82.

[62] Ames CP, Scheer JK, Lafage V, et al. Adult Spinal Deformity: Epidemiology, Health

Impact, Evaluation, and Management. Spine Deform 2016;4:310–22.

[63] Street JT, Lenehan BJ, DiPaola CP, et al. Morbidity and mortality of major adult spinal

surgery. A prospective cohort analysis of 942 consecutive patients. Spine J 2012;12:22–

34.

[64] Karstensen S, Bari T, Gehrchen M, et al. Morbidity and mortality of complex spine

surgery: a prospective cohort study in 679 patients validating the Spine AdVerse Event

Severity (SAVES) system in a European population. Spine J 2016;16:146–53.

[65] Daubs MD, Lenke LG, Cheh G, et al. Adult spinal deformity surgery: Complications and

outcomes in patients over age 60. Spine (Phila Pa 1976) 2007;32:2238–44.

[66] Smith JS, Klineberg E, Lafage V, et al. Prospective multicenter assessment of

perioperative and minimum 2-year postoperative complication rates associated with adult

spinal deformity surgery. J Neurosurg Spine 2016;25:1–14.

[67] Hallager DW, Karstensen S, Bukhari N, et al. Radiographic Predictors for Mechanical

Failure after Adult Spinal Deformity Surgery. Spine (Phila Pa 1976) 2017;42:E855–63.

[68] Gøtzsche P. Rational diagnosis and treatment: Evidence-based clinical decision-making.

John Wiley & Sons; 2008.

[69] Faundez A, Roussouly P, Le Huec JC. Équilibre sagittal pelvi-rachidien et pathologies

lombaires dégénératives. Rev Med Suisse 2011;7:2470–4.

[70] Barrey C, Jund J, Noseda O, et al. Sagittal balance of the pelvis-spine complex and lumbar

degenerative diseases. A comparative study about 85 cases. Eur Spine J 2007;16:1459–67.

[71] Le Huec JC, Charosky S, Barrey C, et al. Sagittal imbalance cascade for simple

degenerative spine and consequences: algorithm of decision for appropriate treatment. Eur

Spine J 2011;20:699–703.

[72] Boulay C, Tardieu C, Hecquet J, et al. Sagittal alignment of spine and pelvis regulated by

pelvic incidence: Standard values and prediction of lordosis. Eur Spine J 2006;15:415–22.

[73] Roussouly P, Gollogly S, Berthonnaud E, et al. Classification of the normal variation in

the sagittal alignment of the human lumbar spine and pelvis in the standing position. Spine

(Phila Pa 1976) 2005;30:346–53.

[74] Laouissat F, Sebaaly A, Gehrchen M, et al. Classification of normal sagittal spine

alignment: refounding the Roussouly classification. Eur Spine J 2017.

Page 45: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

45

[75] Schwab F, Ungar B, Blondel B, et al. Scoliosis Research Society—Schwab Adult Spinal

Deformity Classification. Spine (Phila Pa 1976) 2012;37:1077–82.

[76] Fehlings MG, Kato S, Lenke LG, et al. Incidence and risk factors of postoperative

neurologic decline after complex adult spinal deformity surgery: results of the Scoli-

RISK-1 study. Spine J 2018;18:1733–40.

[77] Lenke LG, Fehlings MG, Shaffrey CI, et al. Neurologic Outcomes of Complex Adult

Spinal Deformity Surgery. Spine (Phila Pa 1976) 2016;41:204–12.

[78] Yilgor C, Sogunmez N, Boissiere L, et al. Global Alignment and Proportion (GAP) Score.

J Bone Jt Surg 2017;99:1661–72.

[79] Langensiepen S, Semler O, Sobottke R, et al. Measuring procedures to determine the

Cobb angle in idiopathic scoliosis: A systematic review. Eur Spine J 2013;22:2360–71.

[80] de Vet HCW, Terwee CB, Knol DL, et al. When to use agreement versus reliability

measures. J Clin Epidemiol 2006;59:1033–9.

[81] Kottner J, Streiner DL. The difference between reliability and agreement. J Clin

Epidemiol 2011;64:701–2.

[82] Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of

evaluative instruments. J Chronic Dis 1987;40:171–8.

[83] Weir JP. Quantifying Test-Retest Reliability Using the Intraclass Correlation Coefficient

and the SEM. J Strength Cond Res 2005;19:231.

[84] Eliasziw M, Young SL, Woodbury MG, et al. Statistical Methodology for the Concurrent

Assessment of Interrater and Intrarater Reliability: Using Goniometric Measurements as

an Example. Phys Ther 1994;74:777–88.

[85] Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat

Methods Med Res 1999;8:135–60.

[86] Mitchell ML, Jolley JM. Research design explained. Cengage Learning; 2012.

[87] Hart RA, McCarthy I, Ames CP, et al. Proximal Junctional Kyphosis and Proximal

Junctional Failure. Neurosurg Clin N Am 2013;24:213–8.

[88] Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reliability: Key concepts,

approaches, and applications. Res Soc Adm Pharm 2013;9:330–8.

[89] Cohen J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or

partial credit. Psychol Bull 1968;70:213–20.

[90] Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data.

Biometrics 1977;33:159.

[91] Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation

Page 46: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

46

Coefficients for Reliability Research. J Chiropr Med 2016;15:155–63.

[92] McGraw KO, Wong SP. “Forming inferences about some intraclass correlations

coefficients”: Correction. Psychol Methods 1996;1:390–390.

[93] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol

Bull 1979;86:420–8.

[94] Portney LG, Watkins MP. Foundations of clinical research: applications to practice. vol.

892. Pearson/Prentice Hall Upper Saddle River, NJ; 2009.

[95] Martin Bland J, Altman D. STATISTICAL METHODS FOR ASSESSING

AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT.

Lancet 1986;327:307–10.

[96] Hopkins WG. Measures of Reliability in Sports Medicine and Science. Sport Med

2000;30:375–81.

[97] Liu Y, Liu Z, Zhu F, et al. Validation and Reliability Analysis of the New SRS-Schwab

Classification for Adult Spinal Deformity. Spine (Phila Pa 1976) 2013;38:902–8.

[98] Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample

size requirements. Phys Ther 2005;85:257–68.

[99] Nielsen DH, Gehrchen M, Hansen L V., et al. Inter- and Intra-rater Agreement in

Assessment of Adult Spinal Deformity Using the Scoliosis Research Society–Schwab

Classification. Spine Deform 2014;2:40–7.

[100] Atkinson G, Nevill AM. Statistical Methods For Assessing Measurement Error

(Reliability) in Variables Relevant to Sports Medicine. Sport Med 1998;26:217–38.

[101] Aubin C-E, Bellefleur C, Joncas J, et al. Reliability and Accuracy Analysis of a New

Semiautomatic Radiographic Measurement Software in Adult Scoliosis. Spine (Phila Pa

1976) 2011;36:E780–90.

[102] Lafage R, Ferrero E, Henry JK, et al. Validation of a new computer-assisted tool to

measure spino-pelvic parameters. Spine J 2015;15:2493–502.

[103] Popović ZB, Thomas JD. Assessing observer variability: a user’s guide. Cardiovasc Diagn

Ther 2017;7:317–24.

[104] Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of

measurement errors in continuous variables. Ultrasound Obstet Gynecol 2008;31:466–75.

[105] Jones M, Dobson A, O’Brian S. A graphical method for assessing agreement with the

mean between multiple observers using continuous measures. Int J Epidemiol

2011;40:1308–13.

[106] Kyrölä KK, Salme J, Tuija J, et al. Intra- and Interrater Reliability of Sagittal Spinopelvic

Page 47: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

47

Parameters on Full-Spine Radiographs in Adults With Symptomatic Spinal Disorders.

Neurospine 2018;15:175–81.

[107] Pernaa K, Seppänen M, Mäkelä K, et al. Reliability of Sagittal Spinopelvic Alignment

Measurements After Total Hip Arthroplasty. Clin Spine Surg 2017;30:E909–14.

[108] Jacobs E, van Royen BJ, van Kuijk SMJ, et al. Prediction of mechanical complications in

adult spinal deformity surgery—the GAP score versus the Schwab classification. Spine J

2019;19:781–8.

[109] Nicholls FH, Bae J, Theologis AA, et al. Factors Associated with the Development of and

Revision for Proximal Junctional Kyphosis in 440 Consecutive Adult Spinal Deformity

Patients. Spine (Phila Pa 1976) 2017;42:1693–8.

[110] Kim YJ, Bridwell KH, Lenke LG, et al. Pseudarthrosis in adult spinal deformity following

multisegmental instrumentation and arthrodesis. J Bone Jt Surg - Ser A 2006;88:721–8.

[111] Edwards CC, Bridwell KH, Patel A, et al. Long adult deformity fusions to L5 and the

sacrum: A matched cohort analysis. Spine (Phila Pa 1976) 2004;29:1996–2005.

[112] Wang H, Ding W, Ma L, et al. Prevention of Proximal Junctional Kyphosis: Are Polyaxial

Pedicle Screws Superior to Monoaxial Pedicle Screws at the Upper Instrumented

Vertebrae? World Neurosurg 2017;May:405–15.

[113] Gandhi S V, Januszewski J, Bach K, et al. Development of Proximal Junctional Kyphosis

After Minimally Invasive Lateral Anterior Column Realignment for Adult Spinal

Deformity. Neurosurgery 2018;Mars:Epub. ahead of print.

[114] Kim D, Kim J, Kim D, et al. Risk factors of proximal junctional kyphosis after multilevel

fusion surgery: more than 2 years follow-up data 2017;60:174–80.

[115] Gupta M, Henry JK, Schwab F, et al. Dedicated Spine Measurement Software Quantifies

Key Spino-Pelvic Parameters More Reliably Than Traditional Picture Archiving and

Communication Systems Tools. Spine (Phila Pa 1976) 2016;41:E22–7.

[116] Lutz W, Sanderson W, Scherbov S. The coming acceleration of global population ageing.

Nature 2008;451:716–9.

[117] Jacobsen LA, Kent M, Lee M, et al. America´s aging population. Popul Bull 2011;66.

[118] Fehlings MG, Tetreault L, Nater A, et al. The Aging of the Global Population.

Neurosurgery 2015;77:S1–5.

[119] Smith JS, Shaffrey CI, Bess S, et al. Recent and Emerging Advances in Spinal Deformity.

Neurosurgery 2017;80:S70–85.

[120] Dykes PC, Samal L, Donahue M, et al. A patient-centered longitudinal care plan: vision

versus reality. J Am Med Informatics Assoc 2014;21:1082–90.

Page 48: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

48

[121] Gillick MR. The Critical Role of Caregivers in Achieving Patient-Centered Care. JAMA

2013;310:575.

[122] Kupfer JM, Bond EU. Patient Satisfaction and Patient-Centered Care. JAMA

2012;308:139.

[123] Passias PG, Poorman GW, Jalai CM, et al. Morbidity of Adult Spinal Deformity Surgery

in Elderly Has Declined Over Time. Spine (Phila Pa 1976) 2017:1.

[124] Smith JS, Lafage V, Shaffrey CI, et al. Outcomes of operative and nonoperative treatment

for adult spinal deformity: A prospective, multicenter, propensity-matched cohort

assessment with minimum 2-year follow-up. Neurosurgery 2016;78:851–61.

[125] Pitter FT, Lindberg-Larsen M, Pedersen AB, et al. Revision Risk After Primary Adult

Spinal Deformity Surgery: A Nationwide Study With Two-Year Follow-up. Spine Deform

2019;7:619-626.e2.

[126] Pichelmann MA, Lenke LG, Bridwell KH, et al. Revision Rates Following Primary Adult

Spinal Deformity Surgery. Spine (Phila Pa 1976) 2010;35:219–26.

[127] Sebaaly A, Riouallon G, Obeid I, et al. Proximal junctional kyphosis in adult scoliosis:

comparison of four radiological predictor models. Eur Spine J 2018;27:613–21.

[128] Scheer JK, Smith JS, Schwab F, et al. Development of a preoperative predictive model for

major complications following adult spinal deformity surgery. J Neurosurg Spine

2017;26:736–43.

[129] Terran J, Schwab F, Shaffrey CI, et al. The SRS-schwab adult spinal deformity

classification: Assessment and clinical correlations based on a prospective operative and

nonoperative cohort. Neurosurgery 2013;73:559–68.

[130] Rothenfluh DA, Mueller DA, Rothenfluh E, et al. Pelvic incidence-lumbar lordosis

mismatch predisposes to adjacent segment disease after lumbar spinal fusion. Eur Spine J

2015;24:1251–8.

[131] Ohrt-Nissen S, Bari T, Dahl B, et al. Sagittal Alignment After Surgical Treatment of

Adolescent Idiopathic Scoliosis—Application of the Roussouly Classification. Spine

Deform 2018;6:537–44.

[132] Ferrero E, Vira S, Ames CP, et al. Analysis of an unexplored group of sagittal deformity

patients: low pelvic tilt despite positive sagittal malalignment. Eur Spine J 2016;25:3568–

76.

[133] Sebaaly A, Grobost P, Mallam L, et al. Description of the sagittal alignment of the

degenerative human spine. Eur Spine J 2018;27:489–96.

[134] Schwab F, Lafage V, Boyce R, et al. Gravity Line Analysis in Adult Volunteers. Spine

Page 49: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

49

(Phila Pa 1976) 2006;31:E959–67.

[135] Mendoza-Lattes S, Ries Z, Gao Y, et al. Natural History of Spinopelvic Alignment Differs

From Symptomatic Deformity of the Spine. Spine (Phila Pa 1976) 2010;35:E792–8.

[136] Sugrue PA, McClendon J, Smith TR, et al. Redefining Global Spinal Balance. Spine

(Phila Pa 1976) 2013;38:484–9.

[137] Lafage R, Schwab F, Challier V, et al. Defining spino-pelvic alignment thresholds should

operative goals in adult spinal deformity surgery account for age? Spine (Phila Pa 1976)

2016;41:62–8.

[138] Raman T, Nayar SK, Liu S, et al. Cost-Effectiveness of Primary and Revision Surgery for

Adult Spinal Deformity. Spine (Phila Pa 1976) 2018;43:791–7.

[139] McCarthy IM, Hostin RA, Ames CP, et al. Total hospital costs of surgical treatment for

adult spinal deformity: An extended follow-up study. Spine J 2014;14:2326–33.

[140] McCarthy IM, Hostin RA, O’Brien MF, et al. Analysis of the direct cost of surgery for

four diagnostic categories of adult spinal deformity. Spine J 2013;13:1843–8.

[141] Yeramaneni S, Robinson C, Hostin R. Impact of spine surgery complications on costs

associated with management of adult spinal deformity. Curr Rev Musculoskelet Med

2016;9:327–32.

[142] Health expenditure per capita, 2018, p. 132–3.

[143] Kehlet H, Wilmore DW. Evidence-Based Surgical Care and the Evolution of Fast-Track

Surgery. Ann Surg 2008;248:189–98.

[144] Kehlet H. Fast-track colorectal surgery. Lancet (London, England) 2008;371:791–3.

[145] Ljungqvist O, Scott M, Fearon KC. Enhanced Recovery After Surgery. JAMA Surg

2017;152:292.

[146] Wang MY, Chang P-Y, Grossman J. Development of an Enhanced Recovery After

Surgery (ERAS) approach for lumbar spinal fusion. J Neurosurg Spine 2017;26:411–8.

[147] Wang MY, Chang HK, Grossman J. Reduced Acute Care Costs With the ERAS®

Minimally Invasive Transforaminal Lumbar Interbody Fusion Compared With

Conventional Minimally Invasive Transforaminal Lumbar Interbody Fusion.

Neurosurgery 2018;83:827–34.

[148] Angus M, Jackson K, Smurthwaite G, et al. The implementation of enhanced recovery

after surgery (ERAS) in complex spinal surgery. J Spine Surg 2019;5:116–23.

[149] Dagal A, Bellabarba C, Bransford R, et al. Enhanced Perioperative Care for Major Spine

Surgery. Spine (Phila Pa 1976) 2019;44:959–66.

[150] Carr DA, Saigal R, Zhang F, et al. Enhanced perioperative care and decreased cost and

Page 50: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

50

length of stay after elective major spinal surgery. Neurosurg Focus 2019;46:E5.

[151] Wainwright TW, Immins T, Middleton RG. Enhanced recovery after surgery (ERAS) and

its applicability for major spine surgery. Best Pract Res Clin Anaesthesiol 2016;30:91–

102.

[152] Wainwright TW, Wang MY, Immins T, et al. Enhanced recovery after surgery (ERAS)—

Concepts, components, and application to spine surgery. Semin Spine Surg 2018;30:104–

10.

[153] Arima H, Yamato Y, Hasegawa T, et al. Extensive Corrective Fixation Surgeries for Adult

Spinal Deformity Improve Posture and Lower Extremity Kinematics During Gait. Spine

(Phila Pa 1976) 2017;42:1456–63.

[154] Yagi M, Ohne H, Konomi T, et al. Walking balance and compensatory gait mechanisms

in surgically treated patients with adult spinal deformity. Spine J 2017;17:409–17.

[155] Diebo BG, Shah N V., Pivec R, et al. From Static Spinal Alignment to Dynamic Body

Balance. JBJS Rev 2018;6:e3.

[156] Diebo BG, Challier V, Shah N V., et al. The Dubousset Functional Test is a Novel

Assessment of Physical Function and Balance. Clin Orthop Relat Res 2019:1.

[157] Haglin JM, Godzik J, Mauria R, et al. Continuous Activity Tracking Using a Wrist-

Mounted Device in Adult Spinal Deformity: A Proof of Concept Study. World Neurosurg

2019;122:349–54.

[158] Cruz-Jentoft AJ, Bahat G, Bauer J, et al. Sarcopenia: revised European consensus on

definition and diagnosis. Age Ageing 2019;48:16–31.

[159] Song M-Y, Ruts E, Kim J, et al. Sarcopenia and increased adipose tissue infiltration of

muscle in elderly African American women. Am J Clin Nutr 2004;79:874–80.

[160] Moal B, Bronsard N, Raya JG, et al. Volume and fat infiltration of spino-pelvic

musculature in adults with spinal deformity. World J Orthop 2015;6:727–37.

[161] Hyun S-J, Kim YJ, Rhim S-C. Patients with proximal junctional kyphosis after stopping at

thoracolumbar junction have lower muscularity, fatty degeneration at the thoracolumbar

area. Spine J 2016;16:1095–101.

[162] Banno T, Arima H, Hasegawa T, et al. The Effect of Paravertebral Muscle on the

Maintenance of Upright Posture in Patients With Adult Spinal Deformity. Spine Deform

2019;7:125–31.

Page 51: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

U N I V E R S I T Y O F C O P E N H A G E N

F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S

51

Appendices

Study I

Moderate interrater and substantial intrarater reproducibility of the

Roussouly Classification system in patients with Adult Spinal Deformity

Spine Deformity 7 (2019) 312-318

https://doi.org/10.1016/j.jspd.2018.08.009

Page 52: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

Moderate Interrater and Substantial Intrarater Reproducibility of theRoussouly Classification System in Patients With Adult Spinal DeformityTanvir Johanning Bari, MDa,*, Dennis Winge Hallager, MD, PhDa, Niklas Tøndevold, MDa,

Ture Karbo, MDa, Lars Valentin Hansen, MDa, Benny Dahl, MD, PhD, DMScb,Martin Gehrchen, MD, PhDa

aSpine Unit, Department of Orthopedic Surgery, Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, DenmarkbDepartment of Orthopedics and Scoliosis Surgery, Texas Children’s Hospital and Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA

Received 23 December 2017; revised 5 May 2018; accepted 18 August 2018

Abstract

Study Design: Reproducibility study of a classification system.Objectives: To provide the inter- and intrarater reproducibility of the Roussouly Classification System in a single-center prospective cohortof patients referred for Adult Spinal Deformity.Summary of Background Data: The Roussouly Classification System was developed to describe the variation in sagittal spine shape innormal individuals. A recent study suggests that patients’ spine types could influence the outcome following spinal surgery. The utility of aclassification system depends largely on its reproducibility.Methods: Sixty-four consecutive patients were included in a blinded test-retest setting using digital radiographs. All ratings were per-formed by four spine surgeons with different levels of experience. There was a 14-day interval between the two reading sessions. Inter- andintrarater reproducibility was calculated using Fleiss Kappa and crude agreement percentages.Results: We found moderate interrater (k5 0.60) and substantial intrarater (k5 0.68) reproducibility. All 4 raters agreed on the Roussoulytype in 47% of the cases. The most experienced rater had significantly higher intrarater reliability compared to the least experienced rater (k5 0.57 vs 0.78). The two most experienced raters also had the highest crude agreement percentage (75%); however, they also had asignificant difference in distribution of spine types.Conclusion: The current study presents moderate interrater and substantial intrarater reliability of the Roussouly Classification System.These findings are acceptable and comparable to previous results of reproducibility for a classification system in patients with Adult SpinalDeformity. Additional studies are requested to validate these findings as well as to further investigate the impact of the classification systemon outcome following surgery.� 2018 Scoliosis Research Society. All rights reserved.

Keywords: Roussouly; Classification; Reproducibility of results; Observer variation; Spine; Adult; Scoliosis; Kyphosis; Sagittal alignment

Introduction

The variation in sagittal spinal shape among healthyyoung adults has been described by Roussouly et al. [1], whofound that the sagittal profile of the spine was related to the

orientation of the pelvis. As the sacral slope (SS) increases,so does the lower arc of lordosis as well as the globallordosis. Through this observation, four types of spinal typeswere observed, and recently, two additional types wereadded, resulting in the Roussouly Classification Systemconsisting of 6 types [2]. The five main types are illustratedin Figure 1, and all types are defined in Table 1. Type 1 andType 2 are characterized by having a low-grade SS and low-grade pelvic incidence (PI) and differ from each other by thenumber of lordotic vertebrae. Type 3 has a high-grade SSand high-grade PI whereas Type 4 is defined as the type withlargest SS in combination with a high grade PI. Type 3anteverted pelvis (Type 3AP) is identified by its high-grade

Author disclosures: TJB (none), DWH (none), NT (none), TK (none),

LVH (none), BD ( grants from K2M and Medtronic, outside the submitted

work), MG ( grants from K2M and Medtronic, outside the submitted work).

Ethics approval: The study was approved by the Danish Data Protec-

tion Agency and the Danish Patient Safety Authority.

*Corresponding author. Spine Unit, Department of Orthopedic Sur-

gery, Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, Denmark. Tel.:

þ45 5057 9448; fax: þ45 3545 2165.

E-mail address: [email protected] (T.J. Bari).

2212-134X/$ - see front matter � 2018 Scoliosis Research Society. All rights reserved.

https://doi.org/10.1016/j.jspd.2018.08.009

Spine Deformity 7 (2019) 312e318www.spine-deformity.org

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 53: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

SS in contrast to a low-grade PI. These spine types have beensuggested to influence the outcome of spinal surgery [3]. Toour knowledge, however, no previous study has investigatedthe reproducibility of this classification. Because the classi-fication system might be useful as a tool for the evaluation ofand individualizing the treatment for ASD, a reproducibilitystudy is necessary to test its usefulness.

The aim of the current study was to provide estimatesof the inter- and intrarater reproducibility of the Rous-souly Classification System in a population of patientswith ASD.

Material and Methods

We performed a reproducibility study on a single-center cohort of patients with ASD. Approval from thelocal data protection agency and patient safety authoritywas obtained (jr.nr. 2012-58-0004 and jr.nr: 3-3013-1980/1). All patients referred for evaluation of ASD at ourtertiary institution for spine surgery between August 1,2013, and March 30, 2014, were assessed for inclusion.

The inclusion criteria were as follows: referral for ASD,age >18 years, and sufficient radiographs. Sufficient ra-diographs were defined as follows: standing images inboth the lateral and anteroposterior plane, including bothfemoral heads and visible vertebrae from C7eS1 on thelateral images. Patients were standing with ‘‘fists-on-clavicles’’ [4]. Patients were excluded if there was ahistory of previous instrumentation, spinal fracture, orneuromuscular disease. Radiographs were acquired usingthe Digital Imaging and Communications in Medicineformat from the Radiology Information System and Pic-ture Archiving and Communications System IMPAX 6�(Agfa Healthcare NV, Mortsel, Belgium). A total of 8 setsof each patient’s radiographs were then transferred to theonline data management system KEOPS� (SMAIO,Lyon, France) as unmarked digital films. Each set of ra-diographs was coded with a unique identification key,thus blinding the raters from further identification of theindividual patients. Four raters performed all ratings.They consisted of two staff specialists with one and twoyears of experience in spinal surgery (Rater 1 and 2), andtwo senior spine surgeons with more than 20 years ofexperience, respectively (Rater 3 and 4). Each rateridentified specific points on each radiograph includingthe center of femoral heads, endplate of S1, and vertebralbodies from C7eL5. A geometric computerized modelwas then generated, and the software classified eachradiograph based on the measurements according to theRoussouly Classification System [2]. For training pur-poses, each rater was handed a short guide on how to usethe system and assessed 10 unique sets of radiographs.These radiographs were not included in the subsequent

Fig. 1. The Roussouly Classification System. The figure illustrates the five main spine types. To the left are the types with low-grade sacral slope (SS) and

low-grade pelvic incidence (PI). The types to the right are defined as having high-grade SS and high-grade PI. Type 3 Anteverted Pelvis (3AP) is seen in

between with a high-grade SS and a low-grade PI. With permission from Mr. Philippe Roussouly (CEO SMAIO, Lyon, France).

Table 1

Definition of Roussouly classification types.

Roussouly type First criterion Second criterion

Type 1 SS !35 <3 lordotic vertebrates

Type 2 SS !35 O3 lordotic vertebrates

Type 3 35 <SS !45

Type 3 AP 35 <SS !45 PI !50 � or PT !5 �

Type 4 SS >45

Kyphosis No lumbar lordosis No lumbar lordosis

AP, anteverted pelvis; PI, pelvic incidence; PT, pelvic tilt; SS, sacral

slope.

313T.J. Bari et al. / Spine Deformity 7 (2019) 312e318

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 54: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

readings. There was a 14-day delay between readings,and the order of cases was randomized. Results are pre-sented in accordance to the Guidelines for ReportingReliability and Agreement Studies [5].

Statistics

All statistical analyses were performed using the languageand environment for statistical computing, R, version 3.3.3(R Development Core Team, 2011, Vienna, Austria). Thesoftware package kappaSize, version 1.1, was used for powercalculation before the study. With the number of raters set to4, anticipated Kappa set to 0.80, 0.60e0.99 as two-sided95% confidence intervals, and significance level (a) 50.05, a minimum of 40 cases were needed for the study. Thepackage irr, version 0.84, was used for calculations of inter-and intrarater reliability. Distribution of data was assessed byhistograms, and descriptive data were reported as pro-portions, mean with standard deviation (SD), or median withinterquartile range (IQR). Fleiss Kappa coefficients werecalculated for categorical variables as the number of raterswere more than two [6-8]. Cohen Kappa was used whencomparing any 2 raters [7-9]. In accordance with Landis andKoch [10], the results of reliability (k) were classified intoslight (<0.20), fair (0.21e0.40), moderate (0.41e0.60),substantial (0.61e0.80), and almost perfect agreement(>0.81). Individual Cohen intrarater k coefficients for eachrater were compared using a two-tailed Z test. The Bhapkartest for marginal homogeneity was used to test for differ-ences in distribution between any two raters. Results wereclassified as significant at p !.05. All confidence intervalsare provided as 95% confidence intervals.

Results

A total of 803 patients were screened, of which 253 werereferred for ASD and had long-standing radiographs. Inaccordance to the described criteria, a further 55 patientswere excluded because of previous spinal fractures orneuromuscular disease. In 82 cases, both femoral heads orthe C7 vertebrate were not included or obstructed fromview because of a brace. An additional 36 doubles wereremoved and 6 were excluded because of inferior imagequality. This left a total of 74 patients for analysis. Of these,10 cases were used for training purposes, leaving 64 for

Table 2

Comparison of distribution of Roussouly types.

Laouissat et al. [12], % Current study, %

Roussouly Classification System

Type 1 12 13

Type 2 22 35

Type 3 16 19

Type 3 AP 30 10

Type 4 13 21

Kyphosis d 2

AP, anteverted pelvis.

Table 3

Roussouly type inter- and intrarater reliability.

Fleiss kappa

coefficients (95%

Confidence intervals)

Interpretation

of reliability

coefficient*

Percentage

of agreementy

Interrater

Reading 1 0.59 (0.49e0.70) Moderate 47

Reading 2 0.60 (0.50e0.69) Moderate 47

Both

readings

combined

0.60 (0.53e0.66) Moderate 47

Intrarater

Rater 1 0.57 (0.41e0.72) Moderate 67

Rater 2 0.68 (0.54e0.82) Substantial 75

Rater 3 0.78 (0.65e0.90) Substantial 83

Rater 4 0.70 (0.56e0.85) Substantial 78

All raters

combined

0.68 (0.61e0.75) Substantial 76

* <0.20: slight; 0.21e0.40: fair; 0.41e0.60: moderate; 0.61e0.80;

substantial; >0.81: almost perfect.y Percentage of cases where all 4 raters assigned the same Roussouly

Type.

Table 4

Interrater reliability and agreement comparing any two raters using Cohen

kappa (95% confidence interval) and crude agreement percentages.

Rater 1 Rater 2 Rater 3 Rater 4

Rater 1 d 0.57*

(0.47e0.68y)0.58* (0.48e0.69y) 0.55* (0.44e0.66y)

67.2%

(65.6e68.8z)68.0% (67.2e68.8z) 66.4% (64.1e68.8z)

Rater 2 d 0.62* (0.51e0.72y) 0.58* (0.48e0.69y)70.3% (67.2e73.4z) 68.0% (62.5e73.4z)

Rater 3 d 0.67* (0.57e0.77y)75% (73.4e76.6z)

* Cohen kappa.y 95% confidence interval.z Range for reading 1 and 2.

Fig. 2. Venn diagram of agreement as a total of both readings. Intersec-

tions specify the number of cases where raters (1, 2, 3, and 4) agreed on

the spine type.

314 T.J. Bari et al. / Spine Deformity 7 (2019) 312e318

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 55: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

final ratings. There were 44 (68.8%) females and 20(31.2%) males, with a median age of 46 years (IQR 523e63 years). Within one year of the referral date, 15 pa-tients (23.4%) underwent a surgical procedure. The meanthoracic kyphosis (T1eT12) and lumbar lordosis (L1eS1)were 57.2 (�20.0) and 48.7 (�23.6). Mean spinopelvicparameters for SS, pelvic incidence (PI) and pelvic tilt (PT)

were 35.1 (�14.6), 51.6 (�16.5), and 16.6 (�13.5),respectively. PI-LL mismatch (PI-LL >10� or PI-LL<�10�) was seen in 42 patients (68.8%) with a mean PI-LL of �1.5 (�24.4). The population had a mediansagittal vertical axis (SVA) of þ1.5 cm (IQR: �1.1 to 5.8)and 19 (29.7%) were sagittally unbalanced, with an SVA ofmore than þ5 cm [11]. The most frequently occurring

Fig. 3. Distribution of Roussouly type by rater and reading. There was a significant difference in distribution of Roussouly types between Raters 3 and 4dp

5 .034 in reading 1 and p 5 .027 in reading 2.

Fig. 4. Fleiss kappa for each Roussouly type showed similar reliability except for ‘‘Kyphosis,’’ for which there was a large difference between inter- and

intrarater reliability.

315T.J. Bari et al. / Spine Deformity 7 (2019) 312e318

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 56: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

Roussouly type across all 8 ratings was Type 2. The leastfrequent type was Kyphosis, only specified three timesdallin the same patient (Table 2).

Reproducibility

The mean interrater reliability was moderate (k 5 0.60)and on both readings all 4 raters agreed on the sameRoussoulytype in 47% cases. Mean intrarater reliability was substantial(k5 0.68)dranging fromk5 0.57 tok5 0.78 between raters(Table 3). Raters averagely assigned the same type on bothreadings in 76% of the cases (range: 67% to 83%).

Rater performance

We observed a significantly higher intrarater reliabilitycoefficient for one of the two more experienced raters(Rater 3) compared to the least experienced rater (Rater 1).No other significant differences in intrarater reliability werefound for one rater compared to the others. Comparinginterrater reliability between any combination of two ratersshowed the highest coefficient for the most experiencedraters (Rater 3 and 4), who averagely agreed on 75% (k 50.67) of cases (Table 4 and Fig. 2). The difference was,however, not significantly different compared to the tworaters (Rater 1 and 2) with the lowest combined interraterreliability (k 5 0.57, p 5 .09).

Rater bias

A significant difference in distribution of Roussoulytypes was found between the classifications by Raters 3 and4 on both readings (p 5 .034 and p 5 .027, respectively)(Fig. 3). No other significant distribution differences wereobserved between any two raters.

Reliability among classification categories

The most reliable classification categories were Types 1,2, and 4, which showed substantial interrater reliability.Type 3 and 3AP showed moderate reliability, whereas‘‘Kyphosis’’ was only fairly reliable. Intrarater reliabilityfollowed the same pattern except for ‘‘Kyphosis,’’ forwhich substantial reliability was observed (Fig. 4).

Discussion

The current study found moderate inter- and substantialintrarater reproducibility of the Roussouly ClassificationSystem applied to ASD patients.

We observed a spread in intrarater reliability across the 4raters ranging from 0.57 to 0.78 and a trends towards cor-relation to the experience of the surgeon. Furthermore, thetwo most experienced raters (Raters 3 and 4) had thehighest rate of agreement and interrater reliability(Table 3). However, we also observed a significant differ-ence in distribution of Roussouly types between the tworaters. This difference in distributions could also explain

the higher reliability coefficients beyond rater experi-ence [13].

Comparison to previous studies

Comparing the current study with previous reproduc-ibility studies on ASD classification systems, substantial andalmost perfect intra- and interrater reliability was reported bySchwab et al. [14] for the Scoliosis ResearchSocietyeSchwab Adult Spinal Deformity classification(SRS-Schwab). However, readings were performed in only21 preselected cases and on premarked radiographs. Liu et al.[15] had similar results presenting substantial and almostperfect inter- and intrarater reliability without reporting theselection process for the 102 included cases. Hallager et al.[16] also performed a reproducibility study of the SRS-Schwab classification system. They founddin contrary tothe previously mentioned studiesdmoderate interrater andsubstantial intrarater reliability, comparable to our results.The latter study presented a consecutive selection process.Additionally, radiographs were not marked in advance. Weargue that this method of performing a reproducibility studyis the most desirable, and is comparable to the current study.However, the current study used a semiautomatic system toclassify the spine type in contrast to the previouslymentioned studies. The kappa statistics is widely used inreproducibility studies, but certain shortcomings must beconsidered. The statistics is highly sensitive to marginalhomogeneity and trait prevalence of distribution in types[12,13]. The more even the distribution of categories, thelarger the Kappa coefficient. However contradictory, theKappa coefficient also becomes larger the more different thedistribution between raters. Therefore, the selection processin reproducibility studies is crucial to the overall results ofKappa, and selection bias could easily produce artificiallybetter results, as may be the case for the results of the twoinitially mentioned studies [14,15]. Hallager et al. [16] per-formed their study under similar conditions as the presentstudy, and this could explain the resemblance in results.Keeping the traits of Kappa in mind, the Roussouly typeswere clearly unevenly distributed in our population aswell asin the original studies of the classifications [1,2]. Further-more, the distribution between raters was not significantlydifferent except for between Raters 3 and 4. Choosing amoreevenly distributed population probably would have contrib-uted to a higher degree of reproducibility. However, thedistribution of Roussouly types both in the normal popula-tion and in patients with ASD is most probably not evenlydistributed. Therefore, we suggest that future studies alsoconsist of randomly or consecutively selected cases and thatthe selection process is clearly presented.

Distribution

Type 2 was clearly the predominant type across bothreadings, including about a third of the patients. Laouissat

316 T.J. Bari et al. / Spine Deformity 7 (2019) 312e318

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 57: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

et al. [2] described the distribution in 296 healthy adultswith Type 3 AP as the predominant type (Table 2). Weattribute the somewhat different distribution to the differ-ence between healthy adults and ASD patients, because oneof the compensating mechanisms to sagittal imbalance isretroversion of the pelvis, which means increasing thepelvic tilt, and as a result of the geometric properties of thepelvic parameters, thus reducing the SS [17]. Also the au-thors theorized that patients with larger PI and SS should beable to withstand changes in alignment better than thosewith smaller values [18]. Patients with larger PI and SSwould predominantly be categorized as Types 3 and 4, andpatients with small PI and SS would mainly be categorizedas Types 1 and 2. Following the earlier-mentioned theory,the latter would be less able to compensate changes inspinal alignment and therefore to a wider extentseek treatment.

The current study presents a spread in reliability co-efficients when comparing each Roussouly Type (Fig. 4).Kyphosis, which had the lowest interrater kappa was alsothe spine type least represented in the population as it wasonly classified on three occasions for one case (Table 2).

Strengths and limitations

The strength of the current study is the consecutive one-center design minimizing selection bias. Furthermore, thenature of blinding the raters to others’ and previous ratingsminimized the risk of estimation bias. There was, however,a high percentage of excluded cases due to a high rate ofinsufficient radiographs. Since the period of inclusion, thisrate has declined considerably, leading to a better utility ofthe current classification system at our department.Furthermore, the current study used raters with differentlevels of experience, leading to more generalizable results.

This study was performed in a relatively small samplesize, which might contribute to sampling bias. Neverthe-less, power estimation was performed before the study andthe presented selection process of a consecutive cohortcontributes to minimize the risk of this type of bias.

Furthermore, all ratings in the current study took placein unmarked digital radiographs using a semi-automaticsystem. Although the reproducibility would be greater inusing premarked radiographs, we argue that the utility ismore favorable as premarked images would not be used in aclinical setting. The use of a semiautomatic system haspreviously been investigated in patients with ASD [19] aswell as in patients with a wider range of spinal pathologies[20]. In contrast, using a fully manual method of classifyingeach radiograph could negatively affect the reproducibilityas an extra layer of potential error would be introduced. Inthe setting of the current study, the spine type was given bythe software once the reader had made his measurements.This method of measurement was chosen as we argue thatsoftware systems can easily be set for these calculationsinstead of being performed by the surgeon.

The primary objective of a classification system is toultimately improve the outcome for individual patients.Others have hypothesized that patients with differentRoussouly types may have different outcomes followingspinal surgery [3]. There is, therefore, still need for furtherstudies on how patients’ spine type may influence theoutcome following surgery. With the present study, we hopeto aid future investigations on this matter.

Conclusion

In this study of the Roussouly Classification System, wepresent moderate interrater and substantial intraraterreproducibility in a single-center cohort of 64 ASD pa-tients. We observed a trend for higher levels of reproduc-ibility among more experienced raters. We conclude thereproducibility of this system to be acceptable and com-parable to other ASD classification systems. We suggestthat future studies aim to validate these findings as well asaddress the possible influence of the system on patientoutcome following surgery.

References

[1] Roussouly P, Gollogly S, Berthonnaud E, et al. Classification of the

normal variation in the sagittal alignment of the human lumbar spine

and pelvis in the standing position. Spine (Phila Pa 1976) 2005;30:

346e53.

[2] Laouissat F, Sebaaly A, Gehrchen M, et al. Classification of normal

sagittal spine alignment: refounding the Roussouly classification. Eur

Spine J 2018;27:2002e11.[3] Laouissat F, Scemama C, Del�ecrin J. Does the type of sagittal spinal

shape influence the clinical results of lumbar disc arthroplasty? Or-

thop Traumatol Surg Res 2016;102:765e8.[4] Horton WC, Brown CW, Bridwell KH, et al. Is there an optimal pa-

tient stance for obtaining a lateral 36" radiograph? A critical com-

parison of three techniques. Spine (Phila Pa 1976) 2005;30:

427e33.

[5] Kottner J, Audige L, Brorson S, et al. Guidelines for Reporting Reli-

ability and Agreement Studies (GRRAS) were proposed. Int J Nurs

Stud 2011;48:661e71.

[6] Fleiss JL. Measuring nominal scale agreement among many raters.

Psychol Bull 1971;76:378e82.

[7] de Vet HCW, Terwee CB, Knol DL, et al. When to use agreement

versus reliability measures. J Clin Epidemiol 2006;59:1033e9.

[8] Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reli-

ability: Key concepts, approaches, and applications. Res Soc Adm

Pharm 2013;9:330e8.

[9] Cohen J. Weighted kappa: nominal scale agreement with provision

for scaled disagreement or partial credit. Psychol Bull 1968;70:

213e20.

[10] Landis JR, Koch GG. The measurement of observer agreement for

categorical data. Biometrics 1977;33:159e74.[11] Schwab F, Patel A, Ungar B, et al. Adult spinal deformity postoper-

ative standing imbalance. Spine (Phila Pa 1976) 2010;35:2224e31.

[12] Gwet K. Inter-rater reliability: dependency on trait prevalence and

marginal homogeneity. Stat Methods Inter-Reliabil Assess 2002;2:9.

[13] Sim J, Wright CC. The kappa statistic in reliability studies: use, inter-

pretation, and sample size requirements. Phys Ther 2005;85:257e68.

[14] Schwab F, Ungar B, Blondel B, et al. Scoliosis research societ-

ydSchwab adult spinal deformity classification. Spine (Phila Pa

1976) 2012;37:1077e82.

317T.J. Bari et al. / Spine Deformity 7 (2019) 312e318

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 58: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

[15] Liu Y, Liu Z, Zhu F, et al. Validation and reliability analysis of the

new SRS-Schwab classification for adult spinal deformity. Spine

(Phila Pa 1976) 2013;38:902e8.[16] Nielsen DH, Gehrchen M, Hansen LV, et al. Inter- and intra-rater

agreement in assessment of adult spinal deformity using the Scoli-

osis Research Society-Schwab classification. Spine Deform 2014;2:

40e7.

[17] Ames CP, Smith JS, Scheer JK, et al. Impact of spinopelvic alignment

on decision making in deformity surgery in adults: a review.

J Neurosurg Spine 2012;16:547e64.

[18] Barrey C, Roussouly P, Perrin G, et al. Sagittal balance disorders in

severe degenerative spine. Can we identify the compensatory mech-

anisms? Eur Spine J 2011;20(suppl 5):626e33.[19] Aubin C-E, Bellefleur C, Joncas J, et al. Reliability and accuracy

analysis of a new semiautomatic radiographic measurement software

in adult scoliosis. Spine (Phila Pa 1976) 2011;36:E780e90.

[20] Maillot C, Ferrero E, Fort D, et al. Reproducibility and repeatability

of a new computerized software for sagittal spinopelvic and scoliosis

curvature radiologic measurements: Keops?? Eur Spine J 2015;24:

1574e81.

318 T.J. Bari et al. / Spine Deformity 7 (2019) 312e318

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 59: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

U N I V E R S I T Y O F C O P E N H A G E N

F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S

59

Study II

Spinopelvic parameters depending on the angulation of the sacral

endplate are less reproducible than other Spinopelvic parameters in

Adult Spinal Deformity patients

Spine Deformity 7 (2019) pp. 771-778

https://doi.org/10.1016/j.jspd.2018.12.002

Page 60: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

Spinopelvic Parameters Depending on the Angulation of the Sacral EndPlate Are Less Reproducible Than Other Spinopelvic Parameters in Adult

Spinal Deformity PatientsTanvir Johanning Bari, MDa,*, Dennis Winge Hallager, MD, PhDa, Niklas Tøndevold, MDa,

Ture Karbo, MDa, Lars Valentin Hansen, MDa, Benny Dahl, MD, PhD, DMScb,Martin Gehrchen, MD, PhDa

aSpine Unit, Department of Orthopedic Surgery, Rigshospitalet, University Hospital of Copenhagen, Copenhagen, DenmarkbDepartment of Orthopedics and Scoliosis Surgery, Texas Children’s Hospital and Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA

Received 30 July 2018; revised 27 November 2018; accepted 3 December 2018

Abstract

Study Design: Reproducibility study.Objectives: To report the agreement and reliability for commonly used sagittal plane measurements.Summary of Background Data: Spinopelvic parameters and sagittal vertical axis (SVA) are commonly used parameters for preoperativeplanning and postoperative evaluation of patients with adult spinal deformity (ASD). Previous reproducibility studies have focused ondescribing the reliability using intraclass correlation coefficients (ICCs), thus quantifying the methods’ ability to distinguish betweenindividuals. To our knowledge, no previous study in patients with ASD has reported the measurement error in terms of limits of agreement.The current study aimed to report the agreement and reliability for measurements of pelvic incidence (PI), pelvic tilt (PT), sacral slope (SS),and SVA in ASD patients.Methods: In a consecutive, one-center cohort of 64 patients referred for ASD evaluation, a blinded test-retest study was performed.Reliability was assessed using ICCs, whereas 95% limits of agreement (LOAs) were used to quantify agreement.Results: We found ‘‘excellent’’ (ICC O 0.9) results in all analyses of reliability except for interrater PI, which was classified as ‘‘good’’(ICC 5 0.89). However, considerable interrater measurement error was observed for parameters depending on the angulation of the sacralend plate (95% LOA of �11� and �14� for SS and PI, respectively) compared with �5� for PT and �7 mm for SVA, which depends on thelocation of the sacral end plate. Intrarater agreement was only slightly better.Conclusion: These are to our knowledge the first estimates of measurement error for sagittal spinopelvic parameters in ASD patients.Despite near excellent ICCs, we found considerable measurement error for parameters depending on the angulation rather than the locationof the sacral end plate.Level of Evidence: Level II.� 2018 Scoliosis Research Society. All rights reserved.

Keywords: Spine; Adult; Reproducibility; Reliability; Agreement

Introduction

Radiographic measurements of the spine are widely usedto assess patients with adult spinal deformity (ASD) and

used for preoperative planning as well as a measure ofoutcome following surgical treatment. It has been estab-lished that sagittal plane balance is of more importancethan the coronal plane balance [1,2], and the most

Author disclosures: TJB (none), DWH (none), NT (none), TK (none),

LVH (none), BD (institutional grants from Medtronic, United States and

K2M, outside the submitted work), MG (institutional grants from Med-

tronic and K2M, outside the submitted work).

Funding sources: none.

IRB approval: 3-3013-2760/1.

*Corresponding author. Spine Unit, Department of Orthopedic Sur-

gery, Rigshospitalet, University Hospital of Copenhagen, Denmark Bleg-

damsvej 9, 2100 Copenhagen, Denmark. Tel.: þ45 5057 9448; fax: þ45

3545 2165.

E-mail address: [email protected] (T.J. Bari).

2212-134X/$ - see front matter � 2018 Scoliosis Research Society. All rights reserved.

https://doi.org/10.1016/j.jspd.2018.12.002

Spine Deformity 7 (2019) 771e778www.spine-deformity.org

Page 61: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

commonly used parameters are the sagittal vertical axis(SVA) and the spinopelvic parameters: sacral slope (SS),pelvic tilt (PT), and pelvic incidence (PI) (Figure 1).Despite the reliance on these radiographic measurements,the inter- and intrarater agreement has not yet been estab-lished in ASD patients.

The terms agreement and reliability are often inter-changeably used. Agreement in terms of standard error ofmeasurement (SEM) and limits of agreement (LOA) pro-vide an absolute index of measurement accuracy [3] anddescribe the extent of disagreement between measurementsperformed on the same subject. The 95% LOA describe thedifference in measurements exceeded in 5% of pairs ofmeasurements [4] and are, therefore, important thresholdsto differentiate between actual change in repeated mea-surements from measurement error [5]. This is in contrastto reliability measures such as the widely used intraclasscorrelation coefficient (ICC). ICC represents the ratio ofwithin-subject variability and between-subject variability[3,6]. Consequently, reliability is used to evaluate an in-strument’s ability to distinguish between subjects [7-10]and not of measurement accuracy. As an umbrella term,reproducibility is used to describe both agreement andreliability [8].

Measurements on digital radiographs have previouslybeen compared with traditional manual measurements withexcellent reliability [11]. Although no previous study has,to our knowledge, reported the agreement for individualsagittal plane parameters in patients with ASD. Theobjective of this study was to provide such estimates ofmeasurement error for these parameters. The cohort ofpatients has been described in a previous study regardingthe Roussouly Classification System currently under reviewfor publication. The aim of the current study was to reportthe agreement and reliability for measurements of spino-pelvic parameters in ASD patients.

Materials and Methods

This study was performed as a test-retest study in aconsecutive single-center cohort. Relevant approval wasobtained from the national patient safety authority (ref. no.3-3013-2760/1) and local data protection agency (ref. no.RH-2017-135 I-suite no. 05498). All patients referred to theauthors’ tertiary outpatient clinic for evaluation of ASDduring a 6-month period from August 1, 2013, wereconsidered for inclusion. Only adults (>18 years of age)were includeddimages were taken standing with ‘‘fist-on-clavicles,’’ including both femoral heads and with all ver-tebrates from C7 to S1 visible. Patients with previous spinalinstrumentation, fracture, or neuromuscular disease andpatients with inadequate images were excluded. A total of803 patients with long standing radiographs were screenedfor inclusion, of which 253 were adults with no prior his-tory of spinal instrumentation. A further 178 patients metexclusion criteria because of previous fracture or neuro-logic disease (n 5 55), patient already included (n 5 36),or images without C7 or both femoral heads (n 5 88),leaving 74 patients for inclusion. Ten cases were used fortraining purposes and thus 64 cases were available for finalanalysis. Demographics are shown in Table 1. Eight sets ofeach patient’s radiographs were transferred to the onlinedata image management system KEOPS (SMAIO, Lyon,France). All radiographs were presented to the raters asunmarked digital films, coded with a unique identificationkey. The raters were four surgeons. Rater 1: staff specialistwith one year experience in spinal surgery. Rater 2: staffspecialist with two years of experience. Rater 3 and 4:senior spine surgeons with more than 20 years of experi-ence each. For each reading session, all four raters wereasked to identify certain key points on each radiograph.These included both femoral heads, the end plate of S1, aswell as each vertebral body from C7 to L5; following thesemarkings, the system automatically detected the locationand shape of vertebral bodies and the raters adjusted theseidentifications when appropriate. A computerized modelthen calculated the SVA and spinopelvic parameters.Before the first reading session, 10 cases were rated for

Fig. 1. Graphical description of calculating the three spinopelvic

parameters.

Table 1

Demographics (N 5 64 cases).

Age, y 44.3 (�20.9)

Sex (male) 44 (68.8%)

Pelvic incidence* 51.3 (�14.5)

Pelvic tilt* 16.5 (�13.1)

Sacral slope* 34.8 (�12.5)

SVA* 7.0 [e8.2, 23.9]

SVA !5.0 cm* 465 (90.6%)

SVA >5.0 cm* 47 (9.4%)

SVA, sagittal vertical axis.

Data are mean (standard deviation), median [interquartile range], or

number (%).* Radiographic parameters are calculated based on all 512 entries (64

cases rated twice by 4 raters).

772 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778

Page 62: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

training purposes. These cases were not included in thefinal analysis. Finally, the cases were presented in scram-bled order between reading sessions. The raters were givena 14-day deadline to perform each reading with a 14-daydelay between the two readings. All four raters were blin-ded from the results of the ratings as well as patient details.

Statistics

The language and environment for statistical computing,R version 3.4.3, was used for all statistical analyses (RDevelopment Core Team, 2011, Vienna, Austria). Distri-bution of data were assessed using histograms and reportedas proportions (%), means with standard deviations (SDs),or medians with interquartile ranges (IQRs). ICCs wereused to describe reliability for numeric variables using thepackage ‘‘irr’’ version 0.84. Two-way random effectsmodel, single measure, absolute agreement (ICCA,1) wasused for interrater reliability and two-way mixed effects,single measure, absolute agreement (ICCA,1) for intraraterreliability [12-14]. ICCs were classified as poor (<0.50),moderate (0.50-0.75), good (0.76-0.90), and excellent(>0.90) [8]. When describing agreement, the SEM and95% LOA were calculated. A linear mixed effects modelwith the software package ‘‘lme4’’ version 1.1-15 was usedto calculate the variances. SEM was calculated as SEM

agreement [7,15] as described by Popovi�c et al. [6]. As thenumber of raters in the current study was greater than two,the 95% LOA was calculated using the variances derivedfrom the aforementioned model as described by Bland andAltman [4] and further exemplified by Hopkins et al. [16]. pvalues <.05 were considered significant. A biostatisticianwas consulted before the study, and aided in the perfor-mance of all statistical analysis.

Results

Reliability

ICC was classified as ‘‘excellent’’ (ICC O0.90) forinter- and intrarater reliability of PI, PT, SS, and SVAexcept for interrater reliability of PI, which was classifiedas ‘‘good’’ (ICC 0.89, 95% CI 0.85-0.92) (Tables 2 and 3).

Agreement

The least measurement error and thus the highest degreeof agreement was observed for PT with 95% LOA of �4.9�

and �4.4� compared with SS with 95% LOA of �10.9� and�9.3� and PI with 95% LOA of �13.4� and �12.0� forinter- and intrarater agreement, respectively. In addition,95% LOA for SVA was �6.9 mm and �5.8 mm,

Table 2

Interrater reliability of spinopelvic parameters.

Reading 1 Reading 2 Both readings combined

Pelvic incidence 0.87 (0.80e0.91) 0.91 (0.87e0.94) 0.89 (0.85e0.92)

Pelvic tilt 0.98 (0.97e0.99) 0.98 (0.97e0.99) 0.98 (0.98e0.99)

Sacral slope 0.89 (0.84e0.93) 0.91 (0.87e0.94) 0.90 (0.87e0.93)SVA 0.99 (0.99e1.00) 0.99 (0.99e1.00) 0.99 (0.99e0.99)

SVA, sagittal vertical axis.

Data are intraclass correlation coefficient (95% confidence interval).

Table 3

Intrarater reliability of spinopelvic parameters.

Rater 1 Rater 2 Rater 3 Rater 4 All raters

Pelvic incidence 0.83 (0.73e0.89) 0.94 (0.85e0.97) 0.94 (0.90e0.96) 0.95 (0.90e0.96) 0.91 (0.89e0.93)

Pelvic tilt 0.96 (0.94e0.98) 0.99 (0.98e1.00) 0.99 (0.99e1.00) 0.99 (0.99e1.00) 0.99 (0.98e0.99)

Sacral slope 0.88 (0.80e0.92) 0.93 (0.86e0.96) 0.95 (0.92e0.97) 0.96 (0.93e0.97) 0.93 (0.93e0.94)SVA 0.99 (0.99e1.00) 1.00 (0.99e1.00) 1.00 (0.99e1.00) 1.00 (0.99e1.00) 1.00 (0.99e1.00)

SVA, sagittal vertical axis.

Data are intraclass correlation coefficient (95% confidence interval).

Table 4

Agreement of spinopelvic parameters.

Interrater agreement Intrarater agreement

SEM LOA SEM LOA

Pelvic incidence 4.9 � �13.4 � 4.3 � �12.0 �

Pelvic tilt 1.8 � �4.9 � 1.6 � �4.4 �

Sacral slope 3.9 � �10.9 � 3.3 � �9.3 �

SVA (mm) 2.5 �6.9 2.1 �5.8

LOA, limits of agreement; SEM, standard error of measurement; SVA, sagittal vertical axis.

Data are degrees or distance (mm).

773T.J. Bari et al. / Spine Deformity 7 (2019) 771e778

Page 63: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

Fig. 2. Bland-Altman plots of intra-rater agreement. Each dot illustrates the difference between any two measurements performed by the same rater. LOA

indicates the 95% limits of intra-rater agreement derived from the linear mixed effects model.

Fig. 3. Bland-Altman plots of inter-rater agreement across both readings. Each dot illustrates the difference between each measurement and the mean of all 4

raters. LOA indicates the 95% limits of inter-rater agreement derived from the linear mixed effects model.

774 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778

Page 64: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

respectively (Table 4). Figure 2 shows Bland-Altman plots(BA-plots) of the intrarater agreements. Figure 3 showsmodified Bland-Altman plots of the interrater agreementsplotting the difference between each measurement and themean of all four raters according to Jones et al. [17].Finally, interrater agreement was further analyzed bycomparing each pair of raters (Table 5).

Discussion

The current results showed excellent reliability of PT,SS, and SVA, while the reliability of PI was consideredgood. These results are in accordance with previous studies[11,18-20]. The interrater 95% LOA was �4.9� for PT,�10.9� for SS, �13.4� for PI, and �6.9 mm for SVA,whereas intrarater LOAs were slightly better. Thus, 19 of20 pairs of measurements on a single radiograph werecloser than these limits, which we argue has to be exceededfor any difference in serial radiographs to be consideredmore than measurement error.

We found profound differences in overall measurementerror for the three angle parameters (PT vs. PI and SS). Theleast measurement error was found for PT, which isdependent on the location of the center of the sacral endplate, whereas SS and PI are dependent on the sacral endplate slope (Figure 1). Thus, minor changes in the measuredslope of sacral end plate can vastly influence the magnitudeof PI and SS compared with PT. PI and SS yielded thelargest absolute differences between any two measurementson the same radiograph (Figure 2). This pattern was alsoreported by Pernaa et al. [21] in patients following total hiparthroplasty. The vast majority of the differences in ourstudy were due to disagreement in selecting the sacral endplate. When two raters had selected the end plate differ-ently, up to 30� in disagreement was observed for PI and

SS, with only a minor difference in PT. This largely ex-plains the discrepancy in LOA between the spinopelvicparameters and thus highlights a seemingly unrecognizedchallenge in ASD radiographic measurements as the sacralend plate often is difficult to identify because of degener-ative changes, image quality, and concomitant pelvicobliquity. Figures 4 and 5 portrays the difference in diffi-culty of accurately identifying key sagittal radiographiclandmarks in patients with and without pelvic obliquity.Furthermore, no pre-markings were undertaken in the cur-rent study, resulting in measurement error arguably closerto everyday practice compared with studies using pre-marked images.

Interrater LOAwas also calculated for each pair of raters(Table 5). There was tendency toward better agreement forthe more experienced raters (Rater 3 and 4) compared withthe two less experienced. However, this difference was notfurther analyzed for statistical significance and should beinterpreted with caution.

Other studies of agreement of sagittal spinopelvic pa-rameters include the study by Aubin et al. [19], carried outusing 6 cases of realistic spine prototypes from patientswith ASD. A precise coordinate measuring machine wasused to calculate a gold standard compared with subsequentmeasurements of radiographs. Agreement was quantifiedthrough SD of mean differences between ratings. However,Bland and Altman suggest that an ANOVA is performedwhen comparing more than two measurements as the SD ofdifference will be reported smaller than the actual error as aresult of removal of part of the error [4,22]. Lafage et al.[23] validated a new computer-assisted tool to measurespinal parameters. However, all radiographs were pre-marked, thus removing a lot of the measurement error.Although several reliability studies on spinopelvic param-eters have been published, none explicitly concern ASD

Table 5

Interrater agreement for each pair of raters.

Rater 2 Rater 3 Rater 4

Pelvic incidence (overall interrater LOA: �13.4)

Rater 1 �15.2 � �14.2 � �13.6 �

Rater 2 d �12.8 � �12.8 �

Rater 3 d d �12.0 �

Pelvic tilt (overall interrater LOA: �4.9)

Rater 1 �5.4 � �5.4 � �5.3 �

Rater 2 d �5.0 � �3.5 �

Rater 3 d d �4.5 �

Sacral slope (overall interrater LOA: �10.9)

Rater 1 �12.1 � �11.3 � �10.6 �

Rater 2 �10.5 � �11.7 �

Rater 3 �9.0 �

Sagittal vertical axis (overall interrater LOA: �6.9)

Rater 1 �6.2 mm �6.7 mm �8.3 mm

Rater 2 �6.1 mm �7.8 mm

Rater 3 �6.1 mm

LOA, limits of agreement.

Data represent interrater LOA when comparing pairs of raters.

775T.J. Bari et al. / Spine Deformity 7 (2019) 771e778

Page 65: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

patients and adhere to measurement error calculationssuggested by Bland and Altman [3,4,6,8,15,17,22]. Pernaaet al. [21] performed a reproducibility study of radiographicmeasurements in patients following THA and used BAplots to describe agreement. The 95% LOA values are notgiven but can be read from the BA plots or calculated fromthe SDs of mean differences between readings (95% LOAfor PT 5 �5.9� and �4.2�, for SS 5 �8.4� and �7.6�, andfor PI 5 �10.0� and 7.2� for inter- and intrarater agree-ment, respectively). Kim et al. [18] presented the results ofa reproducibility study of the same four radiographic pa-rameters as the current study in patients with variousdegenerative lumbar spinal disorders. Results were givenonly as intrarater 95% LOA for manual and computer-assisted measurements (PT: �1.4� to �6.0�, SS: �1.6� to�6.3�, PI: �1.0� to �7.3�, and SVA: �1.4 to �8.3 mm).No estimate of interrater agreement was given. Both studiespresent marginally lower estimates of measurement errorcompared with the results of the current study, which couldbe caused by the difference in spine pathology be-tween studies.

Comparing our results to the generally accepted mea-surement error of �5� for the coronal Cobb angle, theliterature suggest that this assumed error is not well

founded. The systematic review by Langensiepen et al. [24]state that the few studies on agreement were mostly notconsistent in reporting the same agreement parameter, onlyreported intrarater agreement, or had implemented verte-bral pre-selection for Cobb angle calculations which aspreviously mentioned is likely to exclude part of the mea-surement error. In their review of 11 selected studies, 9included adolescent or infantile scoliosis patients and 2 didnot specify age characteristics. They reported interrater95% LOA ranging from �3.6� to �8.3� and intrarater 95%LOA from �1.3� to �4.5�.

Strengths and limitations

In the current study, all cases were enrolled consecu-tively at a single center; however, many were excludedbecause of inadequate availability of radiographs, leaving arelatively small sample for inclusion. This is indeed sub-optimal and indicates low validity. New protocols havesince been introduced to reduce the rate of insufficient ra-diographs in collaboration with the local radiologydepartment. This has led to a significant reduction in ra-diographs lacking either C7 or femoral heads. Other centersundergoing similar difficulties are suggested to perform

Fig. 4. Coronal and sagittal radiographs of patient with pelvic obliquity illustrating the challenges in correctly identifying key landmarks.

776 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778

Page 66: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

quality assessment of performed radiographs to assess andimplement newer protocols to diminish the rate of insuffi-cient radiographs. Furthermore, the current study utilized asemiautomatic system for all radiographic measures.Although the surgeon had to identify all anatomic land-marks, the system calculated the radiographic parameters.This eliminates the risk of mistakes during calculations.Therefore, measurement error may well be even moreprominent when calculating spinopelvic parameters indigital radiographs using ‘‘manual’’ methods.

On the other hand, a fully disclosed selection process ispresented, which we argue contribute to lessening the riskof bias. Additionally, the use of mixed effect models tocalculate the LOA makes it independent of the specificICC. This, combined with results of SEM, allows fordirect comparison to future studies. Finally, the currentstudy was performed by surgeons with different levels ofexperience. As radiographic measurements in a clinicalsetting often is performed by observers of varying levels of

training, we argue that this should also be the case whenperforming a reproducibility study, thus increasing theexternal validity.

Conclusion

In this reproducibility study of the measurement accu-racy of spinopelvic parameters, we found noticeably largermeasurement errors in parameters depending on the angu-lation of the sacral end plate (SS: �11�; PI: �14�)compared with parameters merely depending on the loca-tion of the sacral end plate (PT: �5�; SVA: �7 mm).

This was despite excellent or near excellent ICCs for allparameters. These are to our knowledge the first estimatesof agreement according to Bland and Altman for spino-pelvic parameters in ASD patients, and we encourage sur-geons to account for the considerable measurement error inparameters depending on the angulation of the sacral endplate when comparing repeated measurements.

Fig. 5. Coronal and sagittal radiographs of patient without pelvic obliquity.

777T.J. Bari et al. / Spine Deformity 7 (2019) 771e778

Page 67: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

References

[1] Glassman SD, Berven S, Bridwell K, Horton W, Dimar JR. Correla-

tion of radiographic parameters and clinical symptoms in adult scoli-

osis. Spine (Phila Pa 1976) 2005;30:682e8.

[2] Glassman SD, Bridwell K, Dimar JR, Horton W, Berven S, Schwab F.

The impact of positive sagittal balance in adult spinal deformity.

Spine (Phila Pa 1976) 2005;30:2024e9.

[3] Weir JP. Quantifying test-retest reliability using the intraclass correla-

tion coefficient and the SEM. J Strength Cond Res 2005;19:231e40.

[4] Bland JM, Altman DG. Measuring agreement in method comparison

studies. Stat Methods Med Res 1999;8:135e60.[5] Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical

methodology for the concurrent assessment of interrater and intrarat-

er reliability: using goniometric measurements as an example. Phys

Ther 1994;74:777e88.

[6] Popovi�c ZB, Thomas JD. Assessing observer variability: a user’s

guide. Cardiovasc Diagn Ther 2017;7:317e24.

[7] Guyatt G, Walter S, Norman G. Measuring change over

timedaseessing the usefulness of evaluative instruments. J Chronic

Dis 1987;40:171e8.

[8] de Vet HCW, Terwee CB, Knol DL, Bouter LM. When to use agree-

ment versus reliability measures. J Clin Epidemiol 2006;59:1033e9.[9] Kottner J, Streiner DL. The difference between reliability and agree-

ment. J Clin Epidemiol 2011;64:701e2.

[10] Kottner J, Audige L, Brorson S, Donner A, et al. Guidelines for Re-

porting Reliability and Agreement Studies (GRRAS) were proposed.

Int J Nurs Stud 2011;48:661e71.

[11] Maillot C, Ferrero E, Fort D, et al. Reproducibility and repeatability of a

new computerized software for sagittal spinopelvic and scoliosis curva-

ture radiologic measurements: Keops?? Eur Spine J 2015;24:1574e81.

[12] Koo TK, Li MY. A guideline of selecting and reporting intraclass cor-

relation coefficients for reliability research. J Chiropr Med 2016;15:

155e63.

[13] McGraw KO, Wong SP. Forming inferences about some intraclass

correlations coefficients. Psychol Methods 1996;1:30e46.

[14] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater

reliability. Psychol Bull 1979;86:420e8.

[15] Bartlett JW, Frost C. Reliability, repeatability and reproducibility:

Analysis of measurement errors in continuous variables. Ultrasound

Obstet Gynecol 2008;31:466e75.[16] Hopkins WG. Measures of reliability in sports medicine and science.

Sports Med 2000;30:375e81.

[17] Jones M, Dobson A, O’brian S. A graphical method for assessing

agreement with the mean between multiple observers using contin-

uous measures. Int J Epidemiol 2011;40:1308e13.

[18] Kim CH, Chung CK, Hong HS, et al. Validation of a simple comput-

erized tool for measuring spinal and pelvic parameters. J Neurosurg

Spine 2012;16:154e62.[19] Aubin C-E, Bellefleur C, Joncas J, et al. Reliability and accu-

racy analysis of a new semiautomatic radiographic measurement

software in adult scoliosis. Spine (Phila Pa 1976) 2011;36:

E780e90.

[20] Berthonnaud E, Labelle H, Roussouly P, et al. A variability study of

computerized sagittal spinopelvic radiologic measurements of trunk

balance. J Spinal Disord Tech 2005;18:66e71.[21] Pernaa K, Sepp€anen M, M€akel€a K, Saltychev M. Reliability of

sagittal spinopelvic alignment measurements after total hip arthro-

plasty. Clin Spine Surg 2017;30:E909e14.

[22] Martin Bland J, Altman D. Statistical methods for assessing agree-

ment between two methods of clinical measurement. Lancet

1986;327:307e10.

[23] Lafage R, Ferrero E, Henry JK, et al. Validation of a new computer-

assisted tool to measure spino-pelvic parameters. Spine J 2015;15:

2493e502.

[24] Langensiepen S, Semler O, Sobottke R, et al. Measuring procedures

to determine the Cobb angle in idiopathic scoliosis: A systematic re-

view. Eur Spine J 2013;22:2360e71.

778 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778

Page 68: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

U N I V E R S I T Y O F C O P E N H A G E N

F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S

68

Study III

Ability of the Global Alignment and Proportion score to predict

mechanical failure following Adult Spinal Deformity surgery

– Validation in 149 patients with two-year follow-up

Spine Deformity 7 (2019) 331-337

https://doi.org/10.1016/j.jspd.2018.08.002

Page 69: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

Ability of the Global Alignment and Proportion Score to PredictMechanical Failure Following Adult Spinal Deformity

SurgerydValidation in 149 Patients With Two-Year Follow-upTanvir Johanning Bari, MDa,*, Søren Ohrt-Nissen, MD, PhDa, Lars Valentin Hansen, MDa,

Benny Dahl, MD, PhD, DMScb, Martin Gehrchen, MD, PhDa

aSpine Unit, Department of Orthopedic Surgery, University Hospital of Copenhagen, Copenhagen 2100, DenmarkbDepartment of Orthopedics and Scoliosis Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX 77030, USA

Received 4 June 2018; revised 27 July 2018; accepted 4 August 2018

Abstract

Study Design: Retrospective analysis of prospectively collected data.Objectives: To validate the Global Alignment and Proportion (GAP) score in a single-center cohort of adult spinal deformity (ASD)patients.Summary of Background Data: Surgical treatment for ASD is associated with a high risk of mechanical failure and consequent revisionsurgery. To improve prediction of mechanical complications, the GAP score was developed with promising results. Development was basedon the assumption that not all patients would benefit from the same fixed radiographic targets as pelvic incidence is an individual,morphological parameter that greatly influences the sagittal curves of the spine.Methods: In a validation study of the GAP score, patients undergoing ASD surgery with four or more levels of instrumentation wereconsecutively included at a tertiary spine unit. Patients were followed for a minimum of two years. Pre- and postoperative GAP score andcategories were calculated for all patients, and the association with mechanical failure and revision surgery was analyzed.Results: A total of 149 patients with a mean age of 57.4 years were included. Overall, the rates of mechanical failure and revision surgerywere 51% and 35% respectively. The area under the curve (AUC) using receiver operating characteristic was classified as ‘‘no or lowdiscriminatory power’’ for the GAP score in predicting either outcome (AUC 5 0.50 and 0.49, respectively). Similarly, there were nosignificant associations between GAP categories and the occurrence of mechanical failure or revision surgery when using Cochran-Armitage test of trend (p 5 .28 for mechanical failure and p 5 .58 for revision surgery).Conclusions: In a consecutive series of surgically treated ASD patients, we found no significant association between postoperative GAPscore and mechanical failure or revision surgery. Despite minor limitations in similarities to the original study cohort, further validationstudies or adjustments to the original scoring system are proposed.Level of Evidence: Level II.� 2018 Scoliosis Research Society. All rights reserved.

Keywords: Adult spinal deformity; Classification; Validation studies; Postoperative complications; Sagittal alignment; Reproducibility of results; Spinal

fusion; Radiography

Introduction

Adult spinal deformity (ASD) is a complex entity thatcan significantly impact health-related quality of life [1,2].Surgical treatment in selected patients has demonstratedbeneficial results compared with nonsurgical treatment[3,4], and the most common indications for surgery are painand disability [5]. However, ASD surgery is associated witha high risk of postoperative complications [6,7], and revi-sion surgery may be necessary, at times as a result of so-

Author disclosures: TJB (none), SO-N (institutional grant from K2M

outside the submitted work), LVH (none), BD (grants from K2M and Med-

tronic, outside the submitted work), MG (grants from K2M and Medtronic,

outside the submitted work).

Ethics approval: The study was approved by the Danish Data Protec-

tion Agency and the Danish Patient Safety Authority.

*Corresponding author. Department of Orthopedic Surgery, Spine

Unit, Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, Denmark. Tel.:

þ45 5057 9448; fax: þ45 3545 2165.

E-mail address: [email protected] (T.J. Bari).

2212-134X/$ - see front matter � 2018 Scoliosis Research Society. All rights reserved.

https://doi.org/10.1016/j.jspd.2018.08.002

Spine Deformity 7 (2019) 331e337www.spine-deformity.org

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 70: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

called mechanical failure [8-13]. The SRS-Schwab classi-fication was developed to guide surgical decision makingand establish targets for surgical results that would mini-mize postoperative disability [14]. The classification wasbased on health-related quality of life measures and has notbeen shown to reliably predict mechanical failure [13].Yilgor et al. [15] recently developed the Global Alignmentand Proportion (GAP) score, which is a predictive modelfor mechanical complications based on spinopelvic bal-ance. Pelvic incidence (PI) is a fixed morphologic param-eter that varies substantially between patients. Themorphology of the pelvis greatly influences the sagittalcurves of the spine, including the individual variability ofthe sacral slope (SS) and lordosis curve [16,17]. The GAPscore is based on the assumption that not all patients wouldbenefit from the same fixed radiographic targets. The scorewas developed in a cohort of ASD patients from the Eu-ropean Spine Society Group (ESSG) database. The GAPscore comprises 4 radiographic parameters calculated as thedifference between measured and ideal angle (Fig. 1).Relative pelvic version is calculated as the difference be-tween the measured angle and an ideal measure based onthe individual PI. Relative lumbar lordosis is calculated in

an equivalent manner. The authors calculated the idealangle for each measure using linear regression models ofthe relationship to PI from data of asymptomatic in-dividuals. The Lordosis Distribution Index characterizes thedisposition of the lumbar lordosis while the relative spi-nopelvic alignment is an estimate of the global sagittalalignment (Fig. 1). After adding a variable for age, a scorebetween 0 and 13 is given and patients are further sub-categorized into three categories (Fig. 2). After developingthe model, the authors validated the system in a new cohortof patients using receiver operating characteristics (ROC)and Cochran-Armitage test of trend with promising results.The aim of the current study was to validate the GAP scoreas predictor of mechanical failure and revision surgery in asingle-center cohort of patients with ASD.

Methods

All adult patients (>18 years of age) undergoing surgeryfor ASD in a three-year period from January 1, 2013, werescreened for inclusion. A total of 186 patients were iden-tified, and all were treated at a single tertiary spine unit.Patients with less than two years of follow-up (n 5 37)

Fig. 1. Methods for calculating the radiographic components of the Global Alignment and Proportion score. Relative pelvic version, relative lumbar lordosis,

and relative spinopelvic alignment are calculated based on the measured angle in relation to the ideal angle.

332 T.J. Bari et al. / Spine Deformity 7 (2019) 331e337

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 71: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

were excluded, leaving 149 for final inclusion. Long-standing digital radiographs were available for all patientspre- and postoperatively, as well as at 1- and 2-year follow-up. Demographics were obtained from medical records.Radiographic measurements were performed using thevalidated online data image management system KEOPS(SMAIO, Lyon, France) [18]. Relative pelvic version, rela-tive lumbar lordosis, Lordosis Distribution Index, and rela-tive spinopelvic alignment were calculated from radiographsin accordance to the original work (Fig. 1). Patients wereclassified into the three subcategories: proportioned,moderately disproportioned, and severely disproportioned(Fig. 2). Mechanical failure was defined as proximal junc-tional kyphosis (PJK), proximal junctional failure, distaljunctional failure, rod breakage, or other. Other implant-related complications included screw breakage, loosening,or pullout or set-screw dislodgement. PJK was defined as a>10� increase in kyphosis angle between the upper instru-mented vertebra (UIV) and 2 vertebrae above UIV (UIVþ2).Proximal junctional failure was defined as PJK along withone or more of the following: fracture of the UIVor UIVþ1,

posterior osseoligamentous disruption, or instrumentationpullout at the UIV [19]. Revision surgery was defined asreoperation within two years of index surgery because ofmechanical failure. A subset of patients have been describedin a previous study [20].

Statistics

All statistical analyses were performed using R, version3.4.3 (R Development Core Team, Vienna, Austria). Dis-tribution of data was assessed using histograms and re-ported as proportions (%), means with standard deviations(SDs), or medians with interquartile ranges. The Mann-Whitney U test was used when comparing numeric non-Gaussian distributed data. Pearson c2 test of indepen-dence or trend across ordinal categories was used whencomparing categorical data, and Fisher exact test was usedwhen the expected frequency was below 5 for more than20% of the compared groups. ROC curves were used tocalculate AUC in assessing the GAP score’s diagnosticaccuracy of predicting mechanical failure or revision

Fig. 2. Each component of the Global Alignment and Proportion score includes a value based on the radiographic parameters as depicted in Fig. 1 and an age

component. Patients are given a score for each parameter, which are summarized as a total score (range 0e13) and further subcategorized. RPV, relative

pelvic version; RLL, relative lumbar lordosis; LDI, lumbar distribution index; RSA, relative spinopelvic alignment.

333T.J. Bari et al. / Spine Deformity 7 (2019) 331e337

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 72: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

surgery [21,22]. Diagnostic accuracy was classified as ‘‘noor low discriminatory power’’ for an AUC of 0.5e0.7,‘‘moderate discriminatory power’’ for an AUC of 0.7e0.9and ‘‘high discriminatory power’’ for an AUC >0.9 [23].Additionally, GAP categories were analyzed for associationto the two outcome variables using Cochrane-Armitage testof trend and presented as odds ratios (ORs) with 95% CIs[24,25]. Furthermore, GAP score and categories wereanalyzed for association with revision surgery or mechan-ical failure using logistic regression models. p values <.05were considered significant.

Before the study, approvals were obtained from theregional data protection agency and national patient safetyauthority: journal number 2012-58-0004 and 3-3013-1980/1.

Results

The current study included a consecutive cohort of 149surgically treated ASD patients. The mean age was 57.4(�15.9) years, patients were followed for a median of 32months (range: 24e57), and 88 (59%) had a history ofprevious spine surgery. Demographics and complicationsare shown in Table 1. Patients attained a mean GAP scorecorrection of �4.1 (�4.1). In 86 (58%) cases, the surgicalprocedure included a three-column osteotomy (3CO).Seventy-six (51%) patients had one or more mechanicalcomplications, and 52 (35%) underwent revision surgerybecause of mechanical complication within two years of theindex procedure. There were 4 (2.2%) deaths during thestudy period in the initial cohort of 181 patients, and nonewere included in the final cohort. We found no associationbetween postoperative GAP score and mechanical failure orrevision surgery using the Mann-Whitney U test (p 5 .84and p 5 .97, respectively). Similar findings were presentfor the three subcategories when using the Pearson c2 test(p 5 .39 and p 5 .84 respectively) (Fig. 3). Using receiveroperating characteristic curves, we found no or lowdiscriminatory power for both mechanical failure (AUC:0.50, 95% CI: 0.40e0.59) and revision surgery (AUC: 0.49,95% CI: 0.39e0.59) (Fig. 4). When assessing subcategoriesin relation to mechanical failure or revision surgery, nosignificant effect was found in Cochran-Armitage test oftrend (p 5 .58 and p 5 .28, respectively). Logisticregression showed no significant association between eitherGAP score or categories and the two outcome vari-ables (Table 2).

Discussion

The present study found no significant association be-tween either GAP score or GAP categories in relation tomechanical failure or revision surgery in patients withASD. Analyses were performed using ROC curves,Cochran-Armitage test of trends, and logistic regres-sion models.

The present study was designed to validate the recentlypresented classification system in ASD patients. Patientsundergoing surgery were consecutively included to estab-lish the current cohort and no attempts were made to matchthe current cohort with that in the original study. Ourdepartment is a tertiary spine unit, and a majority ofreferred patients were complex cases in contrast to patientsin the original work who were included from the ESSGdatabase. In comparing the two cohorts, we found smalldifferences in age (57.4 vs. 50.7 years) and sex distribution(70.5 vs. 80.0% females). Most noticeable was the pro-portion of patients with a history of previous spine surgery(59% in the current study vs. 30% in the original cohort).Mean postoperative GAP score was 4.8 (range: 0e12)compared with 4.0 (range: 0e13) in the original cohort.Information regarding the extent of 3COs (58% in thecurrent study) and number of instrumented vertebrae werenot presented in the original validation cohort. Rates ofmechanical failure and revision procedures were slightlyhigher in the present study, which may be attributed to ahigh rate of 3CO procedures [6,13,26].

As 37 of the 186 identified ASD patients were lost tofollow-up, the risk of selection bias should be considered.Although an inclusion rate of 80% is not ideal, we arguethat the one-center and consecutive fashion of inclusionminimizes the risk of this type of bias.

The use of any classification system relies on its abilityto propose treatment strategy and accurately predictoutcome, ultimately minimizing the risk of postoperative

Table 1

Demographics

Age, years, mean (SD) 57.4 (15.9)

Instrumented vertebrae, mean (SD) 12.0 (3.5)

Sex, female 105 (70.5)

Previous spine surgery 88 (59.1)

3-column osteotomy 86 (57.7)

Follow-up, months, median [IQR] 32 [24e57]Mechanical failure 76 (51.0)

Proximal junctional kyphosis 4 (2.7)

Proximal junctional failure 10 (6.7)

Distal junctional failure 1 (0.7)

Rod breakage 57 (38.3)

Other mechanical failure 17 (11.4)

Revision surgery 52 (34.9)

Preoperative GAP score, median [IQR] 10 [6e12]

Postoperative GAP score, median [IQR] 4 [2-7]

Preoperative GAP category

Proportioned 7 (4.7)

Moderately disproportioned 32 (21.5)

Severely disproportioned 110 (73.8)

Postoperative GAP category

Proportioned 40 (26.8)

Moderately disproportioned 64 (43.0)

Severely disproportioned 45 (30.2)

Change in GAP score, mean (SD) �4.1 (4.1)

GAP, Global Alignment and Proportion; IQR, interquartile range; SD,

standard deviation.

Values are n (%) unless otherwise noted.

334 T.J. Bari et al. / Spine Deformity 7 (2019) 331e337

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 73: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

complications. We therefore find the seminal work of Yil-gor et al. [15] of considerable relevance as such a predictionmodel has not previously been published in patients withASD. Results were promising, with an AUC in correlationto mechanical failure of 0.92 (standard error [SE]: 0.0034,p ! .001, 95% CI: 0.85e0.98). However, the results of thepresent study showed no significant association between theGAP score and mechanical failure and thus did not confirmthe findings of the original study. Given that the currentstudy population differed slightly from the original cohort,

caution is warranted regarding the interpretation of thesefindings. The current cohort reflects the heterogeneity ofASD patients and as such was not designed to replicate theoriginal study but to assess whether the promising results ofthe GAP score had external validity in a non-selected ASDcohort. The rate of complications in the present study washigher than previously described following ASD surgery[6] and could have influenced the results.

The GAP score was developed on the basis of theassumption that PI is an individual morphologic parameter

Fig. 3. Boxplot of Global Alignment and Proportion (GAP) category distribution by mechanical failure and revision surgery.

Fig. 4. Receiver operating characteristic curves illustrating the diagnostic accuracy of the Global Alignment and Proportion (GAP) score in predicting me-

chanical failure and revision surgery. An area under the curve (AUC) of !0.7 indicates ‘‘no or low discriminatory power’’ of diagnostic accuracy.

335T.J. Bari et al. / Spine Deformity 7 (2019) 331e337

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 74: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

affecting all other spinal and pelvic parameters. The sys-tem was therefore established by comparing differentmeasured spinopelvic parameters in relation to calculatedtarget values based on the individual PI and takes intoaccount the distribution of the lumbar lordosis, sagittalbalance, magnitude of SS and lumbar lordosis in relationto PI, and age. However, previous studies have suggestedthat there are multiple factors associated with failure.Nicholls et al. [27] found that pelvic tilt (PT) and thoracickyphosis (TK) significantly increased the risk of me-chanical failure following ASD surgery. The relationship:PI 5 SSþPT, is well established, and although both SSand PI are included in the GAP model, PT is known to bea reliable estimator of pelvic retroversion and therebycompensation of sagittal imbalance [28]. Additionally,Nicholls et al. reported that females developed PJK in ahigher rate than men. In a competing risk model of me-chanical failure, Hallager et al. [13] also reported that themagnitude of thoracic kyphosis significantly influencedthe rate of postoperative failure. Furthermore, the studyfound that larger operative changes, in particular in lum-bar lordosis, significantly increased the risk. Further as-pects in relation to mechanical failure are the number ofinstrumented vertebrae and instrumentation ending at thesacrum that have previously been correlated to risk ofmechanical complications [29,30]. Obesity, osteoporosis,severity of ASD, minimally invasive surgery, and musclevolume has also been associated with changed risk ofmechanical failure [31-33]. In addition, questions havebeen raised whether or not the presence of mechanicalfailure correlates to patient-reported outcome measures[34], which further complicates the clinical significance ofthese radiographic outcomes.

To our knowledge, the present study is the first valida-tion study of the GAP score outside of the original work.The results did not confirm the findings of the originalstudy. Identifying patients at risk of mechanical failure andrevision surgery is a key task of any spine surgeon. How-ever, ASD patients are characterized by a substantial het-erogeneity, and predictive classifications may need toaddress this variation even further. Due care is thereforesuggested in clinical use of the GAP score in its presentform until appropriate modifications or future validationstudies are presented.

Conclusions

This longitudinal, single-center study on ASD patientsfound no significant association between postoperativeGAP score and mechanical failure or revision surgery.Despite limitations in case similarities, further adjustmentsto the classification system or future validation studies aresuggested before clinical implementation.

References

[1] Pellis�e F, Vila-Casademunt A, Ferrer M, et al. Impact on health

related quality of life of adult spinal deformity (ASD) compared with

other chronic conditions. Eur Spine J 2014;24:3e11.

[2] Baldus C, Bridwell KH, Harrast J, et al. Age-gender matched com-

parison of SRS instrument scores between adult deformity and

normal adults: are all SRS domains disease specific? Spine (Phila

Pa 1976) 2008;33:2214e8.

[3] Bridwell KH, Glassman S, Horton W, et al. Does treatment (nonop-

erative and operative) improve the two-year quality of life in patients

with adult symptomatic lumbar scoliosis: a prospective multicenter

evidence-based medicine study. Spine (Phila Pa 1976) 2009;34:

2171e8.[4] Smith JS, Shaffrey CI, Glassman SD, et al. Risk-benefit assessment

of surgery for adult scoliosis: an analysis based on patient age. Spine

(Phila Pa 1976) 2011;36:817e24.

[5] Bess S, Boachie-Adjei O, Burton D, et al. Pain and disability deter-

mine treatment modality for older patients with adult scoliosis, while

deformity guides treatment for younger patients. Spine (Phila Pa

1976) 2009;34:2186e90.

[6] Sciubba DM, Yurter A, Smith JS, et al. A comprehensive review of

complication rates after surgery for adult deformity: a reference for

informed consent. Spine Deform 2015;3:575e94.

[7] Karstensen S, Bari T, Gehrchen M, et al. Morbidity and mortality of

complex spine surgery: a prospective cohort study in 679 patients

validating the Spine AdVerse Event Severity (SAVES) system in a

European population. Spine J 2016;16:146e53.

[8] Passias PG, Soroceanu A, Yang S, et al. Predictors of revision surgi-

cal procedure excluding wound complications in adult spinal defor-

mity and impact on patient-reported outcomes and satisfaction.

J Bone Joint Surg Am 2016;98:536e43.

[9] Howe CR, Agel J, Lee MJ, et al. The morbidity and mortality of fu-

sions from the thoracic spine to the pelvis in the adult population.

Spine (Phila Pa 1976) 2011;36:1397e401.

[10] Charosky S, Guigui P, Blamoutier A, et al. Complications and risk

factors of primary adult scoliosis surgery: a multicenter study of

306 patients. Spine (Phila Pa 1976) 2012;37:693e700.

[11] Blamoutier A, Guigui P, Charosky S, et al. Surgery of lumbar and

thoracolumbar scolioses in adults over 50. Morbidity and survival

in a multicenter retrospective cohort of 180 patients with a mean

follow-up of 4.5years. Orthop Traumatol Surg Res 2012;98:528e35.

Table 2

Accuracy of the GAP score as a predictor of revision surgery or mechanical failure

Revision surgery p value Mechanical failure p value

Receiver operating characteristics, AUC (95% CI) 0.49 (0.39e0.59) 0.50 (0.40e0.59)

Cochran-Armitage test of trend, c2 test 1.16 .28 0.30 .58

Univariate logistic regression, OR (95% CI)

GAP score 1.01 (0.91e1.11) .28 1.00 (0.91e1.10)

GAP categories

Proportioned Reference Reference

Moderately disproportioned 1.42 (0.76e2.66) .28 1.14 (0.65e2.18) .28

Severely disproportioned 0.87 (0.50e1.53) .63 0.94 (0.55e1.60) .81

AUC, area under the curve; CI, confidence interval; GAP, Global Alignment and Proportion; OR, odds ratio.

336 T.J. Bari et al. / Spine Deformity 7 (2019) 331e337

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 75: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

[12] Inoue S, Khashan M, Fujimori T, Berven SH. Analysis of mechanical

failure associated with reoperation in spinal fusion to the sacrum in

adult spinal deformity. J Orthop Sci 2015:609e16.[13] Hallager DW, Karstensen S, Bukhari N, et al. Radiographic predic-

tors for mechanical failure after adult spinal deformity surgery. Spine

(Phila Pa 1976) 2017;42:E855e63.

[14] Schwab F, Ungar B, Blondel B, et al. Scoliosis Research Societ-

ydSchwab Adult Spinal Deformity Classification. Spine (Phila Pa

1976) 2012;37:1077e82.

[15] Yilgor C, Sogunmez N, Boissiere L, et al. Global Alignment and Pro-

portion (GAP) Score: development and validation of a new method of

analyzing spinopelvic alignment to predict mechanical complications

after adult spinal deformity surgery. J Bone Joint Surg Am 2017;99:

1661e72.

[16] Legaye J, Duval-Beaup�ere G, Hecquet J, Marty C. Pelvic incidence: a

fundamental pelvic parameter for three-dimensional regulation of

spinal sagittal curves. Eur Spine J 1998;7:99e103.

[17] Boulay C, Tardieu C, Hecquet J, et al. Sagittal alignment of spine and

pelvis regulated by pelvic incidence: standard values and prediction

of lordosis. Eur Spine J 2006;15:415e22.

[18] Maillot C, Ferrero E, Fort D, et al. Reproducibility and repeatability

of a new computerized software for sagittal spinopelvic and scoliosis

curvature radiologic measurements: Keops?? Eur Spine J 2015;24:

1574e81.

[19] Hart RA, McCarthy I, Ames CP, et al. Proximal junctional kyphosis

and proximal junctional failure. Neurosurg Clin N Am 2013;24:

213e8.

[20] Hallager DW, Hansen LV, Dragsted CR, et al. A comprehensive anal-

ysis of the SRS-Schwab adult spinal deformity classification and con-

founding variables: a prospective, non-US cross-sectional study in

292 patients. Spine (Phila Pa 1976) 2016;41:E589e97.

[21] Hanley AJ, McNeil JB. The meaning and use of the area under a

receiver operating characteristic (ROC) curve. Radiology 1982;143:

29e36.

[22] Faraggi D, Reiser B. Estimation of the area under the ROC curve.

Stat Med 2002;21:3093e106.

[23] GrzybowskiM,Younger JG. Statisticalmethodology: III.Receiver oper-

ating characteristic (ROC) curves. Acad Emerg Med 1997;4:818e26.

[24] Cochran WG. Some methods for strengthening the common c2 tests.

Biometrics 1954;10:417e51.

[25] Armitage P. Tests for linear trends in proportions and frequencies.

Biometrics 1955;11:375e6.

[26] Smith JS, Shaffrey CI, Klineberg E, et al. Complication rates associ-

ated with 3-column osteotomy in 82 adult spinal deformity patients:

retrospective review of a prospectively collected multicenter consec-

utive series with 2-year follow-up. J Neurosurg Spine 2017;27:1e14.

[27] Nicholls FH, Bae J, Theologis AA, et al. Factors associated with the

development of and revision for proximal junctional kyphosis in 440

consecutive adult spinal deformity patients. Spine (Phila Pa 1976)

2017;42:1693e8.

[28] Lafage V, Schwab F, Patel A, et al. Pelvic tilt and truncal inclination:

two key radiographic parameters in the setting of adults with spinal

deformity. Spine (Phila Pa 1976) 2009;34:E599e606.

[29] Kim YJ, Bridwell KH, Lenke LG, et al. Pseudarthrosis in adult spinal

deformity following multisegmental instrumentation and arthrodesis.

J Bone Joint Surg Am 2006;88:721e8.

[30] Edwards CC, Bridwell KH, Patel A, et al. Long adult deformity fu-

sions to L5 and the sacrum: a matched cohort analysis. Spine (Phila

Pa 1976) 2004;29:1996e2005.

[31] Wang H, Ding W, Ma L, et al. Prevention of proximal junctional

kyphosis: are polyaxial pedicle screws superior to monoaxial pedicle

screws at the upper instrumented vertebrae? World Neurosurg

2017;101:405e15.

[32] Shashank VG, Jacob J, Konrad B, et al. Development of proximal

junctional kyphosis after minimally invasive lateral anterior column

realignment for adult spinal deformity. Neurosurgery 2018;28:1e9.

[33] Kim D, Kim J, Kim D, et al. Risk factors of proximal junctional

kyphosis after multilevel fusion surgery: more than 2 years follow-

up data. J Korean Neurosurg Soc 2017;60:174e80.[34] Yasuda T, Hasegawa T, Yamato Y, et al. Proximal junctional kyphosis

in adult spinal deformity with long spinal fusion from T9/T10 to the

ilium. J Spine Surg 2017;3:204e11.

337T.J. Bari et al. / Spine Deformity 7 (2019) 331e337

Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.

Page 76: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

U N I V E R S I T Y O F C O P E N H A G E N

F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S

76

Appendix 4 - Specific statistical considerations

Page 77: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

Specific statistical considerations

Intraclass correlation coefficients (ICC)

ICC coefficients were used to describe reliability for numeric variables using the package

“irr” version 0.84. Two-way random effects model, single measure, absolute agreement

(ICCA,1) was used for inter-rater reliability and two way mixed effects, single measure,

absolute agreement (ICCA,1) for intra-rater reliability [1–3].

Standard error of measurement (SEM) and 95% Limits of Agreement (LOA)

When describing agreement the SEM and 95 % LOA were calculated. SEM is often

calculated as 𝑆𝐸𝑀 = 𝑆𝐷 √1 − 𝐼𝐶𝐶 [4–6] where SD is the standard deviation of scores for all

cases. This form is known as SEM consistency. As different forms of ICC derive different

values, the use of this equation can substantially influence the size of SEM, and SEM

consistency does not take in to account the systematic differences between raters. Therefore, a

more precise method of deriving the SEM is using the square root of the mean square error

term by the means of a repeated measure analysis of variance (ANOVA)[7,8] and is called

SEM agreement. Any further mention of SEM will be in the agreement form. SEM was

calculated as 𝑆𝐸𝑀𝑖𝑛𝑡𝑟𝑎 = √𝑟𝑎𝑡𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 and 𝑆𝐸𝑀𝑖𝑛𝑡𝑒𝑟 =

√𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 . A linear mixed effects model with the software package “lme4”

version 1.1-15 was used to calculate the variances. The repeatability coefficient (RC) devised

by Bland and Altman[9] also describes the within-subject variation by calculating the

difference in measurements exceeded by only 5% of pairs of measurement. The relationship

between the RC and SEM is 𝑅𝐶 = 𝑡 × 𝑆𝐸𝑀𝑑𝑖𝑓𝑓 where t is the t statistic for a given

cumulative probability with n degrees of freedom. As 𝑆𝐸𝑀𝑑𝑖𝑓𝑓 = √2 × 𝑆𝐸𝑀 the full equation

is 𝑅𝐶 = 𝑡 × √2 × 𝑆𝐸𝑀 = 2.77 × 𝑆𝐸𝑀, for cumulative probability of 0.975 and n degrees

of freedom [6,8,9]. Bland and Altman further describe the 95% limits of agreement (LOA) as

±RC. A biostatistician was consulted prior to the study and aided in the performance of all

statistical analysis.

Page 78: Name of department: Spine Unit, Department of Orthopedic ...4 Abbreviations and concepts Agreement The degree to which repeated measurements are identical, e.g. the measurement precision

[1] Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation

Coefficients for Reliability Research. J Chiropr Med 2016;15:155–63.

[2] McGraw KO, Wong SP. Forming inferences about some intraclass correlations

coefficients. Psychol Methods 1996;1:30–46.

[3] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol

Bull 1979;86:420–8.

[4] Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness

of evaluative instruments. J Chronic Dis 1987;40:171–8.

[5] Beaton DE, Bombardier C, Katz JN, et al. A taxonomy for responsiveness. J Clin

Epidemiol 2001;54:1204–17.

[6] Hopkins WG. Measures of Reliability in Sports Medicine and Science. Sport Med

2000;30:375–81.

[7] Stratford PW, Goldsmith CH. Use of the Standard Error as a Reliability Index of

Interest: An Applied Example Using Elbow Flexor Strength Data. Phys Ther

1997;77:745–50.

[8] Weir JP. Quantifying Test-Retest Reliability Using the Intraclass Correlation

Coefficient and the SEM. J Strength Cond Res 2005;19:231.

[9] Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat

Methods Med Res 1999;8:135–60.