name of department: spine unit, department of orthopedic ...4 abbreviations and concepts agreement...
TRANSCRIPT
1
Name of department: Spine Unit, Department of Orthopedic Surgery, Rigshospitalet,
Copenhagen University Hospital
Blegdamsvej 9, 2100, Copenhagen, Denmark
Author: Tanvir Johanning Bari
Title and subtitle: Radiographic parameters in Adult Spinal Deformity
– Reproducibility and prediction of failure
Academic advisors: Principal supervisor
Associate professor Martin Gehrchen, MD, PhD, Head of Spine Unit
Spine Unit, Department of Orthopedic Surgery, Rigshospitalet,
Copenhagen University Hospital
Primary co-supervisor
Professor Benny Dahl, MD, PhD, DMSc, Director of Spine Program
Department of Orthopedics and Scoliosis Surgery, Texas Children’s
Hospital & Baylor College of Medicine, Houston, TX, USA
Co-supervisor
Associate Professor Rachid Bech-Azeddine, MD, PhD,
Center for Rheumatology and Spine Diseases,
Rigshospitalet Glostrup, Copenhagen University Hospital
Co-supervisor
Dennis Winge Hallager, MD, PhD,
Spine Unit, Department of Orthopedic Surgery, Rigshospitalet,
Copenhagen University Hospital
Assessment committee: Chairman
Professor Jes Bruun Lauritsen, MD, PhD, DMSc,
Department of Orthopedic Surgery, Bispebjerg Hospital,
Copenhagen University Hospital
Assessor representing International research
Associate Professor, Dominique André Rothenfluh, MD, PhD,
Consultant Spinal Surgeon, Senior Research Fellow,
Oxford University Hospitals NHS Foundation Trust
Assessor representing Danish research
Associate Professor Stig Mindedahl Jespersen, MD, PhD,
Department of Orthopedic Surgery, Odense University Hospital
2
Table of contents
ACKNOWLEDGMENTS ................................................................................................................. 3
ABBREVIATIONS AND CONCEPTS ............................................................................................ 4
SUMMARY ........................................................................................................................................ 6
DANSK RESUMÉ.............................................................................................................................. 8
LIST OF PAPERS ........................................................................................................................... 10
INTRODUCTION ............................................................................................................................ 11
BACKGROUND .............................................................................................................................. 12 Sagittal balance ...................................................................................................................................................... 12 Compensating for imbalance ................................................................................................................................. 12 Treatment............................................................................................................................................................... 15 Complications and Adverse events ........................................................................................................................ 15 Classification systems ........................................................................................................................................... 15 Reproducibility ...................................................................................................................................................... 17
OBJECTIVES AND HYPOTHESES............................................................................................. 19 Study I: Roussouly Classification .......................................................................................................................... 19 Study II: Spinopelvic parameters .......................................................................................................................... 19 Study III: GAP-score ............................................................................................................................................. 19
METHODS ....................................................................................................................................... 20 Overall study design .............................................................................................................................................. 20 Patient inclusion .................................................................................................................................................... 20 Data collection ....................................................................................................................................................... 21 Radiographic measurements .................................................................................................................................. 21 Reproducibility measurements .............................................................................................................................. 21 Outcomes ............................................................................................................................................................... 21 Statistics ................................................................................................................................................................ 22
RESULTS ......................................................................................................................................... 24 Study I ................................................................................................................................................................... 24 Study II .................................................................................................................................................................. 26 Study III ................................................................................................................................................................. 28
DISCUSSION ................................................................................................................................... 30 The Roussouly Classification ................................................................................................................................ 30 Measurement accuracy of sagittal radiographic parameters .................................................................................. 31 Predicting mechanical failure using the GAP-score .............................................................................................. 33
LIMITATIONS ................................................................................................................................ 35
PERSPECTIVES AND FUTURE RESEARCH ........................................................................... 37
CONCLUDING REMARKS .......................................................................................................... 39
REFERENCES ................................................................................................................................. 40
APPENDICES .................................................................................................................................. 51 Study I ................................................................................................................................................................... 51 Study II .................................................................................................................................................................. 59 Study III ................................................................................................................................................................. 68 Appendix 4 - Specific statistical considerations .................................................................................................... 76
3
Acknowledgments
Probably the most read part of any thesis or dissertation is this – the acknowledgments, and here are
mine:
I was first drawn to the academic field of spine surgery when I, as a medical student at a lecture in
2010, met Professor Benny Dahl. I was shortly after enrolled as a pregraduate research year student
at the Spine Unit with Benny and Associate Professor Martin Gehrchen as my supervisors. After
graduating medical school in 2014, it was an exceptional privilege to be offered a position as full-
time research fellow by these two men, for which I am very grateful. I owe a tremendous amount of
thanks to Martin and Benny for providing the foundation and structure to support the research
presented in this thesis. I have learned a great deal from the both of you, and have come to
understand how you, in a very short time span, have transformed the Spine Unit at Rigshospitalet to
a world-known scientific institute.
I am also grateful for the support that I have received from my academic supervisors Associate
Professor Rachid Bech-Azeddine and Dennis Winge Hallager. Dennis has shown me how
transparency, attention to detail and not cutting corners are essential in all academic work. Thank
you for the hard work that paved the road for my research, and for your guidance.
I would also like to thank all my co-authors Lars Valentin Hansen, Niklas Tøndevold and Ture
Karbo for contributing to the studies. A special thanks to Niklas, for keeping me grounded.
I owe a very sincere thank you to my fellow Ph.D. colleagues at the Spine Unit, Sidsel Fruergaard,
Frederik Pitter, Casper Dragsted and Søren Ohrt-Nissen, and from across the hall: Pelle
Petersen, for both social, scientific and emotional support. Your company has provided the best
work environment that I could possibly imagine. And of course, to my gym-buddy Peter Polzik, for
listening to my rants and for teaching me sarcasm – bol’shoye spasibo.
Without the support of my friends and family, this thesis would not haven possible. I would also
like to thank Aesculapius, Partur and Uglerne.
A very sincere thank you to my Father, who has endured what I can only imagine, to provide the
conditions for which I now find myself in – I will always be grateful for all that you have done and
for being my role model.
And finally, for everything you put up with, for your love, for your endless support and for the love
of our life, Hugo – I dedicate this work to my wife, Emilie.
4
Abbreviations and concepts
Agreement The degree to which repeated measurements are identical, e.g. the measurement precision
AP Anteverted pelvis
ASD Adult Spinal Deformity
AUC Area under the curve
C7 The 7th cervical vertebra
CI Confidence interval
Coronal Anterior plane of the spine
DICOM Digital Imaging and Communication in Medicine
GAP Global alignment and Proportion: a score to predict postoperative results
HRQOL Health-related quality of life
ICC Intraclass correlation coefficient, used to measure reliability
IQR Interquartile range
Kyphosis Convex curvature of the spine, normally seen in the thoracic spine
L5 The 5th lumbar vertebra
LOA Limits of Agreement, used to quantify Agreement in continuous data
Lordosis Concave curvature of the spine, normally seen in the cervical and lumbar spine
Mechanical complication Defined as a postoperative instrument failure, eg. rod breakage or proximal junctional failure
OR Odd ratio
PACS Picture Archiving and Communication System
PI Pelvic Incidence: morphologic spinopelvic radiographic parameter (Figure 3)
PJF Proximal junctional failure: PJK more than a specified amount or combined with eg. fracture or ligamentous rupture.
PJK Proximal junctional kyphosis: postoperative kyphosation above the highest level of instrumentation
5
Plumb line vertical line
PT Pelvic Tilt: radiographic spinopelvic parameter representing compensation for sagittal imbalance (Figure 3)
Reliability The ability of a measurement to differentiate between subjects[1], using the ratio between measurement error (within subject) and between subject variation
Reproducibility Umbrella term for agreement and reliability, describes the similarity between repeated measures.
Revision surgery Secondary unplanned surgery after index procedure
ROC Receiver operating characteristics
Sagittal The lateral plane of the spine
SD Standard deviation
SEM Standard error of measurement: a parameter to assess Agreement
SRS Scoliosis Research Society
SS Sacral slope: a radiographic parameter of the slope of the sacral endplate (Figure 3)
SVA Sagittal vertical axis: a radiographic describing the sagittal balance of the spine (Figure 2)
UIV The uppermost instrumented vertebra
6
Summary
Adult Spinal Deformity (ASD) is a common disorder which substantially impacts health-related
quality of life and is associated to considerable pain. Although the reason for ASD can be
manifold, iatrogenic causes and degenerative aging processes are the most common. Recent
decades have seen accelerated technological and surgical advances, and in addition to a
demographic shift of the global population, ASD is attracting growing interest. Non-surgical
treatment is often chosen as first-line of treatment; although, the outcome of surgery is
considerably favorable. Despite compelling effect on every-day living and pain, surgical
treatment is associated with a considerable risk of complications, including mechanical failure
requiring revision surgery. Radiographic parameters on whole-spine imaging have been
established as a crucial cornerstone in diagnosis, assessment of severity, pre-operative planning,
and post-operative evaluation of patients with this disorder. In addition, numerous classification
and scoring systems are emerging in attempts to identify at-risk patients and predict
complications requiring revision surgery.
The overall aim of this thesis was to further scientific knowledge of radiographic parameters in
patients with ASD, in efforts to ultimately reduce the surgery-related risks. The Roussouly
Classification system was developed to describe the variation in sagittal spine shapes in the
normal population, and to serve as templates for the surgical targets. In Study I, this system was
assessed for its reproducibility when classified by multiple raters and with repeated
measurements. A blinded test-re-test study was performed in 64 patients referred for ASD
assessment. Using Fleiss Kappa, inter-rater reliability was estimated as “moderate” and intra-
rater reliability as “substantial”, which we found comparable with previous similar estimates of
the most commonly used ASD classification system—the SRS-Schwab ASD Classification.
Although, single radiographic parameters of the sagittal spine are frequently used in ASD
evaluation, the measurement error has not been established. It is assumed to be approximately
±5°; although, the evidence for this is not well founded. In Study II, we aimed to address this
lack in knowledge. In a similar reproducibility study, we estimated the measurement error of
four commonly used radiographic parameters. Despite excellent reliability, we found that the
measurement error was considerably greater than previously assumed, and that the magnitude of
measurement error varied considerably between the included parameters.
Finally, the Global Alignment and Proportion (GAP) score was recently presented with excellent
predictability of mechanical failure after ASD surgery. In Study III, we aimed to validate this
7
novel score in a consecutive cohort of 149 patients with minimum 2-years follow-up. Despite
only minor dissimilarities to the cohort in the original study, we found no statistical association
between the GAP score and mechanical complications after surgery for ASD.
In conclusion, we found moderate to substantial reliability of the Roussouly Classification
system; estimated the measurement error of sagittal radiographic parameters to be greater than
previously assumed; and were not able to validate the accuracy of the GAP score, all in patients
with ASD.
8
Dansk resumé
Adult Spinal Deformity (ASD) er den internationale betegnelse for rygdeformiter hos voksne, og er
en hyppig og smertefuld tilstand med betydelig negativ indflydelse på livskvaliteten. Rygdeformiter
hos voksne kan opstå af forskellige grunde, og de hyppigste årsager er iatrogen deformitet (som
følge af en behandling), samt degenerativ deformitet som følge af aldring. Der har i de seneste årtier
været en betragtelig teknologisk og kirurgisk udvikling, og kombineret med den demografiske
befolkningsudvikling og deraf flere ældre, ses en øget interesse for behandling af rygdeformiteter. I
tilfælde af svær deformitet er kirurgi at foretrække frem for konservativ behandling, og de fleste
patienter vil opnå et godt resultat med betydelig reduktion af smerter og funktionsnedsættelse.
Imidlertid er kirurgi forbundet med betydelige risici, inklusive mekaniske komplikationer der ofte
kræver fornyet kirurgi. En afgørende hjørnesten i undersøgelse af rygdeformitet er radiologiske
parametre. Disse måles på røntgenbilleder taget af hele rygsøjlen med patienten i stående stilling.
De bruges til diagnostik, vurdering af grad af deformitet, planlægning af kirurgi og post-operativ
vurdering af effekten af kirurgi. Ligeledes sker der for tiden en udvikling af flere
klassifikationssystemer til at identificere patienter i højere risiko for komplikationer, og til generelt
at forudsige komplikationer som følge af kirurgi.
Denne ph.d.-afhandlings overordnede formål var at bibringe yderligere viden indenfor radiologiske
målinger hos voksne med rygdeformiteter, for at reducere den risiko der er forbundet med kirurgi.
Roussouly´s klassifikationssystem blev udviklet med det formål at beskrive den normale variation i
rygtyper hos symptomfri individer, og til at bruges som skabelon ved planlægning forud for
deformitetskirurgi. I Studie I, undersøgte vi reproducerbarheden af dette system når flere
observatører foretog gentagne målinger på de samme patienter. Forsøget udførtes som et blindet
test-re-test studie af 64 patienter henvist pga. rygdeformitet. Ved brug af Fleiss Kappa estimerede vi
reproducerbarheden som ”moderat” til ”betydelig”, sammenligneligt med tidligere skøn af det, for
tiden, mest brugte klassifikationssystem af rygdeformitet.
Selvom enkelte røntgenparametre hyppigt bruges ved vurdering af rygdeformiteter, er den reelle
måleusikkerhed ikke fastslået. Der er en generel antagelse af en måleusikkerhed på ±5°, men
antagelsen er ikke velbegrundet. I Studie II, var formålet at belyse denne problemstilling. Med
lignende fremgangsmåde som i Studie I udførtes et reproducerbarheds-studie, hvor vi ønskede at
fastslå måleusikkerheden af 4 hyppigt brugte radiologiske parametre. Måleusikkerheden var
9
betydeligt større end tidligere antaget. Desuden var der en stor variation i måleusikkerhed blandt de
undersøgte parametre.
Der er for nylig publiceret et studie hvor Global Alignment and Proportion (GAP) skalaen blev
præsenteret med fremragende evne til at forudse mekaniske komplikationer efter deformitetskirurgi.
I Studie III, var formålet at validere denne nye skala hos 149 patienter efter kirurgi for
rygdeformitet, og med to års opfølgning. Med forbehold for mindre demografiske forskelle
sammenlignet med den oprindelige patient population, fandt vi ingen statistisk sammenhæng
mellem GAP score og mekanisk komplikation til kirurgi.
På baggrund af disse tre studier kan vi konkludere: 1) at Roussouly’s klassifikationssystem er
forbundet med ”moderat” til ”betydelig” reproducerbarhed, 2) at der er betydeligt større
måleusikkerhed ved brug at radiologiske parametre hos patienter med rygdeformitet end tidligere
antaget, og 3) at vi ikke kunne bekræfte GAP skalaens evne til at forudsige mekaniske
komplikationer efter deformitetskirurgi.
10
List of papers
This thesis was based on the following papers
I. Bari, T. J., Hallager, D. W., Tøndevold, N., Karbo, T., Hansen, L. V., Dahl, B., &
Gehrchen, M. Moderate interrater and substantial intrarater reproducibility of the
Roussouly Classification System in patients with adult spinal deformity. Spine Deformity
2019; 7(2), 312-318.
II. Bari, T. J., Hallager, D. W., Tøndevold, N., Karbo, T., Hansen, L. V., Dahl, B., &
Gehrchen, M. Spinopelvic parameters depending on the angulation of the sacral endplate
are less reproducible than other spinopelvic parameters in adult spinal deformity
patients. Spine Deformity 2019; 7, 771-778.
III. Bari, T. J., Ohrt-Nissen, S., Hansen, L. V., Dahl, B., & Gehrchen, M. (2019). Ability of
the global alignment and proportion score to predict mechanical failure following adult
spinal deformity surgery—validation in 149 patients with two-year follow-up. Spine
Deformity 2019; 7(2), 331-337.
11
Introduction
“Primum, non necere”
Adult Spinal Deformity (ASD) is a complex and heterogeneous spinal disorder involving
deformity in the coronal and the sagittal plane[2]. It is now well established that ASD is
associated with significant disability and pain [3–5]. Previous studies have estimated the
prevalence of ASD in the general population to be between 2%-23%[6–9], although a prevalence
of up to 68% has been suggested in the elderly[5,10]. This pathological condition encompasses a
variety of types and magnitudes of deformity[11]. The majority of patients with ASD are less
severe cases and these patients will often be treated using non-operative modalities such as
physical therapy and pain-medication. Patients with more severe deformities will often benefit
from surgical treatment [12–14], involving correction of the spine using pedicle screw
instrumentation. Although beneficial, surgery for ASD has definite risks, including mechanical
complications requiring revision surgery[15–17]. Therefore, increased attention has been
directed towards reducing these risks. Certain patient characteristics have been identified to be
correlated to postoperative complications and further investigation is being conducted in
developing radiographic targets for surgery[18–21]. The importance of sagittal balance is well
established and recent interest has been directed towards the development of individualized
radiographic goals for surgery[22–24]. The following sections of this thesis will aim to introduce
the importance of sagittal spine balance and the nature of sagittal spine deformity, followed by a
description of current classification systems and the gap in knowledge that necessitates the
papers presented in this thesis.
12
Background
Sagittal balance
In young healthy individuals, the spine is balanced with the
center of mass vertically aligned with the feet. Due to
degenerative changes with increasing age, or other causes
which will be mentioned later, a forward tilt, or kyphosing
event, will alter this balance[25–29]. Dubousset proposed the
“cone of economy” to illustrate the essentials of standing
balance (Figure 1) [30]. Centered at the feet, a 3-dimensional
cone is projected upwards and thus defines the range within
which the body can remain standing with minimum energy
expenditure[31]. As the center of mass sways from the center
of the cone, energy expenditure increases with consequent
muscle fatigue and pain [22,32]. The most common method
of measuring sagittal balance is the Sagittal Vertical Axis
(SVA) measured on long standing radiographs of the spine
and defined as the distance between the C7-plumbline and
the superior posterior corner of S1 (Figure 2). Understanding
sagittal balance requires awareness of the spinopelvic
parameters: Pelvic Incidence (PI), Pelvic Tilt (PT) and Sacral
Slope (SS). The measurement methods of these radiographic
parameters are illustrated in Figure 3. PI is a fixed,
individually morphologic parameter and does not change with age after adulthood[33–36].
Furthermore, the PI is not affected by the patient´s body position.
Compensating for imbalance
Besides natural aging, other processes or events can also cause spinal kyphosing with consequent
sagittal imbalance [25,37–42]. Neuromuscular diseases such as Parkinson, degenerative
conditions, Scheuermann´s disease, and autoimmune processes such as Ankylosing Spondylitis
often cause spinal deformity. Others causes are osteoporosis, vertebral fractures, short segment
Figure 1. Illustrates the “Cone of
economy proposed by Dubousset. The
least amount of energy is spent in a
balanced erect position. As the center of
mass shifts forward towards the edge of
the cone, energy expenditure increases.
13
degeneration, and iatrogenic events as post laminectomy syndrome or lumbar fusion in relative
Figure 2. Sagittal (lateral) spine. C7 indicate the C7
vertebrate. The solid line illustrates the sagittal vertical
axis (SVA), measured as the horizontal distance between
the superior, posterior corner of the sacral endplate and
the plumb line drawn from C7.
Figure 3. Sagittal (lateral) lower spine and pelvis.
Methods for measuring the spinopelvic radiographic
parameters: sacral slope, pelvic incidence and pelvic tilt.
14
kyphosis. Regardless of the reason for sagittal imbalance,
mechanisms of self- preservation will attempt to restore
balance to enable upright gait and horizontal gaze [30,43–
45]. These mechanisms will briefly be described:
Spinal extension
In an overall imbalanced spine, extension in parts above
and below the kyphotic segments is possible in a flexible
spine and with musculature capable of erection, thus
resulting in a reduction of thoracic kyphosis. Effects may
be painful and can lead to a possible spondylolisthesis
under extreme conditions[25].
Pelvic retroversion
Pelvic retroversion involves retroversion (hip extension)
of the pelvis around the femoral heads (Figure 4) [46,47].
PT represents the extent of pelvic retroversion, and as PT
increases, SS decreases which in term enables correction
of sagittal imbalance. As PI=PT+SS, individuals with a
large PI have greater ability of pelvic retroversion as their
PT range is greater[48]. Assuming that the SS can
maximally be decreased to 0° (horizontal sacral endplate),
an individual with a PI of 70° could potentially retrovert
their pelvis to a maximum of 70° in PT (70=70+0). This
would in term mean full hip extension, and to further
compensate for sagittal imbalance, other mechanisms
need to be used.
Knee flexion
At maximum retroversion of the pelvis, further correction
of sagittal balance will involve flexion of the knees and
dorsiflexion of the ankles (Figure 5) [47,49,50]. Both
pelvic retroversion and knee flexion increases muscle
activity and, more importantly, impairs walking[46,51].
Figure 4. Pelvic retroversion can compensate for
sagittal imbalance and involves retroversion of
the pelvic around the femoral heads.
Figure 5. Flexion of the knees as a last resort of
compensation for sagittal imbalance.
15
Treatment
Identifying patients in need of surgical treatment for ASD is complex, and involves radiographic
imaging (long sagittal and coronal films in standing position with knees fully extended) in
addition to a full clinical assessment[12]. Non-operative treatment is often used initially and
involves physical therapy, training and pain medication [52,53]. Bracing does not effectively
prevent curve progression and is therefore seldom used as a treatment modality of ASD[53,54].
It is now well established that surgical treatment can significantly benefit appropriately selected
patients with ASD compared to non-operative treatment[52,55–61]. However, the often
challenging decision to perform surgery for ASD is accompanied by an equally complex
decision making and surgical planning of type of correction, number of levels for instrumented
fusion and use of spinal osteotomies[20,43,52,62].
Complications and Adverse events
As operative techniques for ASD is in constant evolvement, surgeons are able to correct even
more severe deformities. Invasive procedures are often related to a certain risk of complications,
and with procedures as extensive as ASD surgery, this risk is especially considerable. A recent
systematic review by Sciubba et al. revealed an overall complication rate of 55% in 3,299
patients undergoing ASD surgery. Prospective studies have reported even higher rates of
perioperative adverse events [63,64]. Although most of these complications are minor, rarely
requiring further intervention, the risk of mechanical complications, such as pseudoarthrosis with
consequent rod breakage and proximal junctional kyphosis (PJK), is also significant. Previous
studies have reported the risk of revision surgery due to mechanical failure up to 30% within two
years of surgery [16,65–67]. Therefore, understanding sagittal imbalance, how to correct it and
identifying models to predict mechanical failure is of grave importance.
Classification systems
Attempts to classify disease was first implemented in the 18th
century and classification of
sagittal spine deformity has only been introduced in recent decades[68]. These were developed
to understand sagittal balance, to subcategorize patients based on the degree of imbalance, to
recognize differences in individual spinopelvic morphology and to set targets for surgical
treatment. In addition, classification systems provide effective tools for communication and
comparison in both clinical and academic settings. In a series of studies, Barrey et al. recognized
that sagittal imbalance can be compensated for – to an extent[69–71]. Based on the C7 plumb
line and degree of PT, patients were categorized as balanced, balanced through compensation or
16
imbalanced. Lafage et al. found that pelvic retroversion was correlated to poor health-related
quality of life (HRQOL)[46]. Legaye et al. suggested a relationship between spinopelvic
parameters and the degree of lumbar lordosis (LL). Boulay et al. later proposed an equation for
predicting LL based on PI and this has since been established and expressed as PI-LL[20,22,72].
These observations lead to a series of classifications aiming to classify the adult sagittal spine in
relation to ASD.
The Roussouly Classification
Roussouly et al. recognized that the sagittal shape of the spine presents with some variation
between individuals[73]. The authors established a system to classify this normal variation as
one of four types. Recently, a fifth type was added[74]. Figure 6 illustrates these five types. Type
1 and 2 are characterized by a small PI and consequent small SS. The apex of the LL is located
caudally in Type 1 creating a short lordosis. Whilst Type 2 has a slightly more cranial location of
the apex, and a higher inflection point, in terms producing a flat lordosis and overall flat spine.
The Type 3 spine has larger PI and SS with subsequent prominent lordosis. Type 4 is also
determined by a high-grade PI in combination with the largest SS (>45°) and most prominent
and longest lordosis. The intermediate type, Type 3 anteverted pelvis (3AP), was later added,
and characterized by a large SS despite a small PI, producing this anteverted pelvis.
Scoliosis Research Society-Schwab (SRS-Schwab) Adult Spinal Deformity Classification
By analyzing multiple radiographic parameters, the SRS-Schwab classification system was
proposed based on the four parameters most correlated to HRQOL measures. Three of these are
measured in the sagittal spine: PI-LL within 10°; SVA less than 4 cm; and PT under 20°[75].
17
Figure 6. The Roussouly Classification´s five main spine types. Type 1 and 2 are characterized by a small sacral slope
(SS) and small pelvic incidence (PI). Type 3 and 4 have large SS and PI. In Type 3 anteverted pelvis (3-AP), the pelvis is
shifted forward generating a large SS despite a small PI.
Global Alignment and Proportion (GAP) score
Although surgical correction of ASD can be beneficial in select cases, the risk of complications
is considerable [76,77] and up to 30% of patients are at risk of revision surgery due to
mechanical failure [16,65–67]. As the SRS-Schwab system was developed in relation to HRQOL
measures, and not in correlation to the risk of complications, the European Spine Study Group
recently proposed the GAP score[78]. The GAP score comprises four parameters based on pelvic
incidence and overall sagittal balance, in addition to a category for age. The authors developed a
scoring system based on these parameters and validated the score in a second sample of patients
with ASD. They found a substantial positive correlation between increasing GAP score and the
risk of revision surgery due to mechanical failure.
Reproducibility
As classification systems emerge, determining validity and reproducibility is essential. In the
field of deformity surgery, there is a general acceptance of a measurement error of ±5° for
radiographic parameters. Although, this is based on studies of the coronal Cobb angle, and a
recent review revealed that this assumed error is not well founded[79]. The following section
will aim to explain the terminology and describe methods for determining reproducibility. In
general, reproducibility is an umbrella-term for reliability and agreement[80].
18
Reliability
Reliability describes the variability of measuring a parameter in a single subject compared to that
of the entire sample. Therefore, reliability coefficients, such as the Intraclass Correlation
Coefficient (ICC), describe a measurement method´s (or instrument) ability to distinguish
between individuals [80–82]. Consequently, reliability does not describe measurement error.
Common statistical methods for describing reliability are ICC (numeric parameters) and Kappa
(κ, categorical parameters).
Agreement
Agreement describes the measurement error in repeated measurements of a parameter of interest.
Therefore, agreement estimates the precision of any single measurement[80,82], and aids in
differentiating between real clinical change from irrelevant fluctuations[83,84]. The standard
error of measurement (SEM) or Bland and Altman´s limits of agreement (LOA) are often used
for this purpose when assessing numeric parameters [85]. For categorical variables, crude
percentages of agreement are often used.
Example
Kottner et al. used the example of rating medical students[81]. All observers rated the students as
“excellent” and the agreement score would therefore be perfect between raters. The reliability
score would be close to zero, as there is no variability and therefore no way to distinguish
between individuals.
19
Objectives and hypotheses
As scientific studies are performed, and before applying the conclusion, results need to be
validated outside the context of the original study[86]. This is referred to as the external validity
and entails the extent to which a study conclusion can be generalized to other situations.
This PhD-thesis aimed to determine the reproducibility of the Roussouly Classification system
and the commonly used radiographic spinopelvic parameters PI, PT and SS in patients with
ASD. In addition, we aimed to validate the recently published GAP-score in a Danish population
of patients undergoing surgical treatment for ASD. To address these issues, we performed a
reproducibility study of radiographic parameters and a validation study of the GAP score.
Study I: Roussouly Classification
To provide estimates of inter- and intra-rater agreement and reliability of the Roussouly
Classification system in patients referred for ASD.
We hypothesized that the reproducibility would be comparable to estimates of the SRS-Schwab
Classification system. As a secondary outcome, we aimed to estimate the distribution of spine
types in this study population.
Study II: Spinopelvic parameters
To provide estimates of inter- and intra-rater agreement and reliability of the commonly used
sagittal radiographic parameters SVA, PI, PT and SS in patients referred for ASD.
We hypothesized that the measurement error would be greater than previously assumed.
Study III: GAP-score
To validate the GAP-score in a single-center cohort of patients undergoing ASD surgery.
We hypothesized that postoperative GAP score and categories would be correlated to
postoperative mechanical failure and revision surgery as reported in the original study[78].
20
Methods
A detailed description of each study´s design is provided in the manuscripts accompanying this
thesis as Appendices. The overall methodological considerations are described below.
Overall study design
The typical ASD patient is diagnosed at a secondary institute following referral by the patient´s
general practitioner (family practitioner). Following diagnosis at the secondary institute, patients
are referred to our tertiary institution for evaluation of possible surgical treatment. Our institution
serves a population of approximately 2.5 million citizens.
For Studies I and II, objectives did not concern intervention or estimates of effects. Therefore,
these studies were conducted as cross-sectional reproducibility studies of patients referred to our
institution for ASD assessment. Studies I and II were performed in combination based on the
same patient sample and radiographic measurements. Study III was performed as a retrospective
validation study of a recently published predictor of mechanical failure in patients undergoing
ASD surgery.
Patient inclusion
Study I-II
All patients referred for evaluation of ASD surgery at our tertiary institution were assessed for
inclusion from August 1, 2013 to March 31, 2014. As these studies aimed to assess specific
radiographic measurements, certain inclusion and exclusion criteria had to be met. Inclusion
criteria were ≥18 years of age and availability of sufficient radiographs. This was defined as
long, standing, whole-spine radiographs taken as “fists-on-clavicles” both in the anterior and
lateral plane, and including the C7 to L5 vertebra, sacral endplate and both femoral heads.
Exclusion criteria were: previous instrumentation, vertebral fracture or neuromuscular disease.
Study III
In study II, all patients undergoing ASD surgery from January 1, 2013 to December 31, 2015
were screened for inclusion. Inclusion criteria were age of 18 years or older and sufficient
radiographs (defined as previously mentioned) taken preoperatively, postoperatively and at
follow-ups 3 months, 1 year and 2 years after surgery. Consequently, patients with less than 2-
21
year follow-up were excluded. The patient sample was not exclusive to index procedures, and
thus, both primary and revision procedures were included.
Data collection
Clinical data, including demographics were obtained using electronic medical records. For
Studies I and II, this was limited to age and sex, in addition to radiographic parameters which
were measured on radiographic images. In Study III, further data were gathered including patient
characteristics and postoperative information regarding mechanical complications and secondary
unplanned surgeries (revision).
Radiographic measurements
All radiographs were obtained using the Digital Imaging and Communications in Medicine
(DICOM) format in the Radiology Information System and Picture Archiving and
Communications System IMPAX 6® (Agfa Healthcare NV, Mortsel, Belgium). Images were
then transferred to the image management system KEOPS® (SMAIO, Lyon, France). This
system is used for all radiographic measurement of patients with ASD at our institution. After
identifying key radiographic landmarks, including center of femoral heads, sacral endplate and
vertebral bodies from L5-C7, the software then generated a computerized model of the spine.
After manual adjustments of this model by the clinician, the software then calculates the
radiographic parameters including the Roussouly spine type. Again, this is the standard clinical
method for estimating radiographic parameters at our institution. For the GAP score and
category used in study III, the principal investigator calculated each specified radiographic
parameter based on the parameters provided by the KEOPS® software, in accordance to the
original paper[78].
Reproducibility measurements
In Studies I and II, reproducibility was assessed in a test-re-test setting by four raters in blinded
fashion. Detailed description of the methodology is provided in the accompanying manuscripts.
Outcomes
In Studies I and II, the primary outcomes were radiographic measurements, and no further
outcomes were recorded. In study III, the main outcome was defined as revision surgery within 2
years of index surgery due to a mechanical failure, defined as: proximal junctional kyphosis
(PJK), proximal junctional failure, distal junctional failure, rod breakage or “other”. We further
22
defined PJK as ≥10° increase in kyphosis angle measured between the uppermost instrumented
vertebra (UIV) and the upper endplate of the vertebra 2 levels above UIV (UIV+2)[87]. The
increase in angle was in comparison to the radiograph taken immediately postoperatively.
Proximal junctional failure was defined in accordance to Hart et al. [87] as PJK in combination
with either: fracture of the UIV or UIV+1, instrumentation pullout at UIV or posterior
osteoligamenteous disruption. Secondary outcome was mechanical failure within 2 years of
index procedure regardless of subsequent revision surgery.
Statistics
Demographic data
Distribution of data was assessed using histograms. Data were reported as proportions (%),
means with standard deviations (SD), or medians with interquartile ranges (IQR). The Student´s
t-test was used to compare numeric Gaussian distributed data, and the Wilcoxon signed-rank test
for non-Gaussian data. When comparing categorical data, the Pearson χ2 test of independence of
trend was used unless the expected frequency was below 5 for 20% of compared groups, in
which case the Fisher exact test was used.
Reproducibility: categorical parameters
For assessing reliability, Fleiss Kappa values were calculated for variables measured by more
than two raters [1,80,88]. When the number of compared raters were two, Cohens Kappa was
used [1,88,89]. The calculated kappa coefficients were then classified in accordance to Landis
and Koch as slight (≤0.2), fair (>0.2-0.4), moderate (>0.4-0.6), substantial (>0.6-0.8) or almost
perfect reliability (>0.8)[90]. Two-tailed Z-tests were used to compare individual intra-rater
reliability. We used the Bhapkar test of marginal homogeneity to assess differences in
distribution of categories between raters. Agreement was reported using crude percentages.
Reproducibility: numerical parameters
ICCs were used to assess reliability of numeric parameters. We used ICCA,1 for both intra- and
interrater reliability[91–93]. ICCs were classified as poor (<0.5), moderate (0.5-0.75), good
(>0.75-0.9) or excellent (>0.9)[91,94]. Agreement was reported using the SEM and 95%
LOA[85,95]. The SEM and 95% LOA were derived from variances calculated using a linear
mixed effects model [83,85,96]. A more detailed description for calculating the SEM and 95%
LOA is presented in Appendix 4.
23
Association to outcome
In Study III, the area under the curve (AUC) was calculated using receiver operating
characteristics (ROC) curves when assessing the diagnostic accuracy of the GAP score in
predicting postoperative outcome. The diagnostic accuracy was further classified based on the
AUC as: no or low power (0.5-0.7), moderate power (0.7-0.9) or high power (≥0.9). When
assessing GAP categories for association to outcome, the Cochrane-Armitage test of trend was
used and reported as odd ratios (OR) with 95% confidence intervals (CI). As a secondary
assessment, both GAP score and categories were analyzed for association to outcome using
logistic regression.
24
Results
The following section contains a summary of each study.
Study I
Objective
To assess the reproducibility of the Roussouly Classification in patients referred for ASD
evaluation.
Summary of results
Sixty-four patients were included with a median age of 46 (IQR 23-63), 30% of patients had an
SVA of more than 5 cm and almost 70% had a PI-LL mismatch. All 64 patients were rated twice
by four spine surgeons with different levels of experience. Interrater reliability was moderate
(κ=0.60) and intrarater reliability was substantial (κ=0.68). There was a tendency towards higher
intrarater reliability with rater experience. In terms of agreement, the same spine type was
assigned in both readings by the same rater in 76% (range: 67%-83%). Interrater agreement,
assessed as the same spine type assigned by all 4 raters, was 47%. Figure 7 illustrates the
distribution of spine types across all readings.
Main finding
I. Moderate interrater and substantial intrarater reliability when using the Roussouly
Classification.
25
Figure 7. Distribution of Roussouly types across four raters and two readings. The x-axis indicates the rater followed by the
reading, i.e. the first column details rater 1´s first reading, the second column rater 1´s second reading and so on.
26
Study II
Objective
The aim of this study was to assess the degree of measurement error of commonly used sagittal
radiographic parameters.
Summary of results
We identified 64 patients referred for ASD treatment. Four spine surgeons measured PI, SS, PT
and SVA in all patients twice, with a 14-day delay between readings. Reliability was assessed
using ICCs and classified as “excellent” (ICC<0.9) for all parameters (Table 1). Except interrater
reliability of PI (ICC=0.89, classified as “good”). Agreement was assessed using the 95% LOA,
and the measurement error was found to be greater than previously assumed (Table 2). In
addition, we found that the apparent measurement error was larger for parameters depending on
the slope of sacral endplate (PI and SS) compared to parameters depending of its position (PT
and SVA).
Main findings
II. The measurement error of spinopelvic parameters was greater than previously assumed.
III. The measurement error PI and SS were noticeably larger than that for PT.
IV. Measurement error was considerable despite excellent reliability.
27
Table 1. Assessment of reliability using intraclass correlation coefficients (ICC). Reliability was classified as “excellent” for
assessed parameters (ICC≥0.90) except for inter-rater reliability of pelvic incidence (0.89).
Table 2. Measurement precision measured as 95% Limits of Agreement (LOA) for commonly used radiographic parameters.
Data indicate the precision of a single measurement compared to the hypothetical true measurement.
28
Study III
Objective
To validate the GAP-score in a sample outside of the original cohort.
Primary outcome
Postoperative mechanical failure leading to revision surgery within two years of index surgery.
Secondary outcome
Any mechanical failure within two years of index surgery.
Summary of results
We retrospectively included 149 patients undergoing ASD surgery at a single tertiary institution.
All patients were followed for two years after surgery and medical records were used to obtain
events of mechanical failure and revision surgery. Mechanical failure occurred in 51% of cases,
and in 31% patients, revision surgery was performed due to mechanical failure. The AUC using
ROC curves showed “no or low discriminatory power” of the GAP score in predicting
mechanical failure with or without subsequent revision (AUC = 0.50 and 0.49) (Figure 8). We
found no significant correlation between GAP categories and either of the measured outcomes (p
= 0.28 and 0.58). In addition, secondary analyses using logistic regression was similarly without
significant association.
Main findings
V. GAP score and categories did not predict mechanical failure after ASD surgery.
29
Figure 8. Receiver operating
characteristic (ROC) curves of
the Global Alignment and
Proportion (GAP) score in
relation to either outcome.
“Mechanical failure” indicates
mechanical failure within two
years of index surgery. “Revision
surgery” indicates revision
surgery due to mechanical failure
within two years of index
surgery. The diagnostic accuracy
of the GAP score was classified
as “no or low power” for both
outcomes.
30
Discussion
The Roussouly Classification
The classification system was developed in an attempt to describe the normal variation in sagittal
spine shape[73]. Mechanical complications remain a considerable risk following ASD surgery,
and researchers are attempting to establish methods of reducing this risk, including
individualized radiographic targets[67]. Naturally, part of the solution to this must be found in
the fact that there are normal variations in the shape of human spine, and the Roussouly
Classification may well play a considerable role. To the best of our knowledge, the study
presented in this thesis (Study I) is the first to assess the reproducibility of the Roussouly
Classification. We found moderate inter-rater and substantial intrarater reliability. The only
previously well-established classification system used in ASD patients is the SRS-Schwab
Classification, and results were therefore compared to the reproducibility reported for this system
in three previous studies. Schwab et al. who developed the system also aimed to validate it[75].
In their study, 21 cases were selected to represent the distribution of classification grades.
Radiographs were pre-marked for landmarks and then presented to 9 raters. They reported
substantial inter-rater reliability, almost perfect intra-rater reliability and 88% crude intra-rater
agreement. The inter-rater agreement was not reported. By using pre-marked radiographs and
selecting patients for an even distribution of categories could easily produce artificially high
reliability. A second study was published by Liu et al. of 102 ASD patients[97]. Radiographs
were not pre-marked in their study, although exclusion criteria were not given. They found
substantial inter-rater reliability, almost perfect intra-rater reliability and 42% crude inter-rater
agreement. No intra-rater agreement was given. In neither of these two studies, trait prevalence
or rater-bias was reported. Evenly distributed trait prevalence between categories and large
differences in distribution between raters tends to produce higher reliability[98]. In a third study,
Hallager et al reported moderate inter-rater and substantial intra-rater reliability in addition to
39% crude inter-rater and 70% crude intra-rater agreement[99]. In their study, radiographs were
not pre-marked, and the selection process of patients was fully disclosed, without any active
selection of patients. Rater bias and trait prevalence was given, thus producing the study with a
method most comparable to the study presented in this thesis. Results of reliability and
agreement were similar to the reliability found in our study.
31
Measurement accuracy of sagittal radiographic parameters
Measurement precision
The aim of study II was to assess the measurement error of commonly used parameters on
sagittal radiographs of ASD patients, and to compare the results to the previously assumed ±5°.
We therefore used the 95% LOA, as opposed to reliability measures, to assess the measurement
error[80–82]. The 95% LOA is to be interpreted as a 95% confidence interval of the difference
between two individual measurements. A degree of error will always be present in measurements
of continuous variables [100]. Accordingly, it will affect the reading of single and repeated
measures [96] which often is of interest when treating patients with ASD. We found 95% LOA
of ±4° to ±13° for parameters measured as angles and approximately ±6 mm for SVA which is
measured as a distance (Table 2). At the time this study was conducted, no previous study had
reported the measurement accuracy of these radiographic parameters in patients with ASD using
agreement parameters. However, previous studies had attempted to describe the reproducibility
through other methods. Aubin et al. used 6 realistic spine prototypes as “gold standard”
compared to radiographic measurements and reported agreement as the standard deviation (SD)
of mean difference[101]. However, using the SD of mean difference fails to take into account a
large part of the error and tends to underestimate the measurement error[85,100]. In a study by
Lafage et al. radiographs were pre-marked for specific landmarks before presenting the images
to the raters[102]. Results can therefore be interpreted as the accuracy of the radiographic
software’s ability to calculate the parameters, based on the markings, and not the accuracy of
radiographic measurement. Finally, several additional studies have assessed the reproducibility
of these spinopelvic parameters; although, none have adhered to the methods for agreement
assessment suggested by Bland and Altman in combination with being performed on patients
with ASD[80,83,85,95,103–105].
Recently, and after submission of our study, Kyrölä et al. published their paper assessing the
reproducibility of the same radiographic parameters[106]. Their study was performed on 49
patients referred due to thoracolumbar complaints. Only 8 patients (16%) were classified as
having a structural deformity (scoliosis, kyphosis, spondylolysis or spondylolisthesis) as opposed
to our study, where all patients were referred for ASD evaluation. They reported the SEM as an
assessment of agreement, as also presented in our study. The SEM is easily compared between
studies as the SEMagreement takes into account the systematic difference between raters (Appendix
4). Furthermore, as SEM was calculated using the same method as in our study, in addition to the
given repeatability coefficient (RC), the 95% LOA can be derived[104]. Table 4 details the
32
comparison of these estimates and results were similar to our study. Kyrölä et al. found greater
measurement error for most parameters, except for intra-rater agreement of PI and SS.
Table 3. Comparison of agreement assessment using 95% Limits of Agreement (LOA). Results from the study by Kyrölä
et al. was calculated as ± the repeatability coefficient and derived as the mean of both readings for inter-rater agreement
and mean of all three raters for intra-rater agreement. Results were similar.
Importance of sacral endplate slope
In relation to PI and SS, the observed measurement error was profoundly larger than the
previously assumed ±5°. PI and SS are parameters depending on the slope of the sacral endplate,
as opposed to PT which merely depends on the endplates position (Figure 3). This difference in
measurement method could explain the observed difference in measurement error when
comparing PI and SS to the measurement error of PT. Pernaa et al. found similar results in
patients following hip arthroplasty[107] as did Kyrölä et al[106]. When looking more closely at
our own results, we found that in cases with large measurement difference between raters, the
difference was often due to disagreement in selecting the sacral endplate. This yielded up to 30°
difference between measurements of PI and SS while PT was only marginally different. The
sacral endplate can be difficult to distinguish in patients with ASD due to degenerative changes
and deformity. These findings therefore highlight an unrecognized challenge in radiographic
assessment of ASD.
33
Agreement describes measurement precision
In the reproducibility study of spinopelvic parameters by Kyrölä et al. [106], the authors reported
“excellent” reliability despite considerable measurement error —consistent with our results. This
further underlines the importance of choosing the correct statistical method in reproducibility
studies.
Predicting mechanical failure using the GAP-score
The seminal work of the European Spine Study Group demonstrated promising results using the
GAP score and category to predict mechanical failure in 74 patients following ASD surgery[78].
Despite multiple statistical analyses, we were not able to validate the initial findings in our
sample of 149 ASD patients. There are numerous possible explanations for this discrepancy.
First and foremost, all medical research is susceptible to type I errors, and rejection of the null
hypothesis cannot with 100% certainty be true. In addition, a certain amount of bias need to be
calculated for as the authors validated a model which they themselves developed. Therefore,
external validation is necessary before clinical implementation. Consequently, Jacobs et al. [108]
recently published their study comparing the predictability of the GAP score vs the SRS-Schwab
Classification in 39 patients with ASD. They found that the GAP score was significantly
correlated to postoperative mechanical failure, and significantly better than the SRS-Schwab
classification. At a closer look at the original study by Yilgor et al., out of 548 eligible patients,
only 222 met 2-year follow-up criteria, resulting in an eligibility of merely 40%. Of these 222
patients, 74 comprised the validation cohort. The surgical outcome of the remaining 326 (60%)
patients are unknown, and the risk of treatment-by-condition needs to be considered. In
comparison, the eligibility in our study was 80% (149 out of 186), whereas Jacobs et al. did not
disclose their exclusion process. The sample presented in our study was almost four times larger,
and the inclusion period was 3 years in comparison to 11 years in the study by Jacobs et al. All
three studies used the same definition for mechanical complications. A second common trait was
the retrospective nature. Two out of three of these studies demonstrated a positive correlation
between the GAP score and postoperative mechanical failure. Therefore, the results of our study
may well be due to a type II error (false negative). In search for the true answer to this question,
further studies are highly warranted, preferably in prospective cohorts.
A second explanation for the discrepancy could be explained by a missing aspect, or element, of
the GAP score. Previous studies have demonstrated that mechanical complications following
ASD surgery can be described to a myriad of factors: PT, sex, magnitude of thoracic kyphosis,
34
degree of correction, number of instrumented vertebra, ending instrumentation at the sacrum,
obesity, osteoporosis, surgical technique and muscle volume among other factors have been
association to the risk of failure[67,109–114]. As discussed in previous sections of this thesis, the
PT in particular is a direct estimate of compensation for sagittal imbalance. Jacobs et al.
demonstrated that the Relative Pelvic Version, a subcategory of the GAP-score, showed similar
properties and predictability of mechanical failure as the pelvic tilt modifier used in SRS-Schwab
Classification. Finally, the samples presented in the original validation study and the study
presented in this thesis, differed in an aspect: the percentage of patients undergoing revision
surgery due to mechanical complications were higher in our study, as was the proportion of
patients with a history of previous spine surgery. Regardless of demographic and outcome
discrepancy, a model for accurately predicting mechanical failure should be applicable to all
ASD patients. Whether or not the GAP-score is the true solution remains to be answered.
35
Limitations
There are important limitations to the three presented studies. The inclusion process in all three
studies is susceptible to selection bias, and most noticeable in studies I and II, due to the large
number of excluded patients, indicating low external validity. Most exclusions were ascribed
insufficient radiographs (not including either femoral heads or C7). Our institution has since
implemented protocols to reduce this rate, in collaboration with the local radiology department,
leading to very few ASD radiographs taken insufficiently today. In addition, all radiographic
measurements were performed using image management software. This particular system
requires the rater (or surgeon) to identify all anatomical parts of the spine necessary for
calculations of the radiographic parameters. The system then measures the angles and distances
based on the given algorithms for radiographic parameters (Figures 1, 2 and 5). This method of
measuring radiographic parameters, therefore, excludes a certain aspect of potential error – i.e.
once all landmarks and anatomical parts of the spine has been identified, no further measurement
error can occur (e.g. the rater cannot measure PI in an incorrect manner). This is what is
sometimes referred to as a “semi-automatic” system, in comparison to imaging systems where
the rater uses a measuring tool to calculate the angle between two lines. The software system
used in these studies is standard of care, at our institution, and used in all radiographic
assessment of ASD patients, including preoperative diagnostics, surgical planning and
postoperative evaluation of results. Nonetheless, the external validity can be questioned, as other
institutions may have to calculate for an additional step with potentially added measurement
error. Previous studies report that the use of dedicated software was related to greater reliability
compared to a standard picture archiving and communication (PACS) system[102,115]. This is
particularly relevant for study I. In study II, the potential extra error only adds to the conclusion:
the extent of measurement error may well be even greater when performing manual radiographic
measurements. Keeping with the methodology of study II, the study was performed as a post-hoc
study. Therefore, no power calculation was performed and results should be interpreted with
caution. Despite limitations to this method, our findings nevertheless suggest that radiographic
measurements of ASD patients are less accurate than previously assumed.
Turning to the third paper of this thesis, the aim was to validate the GAP score in an external
patient sample. The score was validated as whole in addition to the four subcategories. However,
the GAP score is comprised of five sub-elements, each contributing to the final total score. These
36
sub-elements were not evaluated individually. As patient characteristics in our study differed
slightly from the original patient sample, further analysis of the sub-elements of the GAP score
might have revealed correlation to the outcomes of interest. In addition, the retrospective nature
of our study, allows for selection bias in addition to exclusion of 20% of eligible patients due to
insufficient radiographs or missing 2-year follow up.
37
Perspectives and future research
With accelerated advances in surgical techniques and implant technology, combined with a
demographic shift seen in most developed countries[116,117], an increased demand for ASD
surgery is to be expected[5,118,119]. Also, the progression in anesthetic and medical treatment
allow for surgical treatment of patients previously deemed unfit to undergo major
surgery[60,120–124]. A recent Danish nationwide database study showed an increased 2-year
revision risk following ASD surgery from 2006 to 2011, with a slight decrease in recent
years[125]. Although this trend has not been validated in other studies, the risk of complications,
including revision surgery, continues to be significant[12,16,17,64,67,126].
Fixed alignment targets
Targets for surgical treatment has previously been regarded as a “one-size-fits-all”, i.e. a specific
set of radiographic parameters are regarded as “normal” and to be aimed for when performing
surgical correction of deformity. Recent studies have recognized flaws in this assumption as the
morphologic parameter PI seem to play a significant role in tailoring individual surgical
targets[18,78,127–130]. By focusing attention to the variation in spine shape in normal
individuals, the Roussouly classification was developed. This system may well play an important
role in surgical planning and evaluation of patients with spinal deformity[131], although further
studies assessing the effect of implementing this system and the effect on outcome is most
needed. Our institution is currently involved in multicenter study, a retrospective and a
prospective study investigating this precise effect, and preliminary results were presented at the
Scoliosis Research Society´s annual meeting in 2018.
Potential individuals not fitting the model
As a system for individual alignment targets is still to be established, a recent study by Ferrero et
al, found that a proportion of patients with ASD and sagittal imbalance were not able to
compensate through pelvic retroversion[132]. The authors suggested unknown neuromuscular or
hip pathology as potential causes to this phenomenon, and further investigation into this matter is
needed.
The aging spine
Degenerative processes occurring with increasing age is known to effect sagittal balance and the
occurrence of spinal deformity[26,43,133] Although, studies have suggested that partial sagittal
38
imbalance may be part of normal ageing, and correction of this imbalance may be poorly
received[48,134–136]. The effect of age, and its role in both sagittal balance and correction of
deformity might not be fully understood, and has to be addressed completely before
implementing a model for radiographic targets of ASD surgery[137].
Enhanced recovery after surgery
The cost of surgery for ASD is noteworthy and, with technological advances, increasing[138–
141]. In addition, potential revision procedures adds to this cost, and the effectiveness of ASD
surgery has to be evaluated with the ever increasing overall expenditure of worldwide
healthcare[142]. Reducing costs has become a significant focus, and implementation of enhanced
recovery after surgery (ERAS) principles has resulted in major improvements in clinical
outcomes and reduction in cost in other surgical fields[143–145]. In the field of spine surgery,
most studies have reported ERAS implementation in minimal invasive surgery, and few clinical
studies have reported the effect in major spine surgery such as ASD surgery[146–152]. Clinical
studies evaluating the effect of full implementation of ERAS principles in ASD surgery is still to
be published.
Gait, functional tests and paravertebral muscles
Why certain individuals are able to compensate for progressive spinal malalignment and others
are not, remain to be fully understood. Gait and motion analysis could provide important
information in comparison, and addition, to static radiographic parameters[153–156]. Physical
activity trackers has also been suggested as means of evaluating outcome[157]. By equipping
patients with activity trackers, real-time data of activeness may provide important information in
assessing severity of deformity and effect of surgery. A sudden decrease in activity could be
interpreted as symptoms of mechanical or neurologic complications, allowing earlier
intervention than possible today. Finally, spinal musculature is an essential factor in keeping an
upright posture. The characteristic, type and extent of muscle volume loss and fat infiltration has
been proposed as factors related to skeletal muscle degeneration[158,159]. Recent studies have
suggested this fat infiltration to be different in spinal erectors compared to other muscle groups,
and that it may be related to sagittal spinal imbalance[160–162].
39
Concluding remarks
Radiographic measurements are crucial in ASD assessment. In this thesis, we present three
papers of radiographic parameters in patients with ASD. Firstly, the reproducibility of the
Roussouly Classification was estimated. We found moderate to substantial reliability. Secondly,
we assessed single, commonly used radiographic parameters and found greater measurement
error than previously assumed. In addition, parameters depending on the angulation of the sacral
endplate were associated with the greatest measurement error. Finally, the recently published
GAP score was assessed for external validity in predicting mechanical failure after ASD surgery.
We found no correlation between the GAP score and mechanical failure – with or without
subsequent revision surgery.
40
References
[1] Kottner J, Audige L, Brorson S, et al. Guidelines for Reporting Reliability and Agreement
Studies (GRRAS) were proposed. Int J Nurs Stud 2011;48:661–71.
[2] Good CR, Auerbach JD, O’Leary PT, et al. Adult spine deformity. Curr Rev
Musculoskelet Med 2011;4:159–67.
[3] Pellisé F, Vila-Casademunt A, Ferrer M, et al. Impact on health related quality of life of
adult spinal deformity (ASD) compared with other chronic conditions. Eur Spine J
2014;24:3–11.
[4] Schwab F, Blondel B, Chay E, et al. The comprehensive anatomical spinal osteotomy
classification. Neurosurgery 2015;76:112–20.
[5] Ames CP, Scheer JK, Lafage V, et al. Adult Spinal Deformity: Epidemiology, Health
Impact, Evaluation, and Management. Spine Deform 2016;4:310–22.
[6] Carter OD, Haynes SG. Prevalence rates for scoliosis in US adults: results from the first
National Health and Nutrition Examination Survey. Int J Epidemiol 1987;16:537–44.
[7] Francis RS. Scoliosis screening of 3,000 college-aged women. The Utah Study--phase 2.
Phys Ther 1988;68:1513–6.
[8] Kebaish KM, Neubauer PR, Voros GD, et al. Scoliosis in adults aged forty years and
older: prevalence and relationship to age, race, and gender. Spine (Phila Pa 1976)
2011;36:731–6.
[9] Pérennou D, Marcelli C, Hérisson C, et al. Adult lumbar scoliosis. Epidemiologic aspects
in a low-back pain population. Spine (Phila Pa 1976) 1994;19:123–8.
[10] Schwab F, Dubey A, Gamez L, et al. Adult scoliosis: prevalence, SF-36, and nutritional
parameters in an elderly volunteer population. Spine (Phila Pa 1976) 2005;30:1082–5.
[11] Diebo BG, Shah N V, Boachie-Adjei O, et al. Adult spinal deformity. Lancet
2019;394:160–72.
[12] Youssef JA, Orndorff DO, Patty CA, et al. Current Status of Adult Spinal Deformity.
Glob Spine J 2013;3:051–62.
[13] Passias PG, Jalai CM, Line BG, et al. Patient profiling can identify patients with adult
spinal deformity (ASD) at risk for conversion from nonoperative to surgical treatment:
initial steps to reduce ineffective ASD management. Spine J 2018;18:234–44.
[14] Teles AR, Mattei TA, Righesso O, et al. Effectiveness of Operative and Nonoperative
Care for Adult Spinal Deformity: Systematic Review of the Literature. Glob Spine J
41
2017;7:170–8.
[15] Yadla S, Maltenfort MG, Ratliff JK, et al. Adult scoliosis surgery outcomes: a systematic
review. Neurosurg Focus 2010;28:E3.
[16] Charosky S, Guigui P, Blamoutier A, et al. Complications and Risk Factors of Primary
Adult Scoliosis Surgery. Spine (Phila Pa 1976) 2012;37:693–700.
[17] Sciubba DM, Yurter A, Smith JS, et al. A Comprehensive Review of Complication Rates
after Surgery for Adult Deformity: A Reference for Informed Consent. Spine Deform
2015;3:575–94.
[18] Legaye J, Duval-Beaupère G, Hecquet J, et al. Pelvic incidence: A fundamental pelvic
parameter for three-dimensional regulation of spinal sagittal curves. Eur Spine J
1998;7:99–103.
[19] Scheer JK, Keefe M, Lafage V, et al. Importance of patient-reported individualized goals
when assessing outcomes for adult spinal deformity (ASD): initial experience with a
Patient Generated Index (PGI). Spine J 2017;17:1397–405.
[20] Schwab F, Patel A, Ungar B, et al. Adult Spinal Deformity—Postoperative Standing
Imbalance. Spine (Phila Pa 1976) 2010;35:2224–31.
[21] Bess S, Protopsaltis TS, Lafage V, et al. Clinical and Radiographic Evaluation of Adult
Spinal Deformity. Clin Spine Surg 2016;29:6–16.
[22] Schwab FJ, Blondel B, Bess S, et al. Radiographical spinopelvic parameters and disability
in the setting of adult spinal deformity: a prospective multicenter analysis. Spine (Phila Pa
1976) 2013;38:E803--12.
[23] Diebo BG, Varghese JJ, Lafage R, et al. Sagittal alignment of the spine: What do you
need to know? Clin Neurol Neurosurg 2015;139:295–301.
[24] Glassman SD, Bridwell K, Dimar JR, et al. The Impact of Positive Sagittal Balance in
Adult Spinal Deformity. Spine (Phila Pa 1976) 2005;30:2024–9.
[25] Barrey C, Roussouly P, Perrin G, et al. Sagittal balance disorders in severe degenerative
spine. Can we identify the compensatory mechanisms? Eur Spine J 2011;20:626–33.
[26] Pollintine P, van Tunen MSLM, Luo J, et al. Time-Dependent Compressive Deformation
of the Ageing Spine. Spine (Phila Pa 1976) 2010;35:386–94.
[27] Ailon T, Smith JS, Shaffrey CI, et al. Degenerative Spinal Deformity. Neurosurgery
2015;77:S75–91.
[28] Grubb SA, Lipscomb HJ, Coonrad RW. Degenerative adult onset scoliosis. Spine (Phila
Pa 1976) 1988;13:241–5.
[29] Ailon T, Shaffrey CI, Lenke LG, et al. Progressive Spinal Kyphosis in the Aging
42
Population. Neurosurgery 2015;77:S164–72.
[30] Dubousset J. Three-dimensional analysis of the scoliotic deformity. Pediatr Spine Princ
Pract 1994:479–496.
[31] Ames CP, Smith JS, Scheer JK, et al. Impact of spinopelvic alignment on decision making
in deformity surgery in adults: A review. J Neurosurg Spine 2012;16:547–64.
[32] Rajnics P, Templier A, Skalli W, et al. The Association of Sagittal Spinal and Pelvic
Parameters in Asymptomatic Persons and Patients with Isthmic Spondylolisthesis. J
Spinal Disord Tech 2002;15:24–30.
[33] Roussouly P, Pinheiro-Franco JL. Sagittal parameters of the spine: biomechanical
approach. Eur Spine J 2011;20:578–85.
[34] Mac-Thiong J-M, Roussouly P, Berthonnaud É, et al. Age- and sex-related variations in
sagittal sacropelvic morphology and balance in asymptomatic adults. Eur Spine J
2011;20:572–7.
[35] Mac-Thiong J-M, Berthonnaud É, Dimar JR, et al. Sagittal Alignment of the Spine and
Pelvis During Growth. Spine (Phila Pa 1976) 2004;29:1642–7.
[36] Mac-Thiong J-M, Labelle H, Roussouly P. Pediatric sagittal alignment. Eur Spine J
2011;20:586–90.
[37] Wenger DR, Frick SL. Scheuermann kyphosis. Spine (Phila Pa 1976) 1999;24:2630–9.
[38] Cortet B, Houvenagel E, Puisieux F, et al. Spinal curvatures and quality of life in women
with vertebral fractures secondary to osteoporosis. Spine (Phila Pa 1976) 1999;24:1921–5.
[39] Vaccaro AR, Silber JS. Post-traumatic spinal deformity. Spine (Phila Pa 1976)
2001;26:S111-8.
[40] Potter BK, Lenke LG, Kuklo TR. Prevention and management of iatrogenic flatback
deformity. J Bone Joint Surg Am 2004;86-A:1793–808.
[41] Boody, Barrett S. MD; Rosenthal, Brett D. MD; Jenkins, Tyler J. MD; Patel, Alpesh A.
MD; Savage, Jason W. MD; Hsu WKM. Iatrogenic Flatback and Flatback Syndrome:
Evaluation, Management, and Prevention. Clin Spine Surg 2017;30:142–149.
[42] Lu DC, Chou D. Flatback syndrome. Neurosurg Clin N Am 2007;18:289–94.
[43] Roussouly P, Nnadi C. Sagittal plane deformity: an overview of interpretation and
management. Eur Spine J 2010;19:1824–36.
[44] Vidal J. Sagittal deviations of the spine, and trial of classification as a function of the
pelvic balance [in French]. Rev Chir Orthop Reparatrice Appar Mot 1984;70:124–6.
[45] Vidal J, Marnay T. [Morphology and anteroposterior body equilibrium in
spondylolisthesis L5-S1]. Rev Chir Orthop Reparatrice Appar Mot 1983;69:17–28.
43
[46] Lafage V, Schwab F, Patel A, et al. Pelvic Tilt and Truncal Inclination. Spine (Phila Pa
1976) 2009;34:E599–606.
[47] Lazennec J-Y, Brusson A, Rousseau M-A. Hip–spine relations and sagittal balance
clinical consequences. Eur Spine J 2011;20:686–98.
[48] Vaz G, Roussouly P, Berthonnaud E, et al. Sagittal morphology and equilibrium of pelvis
and spine. Eur Spine J 2002;11:80–7.
[49] Obeid I, Hauger O, Aunoble S, et al. Global analysis of sagittal spinal alignment in major
deformities: correlation between lack of lumbar lordosis and flexion of the knee. Eur
Spine J 2011;20:681–5.
[50] Diebo BG, Oren JH, Challier V, et al. Global sagittal axis: a step toward full-body
assessment of sagittal plane deformity in the human body. J Neurosurg Spine
2016;25:494–9.
[51] Roussouly P, Pinheiro-Franco JL. Biomechanical analysis of the spino-pelvic organization
and adaptation in pathology. Eur Spine J 2011;20:1–10.
[52] Crawford CH, Glassman SD. Fusing Adult Degenerative Deformities of the Lumbar
Spine. Semin Spine Surg 2011;23:222–6.
[53] Russo A, Bransford R, Wagner T, et al. Adult degenerative scoliosis insights, challenges,
and treatment outlook. Curr Orthop Pract 2008;19:357–65.
[54] Silva FE, Lenke LG. Adult degenerative scoliosis: evaluation and management.
Neurosurg Focus 2010;28:E1.
[55] Everett CR, Patel RK. A Systematic Literature Review of Nonsurgical Treatment in Adult
Scoliosis. Spine (Phila Pa 1976) 2007;32:S130–4.
[56] Scheer JK, Hostin R, Robinson C, et al. Operative Management of Adult Spinal
Deformity Results in Significant Increases in QALYs Gained Compared to Nonoperative
Management. Spine (Phila Pa 1976) 2018;43:339–47.
[57] Smith JS, Shaffrey CI, Berven S, et al. IMPROVEMENT OF BACK PAIN WITH
OPERATIVE AND NONOPERATIVE TREATMENT IN ADULTS WITH SCOLIOSIS.
Neurosurgery 2009;65:86–94.
[58] Li G, Passias P, Kozanek M, et al. Adult Scoliosis in Patients Over Sixty-Five Years of
Age. Spine (Phila Pa 1976) 2009;34:2165–70.
[59] Smith JS, Shaffrey CI, Berven S, et al. Operative Versus Nonoperative Treatment of Leg
Pain in Adults With Scoliosis. Spine (Phila Pa 1976) 2009;34:1693–8.
[60] Bridwell KH, Glassman S, Horton W, et al. Does Treatment (Nonoperative and
Operative) Improve the Two-Year Quality of Life in Patients With Adult Symptomatic
44
Lumbar Scoliosis. Spine (Phila Pa 1976) 2009;34:2171–8.
[61] Glassman SD, Carreon LY, Shaffrey CI, et al. The Costs and Benefits of Nonoperative
Management for Adult Scoliosis. Spine (Phila Pa 1976) 2010;35:578–82.
[62] Ames CP, Scheer JK, Lafage V, et al. Adult Spinal Deformity: Epidemiology, Health
Impact, Evaluation, and Management. Spine Deform 2016;4:310–22.
[63] Street JT, Lenehan BJ, DiPaola CP, et al. Morbidity and mortality of major adult spinal
surgery. A prospective cohort analysis of 942 consecutive patients. Spine J 2012;12:22–
34.
[64] Karstensen S, Bari T, Gehrchen M, et al. Morbidity and mortality of complex spine
surgery: a prospective cohort study in 679 patients validating the Spine AdVerse Event
Severity (SAVES) system in a European population. Spine J 2016;16:146–53.
[65] Daubs MD, Lenke LG, Cheh G, et al. Adult spinal deformity surgery: Complications and
outcomes in patients over age 60. Spine (Phila Pa 1976) 2007;32:2238–44.
[66] Smith JS, Klineberg E, Lafage V, et al. Prospective multicenter assessment of
perioperative and minimum 2-year postoperative complication rates associated with adult
spinal deformity surgery. J Neurosurg Spine 2016;25:1–14.
[67] Hallager DW, Karstensen S, Bukhari N, et al. Radiographic Predictors for Mechanical
Failure after Adult Spinal Deformity Surgery. Spine (Phila Pa 1976) 2017;42:E855–63.
[68] Gøtzsche P. Rational diagnosis and treatment: Evidence-based clinical decision-making.
John Wiley & Sons; 2008.
[69] Faundez A, Roussouly P, Le Huec JC. Équilibre sagittal pelvi-rachidien et pathologies
lombaires dégénératives. Rev Med Suisse 2011;7:2470–4.
[70] Barrey C, Jund J, Noseda O, et al. Sagittal balance of the pelvis-spine complex and lumbar
degenerative diseases. A comparative study about 85 cases. Eur Spine J 2007;16:1459–67.
[71] Le Huec JC, Charosky S, Barrey C, et al. Sagittal imbalance cascade for simple
degenerative spine and consequences: algorithm of decision for appropriate treatment. Eur
Spine J 2011;20:699–703.
[72] Boulay C, Tardieu C, Hecquet J, et al. Sagittal alignment of spine and pelvis regulated by
pelvic incidence: Standard values and prediction of lordosis. Eur Spine J 2006;15:415–22.
[73] Roussouly P, Gollogly S, Berthonnaud E, et al. Classification of the normal variation in
the sagittal alignment of the human lumbar spine and pelvis in the standing position. Spine
(Phila Pa 1976) 2005;30:346–53.
[74] Laouissat F, Sebaaly A, Gehrchen M, et al. Classification of normal sagittal spine
alignment: refounding the Roussouly classification. Eur Spine J 2017.
45
[75] Schwab F, Ungar B, Blondel B, et al. Scoliosis Research Society—Schwab Adult Spinal
Deformity Classification. Spine (Phila Pa 1976) 2012;37:1077–82.
[76] Fehlings MG, Kato S, Lenke LG, et al. Incidence and risk factors of postoperative
neurologic decline after complex adult spinal deformity surgery: results of the Scoli-
RISK-1 study. Spine J 2018;18:1733–40.
[77] Lenke LG, Fehlings MG, Shaffrey CI, et al. Neurologic Outcomes of Complex Adult
Spinal Deformity Surgery. Spine (Phila Pa 1976) 2016;41:204–12.
[78] Yilgor C, Sogunmez N, Boissiere L, et al. Global Alignment and Proportion (GAP) Score.
J Bone Jt Surg 2017;99:1661–72.
[79] Langensiepen S, Semler O, Sobottke R, et al. Measuring procedures to determine the
Cobb angle in idiopathic scoliosis: A systematic review. Eur Spine J 2013;22:2360–71.
[80] de Vet HCW, Terwee CB, Knol DL, et al. When to use agreement versus reliability
measures. J Clin Epidemiol 2006;59:1033–9.
[81] Kottner J, Streiner DL. The difference between reliability and agreement. J Clin
Epidemiol 2011;64:701–2.
[82] Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of
evaluative instruments. J Chronic Dis 1987;40:171–8.
[83] Weir JP. Quantifying Test-Retest Reliability Using the Intraclass Correlation Coefficient
and the SEM. J Strength Cond Res 2005;19:231.
[84] Eliasziw M, Young SL, Woodbury MG, et al. Statistical Methodology for the Concurrent
Assessment of Interrater and Intrarater Reliability: Using Goniometric Measurements as
an Example. Phys Ther 1994;74:777–88.
[85] Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat
Methods Med Res 1999;8:135–60.
[86] Mitchell ML, Jolley JM. Research design explained. Cengage Learning; 2012.
[87] Hart RA, McCarthy I, Ames CP, et al. Proximal Junctional Kyphosis and Proximal
Junctional Failure. Neurosurg Clin N Am 2013;24:213–8.
[88] Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reliability: Key concepts,
approaches, and applications. Res Soc Adm Pharm 2013;9:330–8.
[89] Cohen J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or
partial credit. Psychol Bull 1968;70:213–20.
[90] Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data.
Biometrics 1977;33:159.
[91] Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation
46
Coefficients for Reliability Research. J Chiropr Med 2016;15:155–63.
[92] McGraw KO, Wong SP. “Forming inferences about some intraclass correlations
coefficients”: Correction. Psychol Methods 1996;1:390–390.
[93] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol
Bull 1979;86:420–8.
[94] Portney LG, Watkins MP. Foundations of clinical research: applications to practice. vol.
892. Pearson/Prentice Hall Upper Saddle River, NJ; 2009.
[95] Martin Bland J, Altman D. STATISTICAL METHODS FOR ASSESSING
AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT.
Lancet 1986;327:307–10.
[96] Hopkins WG. Measures of Reliability in Sports Medicine and Science. Sport Med
2000;30:375–81.
[97] Liu Y, Liu Z, Zhu F, et al. Validation and Reliability Analysis of the New SRS-Schwab
Classification for Adult Spinal Deformity. Spine (Phila Pa 1976) 2013;38:902–8.
[98] Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample
size requirements. Phys Ther 2005;85:257–68.
[99] Nielsen DH, Gehrchen M, Hansen L V., et al. Inter- and Intra-rater Agreement in
Assessment of Adult Spinal Deformity Using the Scoliosis Research Society–Schwab
Classification. Spine Deform 2014;2:40–7.
[100] Atkinson G, Nevill AM. Statistical Methods For Assessing Measurement Error
(Reliability) in Variables Relevant to Sports Medicine. Sport Med 1998;26:217–38.
[101] Aubin C-E, Bellefleur C, Joncas J, et al. Reliability and Accuracy Analysis of a New
Semiautomatic Radiographic Measurement Software in Adult Scoliosis. Spine (Phila Pa
1976) 2011;36:E780–90.
[102] Lafage R, Ferrero E, Henry JK, et al. Validation of a new computer-assisted tool to
measure spino-pelvic parameters. Spine J 2015;15:2493–502.
[103] Popović ZB, Thomas JD. Assessing observer variability: a user’s guide. Cardiovasc Diagn
Ther 2017;7:317–24.
[104] Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of
measurement errors in continuous variables. Ultrasound Obstet Gynecol 2008;31:466–75.
[105] Jones M, Dobson A, O’Brian S. A graphical method for assessing agreement with the
mean between multiple observers using continuous measures. Int J Epidemiol
2011;40:1308–13.
[106] Kyrölä KK, Salme J, Tuija J, et al. Intra- and Interrater Reliability of Sagittal Spinopelvic
47
Parameters on Full-Spine Radiographs in Adults With Symptomatic Spinal Disorders.
Neurospine 2018;15:175–81.
[107] Pernaa K, Seppänen M, Mäkelä K, et al. Reliability of Sagittal Spinopelvic Alignment
Measurements After Total Hip Arthroplasty. Clin Spine Surg 2017;30:E909–14.
[108] Jacobs E, van Royen BJ, van Kuijk SMJ, et al. Prediction of mechanical complications in
adult spinal deformity surgery—the GAP score versus the Schwab classification. Spine J
2019;19:781–8.
[109] Nicholls FH, Bae J, Theologis AA, et al. Factors Associated with the Development of and
Revision for Proximal Junctional Kyphosis in 440 Consecutive Adult Spinal Deformity
Patients. Spine (Phila Pa 1976) 2017;42:1693–8.
[110] Kim YJ, Bridwell KH, Lenke LG, et al. Pseudarthrosis in adult spinal deformity following
multisegmental instrumentation and arthrodesis. J Bone Jt Surg - Ser A 2006;88:721–8.
[111] Edwards CC, Bridwell KH, Patel A, et al. Long adult deformity fusions to L5 and the
sacrum: A matched cohort analysis. Spine (Phila Pa 1976) 2004;29:1996–2005.
[112] Wang H, Ding W, Ma L, et al. Prevention of Proximal Junctional Kyphosis: Are Polyaxial
Pedicle Screws Superior to Monoaxial Pedicle Screws at the Upper Instrumented
Vertebrae? World Neurosurg 2017;May:405–15.
[113] Gandhi S V, Januszewski J, Bach K, et al. Development of Proximal Junctional Kyphosis
After Minimally Invasive Lateral Anterior Column Realignment for Adult Spinal
Deformity. Neurosurgery 2018;Mars:Epub. ahead of print.
[114] Kim D, Kim J, Kim D, et al. Risk factors of proximal junctional kyphosis after multilevel
fusion surgery: more than 2 years follow-up data 2017;60:174–80.
[115] Gupta M, Henry JK, Schwab F, et al. Dedicated Spine Measurement Software Quantifies
Key Spino-Pelvic Parameters More Reliably Than Traditional Picture Archiving and
Communication Systems Tools. Spine (Phila Pa 1976) 2016;41:E22–7.
[116] Lutz W, Sanderson W, Scherbov S. The coming acceleration of global population ageing.
Nature 2008;451:716–9.
[117] Jacobsen LA, Kent M, Lee M, et al. America´s aging population. Popul Bull 2011;66.
[118] Fehlings MG, Tetreault L, Nater A, et al. The Aging of the Global Population.
Neurosurgery 2015;77:S1–5.
[119] Smith JS, Shaffrey CI, Bess S, et al. Recent and Emerging Advances in Spinal Deformity.
Neurosurgery 2017;80:S70–85.
[120] Dykes PC, Samal L, Donahue M, et al. A patient-centered longitudinal care plan: vision
versus reality. J Am Med Informatics Assoc 2014;21:1082–90.
48
[121] Gillick MR. The Critical Role of Caregivers in Achieving Patient-Centered Care. JAMA
2013;310:575.
[122] Kupfer JM, Bond EU. Patient Satisfaction and Patient-Centered Care. JAMA
2012;308:139.
[123] Passias PG, Poorman GW, Jalai CM, et al. Morbidity of Adult Spinal Deformity Surgery
in Elderly Has Declined Over Time. Spine (Phila Pa 1976) 2017:1.
[124] Smith JS, Lafage V, Shaffrey CI, et al. Outcomes of operative and nonoperative treatment
for adult spinal deformity: A prospective, multicenter, propensity-matched cohort
assessment with minimum 2-year follow-up. Neurosurgery 2016;78:851–61.
[125] Pitter FT, Lindberg-Larsen M, Pedersen AB, et al. Revision Risk After Primary Adult
Spinal Deformity Surgery: A Nationwide Study With Two-Year Follow-up. Spine Deform
2019;7:619-626.e2.
[126] Pichelmann MA, Lenke LG, Bridwell KH, et al. Revision Rates Following Primary Adult
Spinal Deformity Surgery. Spine (Phila Pa 1976) 2010;35:219–26.
[127] Sebaaly A, Riouallon G, Obeid I, et al. Proximal junctional kyphosis in adult scoliosis:
comparison of four radiological predictor models. Eur Spine J 2018;27:613–21.
[128] Scheer JK, Smith JS, Schwab F, et al. Development of a preoperative predictive model for
major complications following adult spinal deformity surgery. J Neurosurg Spine
2017;26:736–43.
[129] Terran J, Schwab F, Shaffrey CI, et al. The SRS-schwab adult spinal deformity
classification: Assessment and clinical correlations based on a prospective operative and
nonoperative cohort. Neurosurgery 2013;73:559–68.
[130] Rothenfluh DA, Mueller DA, Rothenfluh E, et al. Pelvic incidence-lumbar lordosis
mismatch predisposes to adjacent segment disease after lumbar spinal fusion. Eur Spine J
2015;24:1251–8.
[131] Ohrt-Nissen S, Bari T, Dahl B, et al. Sagittal Alignment After Surgical Treatment of
Adolescent Idiopathic Scoliosis—Application of the Roussouly Classification. Spine
Deform 2018;6:537–44.
[132] Ferrero E, Vira S, Ames CP, et al. Analysis of an unexplored group of sagittal deformity
patients: low pelvic tilt despite positive sagittal malalignment. Eur Spine J 2016;25:3568–
76.
[133] Sebaaly A, Grobost P, Mallam L, et al. Description of the sagittal alignment of the
degenerative human spine. Eur Spine J 2018;27:489–96.
[134] Schwab F, Lafage V, Boyce R, et al. Gravity Line Analysis in Adult Volunteers. Spine
49
(Phila Pa 1976) 2006;31:E959–67.
[135] Mendoza-Lattes S, Ries Z, Gao Y, et al. Natural History of Spinopelvic Alignment Differs
From Symptomatic Deformity of the Spine. Spine (Phila Pa 1976) 2010;35:E792–8.
[136] Sugrue PA, McClendon J, Smith TR, et al. Redefining Global Spinal Balance. Spine
(Phila Pa 1976) 2013;38:484–9.
[137] Lafage R, Schwab F, Challier V, et al. Defining spino-pelvic alignment thresholds should
operative goals in adult spinal deformity surgery account for age? Spine (Phila Pa 1976)
2016;41:62–8.
[138] Raman T, Nayar SK, Liu S, et al. Cost-Effectiveness of Primary and Revision Surgery for
Adult Spinal Deformity. Spine (Phila Pa 1976) 2018;43:791–7.
[139] McCarthy IM, Hostin RA, Ames CP, et al. Total hospital costs of surgical treatment for
adult spinal deformity: An extended follow-up study. Spine J 2014;14:2326–33.
[140] McCarthy IM, Hostin RA, O’Brien MF, et al. Analysis of the direct cost of surgery for
four diagnostic categories of adult spinal deformity. Spine J 2013;13:1843–8.
[141] Yeramaneni S, Robinson C, Hostin R. Impact of spine surgery complications on costs
associated with management of adult spinal deformity. Curr Rev Musculoskelet Med
2016;9:327–32.
[142] Health expenditure per capita, 2018, p. 132–3.
[143] Kehlet H, Wilmore DW. Evidence-Based Surgical Care and the Evolution of Fast-Track
Surgery. Ann Surg 2008;248:189–98.
[144] Kehlet H. Fast-track colorectal surgery. Lancet (London, England) 2008;371:791–3.
[145] Ljungqvist O, Scott M, Fearon KC. Enhanced Recovery After Surgery. JAMA Surg
2017;152:292.
[146] Wang MY, Chang P-Y, Grossman J. Development of an Enhanced Recovery After
Surgery (ERAS) approach for lumbar spinal fusion. J Neurosurg Spine 2017;26:411–8.
[147] Wang MY, Chang HK, Grossman J. Reduced Acute Care Costs With the ERAS®
Minimally Invasive Transforaminal Lumbar Interbody Fusion Compared With
Conventional Minimally Invasive Transforaminal Lumbar Interbody Fusion.
Neurosurgery 2018;83:827–34.
[148] Angus M, Jackson K, Smurthwaite G, et al. The implementation of enhanced recovery
after surgery (ERAS) in complex spinal surgery. J Spine Surg 2019;5:116–23.
[149] Dagal A, Bellabarba C, Bransford R, et al. Enhanced Perioperative Care for Major Spine
Surgery. Spine (Phila Pa 1976) 2019;44:959–66.
[150] Carr DA, Saigal R, Zhang F, et al. Enhanced perioperative care and decreased cost and
50
length of stay after elective major spinal surgery. Neurosurg Focus 2019;46:E5.
[151] Wainwright TW, Immins T, Middleton RG. Enhanced recovery after surgery (ERAS) and
its applicability for major spine surgery. Best Pract Res Clin Anaesthesiol 2016;30:91–
102.
[152] Wainwright TW, Wang MY, Immins T, et al. Enhanced recovery after surgery (ERAS)—
Concepts, components, and application to spine surgery. Semin Spine Surg 2018;30:104–
10.
[153] Arima H, Yamato Y, Hasegawa T, et al. Extensive Corrective Fixation Surgeries for Adult
Spinal Deformity Improve Posture and Lower Extremity Kinematics During Gait. Spine
(Phila Pa 1976) 2017;42:1456–63.
[154] Yagi M, Ohne H, Konomi T, et al. Walking balance and compensatory gait mechanisms
in surgically treated patients with adult spinal deformity. Spine J 2017;17:409–17.
[155] Diebo BG, Shah N V., Pivec R, et al. From Static Spinal Alignment to Dynamic Body
Balance. JBJS Rev 2018;6:e3.
[156] Diebo BG, Challier V, Shah N V., et al. The Dubousset Functional Test is a Novel
Assessment of Physical Function and Balance. Clin Orthop Relat Res 2019:1.
[157] Haglin JM, Godzik J, Mauria R, et al. Continuous Activity Tracking Using a Wrist-
Mounted Device in Adult Spinal Deformity: A Proof of Concept Study. World Neurosurg
2019;122:349–54.
[158] Cruz-Jentoft AJ, Bahat G, Bauer J, et al. Sarcopenia: revised European consensus on
definition and diagnosis. Age Ageing 2019;48:16–31.
[159] Song M-Y, Ruts E, Kim J, et al. Sarcopenia and increased adipose tissue infiltration of
muscle in elderly African American women. Am J Clin Nutr 2004;79:874–80.
[160] Moal B, Bronsard N, Raya JG, et al. Volume and fat infiltration of spino-pelvic
musculature in adults with spinal deformity. World J Orthop 2015;6:727–37.
[161] Hyun S-J, Kim YJ, Rhim S-C. Patients with proximal junctional kyphosis after stopping at
thoracolumbar junction have lower muscularity, fatty degeneration at the thoracolumbar
area. Spine J 2016;16:1095–101.
[162] Banno T, Arima H, Hasegawa T, et al. The Effect of Paravertebral Muscle on the
Maintenance of Upright Posture in Patients With Adult Spinal Deformity. Spine Deform
2019;7:125–31.
U N I V E R S I T Y O F C O P E N H A G E N
F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S
51
Appendices
Study I
Moderate interrater and substantial intrarater reproducibility of the
Roussouly Classification system in patients with Adult Spinal Deformity
Spine Deformity 7 (2019) 312-318
https://doi.org/10.1016/j.jspd.2018.08.009
Moderate Interrater and Substantial Intrarater Reproducibility of theRoussouly Classification System in Patients With Adult Spinal DeformityTanvir Johanning Bari, MDa,*, Dennis Winge Hallager, MD, PhDa, Niklas Tøndevold, MDa,
Ture Karbo, MDa, Lars Valentin Hansen, MDa, Benny Dahl, MD, PhD, DMScb,Martin Gehrchen, MD, PhDa
aSpine Unit, Department of Orthopedic Surgery, Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, DenmarkbDepartment of Orthopedics and Scoliosis Surgery, Texas Children’s Hospital and Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
Received 23 December 2017; revised 5 May 2018; accepted 18 August 2018
Abstract
Study Design: Reproducibility study of a classification system.Objectives: To provide the inter- and intrarater reproducibility of the Roussouly Classification System in a single-center prospective cohortof patients referred for Adult Spinal Deformity.Summary of Background Data: The Roussouly Classification System was developed to describe the variation in sagittal spine shape innormal individuals. A recent study suggests that patients’ spine types could influence the outcome following spinal surgery. The utility of aclassification system depends largely on its reproducibility.Methods: Sixty-four consecutive patients were included in a blinded test-retest setting using digital radiographs. All ratings were per-formed by four spine surgeons with different levels of experience. There was a 14-day interval between the two reading sessions. Inter- andintrarater reproducibility was calculated using Fleiss Kappa and crude agreement percentages.Results: We found moderate interrater (k5 0.60) and substantial intrarater (k5 0.68) reproducibility. All 4 raters agreed on the Roussoulytype in 47% of the cases. The most experienced rater had significantly higher intrarater reliability compared to the least experienced rater (k5 0.57 vs 0.78). The two most experienced raters also had the highest crude agreement percentage (75%); however, they also had asignificant difference in distribution of spine types.Conclusion: The current study presents moderate interrater and substantial intrarater reliability of the Roussouly Classification System.These findings are acceptable and comparable to previous results of reproducibility for a classification system in patients with Adult SpinalDeformity. Additional studies are requested to validate these findings as well as to further investigate the impact of the classification systemon outcome following surgery.� 2018 Scoliosis Research Society. All rights reserved.
Keywords: Roussouly; Classification; Reproducibility of results; Observer variation; Spine; Adult; Scoliosis; Kyphosis; Sagittal alignment
Introduction
The variation in sagittal spinal shape among healthyyoung adults has been described by Roussouly et al. [1], whofound that the sagittal profile of the spine was related to the
orientation of the pelvis. As the sacral slope (SS) increases,so does the lower arc of lordosis as well as the globallordosis. Through this observation, four types of spinal typeswere observed, and recently, two additional types wereadded, resulting in the Roussouly Classification Systemconsisting of 6 types [2]. The five main types are illustratedin Figure 1, and all types are defined in Table 1. Type 1 andType 2 are characterized by having a low-grade SS and low-grade pelvic incidence (PI) and differ from each other by thenumber of lordotic vertebrae. Type 3 has a high-grade SSand high-grade PI whereas Type 4 is defined as the type withlargest SS in combination with a high grade PI. Type 3anteverted pelvis (Type 3AP) is identified by its high-grade
Author disclosures: TJB (none), DWH (none), NT (none), TK (none),
LVH (none), BD ( grants from K2M and Medtronic, outside the submitted
work), MG ( grants from K2M and Medtronic, outside the submitted work).
Ethics approval: The study was approved by the Danish Data Protec-
tion Agency and the Danish Patient Safety Authority.
*Corresponding author. Spine Unit, Department of Orthopedic Sur-
gery, Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, Denmark. Tel.:
þ45 5057 9448; fax: þ45 3545 2165.
E-mail address: [email protected] (T.J. Bari).
2212-134X/$ - see front matter � 2018 Scoliosis Research Society. All rights reserved.
https://doi.org/10.1016/j.jspd.2018.08.009
Spine Deformity 7 (2019) 312e318www.spine-deformity.org
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
SS in contrast to a low-grade PI. These spine types have beensuggested to influence the outcome of spinal surgery [3]. Toour knowledge, however, no previous study has investigatedthe reproducibility of this classification. Because the classi-fication system might be useful as a tool for the evaluation ofand individualizing the treatment for ASD, a reproducibilitystudy is necessary to test its usefulness.
The aim of the current study was to provide estimatesof the inter- and intrarater reproducibility of the Rous-souly Classification System in a population of patientswith ASD.
Material and Methods
We performed a reproducibility study on a single-center cohort of patients with ASD. Approval from thelocal data protection agency and patient safety authoritywas obtained (jr.nr. 2012-58-0004 and jr.nr: 3-3013-1980/1). All patients referred for evaluation of ASD at ourtertiary institution for spine surgery between August 1,2013, and March 30, 2014, were assessed for inclusion.
The inclusion criteria were as follows: referral for ASD,age >18 years, and sufficient radiographs. Sufficient ra-diographs were defined as follows: standing images inboth the lateral and anteroposterior plane, including bothfemoral heads and visible vertebrae from C7eS1 on thelateral images. Patients were standing with ‘‘fists-on-clavicles’’ [4]. Patients were excluded if there was ahistory of previous instrumentation, spinal fracture, orneuromuscular disease. Radiographs were acquired usingthe Digital Imaging and Communications in Medicineformat from the Radiology Information System and Pic-ture Archiving and Communications System IMPAX 6�(Agfa Healthcare NV, Mortsel, Belgium). A total of 8 setsof each patient’s radiographs were then transferred to theonline data management system KEOPS� (SMAIO,Lyon, France) as unmarked digital films. Each set of ra-diographs was coded with a unique identification key,thus blinding the raters from further identification of theindividual patients. Four raters performed all ratings.They consisted of two staff specialists with one and twoyears of experience in spinal surgery (Rater 1 and 2), andtwo senior spine surgeons with more than 20 years ofexperience, respectively (Rater 3 and 4). Each rateridentified specific points on each radiograph includingthe center of femoral heads, endplate of S1, and vertebralbodies from C7eL5. A geometric computerized modelwas then generated, and the software classified eachradiograph based on the measurements according to theRoussouly Classification System [2]. For training pur-poses, each rater was handed a short guide on how to usethe system and assessed 10 unique sets of radiographs.These radiographs were not included in the subsequent
Fig. 1. The Roussouly Classification System. The figure illustrates the five main spine types. To the left are the types with low-grade sacral slope (SS) and
low-grade pelvic incidence (PI). The types to the right are defined as having high-grade SS and high-grade PI. Type 3 Anteverted Pelvis (3AP) is seen in
between with a high-grade SS and a low-grade PI. With permission from Mr. Philippe Roussouly (CEO SMAIO, Lyon, France).
Table 1
Definition of Roussouly classification types.
Roussouly type First criterion Second criterion
Type 1 SS !35 <3 lordotic vertebrates
Type 2 SS !35 O3 lordotic vertebrates
Type 3 35 <SS !45
Type 3 AP 35 <SS !45 PI !50 � or PT !5 �
Type 4 SS >45
Kyphosis No lumbar lordosis No lumbar lordosis
AP, anteverted pelvis; PI, pelvic incidence; PT, pelvic tilt; SS, sacral
slope.
313T.J. Bari et al. / Spine Deformity 7 (2019) 312e318
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
readings. There was a 14-day delay between readings,and the order of cases was randomized. Results are pre-sented in accordance to the Guidelines for ReportingReliability and Agreement Studies [5].
Statistics
All statistical analyses were performed using the languageand environment for statistical computing, R, version 3.3.3(R Development Core Team, 2011, Vienna, Austria). Thesoftware package kappaSize, version 1.1, was used for powercalculation before the study. With the number of raters set to4, anticipated Kappa set to 0.80, 0.60e0.99 as two-sided95% confidence intervals, and significance level (a) 50.05, a minimum of 40 cases were needed for the study. Thepackage irr, version 0.84, was used for calculations of inter-and intrarater reliability. Distribution of data was assessed byhistograms, and descriptive data were reported as pro-portions, mean with standard deviation (SD), or median withinterquartile range (IQR). Fleiss Kappa coefficients werecalculated for categorical variables as the number of raterswere more than two [6-8]. Cohen Kappa was used whencomparing any 2 raters [7-9]. In accordance with Landis andKoch [10], the results of reliability (k) were classified intoslight (<0.20), fair (0.21e0.40), moderate (0.41e0.60),substantial (0.61e0.80), and almost perfect agreement(>0.81). Individual Cohen intrarater k coefficients for eachrater were compared using a two-tailed Z test. The Bhapkartest for marginal homogeneity was used to test for differ-ences in distribution between any two raters. Results wereclassified as significant at p !.05. All confidence intervalsare provided as 95% confidence intervals.
Results
A total of 803 patients were screened, of which 253 werereferred for ASD and had long-standing radiographs. Inaccordance to the described criteria, a further 55 patientswere excluded because of previous spinal fractures orneuromuscular disease. In 82 cases, both femoral heads orthe C7 vertebrate were not included or obstructed fromview because of a brace. An additional 36 doubles wereremoved and 6 were excluded because of inferior imagequality. This left a total of 74 patients for analysis. Of these,10 cases were used for training purposes, leaving 64 for
Table 2
Comparison of distribution of Roussouly types.
Laouissat et al. [12], % Current study, %
Roussouly Classification System
Type 1 12 13
Type 2 22 35
Type 3 16 19
Type 3 AP 30 10
Type 4 13 21
Kyphosis d 2
AP, anteverted pelvis.
Table 3
Roussouly type inter- and intrarater reliability.
Fleiss kappa
coefficients (95%
Confidence intervals)
Interpretation
of reliability
coefficient*
Percentage
of agreementy
Interrater
Reading 1 0.59 (0.49e0.70) Moderate 47
Reading 2 0.60 (0.50e0.69) Moderate 47
Both
readings
combined
0.60 (0.53e0.66) Moderate 47
Intrarater
Rater 1 0.57 (0.41e0.72) Moderate 67
Rater 2 0.68 (0.54e0.82) Substantial 75
Rater 3 0.78 (0.65e0.90) Substantial 83
Rater 4 0.70 (0.56e0.85) Substantial 78
All raters
combined
0.68 (0.61e0.75) Substantial 76
* <0.20: slight; 0.21e0.40: fair; 0.41e0.60: moderate; 0.61e0.80;
substantial; >0.81: almost perfect.y Percentage of cases where all 4 raters assigned the same Roussouly
Type.
Table 4
Interrater reliability and agreement comparing any two raters using Cohen
kappa (95% confidence interval) and crude agreement percentages.
Rater 1 Rater 2 Rater 3 Rater 4
Rater 1 d 0.57*
(0.47e0.68y)0.58* (0.48e0.69y) 0.55* (0.44e0.66y)
67.2%
(65.6e68.8z)68.0% (67.2e68.8z) 66.4% (64.1e68.8z)
Rater 2 d 0.62* (0.51e0.72y) 0.58* (0.48e0.69y)70.3% (67.2e73.4z) 68.0% (62.5e73.4z)
Rater 3 d 0.67* (0.57e0.77y)75% (73.4e76.6z)
* Cohen kappa.y 95% confidence interval.z Range for reading 1 and 2.
Fig. 2. Venn diagram of agreement as a total of both readings. Intersec-
tions specify the number of cases where raters (1, 2, 3, and 4) agreed on
the spine type.
314 T.J. Bari et al. / Spine Deformity 7 (2019) 312e318
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
final ratings. There were 44 (68.8%) females and 20(31.2%) males, with a median age of 46 years (IQR 523e63 years). Within one year of the referral date, 15 pa-tients (23.4%) underwent a surgical procedure. The meanthoracic kyphosis (T1eT12) and lumbar lordosis (L1eS1)were 57.2 (�20.0) and 48.7 (�23.6). Mean spinopelvicparameters for SS, pelvic incidence (PI) and pelvic tilt (PT)
were 35.1 (�14.6), 51.6 (�16.5), and 16.6 (�13.5),respectively. PI-LL mismatch (PI-LL >10� or PI-LL<�10�) was seen in 42 patients (68.8%) with a mean PI-LL of �1.5 (�24.4). The population had a mediansagittal vertical axis (SVA) of þ1.5 cm (IQR: �1.1 to 5.8)and 19 (29.7%) were sagittally unbalanced, with an SVA ofmore than þ5 cm [11]. The most frequently occurring
Fig. 3. Distribution of Roussouly type by rater and reading. There was a significant difference in distribution of Roussouly types between Raters 3 and 4dp
5 .034 in reading 1 and p 5 .027 in reading 2.
Fig. 4. Fleiss kappa for each Roussouly type showed similar reliability except for ‘‘Kyphosis,’’ for which there was a large difference between inter- and
intrarater reliability.
315T.J. Bari et al. / Spine Deformity 7 (2019) 312e318
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
Roussouly type across all 8 ratings was Type 2. The leastfrequent type was Kyphosis, only specified three timesdallin the same patient (Table 2).
Reproducibility
The mean interrater reliability was moderate (k 5 0.60)and on both readings all 4 raters agreed on the sameRoussoulytype in 47% cases. Mean intrarater reliability was substantial(k5 0.68)dranging fromk5 0.57 tok5 0.78 between raters(Table 3). Raters averagely assigned the same type on bothreadings in 76% of the cases (range: 67% to 83%).
Rater performance
We observed a significantly higher intrarater reliabilitycoefficient for one of the two more experienced raters(Rater 3) compared to the least experienced rater (Rater 1).No other significant differences in intrarater reliability werefound for one rater compared to the others. Comparinginterrater reliability between any combination of two ratersshowed the highest coefficient for the most experiencedraters (Rater 3 and 4), who averagely agreed on 75% (k 50.67) of cases (Table 4 and Fig. 2). The difference was,however, not significantly different compared to the tworaters (Rater 1 and 2) with the lowest combined interraterreliability (k 5 0.57, p 5 .09).
Rater bias
A significant difference in distribution of Roussoulytypes was found between the classifications by Raters 3 and4 on both readings (p 5 .034 and p 5 .027, respectively)(Fig. 3). No other significant distribution differences wereobserved between any two raters.
Reliability among classification categories
The most reliable classification categories were Types 1,2, and 4, which showed substantial interrater reliability.Type 3 and 3AP showed moderate reliability, whereas‘‘Kyphosis’’ was only fairly reliable. Intrarater reliabilityfollowed the same pattern except for ‘‘Kyphosis,’’ forwhich substantial reliability was observed (Fig. 4).
Discussion
The current study found moderate inter- and substantialintrarater reproducibility of the Roussouly ClassificationSystem applied to ASD patients.
We observed a spread in intrarater reliability across the 4raters ranging from 0.57 to 0.78 and a trends towards cor-relation to the experience of the surgeon. Furthermore, thetwo most experienced raters (Raters 3 and 4) had thehighest rate of agreement and interrater reliability(Table 3). However, we also observed a significant differ-ence in distribution of Roussouly types between the tworaters. This difference in distributions could also explain
the higher reliability coefficients beyond rater experi-ence [13].
Comparison to previous studies
Comparing the current study with previous reproduc-ibility studies on ASD classification systems, substantial andalmost perfect intra- and interrater reliability was reported bySchwab et al. [14] for the Scoliosis ResearchSocietyeSchwab Adult Spinal Deformity classification(SRS-Schwab). However, readings were performed in only21 preselected cases and on premarked radiographs. Liu et al.[15] had similar results presenting substantial and almostperfect inter- and intrarater reliability without reporting theselection process for the 102 included cases. Hallager et al.[16] also performed a reproducibility study of the SRS-Schwab classification system. They founddin contrary tothe previously mentioned studiesdmoderate interrater andsubstantial intrarater reliability, comparable to our results.The latter study presented a consecutive selection process.Additionally, radiographs were not marked in advance. Weargue that this method of performing a reproducibility studyis the most desirable, and is comparable to the current study.However, the current study used a semiautomatic system toclassify the spine type in contrast to the previouslymentioned studies. The kappa statistics is widely used inreproducibility studies, but certain shortcomings must beconsidered. The statistics is highly sensitive to marginalhomogeneity and trait prevalence of distribution in types[12,13]. The more even the distribution of categories, thelarger the Kappa coefficient. However contradictory, theKappa coefficient also becomes larger the more different thedistribution between raters. Therefore, the selection processin reproducibility studies is crucial to the overall results ofKappa, and selection bias could easily produce artificiallybetter results, as may be the case for the results of the twoinitially mentioned studies [14,15]. Hallager et al. [16] per-formed their study under similar conditions as the presentstudy, and this could explain the resemblance in results.Keeping the traits of Kappa in mind, the Roussouly typeswere clearly unevenly distributed in our population aswell asin the original studies of the classifications [1,2]. Further-more, the distribution between raters was not significantlydifferent except for between Raters 3 and 4. Choosing amoreevenly distributed population probably would have contrib-uted to a higher degree of reproducibility. However, thedistribution of Roussouly types both in the normal popula-tion and in patients with ASD is most probably not evenlydistributed. Therefore, we suggest that future studies alsoconsist of randomly or consecutively selected cases and thatthe selection process is clearly presented.
Distribution
Type 2 was clearly the predominant type across bothreadings, including about a third of the patients. Laouissat
316 T.J. Bari et al. / Spine Deformity 7 (2019) 312e318
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
et al. [2] described the distribution in 296 healthy adultswith Type 3 AP as the predominant type (Table 2). Weattribute the somewhat different distribution to the differ-ence between healthy adults and ASD patients, because oneof the compensating mechanisms to sagittal imbalance isretroversion of the pelvis, which means increasing thepelvic tilt, and as a result of the geometric properties of thepelvic parameters, thus reducing the SS [17]. Also the au-thors theorized that patients with larger PI and SS should beable to withstand changes in alignment better than thosewith smaller values [18]. Patients with larger PI and SSwould predominantly be categorized as Types 3 and 4, andpatients with small PI and SS would mainly be categorizedas Types 1 and 2. Following the earlier-mentioned theory,the latter would be less able to compensate changes inspinal alignment and therefore to a wider extentseek treatment.
The current study presents a spread in reliability co-efficients when comparing each Roussouly Type (Fig. 4).Kyphosis, which had the lowest interrater kappa was alsothe spine type least represented in the population as it wasonly classified on three occasions for one case (Table 2).
Strengths and limitations
The strength of the current study is the consecutive one-center design minimizing selection bias. Furthermore, thenature of blinding the raters to others’ and previous ratingsminimized the risk of estimation bias. There was, however,a high percentage of excluded cases due to a high rate ofinsufficient radiographs. Since the period of inclusion, thisrate has declined considerably, leading to a better utility ofthe current classification system at our department.Furthermore, the current study used raters with differentlevels of experience, leading to more generalizable results.
This study was performed in a relatively small samplesize, which might contribute to sampling bias. Neverthe-less, power estimation was performed before the study andthe presented selection process of a consecutive cohortcontributes to minimize the risk of this type of bias.
Furthermore, all ratings in the current study took placein unmarked digital radiographs using a semi-automaticsystem. Although the reproducibility would be greater inusing premarked radiographs, we argue that the utility ismore favorable as premarked images would not be used in aclinical setting. The use of a semiautomatic system haspreviously been investigated in patients with ASD [19] aswell as in patients with a wider range of spinal pathologies[20]. In contrast, using a fully manual method of classifyingeach radiograph could negatively affect the reproducibilityas an extra layer of potential error would be introduced. Inthe setting of the current study, the spine type was given bythe software once the reader had made his measurements.This method of measurement was chosen as we argue thatsoftware systems can easily be set for these calculationsinstead of being performed by the surgeon.
The primary objective of a classification system is toultimately improve the outcome for individual patients.Others have hypothesized that patients with differentRoussouly types may have different outcomes followingspinal surgery [3]. There is, therefore, still need for furtherstudies on how patients’ spine type may influence theoutcome following surgery. With the present study, we hopeto aid future investigations on this matter.
Conclusion
In this study of the Roussouly Classification System, wepresent moderate interrater and substantial intraraterreproducibility in a single-center cohort of 64 ASD pa-tients. We observed a trend for higher levels of reproduc-ibility among more experienced raters. We conclude thereproducibility of this system to be acceptable and com-parable to other ASD classification systems. We suggestthat future studies aim to validate these findings as well asaddress the possible influence of the system on patientoutcome following surgery.
References
[1] Roussouly P, Gollogly S, Berthonnaud E, et al. Classification of the
normal variation in the sagittal alignment of the human lumbar spine
and pelvis in the standing position. Spine (Phila Pa 1976) 2005;30:
346e53.
[2] Laouissat F, Sebaaly A, Gehrchen M, et al. Classification of normal
sagittal spine alignment: refounding the Roussouly classification. Eur
Spine J 2018;27:2002e11.[3] Laouissat F, Scemama C, Del�ecrin J. Does the type of sagittal spinal
shape influence the clinical results of lumbar disc arthroplasty? Or-
thop Traumatol Surg Res 2016;102:765e8.[4] Horton WC, Brown CW, Bridwell KH, et al. Is there an optimal pa-
tient stance for obtaining a lateral 36" radiograph? A critical com-
parison of three techniques. Spine (Phila Pa 1976) 2005;30:
427e33.
[5] Kottner J, Audige L, Brorson S, et al. Guidelines for Reporting Reli-
ability and Agreement Studies (GRRAS) were proposed. Int J Nurs
Stud 2011;48:661e71.
[6] Fleiss JL. Measuring nominal scale agreement among many raters.
Psychol Bull 1971;76:378e82.
[7] de Vet HCW, Terwee CB, Knol DL, et al. When to use agreement
versus reliability measures. J Clin Epidemiol 2006;59:1033e9.
[8] Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reli-
ability: Key concepts, approaches, and applications. Res Soc Adm
Pharm 2013;9:330e8.
[9] Cohen J. Weighted kappa: nominal scale agreement with provision
for scaled disagreement or partial credit. Psychol Bull 1968;70:
213e20.
[10] Landis JR, Koch GG. The measurement of observer agreement for
categorical data. Biometrics 1977;33:159e74.[11] Schwab F, Patel A, Ungar B, et al. Adult spinal deformity postoper-
ative standing imbalance. Spine (Phila Pa 1976) 2010;35:2224e31.
[12] Gwet K. Inter-rater reliability: dependency on trait prevalence and
marginal homogeneity. Stat Methods Inter-Reliabil Assess 2002;2:9.
[13] Sim J, Wright CC. The kappa statistic in reliability studies: use, inter-
pretation, and sample size requirements. Phys Ther 2005;85:257e68.
[14] Schwab F, Ungar B, Blondel B, et al. Scoliosis research societ-
ydSchwab adult spinal deformity classification. Spine (Phila Pa
1976) 2012;37:1077e82.
317T.J. Bari et al. / Spine Deformity 7 (2019) 312e318
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
[15] Liu Y, Liu Z, Zhu F, et al. Validation and reliability analysis of the
new SRS-Schwab classification for adult spinal deformity. Spine
(Phila Pa 1976) 2013;38:902e8.[16] Nielsen DH, Gehrchen M, Hansen LV, et al. Inter- and intra-rater
agreement in assessment of adult spinal deformity using the Scoli-
osis Research Society-Schwab classification. Spine Deform 2014;2:
40e7.
[17] Ames CP, Smith JS, Scheer JK, et al. Impact of spinopelvic alignment
on decision making in deformity surgery in adults: a review.
J Neurosurg Spine 2012;16:547e64.
[18] Barrey C, Roussouly P, Perrin G, et al. Sagittal balance disorders in
severe degenerative spine. Can we identify the compensatory mech-
anisms? Eur Spine J 2011;20(suppl 5):626e33.[19] Aubin C-E, Bellefleur C, Joncas J, et al. Reliability and accuracy
analysis of a new semiautomatic radiographic measurement software
in adult scoliosis. Spine (Phila Pa 1976) 2011;36:E780e90.
[20] Maillot C, Ferrero E, Fort D, et al. Reproducibility and repeatability
of a new computerized software for sagittal spinopelvic and scoliosis
curvature radiologic measurements: Keops?? Eur Spine J 2015;24:
1574e81.
318 T.J. Bari et al. / Spine Deformity 7 (2019) 312e318
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
U N I V E R S I T Y O F C O P E N H A G E N
F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S
59
Study II
Spinopelvic parameters depending on the angulation of the sacral
endplate are less reproducible than other Spinopelvic parameters in
Adult Spinal Deformity patients
Spine Deformity 7 (2019) pp. 771-778
https://doi.org/10.1016/j.jspd.2018.12.002
Spinopelvic Parameters Depending on the Angulation of the Sacral EndPlate Are Less Reproducible Than Other Spinopelvic Parameters in Adult
Spinal Deformity PatientsTanvir Johanning Bari, MDa,*, Dennis Winge Hallager, MD, PhDa, Niklas Tøndevold, MDa,
Ture Karbo, MDa, Lars Valentin Hansen, MDa, Benny Dahl, MD, PhD, DMScb,Martin Gehrchen, MD, PhDa
aSpine Unit, Department of Orthopedic Surgery, Rigshospitalet, University Hospital of Copenhagen, Copenhagen, DenmarkbDepartment of Orthopedics and Scoliosis Surgery, Texas Children’s Hospital and Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
Received 30 July 2018; revised 27 November 2018; accepted 3 December 2018
Abstract
Study Design: Reproducibility study.Objectives: To report the agreement and reliability for commonly used sagittal plane measurements.Summary of Background Data: Spinopelvic parameters and sagittal vertical axis (SVA) are commonly used parameters for preoperativeplanning and postoperative evaluation of patients with adult spinal deformity (ASD). Previous reproducibility studies have focused ondescribing the reliability using intraclass correlation coefficients (ICCs), thus quantifying the methods’ ability to distinguish betweenindividuals. To our knowledge, no previous study in patients with ASD has reported the measurement error in terms of limits of agreement.The current study aimed to report the agreement and reliability for measurements of pelvic incidence (PI), pelvic tilt (PT), sacral slope (SS),and SVA in ASD patients.Methods: In a consecutive, one-center cohort of 64 patients referred for ASD evaluation, a blinded test-retest study was performed.Reliability was assessed using ICCs, whereas 95% limits of agreement (LOAs) were used to quantify agreement.Results: We found ‘‘excellent’’ (ICC O 0.9) results in all analyses of reliability except for interrater PI, which was classified as ‘‘good’’(ICC 5 0.89). However, considerable interrater measurement error was observed for parameters depending on the angulation of the sacralend plate (95% LOA of �11� and �14� for SS and PI, respectively) compared with �5� for PT and �7 mm for SVA, which depends on thelocation of the sacral end plate. Intrarater agreement was only slightly better.Conclusion: These are to our knowledge the first estimates of measurement error for sagittal spinopelvic parameters in ASD patients.Despite near excellent ICCs, we found considerable measurement error for parameters depending on the angulation rather than the locationof the sacral end plate.Level of Evidence: Level II.� 2018 Scoliosis Research Society. All rights reserved.
Keywords: Spine; Adult; Reproducibility; Reliability; Agreement
Introduction
Radiographic measurements of the spine are widely usedto assess patients with adult spinal deformity (ASD) and
used for preoperative planning as well as a measure ofoutcome following surgical treatment. It has been estab-lished that sagittal plane balance is of more importancethan the coronal plane balance [1,2], and the most
Author disclosures: TJB (none), DWH (none), NT (none), TK (none),
LVH (none), BD (institutional grants from Medtronic, United States and
K2M, outside the submitted work), MG (institutional grants from Med-
tronic and K2M, outside the submitted work).
Funding sources: none.
IRB approval: 3-3013-2760/1.
*Corresponding author. Spine Unit, Department of Orthopedic Sur-
gery, Rigshospitalet, University Hospital of Copenhagen, Denmark Bleg-
damsvej 9, 2100 Copenhagen, Denmark. Tel.: þ45 5057 9448; fax: þ45
3545 2165.
E-mail address: [email protected] (T.J. Bari).
2212-134X/$ - see front matter � 2018 Scoliosis Research Society. All rights reserved.
https://doi.org/10.1016/j.jspd.2018.12.002
Spine Deformity 7 (2019) 771e778www.spine-deformity.org
commonly used parameters are the sagittal vertical axis(SVA) and the spinopelvic parameters: sacral slope (SS),pelvic tilt (PT), and pelvic incidence (PI) (Figure 1).Despite the reliance on these radiographic measurements,the inter- and intrarater agreement has not yet been estab-lished in ASD patients.
The terms agreement and reliability are often inter-changeably used. Agreement in terms of standard error ofmeasurement (SEM) and limits of agreement (LOA) pro-vide an absolute index of measurement accuracy [3] anddescribe the extent of disagreement between measurementsperformed on the same subject. The 95% LOA describe thedifference in measurements exceeded in 5% of pairs ofmeasurements [4] and are, therefore, important thresholdsto differentiate between actual change in repeated mea-surements from measurement error [5]. This is in contrastto reliability measures such as the widely used intraclasscorrelation coefficient (ICC). ICC represents the ratio ofwithin-subject variability and between-subject variability[3,6]. Consequently, reliability is used to evaluate an in-strument’s ability to distinguish between subjects [7-10]and not of measurement accuracy. As an umbrella term,reproducibility is used to describe both agreement andreliability [8].
Measurements on digital radiographs have previouslybeen compared with traditional manual measurements withexcellent reliability [11]. Although no previous study has,to our knowledge, reported the agreement for individualsagittal plane parameters in patients with ASD. Theobjective of this study was to provide such estimates ofmeasurement error for these parameters. The cohort ofpatients has been described in a previous study regardingthe Roussouly Classification System currently under reviewfor publication. The aim of the current study was to reportthe agreement and reliability for measurements of spino-pelvic parameters in ASD patients.
Materials and Methods
This study was performed as a test-retest study in aconsecutive single-center cohort. Relevant approval wasobtained from the national patient safety authority (ref. no.3-3013-2760/1) and local data protection agency (ref. no.RH-2017-135 I-suite no. 05498). All patients referred to theauthors’ tertiary outpatient clinic for evaluation of ASDduring a 6-month period from August 1, 2013, wereconsidered for inclusion. Only adults (>18 years of age)were includeddimages were taken standing with ‘‘fist-on-clavicles,’’ including both femoral heads and with all ver-tebrates from C7 to S1 visible. Patients with previous spinalinstrumentation, fracture, or neuromuscular disease andpatients with inadequate images were excluded. A total of803 patients with long standing radiographs were screenedfor inclusion, of which 253 were adults with no prior his-tory of spinal instrumentation. A further 178 patients metexclusion criteria because of previous fracture or neuro-logic disease (n 5 55), patient already included (n 5 36),or images without C7 or both femoral heads (n 5 88),leaving 74 patients for inclusion. Ten cases were used fortraining purposes and thus 64 cases were available for finalanalysis. Demographics are shown in Table 1. Eight sets ofeach patient’s radiographs were transferred to the onlinedata image management system KEOPS (SMAIO, Lyon,France). All radiographs were presented to the raters asunmarked digital films, coded with a unique identificationkey. The raters were four surgeons. Rater 1: staff specialistwith one year experience in spinal surgery. Rater 2: staffspecialist with two years of experience. Rater 3 and 4:senior spine surgeons with more than 20 years of experi-ence each. For each reading session, all four raters wereasked to identify certain key points on each radiograph.These included both femoral heads, the end plate of S1, aswell as each vertebral body from C7 to L5; following thesemarkings, the system automatically detected the locationand shape of vertebral bodies and the raters adjusted theseidentifications when appropriate. A computerized modelthen calculated the SVA and spinopelvic parameters.Before the first reading session, 10 cases were rated for
Fig. 1. Graphical description of calculating the three spinopelvic
parameters.
Table 1
Demographics (N 5 64 cases).
Age, y 44.3 (�20.9)
Sex (male) 44 (68.8%)
Pelvic incidence* 51.3 (�14.5)
Pelvic tilt* 16.5 (�13.1)
Sacral slope* 34.8 (�12.5)
SVA* 7.0 [e8.2, 23.9]
SVA !5.0 cm* 465 (90.6%)
SVA >5.0 cm* 47 (9.4%)
SVA, sagittal vertical axis.
Data are mean (standard deviation), median [interquartile range], or
number (%).* Radiographic parameters are calculated based on all 512 entries (64
cases rated twice by 4 raters).
772 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778
training purposes. These cases were not included in thefinal analysis. Finally, the cases were presented in scram-bled order between reading sessions. The raters were givena 14-day deadline to perform each reading with a 14-daydelay between the two readings. All four raters were blin-ded from the results of the ratings as well as patient details.
Statistics
The language and environment for statistical computing,R version 3.4.3, was used for all statistical analyses (RDevelopment Core Team, 2011, Vienna, Austria). Distri-bution of data were assessed using histograms and reportedas proportions (%), means with standard deviations (SDs),or medians with interquartile ranges (IQRs). ICCs wereused to describe reliability for numeric variables using thepackage ‘‘irr’’ version 0.84. Two-way random effectsmodel, single measure, absolute agreement (ICCA,1) wasused for interrater reliability and two-way mixed effects,single measure, absolute agreement (ICCA,1) for intraraterreliability [12-14]. ICCs were classified as poor (<0.50),moderate (0.50-0.75), good (0.76-0.90), and excellent(>0.90) [8]. When describing agreement, the SEM and95% LOA were calculated. A linear mixed effects modelwith the software package ‘‘lme4’’ version 1.1-15 was usedto calculate the variances. SEM was calculated as SEM
agreement [7,15] as described by Popovi�c et al. [6]. As thenumber of raters in the current study was greater than two,the 95% LOA was calculated using the variances derivedfrom the aforementioned model as described by Bland andAltman [4] and further exemplified by Hopkins et al. [16]. pvalues <.05 were considered significant. A biostatisticianwas consulted before the study, and aided in the perfor-mance of all statistical analysis.
Results
Reliability
ICC was classified as ‘‘excellent’’ (ICC O0.90) forinter- and intrarater reliability of PI, PT, SS, and SVAexcept for interrater reliability of PI, which was classifiedas ‘‘good’’ (ICC 0.89, 95% CI 0.85-0.92) (Tables 2 and 3).
Agreement
The least measurement error and thus the highest degreeof agreement was observed for PT with 95% LOA of �4.9�
and �4.4� compared with SS with 95% LOA of �10.9� and�9.3� and PI with 95% LOA of �13.4� and �12.0� forinter- and intrarater agreement, respectively. In addition,95% LOA for SVA was �6.9 mm and �5.8 mm,
Table 2
Interrater reliability of spinopelvic parameters.
Reading 1 Reading 2 Both readings combined
Pelvic incidence 0.87 (0.80e0.91) 0.91 (0.87e0.94) 0.89 (0.85e0.92)
Pelvic tilt 0.98 (0.97e0.99) 0.98 (0.97e0.99) 0.98 (0.98e0.99)
Sacral slope 0.89 (0.84e0.93) 0.91 (0.87e0.94) 0.90 (0.87e0.93)SVA 0.99 (0.99e1.00) 0.99 (0.99e1.00) 0.99 (0.99e0.99)
SVA, sagittal vertical axis.
Data are intraclass correlation coefficient (95% confidence interval).
Table 3
Intrarater reliability of spinopelvic parameters.
Rater 1 Rater 2 Rater 3 Rater 4 All raters
Pelvic incidence 0.83 (0.73e0.89) 0.94 (0.85e0.97) 0.94 (0.90e0.96) 0.95 (0.90e0.96) 0.91 (0.89e0.93)
Pelvic tilt 0.96 (0.94e0.98) 0.99 (0.98e1.00) 0.99 (0.99e1.00) 0.99 (0.99e1.00) 0.99 (0.98e0.99)
Sacral slope 0.88 (0.80e0.92) 0.93 (0.86e0.96) 0.95 (0.92e0.97) 0.96 (0.93e0.97) 0.93 (0.93e0.94)SVA 0.99 (0.99e1.00) 1.00 (0.99e1.00) 1.00 (0.99e1.00) 1.00 (0.99e1.00) 1.00 (0.99e1.00)
SVA, sagittal vertical axis.
Data are intraclass correlation coefficient (95% confidence interval).
Table 4
Agreement of spinopelvic parameters.
Interrater agreement Intrarater agreement
SEM LOA SEM LOA
Pelvic incidence 4.9 � �13.4 � 4.3 � �12.0 �
Pelvic tilt 1.8 � �4.9 � 1.6 � �4.4 �
Sacral slope 3.9 � �10.9 � 3.3 � �9.3 �
SVA (mm) 2.5 �6.9 2.1 �5.8
LOA, limits of agreement; SEM, standard error of measurement; SVA, sagittal vertical axis.
Data are degrees or distance (mm).
773T.J. Bari et al. / Spine Deformity 7 (2019) 771e778
Fig. 2. Bland-Altman plots of intra-rater agreement. Each dot illustrates the difference between any two measurements performed by the same rater. LOA
indicates the 95% limits of intra-rater agreement derived from the linear mixed effects model.
Fig. 3. Bland-Altman plots of inter-rater agreement across both readings. Each dot illustrates the difference between each measurement and the mean of all 4
raters. LOA indicates the 95% limits of inter-rater agreement derived from the linear mixed effects model.
774 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778
respectively (Table 4). Figure 2 shows Bland-Altman plots(BA-plots) of the intrarater agreements. Figure 3 showsmodified Bland-Altman plots of the interrater agreementsplotting the difference between each measurement and themean of all four raters according to Jones et al. [17].Finally, interrater agreement was further analyzed bycomparing each pair of raters (Table 5).
Discussion
The current results showed excellent reliability of PT,SS, and SVA, while the reliability of PI was consideredgood. These results are in accordance with previous studies[11,18-20]. The interrater 95% LOA was �4.9� for PT,�10.9� for SS, �13.4� for PI, and �6.9 mm for SVA,whereas intrarater LOAs were slightly better. Thus, 19 of20 pairs of measurements on a single radiograph werecloser than these limits, which we argue has to be exceededfor any difference in serial radiographs to be consideredmore than measurement error.
We found profound differences in overall measurementerror for the three angle parameters (PT vs. PI and SS). Theleast measurement error was found for PT, which isdependent on the location of the center of the sacral endplate, whereas SS and PI are dependent on the sacral endplate slope (Figure 1). Thus, minor changes in the measuredslope of sacral end plate can vastly influence the magnitudeof PI and SS compared with PT. PI and SS yielded thelargest absolute differences between any two measurementson the same radiograph (Figure 2). This pattern was alsoreported by Pernaa et al. [21] in patients following total hiparthroplasty. The vast majority of the differences in ourstudy were due to disagreement in selecting the sacral endplate. When two raters had selected the end plate differ-ently, up to 30� in disagreement was observed for PI and
SS, with only a minor difference in PT. This largely ex-plains the discrepancy in LOA between the spinopelvicparameters and thus highlights a seemingly unrecognizedchallenge in ASD radiographic measurements as the sacralend plate often is difficult to identify because of degener-ative changes, image quality, and concomitant pelvicobliquity. Figures 4 and 5 portrays the difference in diffi-culty of accurately identifying key sagittal radiographiclandmarks in patients with and without pelvic obliquity.Furthermore, no pre-markings were undertaken in the cur-rent study, resulting in measurement error arguably closerto everyday practice compared with studies using pre-marked images.
Interrater LOAwas also calculated for each pair of raters(Table 5). There was tendency toward better agreement forthe more experienced raters (Rater 3 and 4) compared withthe two less experienced. However, this difference was notfurther analyzed for statistical significance and should beinterpreted with caution.
Other studies of agreement of sagittal spinopelvic pa-rameters include the study by Aubin et al. [19], carried outusing 6 cases of realistic spine prototypes from patientswith ASD. A precise coordinate measuring machine wasused to calculate a gold standard compared with subsequentmeasurements of radiographs. Agreement was quantifiedthrough SD of mean differences between ratings. However,Bland and Altman suggest that an ANOVA is performedwhen comparing more than two measurements as the SD ofdifference will be reported smaller than the actual error as aresult of removal of part of the error [4,22]. Lafage et al.[23] validated a new computer-assisted tool to measurespinal parameters. However, all radiographs were pre-marked, thus removing a lot of the measurement error.Although several reliability studies on spinopelvic param-eters have been published, none explicitly concern ASD
Table 5
Interrater agreement for each pair of raters.
Rater 2 Rater 3 Rater 4
Pelvic incidence (overall interrater LOA: �13.4)
Rater 1 �15.2 � �14.2 � �13.6 �
Rater 2 d �12.8 � �12.8 �
Rater 3 d d �12.0 �
Pelvic tilt (overall interrater LOA: �4.9)
Rater 1 �5.4 � �5.4 � �5.3 �
Rater 2 d �5.0 � �3.5 �
Rater 3 d d �4.5 �
Sacral slope (overall interrater LOA: �10.9)
Rater 1 �12.1 � �11.3 � �10.6 �
Rater 2 �10.5 � �11.7 �
Rater 3 �9.0 �
Sagittal vertical axis (overall interrater LOA: �6.9)
Rater 1 �6.2 mm �6.7 mm �8.3 mm
Rater 2 �6.1 mm �7.8 mm
Rater 3 �6.1 mm
LOA, limits of agreement.
Data represent interrater LOA when comparing pairs of raters.
775T.J. Bari et al. / Spine Deformity 7 (2019) 771e778
patients and adhere to measurement error calculationssuggested by Bland and Altman [3,4,6,8,15,17,22]. Pernaaet al. [21] performed a reproducibility study of radiographicmeasurements in patients following THA and used BAplots to describe agreement. The 95% LOA values are notgiven but can be read from the BA plots or calculated fromthe SDs of mean differences between readings (95% LOAfor PT 5 �5.9� and �4.2�, for SS 5 �8.4� and �7.6�, andfor PI 5 �10.0� and 7.2� for inter- and intrarater agree-ment, respectively). Kim et al. [18] presented the results ofa reproducibility study of the same four radiographic pa-rameters as the current study in patients with variousdegenerative lumbar spinal disorders. Results were givenonly as intrarater 95% LOA for manual and computer-assisted measurements (PT: �1.4� to �6.0�, SS: �1.6� to�6.3�, PI: �1.0� to �7.3�, and SVA: �1.4 to �8.3 mm).No estimate of interrater agreement was given. Both studiespresent marginally lower estimates of measurement errorcompared with the results of the current study, which couldbe caused by the difference in spine pathology be-tween studies.
Comparing our results to the generally accepted mea-surement error of �5� for the coronal Cobb angle, theliterature suggest that this assumed error is not well
founded. The systematic review by Langensiepen et al. [24]state that the few studies on agreement were mostly notconsistent in reporting the same agreement parameter, onlyreported intrarater agreement, or had implemented verte-bral pre-selection for Cobb angle calculations which aspreviously mentioned is likely to exclude part of the mea-surement error. In their review of 11 selected studies, 9included adolescent or infantile scoliosis patients and 2 didnot specify age characteristics. They reported interrater95% LOA ranging from �3.6� to �8.3� and intrarater 95%LOA from �1.3� to �4.5�.
Strengths and limitations
In the current study, all cases were enrolled consecu-tively at a single center; however, many were excludedbecause of inadequate availability of radiographs, leaving arelatively small sample for inclusion. This is indeed sub-optimal and indicates low validity. New protocols havesince been introduced to reduce the rate of insufficient ra-diographs in collaboration with the local radiologydepartment. This has led to a significant reduction in ra-diographs lacking either C7 or femoral heads. Other centersundergoing similar difficulties are suggested to perform
Fig. 4. Coronal and sagittal radiographs of patient with pelvic obliquity illustrating the challenges in correctly identifying key landmarks.
776 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778
quality assessment of performed radiographs to assess andimplement newer protocols to diminish the rate of insuffi-cient radiographs. Furthermore, the current study utilized asemiautomatic system for all radiographic measures.Although the surgeon had to identify all anatomic land-marks, the system calculated the radiographic parameters.This eliminates the risk of mistakes during calculations.Therefore, measurement error may well be even moreprominent when calculating spinopelvic parameters indigital radiographs using ‘‘manual’’ methods.
On the other hand, a fully disclosed selection process ispresented, which we argue contribute to lessening the riskof bias. Additionally, the use of mixed effect models tocalculate the LOA makes it independent of the specificICC. This, combined with results of SEM, allows fordirect comparison to future studies. Finally, the currentstudy was performed by surgeons with different levels ofexperience. As radiographic measurements in a clinicalsetting often is performed by observers of varying levels of
training, we argue that this should also be the case whenperforming a reproducibility study, thus increasing theexternal validity.
Conclusion
In this reproducibility study of the measurement accu-racy of spinopelvic parameters, we found noticeably largermeasurement errors in parameters depending on the angu-lation of the sacral end plate (SS: �11�; PI: �14�)compared with parameters merely depending on the loca-tion of the sacral end plate (PT: �5�; SVA: �7 mm).
This was despite excellent or near excellent ICCs for allparameters. These are to our knowledge the first estimatesof agreement according to Bland and Altman for spino-pelvic parameters in ASD patients, and we encourage sur-geons to account for the considerable measurement error inparameters depending on the angulation of the sacral endplate when comparing repeated measurements.
Fig. 5. Coronal and sagittal radiographs of patient without pelvic obliquity.
777T.J. Bari et al. / Spine Deformity 7 (2019) 771e778
References
[1] Glassman SD, Berven S, Bridwell K, Horton W, Dimar JR. Correla-
tion of radiographic parameters and clinical symptoms in adult scoli-
osis. Spine (Phila Pa 1976) 2005;30:682e8.
[2] Glassman SD, Bridwell K, Dimar JR, Horton W, Berven S, Schwab F.
The impact of positive sagittal balance in adult spinal deformity.
Spine (Phila Pa 1976) 2005;30:2024e9.
[3] Weir JP. Quantifying test-retest reliability using the intraclass correla-
tion coefficient and the SEM. J Strength Cond Res 2005;19:231e40.
[4] Bland JM, Altman DG. Measuring agreement in method comparison
studies. Stat Methods Med Res 1999;8:135e60.[5] Eliasziw M, Young SL, Woodbury MG, Fryday-Field K. Statistical
methodology for the concurrent assessment of interrater and intrarat-
er reliability: using goniometric measurements as an example. Phys
Ther 1994;74:777e88.
[6] Popovi�c ZB, Thomas JD. Assessing observer variability: a user’s
guide. Cardiovasc Diagn Ther 2017;7:317e24.
[7] Guyatt G, Walter S, Norman G. Measuring change over
timedaseessing the usefulness of evaluative instruments. J Chronic
Dis 1987;40:171e8.
[8] de Vet HCW, Terwee CB, Knol DL, Bouter LM. When to use agree-
ment versus reliability measures. J Clin Epidemiol 2006;59:1033e9.[9] Kottner J, Streiner DL. The difference between reliability and agree-
ment. J Clin Epidemiol 2011;64:701e2.
[10] Kottner J, Audige L, Brorson S, Donner A, et al. Guidelines for Re-
porting Reliability and Agreement Studies (GRRAS) were proposed.
Int J Nurs Stud 2011;48:661e71.
[11] Maillot C, Ferrero E, Fort D, et al. Reproducibility and repeatability of a
new computerized software for sagittal spinopelvic and scoliosis curva-
ture radiologic measurements: Keops?? Eur Spine J 2015;24:1574e81.
[12] Koo TK, Li MY. A guideline of selecting and reporting intraclass cor-
relation coefficients for reliability research. J Chiropr Med 2016;15:
155e63.
[13] McGraw KO, Wong SP. Forming inferences about some intraclass
correlations coefficients. Psychol Methods 1996;1:30e46.
[14] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater
reliability. Psychol Bull 1979;86:420e8.
[15] Bartlett JW, Frost C. Reliability, repeatability and reproducibility:
Analysis of measurement errors in continuous variables. Ultrasound
Obstet Gynecol 2008;31:466e75.[16] Hopkins WG. Measures of reliability in sports medicine and science.
Sports Med 2000;30:375e81.
[17] Jones M, Dobson A, O’brian S. A graphical method for assessing
agreement with the mean between multiple observers using contin-
uous measures. Int J Epidemiol 2011;40:1308e13.
[18] Kim CH, Chung CK, Hong HS, et al. Validation of a simple comput-
erized tool for measuring spinal and pelvic parameters. J Neurosurg
Spine 2012;16:154e62.[19] Aubin C-E, Bellefleur C, Joncas J, et al. Reliability and accu-
racy analysis of a new semiautomatic radiographic measurement
software in adult scoliosis. Spine (Phila Pa 1976) 2011;36:
E780e90.
[20] Berthonnaud E, Labelle H, Roussouly P, et al. A variability study of
computerized sagittal spinopelvic radiologic measurements of trunk
balance. J Spinal Disord Tech 2005;18:66e71.[21] Pernaa K, Sepp€anen M, M€akel€a K, Saltychev M. Reliability of
sagittal spinopelvic alignment measurements after total hip arthro-
plasty. Clin Spine Surg 2017;30:E909e14.
[22] Martin Bland J, Altman D. Statistical methods for assessing agree-
ment between two methods of clinical measurement. Lancet
1986;327:307e10.
[23] Lafage R, Ferrero E, Henry JK, et al. Validation of a new computer-
assisted tool to measure spino-pelvic parameters. Spine J 2015;15:
2493e502.
[24] Langensiepen S, Semler O, Sobottke R, et al. Measuring procedures
to determine the Cobb angle in idiopathic scoliosis: A systematic re-
view. Eur Spine J 2013;22:2360e71.
778 T.J. Bari et al. / Spine Deformity 7 (2019) 771e778
U N I V E R S I T Y O F C O P E N H A G E N
F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S
68
Study III
Ability of the Global Alignment and Proportion score to predict
mechanical failure following Adult Spinal Deformity surgery
– Validation in 149 patients with two-year follow-up
Spine Deformity 7 (2019) 331-337
https://doi.org/10.1016/j.jspd.2018.08.002
Ability of the Global Alignment and Proportion Score to PredictMechanical Failure Following Adult Spinal Deformity
SurgerydValidation in 149 Patients With Two-Year Follow-upTanvir Johanning Bari, MDa,*, Søren Ohrt-Nissen, MD, PhDa, Lars Valentin Hansen, MDa,
Benny Dahl, MD, PhD, DMScb, Martin Gehrchen, MD, PhDa
aSpine Unit, Department of Orthopedic Surgery, University Hospital of Copenhagen, Copenhagen 2100, DenmarkbDepartment of Orthopedics and Scoliosis Surgery, Texas Children’s Hospital and Baylor College of Medicine, Houston, TX 77030, USA
Received 4 June 2018; revised 27 July 2018; accepted 4 August 2018
Abstract
Study Design: Retrospective analysis of prospectively collected data.Objectives: To validate the Global Alignment and Proportion (GAP) score in a single-center cohort of adult spinal deformity (ASD)patients.Summary of Background Data: Surgical treatment for ASD is associated with a high risk of mechanical failure and consequent revisionsurgery. To improve prediction of mechanical complications, the GAP score was developed with promising results. Development was basedon the assumption that not all patients would benefit from the same fixed radiographic targets as pelvic incidence is an individual,morphological parameter that greatly influences the sagittal curves of the spine.Methods: In a validation study of the GAP score, patients undergoing ASD surgery with four or more levels of instrumentation wereconsecutively included at a tertiary spine unit. Patients were followed for a minimum of two years. Pre- and postoperative GAP score andcategories were calculated for all patients, and the association with mechanical failure and revision surgery was analyzed.Results: A total of 149 patients with a mean age of 57.4 years were included. Overall, the rates of mechanical failure and revision surgerywere 51% and 35% respectively. The area under the curve (AUC) using receiver operating characteristic was classified as ‘‘no or lowdiscriminatory power’’ for the GAP score in predicting either outcome (AUC 5 0.50 and 0.49, respectively). Similarly, there were nosignificant associations between GAP categories and the occurrence of mechanical failure or revision surgery when using Cochran-Armitage test of trend (p 5 .28 for mechanical failure and p 5 .58 for revision surgery).Conclusions: In a consecutive series of surgically treated ASD patients, we found no significant association between postoperative GAPscore and mechanical failure or revision surgery. Despite minor limitations in similarities to the original study cohort, further validationstudies or adjustments to the original scoring system are proposed.Level of Evidence: Level II.� 2018 Scoliosis Research Society. All rights reserved.
Keywords: Adult spinal deformity; Classification; Validation studies; Postoperative complications; Sagittal alignment; Reproducibility of results; Spinal
fusion; Radiography
Introduction
Adult spinal deformity (ASD) is a complex entity thatcan significantly impact health-related quality of life [1,2].Surgical treatment in selected patients has demonstratedbeneficial results compared with nonsurgical treatment[3,4], and the most common indications for surgery are painand disability [5]. However, ASD surgery is associated witha high risk of postoperative complications [6,7], and revi-sion surgery may be necessary, at times as a result of so-
Author disclosures: TJB (none), SO-N (institutional grant from K2M
outside the submitted work), LVH (none), BD (grants from K2M and Med-
tronic, outside the submitted work), MG (grants from K2M and Medtronic,
outside the submitted work).
Ethics approval: The study was approved by the Danish Data Protec-
tion Agency and the Danish Patient Safety Authority.
*Corresponding author. Department of Orthopedic Surgery, Spine
Unit, Rigshospitalet, Blegdamsvej 9, 2100 Copenhagen, Denmark. Tel.:
þ45 5057 9448; fax: þ45 3545 2165.
E-mail address: [email protected] (T.J. Bari).
2212-134X/$ - see front matter � 2018 Scoliosis Research Society. All rights reserved.
https://doi.org/10.1016/j.jspd.2018.08.002
Spine Deformity 7 (2019) 331e337www.spine-deformity.org
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
called mechanical failure [8-13]. The SRS-Schwab classi-fication was developed to guide surgical decision makingand establish targets for surgical results that would mini-mize postoperative disability [14]. The classification wasbased on health-related quality of life measures and has notbeen shown to reliably predict mechanical failure [13].Yilgor et al. [15] recently developed the Global Alignmentand Proportion (GAP) score, which is a predictive modelfor mechanical complications based on spinopelvic bal-ance. Pelvic incidence (PI) is a fixed morphologic param-eter that varies substantially between patients. Themorphology of the pelvis greatly influences the sagittalcurves of the spine, including the individual variability ofthe sacral slope (SS) and lordosis curve [16,17]. The GAPscore is based on the assumption that not all patients wouldbenefit from the same fixed radiographic targets. The scorewas developed in a cohort of ASD patients from the Eu-ropean Spine Society Group (ESSG) database. The GAPscore comprises 4 radiographic parameters calculated as thedifference between measured and ideal angle (Fig. 1).Relative pelvic version is calculated as the difference be-tween the measured angle and an ideal measure based onthe individual PI. Relative lumbar lordosis is calculated in
an equivalent manner. The authors calculated the idealangle for each measure using linear regression models ofthe relationship to PI from data of asymptomatic in-dividuals. The Lordosis Distribution Index characterizes thedisposition of the lumbar lordosis while the relative spi-nopelvic alignment is an estimate of the global sagittalalignment (Fig. 1). After adding a variable for age, a scorebetween 0 and 13 is given and patients are further sub-categorized into three categories (Fig. 2). After developingthe model, the authors validated the system in a new cohortof patients using receiver operating characteristics (ROC)and Cochran-Armitage test of trend with promising results.The aim of the current study was to validate the GAP scoreas predictor of mechanical failure and revision surgery in asingle-center cohort of patients with ASD.
Methods
All adult patients (>18 years of age) undergoing surgeryfor ASD in a three-year period from January 1, 2013, werescreened for inclusion. A total of 186 patients were iden-tified, and all were treated at a single tertiary spine unit.Patients with less than two years of follow-up (n 5 37)
Fig. 1. Methods for calculating the radiographic components of the Global Alignment and Proportion score. Relative pelvic version, relative lumbar lordosis,
and relative spinopelvic alignment are calculated based on the measured angle in relation to the ideal angle.
332 T.J. Bari et al. / Spine Deformity 7 (2019) 331e337
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
were excluded, leaving 149 for final inclusion. Long-standing digital radiographs were available for all patientspre- and postoperatively, as well as at 1- and 2-year follow-up. Demographics were obtained from medical records.Radiographic measurements were performed using thevalidated online data image management system KEOPS(SMAIO, Lyon, France) [18]. Relative pelvic version, rela-tive lumbar lordosis, Lordosis Distribution Index, and rela-tive spinopelvic alignment were calculated from radiographsin accordance to the original work (Fig. 1). Patients wereclassified into the three subcategories: proportioned,moderately disproportioned, and severely disproportioned(Fig. 2). Mechanical failure was defined as proximal junc-tional kyphosis (PJK), proximal junctional failure, distaljunctional failure, rod breakage, or other. Other implant-related complications included screw breakage, loosening,or pullout or set-screw dislodgement. PJK was defined as a>10� increase in kyphosis angle between the upper instru-mented vertebra (UIV) and 2 vertebrae above UIV (UIVþ2).Proximal junctional failure was defined as PJK along withone or more of the following: fracture of the UIVor UIVþ1,
posterior osseoligamentous disruption, or instrumentationpullout at the UIV [19]. Revision surgery was defined asreoperation within two years of index surgery because ofmechanical failure. A subset of patients have been describedin a previous study [20].
Statistics
All statistical analyses were performed using R, version3.4.3 (R Development Core Team, Vienna, Austria). Dis-tribution of data was assessed using histograms and re-ported as proportions (%), means with standard deviations(SDs), or medians with interquartile ranges. The Mann-Whitney U test was used when comparing numeric non-Gaussian distributed data. Pearson c2 test of indepen-dence or trend across ordinal categories was used whencomparing categorical data, and Fisher exact test was usedwhen the expected frequency was below 5 for more than20% of the compared groups. ROC curves were used tocalculate AUC in assessing the GAP score’s diagnosticaccuracy of predicting mechanical failure or revision
Fig. 2. Each component of the Global Alignment and Proportion score includes a value based on the radiographic parameters as depicted in Fig. 1 and an age
component. Patients are given a score for each parameter, which are summarized as a total score (range 0e13) and further subcategorized. RPV, relative
pelvic version; RLL, relative lumbar lordosis; LDI, lumbar distribution index; RSA, relative spinopelvic alignment.
333T.J. Bari et al. / Spine Deformity 7 (2019) 331e337
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
surgery [21,22]. Diagnostic accuracy was classified as ‘‘noor low discriminatory power’’ for an AUC of 0.5e0.7,‘‘moderate discriminatory power’’ for an AUC of 0.7e0.9and ‘‘high discriminatory power’’ for an AUC >0.9 [23].Additionally, GAP categories were analyzed for associationto the two outcome variables using Cochrane-Armitage testof trend and presented as odds ratios (ORs) with 95% CIs[24,25]. Furthermore, GAP score and categories wereanalyzed for association with revision surgery or mechan-ical failure using logistic regression models. p values <.05were considered significant.
Before the study, approvals were obtained from theregional data protection agency and national patient safetyauthority: journal number 2012-58-0004 and 3-3013-1980/1.
Results
The current study included a consecutive cohort of 149surgically treated ASD patients. The mean age was 57.4(�15.9) years, patients were followed for a median of 32months (range: 24e57), and 88 (59%) had a history ofprevious spine surgery. Demographics and complicationsare shown in Table 1. Patients attained a mean GAP scorecorrection of �4.1 (�4.1). In 86 (58%) cases, the surgicalprocedure included a three-column osteotomy (3CO).Seventy-six (51%) patients had one or more mechanicalcomplications, and 52 (35%) underwent revision surgerybecause of mechanical complication within two years of theindex procedure. There were 4 (2.2%) deaths during thestudy period in the initial cohort of 181 patients, and nonewere included in the final cohort. We found no associationbetween postoperative GAP score and mechanical failure orrevision surgery using the Mann-Whitney U test (p 5 .84and p 5 .97, respectively). Similar findings were presentfor the three subcategories when using the Pearson c2 test(p 5 .39 and p 5 .84 respectively) (Fig. 3). Using receiveroperating characteristic curves, we found no or lowdiscriminatory power for both mechanical failure (AUC:0.50, 95% CI: 0.40e0.59) and revision surgery (AUC: 0.49,95% CI: 0.39e0.59) (Fig. 4). When assessing subcategoriesin relation to mechanical failure or revision surgery, nosignificant effect was found in Cochran-Armitage test oftrend (p 5 .58 and p 5 .28, respectively). Logisticregression showed no significant association between eitherGAP score or categories and the two outcome vari-ables (Table 2).
Discussion
The present study found no significant association be-tween either GAP score or GAP categories in relation tomechanical failure or revision surgery in patients withASD. Analyses were performed using ROC curves,Cochran-Armitage test of trends, and logistic regres-sion models.
The present study was designed to validate the recentlypresented classification system in ASD patients. Patientsundergoing surgery were consecutively included to estab-lish the current cohort and no attempts were made to matchthe current cohort with that in the original study. Ourdepartment is a tertiary spine unit, and a majority ofreferred patients were complex cases in contrast to patientsin the original work who were included from the ESSGdatabase. In comparing the two cohorts, we found smalldifferences in age (57.4 vs. 50.7 years) and sex distribution(70.5 vs. 80.0% females). Most noticeable was the pro-portion of patients with a history of previous spine surgery(59% in the current study vs. 30% in the original cohort).Mean postoperative GAP score was 4.8 (range: 0e12)compared with 4.0 (range: 0e13) in the original cohort.Information regarding the extent of 3COs (58% in thecurrent study) and number of instrumented vertebrae werenot presented in the original validation cohort. Rates ofmechanical failure and revision procedures were slightlyhigher in the present study, which may be attributed to ahigh rate of 3CO procedures [6,13,26].
As 37 of the 186 identified ASD patients were lost tofollow-up, the risk of selection bias should be considered.Although an inclusion rate of 80% is not ideal, we arguethat the one-center and consecutive fashion of inclusionminimizes the risk of this type of bias.
The use of any classification system relies on its abilityto propose treatment strategy and accurately predictoutcome, ultimately minimizing the risk of postoperative
Table 1
Demographics
Age, years, mean (SD) 57.4 (15.9)
Instrumented vertebrae, mean (SD) 12.0 (3.5)
Sex, female 105 (70.5)
Previous spine surgery 88 (59.1)
3-column osteotomy 86 (57.7)
Follow-up, months, median [IQR] 32 [24e57]Mechanical failure 76 (51.0)
Proximal junctional kyphosis 4 (2.7)
Proximal junctional failure 10 (6.7)
Distal junctional failure 1 (0.7)
Rod breakage 57 (38.3)
Other mechanical failure 17 (11.4)
Revision surgery 52 (34.9)
Preoperative GAP score, median [IQR] 10 [6e12]
Postoperative GAP score, median [IQR] 4 [2-7]
Preoperative GAP category
Proportioned 7 (4.7)
Moderately disproportioned 32 (21.5)
Severely disproportioned 110 (73.8)
Postoperative GAP category
Proportioned 40 (26.8)
Moderately disproportioned 64 (43.0)
Severely disproportioned 45 (30.2)
Change in GAP score, mean (SD) �4.1 (4.1)
GAP, Global Alignment and Proportion; IQR, interquartile range; SD,
standard deviation.
Values are n (%) unless otherwise noted.
334 T.J. Bari et al. / Spine Deformity 7 (2019) 331e337
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
complications. We therefore find the seminal work of Yil-gor et al. [15] of considerable relevance as such a predictionmodel has not previously been published in patients withASD. Results were promising, with an AUC in correlationto mechanical failure of 0.92 (standard error [SE]: 0.0034,p ! .001, 95% CI: 0.85e0.98). However, the results of thepresent study showed no significant association between theGAP score and mechanical failure and thus did not confirmthe findings of the original study. Given that the currentstudy population differed slightly from the original cohort,
caution is warranted regarding the interpretation of thesefindings. The current cohort reflects the heterogeneity ofASD patients and as such was not designed to replicate theoriginal study but to assess whether the promising results ofthe GAP score had external validity in a non-selected ASDcohort. The rate of complications in the present study washigher than previously described following ASD surgery[6] and could have influenced the results.
The GAP score was developed on the basis of theassumption that PI is an individual morphologic parameter
Fig. 3. Boxplot of Global Alignment and Proportion (GAP) category distribution by mechanical failure and revision surgery.
Fig. 4. Receiver operating characteristic curves illustrating the diagnostic accuracy of the Global Alignment and Proportion (GAP) score in predicting me-
chanical failure and revision surgery. An area under the curve (AUC) of !0.7 indicates ‘‘no or low discriminatory power’’ of diagnostic accuracy.
335T.J. Bari et al. / Spine Deformity 7 (2019) 331e337
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
affecting all other spinal and pelvic parameters. The sys-tem was therefore established by comparing differentmeasured spinopelvic parameters in relation to calculatedtarget values based on the individual PI and takes intoaccount the distribution of the lumbar lordosis, sagittalbalance, magnitude of SS and lumbar lordosis in relationto PI, and age. However, previous studies have suggestedthat there are multiple factors associated with failure.Nicholls et al. [27] found that pelvic tilt (PT) and thoracickyphosis (TK) significantly increased the risk of me-chanical failure following ASD surgery. The relationship:PI 5 SSþPT, is well established, and although both SSand PI are included in the GAP model, PT is known to bea reliable estimator of pelvic retroversion and therebycompensation of sagittal imbalance [28]. Additionally,Nicholls et al. reported that females developed PJK in ahigher rate than men. In a competing risk model of me-chanical failure, Hallager et al. [13] also reported that themagnitude of thoracic kyphosis significantly influencedthe rate of postoperative failure. Furthermore, the studyfound that larger operative changes, in particular in lum-bar lordosis, significantly increased the risk. Further as-pects in relation to mechanical failure are the number ofinstrumented vertebrae and instrumentation ending at thesacrum that have previously been correlated to risk ofmechanical complications [29,30]. Obesity, osteoporosis,severity of ASD, minimally invasive surgery, and musclevolume has also been associated with changed risk ofmechanical failure [31-33]. In addition, questions havebeen raised whether or not the presence of mechanicalfailure correlates to patient-reported outcome measures[34], which further complicates the clinical significance ofthese radiographic outcomes.
To our knowledge, the present study is the first valida-tion study of the GAP score outside of the original work.The results did not confirm the findings of the originalstudy. Identifying patients at risk of mechanical failure andrevision surgery is a key task of any spine surgeon. How-ever, ASD patients are characterized by a substantial het-erogeneity, and predictive classifications may need toaddress this variation even further. Due care is thereforesuggested in clinical use of the GAP score in its presentform until appropriate modifications or future validationstudies are presented.
Conclusions
This longitudinal, single-center study on ASD patientsfound no significant association between postoperativeGAP score and mechanical failure or revision surgery.Despite limitations in case similarities, further adjustmentsto the classification system or future validation studies aresuggested before clinical implementation.
References
[1] Pellis�e F, Vila-Casademunt A, Ferrer M, et al. Impact on health
related quality of life of adult spinal deformity (ASD) compared with
other chronic conditions. Eur Spine J 2014;24:3e11.
[2] Baldus C, Bridwell KH, Harrast J, et al. Age-gender matched com-
parison of SRS instrument scores between adult deformity and
normal adults: are all SRS domains disease specific? Spine (Phila
Pa 1976) 2008;33:2214e8.
[3] Bridwell KH, Glassman S, Horton W, et al. Does treatment (nonop-
erative and operative) improve the two-year quality of life in patients
with adult symptomatic lumbar scoliosis: a prospective multicenter
evidence-based medicine study. Spine (Phila Pa 1976) 2009;34:
2171e8.[4] Smith JS, Shaffrey CI, Glassman SD, et al. Risk-benefit assessment
of surgery for adult scoliosis: an analysis based on patient age. Spine
(Phila Pa 1976) 2011;36:817e24.
[5] Bess S, Boachie-Adjei O, Burton D, et al. Pain and disability deter-
mine treatment modality for older patients with adult scoliosis, while
deformity guides treatment for younger patients. Spine (Phila Pa
1976) 2009;34:2186e90.
[6] Sciubba DM, Yurter A, Smith JS, et al. A comprehensive review of
complication rates after surgery for adult deformity: a reference for
informed consent. Spine Deform 2015;3:575e94.
[7] Karstensen S, Bari T, Gehrchen M, et al. Morbidity and mortality of
complex spine surgery: a prospective cohort study in 679 patients
validating the Spine AdVerse Event Severity (SAVES) system in a
European population. Spine J 2016;16:146e53.
[8] Passias PG, Soroceanu A, Yang S, et al. Predictors of revision surgi-
cal procedure excluding wound complications in adult spinal defor-
mity and impact on patient-reported outcomes and satisfaction.
J Bone Joint Surg Am 2016;98:536e43.
[9] Howe CR, Agel J, Lee MJ, et al. The morbidity and mortality of fu-
sions from the thoracic spine to the pelvis in the adult population.
Spine (Phila Pa 1976) 2011;36:1397e401.
[10] Charosky S, Guigui P, Blamoutier A, et al. Complications and risk
factors of primary adult scoliosis surgery: a multicenter study of
306 patients. Spine (Phila Pa 1976) 2012;37:693e700.
[11] Blamoutier A, Guigui P, Charosky S, et al. Surgery of lumbar and
thoracolumbar scolioses in adults over 50. Morbidity and survival
in a multicenter retrospective cohort of 180 patients with a mean
follow-up of 4.5years. Orthop Traumatol Surg Res 2012;98:528e35.
Table 2
Accuracy of the GAP score as a predictor of revision surgery or mechanical failure
Revision surgery p value Mechanical failure p value
Receiver operating characteristics, AUC (95% CI) 0.49 (0.39e0.59) 0.50 (0.40e0.59)
Cochran-Armitage test of trend, c2 test 1.16 .28 0.30 .58
Univariate logistic regression, OR (95% CI)
GAP score 1.01 (0.91e1.11) .28 1.00 (0.91e1.10)
GAP categories
Proportioned Reference Reference
Moderately disproportioned 1.42 (0.76e2.66) .28 1.14 (0.65e2.18) .28
Severely disproportioned 0.87 (0.50e1.53) .63 0.94 (0.55e1.60) .81
AUC, area under the curve; CI, confidence interval; GAP, Global Alignment and Proportion; OR, odds ratio.
336 T.J. Bari et al. / Spine Deformity 7 (2019) 331e337
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
[12] Inoue S, Khashan M, Fujimori T, Berven SH. Analysis of mechanical
failure associated with reoperation in spinal fusion to the sacrum in
adult spinal deformity. J Orthop Sci 2015:609e16.[13] Hallager DW, Karstensen S, Bukhari N, et al. Radiographic predic-
tors for mechanical failure after adult spinal deformity surgery. Spine
(Phila Pa 1976) 2017;42:E855e63.
[14] Schwab F, Ungar B, Blondel B, et al. Scoliosis Research Societ-
ydSchwab Adult Spinal Deformity Classification. Spine (Phila Pa
1976) 2012;37:1077e82.
[15] Yilgor C, Sogunmez N, Boissiere L, et al. Global Alignment and Pro-
portion (GAP) Score: development and validation of a new method of
analyzing spinopelvic alignment to predict mechanical complications
after adult spinal deformity surgery. J Bone Joint Surg Am 2017;99:
1661e72.
[16] Legaye J, Duval-Beaup�ere G, Hecquet J, Marty C. Pelvic incidence: a
fundamental pelvic parameter for three-dimensional regulation of
spinal sagittal curves. Eur Spine J 1998;7:99e103.
[17] Boulay C, Tardieu C, Hecquet J, et al. Sagittal alignment of spine and
pelvis regulated by pelvic incidence: standard values and prediction
of lordosis. Eur Spine J 2006;15:415e22.
[18] Maillot C, Ferrero E, Fort D, et al. Reproducibility and repeatability
of a new computerized software for sagittal spinopelvic and scoliosis
curvature radiologic measurements: Keops?? Eur Spine J 2015;24:
1574e81.
[19] Hart RA, McCarthy I, Ames CP, et al. Proximal junctional kyphosis
and proximal junctional failure. Neurosurg Clin N Am 2013;24:
213e8.
[20] Hallager DW, Hansen LV, Dragsted CR, et al. A comprehensive anal-
ysis of the SRS-Schwab adult spinal deformity classification and con-
founding variables: a prospective, non-US cross-sectional study in
292 patients. Spine (Phila Pa 1976) 2016;41:E589e97.
[21] Hanley AJ, McNeil JB. The meaning and use of the area under a
receiver operating characteristic (ROC) curve. Radiology 1982;143:
29e36.
[22] Faraggi D, Reiser B. Estimation of the area under the ROC curve.
Stat Med 2002;21:3093e106.
[23] GrzybowskiM,Younger JG. Statisticalmethodology: III.Receiver oper-
ating characteristic (ROC) curves. Acad Emerg Med 1997;4:818e26.
[24] Cochran WG. Some methods for strengthening the common c2 tests.
Biometrics 1954;10:417e51.
[25] Armitage P. Tests for linear trends in proportions and frequencies.
Biometrics 1955;11:375e6.
[26] Smith JS, Shaffrey CI, Klineberg E, et al. Complication rates associ-
ated with 3-column osteotomy in 82 adult spinal deformity patients:
retrospective review of a prospectively collected multicenter consec-
utive series with 2-year follow-up. J Neurosurg Spine 2017;27:1e14.
[27] Nicholls FH, Bae J, Theologis AA, et al. Factors associated with the
development of and revision for proximal junctional kyphosis in 440
consecutive adult spinal deformity patients. Spine (Phila Pa 1976)
2017;42:1693e8.
[28] Lafage V, Schwab F, Patel A, et al. Pelvic tilt and truncal inclination:
two key radiographic parameters in the setting of adults with spinal
deformity. Spine (Phila Pa 1976) 2009;34:E599e606.
[29] Kim YJ, Bridwell KH, Lenke LG, et al. Pseudarthrosis in adult spinal
deformity following multisegmental instrumentation and arthrodesis.
J Bone Joint Surg Am 2006;88:721e8.
[30] Edwards CC, Bridwell KH, Patel A, et al. Long adult deformity fu-
sions to L5 and the sacrum: a matched cohort analysis. Spine (Phila
Pa 1976) 2004;29:1996e2005.
[31] Wang H, Ding W, Ma L, et al. Prevention of proximal junctional
kyphosis: are polyaxial pedicle screws superior to monoaxial pedicle
screws at the upper instrumented vertebrae? World Neurosurg
2017;101:405e15.
[32] Shashank VG, Jacob J, Konrad B, et al. Development of proximal
junctional kyphosis after minimally invasive lateral anterior column
realignment for adult spinal deformity. Neurosurgery 2018;28:1e9.
[33] Kim D, Kim J, Kim D, et al. Risk factors of proximal junctional
kyphosis after multilevel fusion surgery: more than 2 years follow-
up data. J Korean Neurosurg Soc 2017;60:174e80.[34] Yasuda T, Hasegawa T, Yamato Y, et al. Proximal junctional kyphosis
in adult spinal deformity with long spinal fusion from T9/T10 to the
ilium. J Spine Surg 2017;3:204e11.
337T.J. Bari et al. / Spine Deformity 7 (2019) 331e337
Downloaded for Anonymous User (n/a) at BS - University of Copenhagen from ClinicalKey.com by Elsevier on February 06, 2019.For personal use only. No other uses without permission. Copyright ©2019. Elsevier Inc. All rights reserved.
U N I V E R S I T Y O F C O P E N H A G E N
F A C U L T Y O F H E A L T H A N D M E D I C A L S C I E N C E S
76
Appendix 4 - Specific statistical considerations
Specific statistical considerations
Intraclass correlation coefficients (ICC)
ICC coefficients were used to describe reliability for numeric variables using the package
“irr” version 0.84. Two-way random effects model, single measure, absolute agreement
(ICCA,1) was used for inter-rater reliability and two way mixed effects, single measure,
absolute agreement (ICCA,1) for intra-rater reliability [1–3].
Standard error of measurement (SEM) and 95% Limits of Agreement (LOA)
When describing agreement the SEM and 95 % LOA were calculated. SEM is often
calculated as 𝑆𝐸𝑀 = 𝑆𝐷 √1 − 𝐼𝐶𝐶 [4–6] where SD is the standard deviation of scores for all
cases. This form is known as SEM consistency. As different forms of ICC derive different
values, the use of this equation can substantially influence the size of SEM, and SEM
consistency does not take in to account the systematic differences between raters. Therefore, a
more precise method of deriving the SEM is using the square root of the mean square error
term by the means of a repeated measure analysis of variance (ANOVA)[7,8] and is called
SEM agreement. Any further mention of SEM will be in the agreement form. SEM was
calculated as 𝑆𝐸𝑀𝑖𝑛𝑡𝑟𝑎 = √𝑟𝑎𝑡𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 and 𝑆𝐸𝑀𝑖𝑛𝑡𝑒𝑟 =
√𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 . A linear mixed effects model with the software package “lme4”
version 1.1-15 was used to calculate the variances. The repeatability coefficient (RC) devised
by Bland and Altman[9] also describes the within-subject variation by calculating the
difference in measurements exceeded by only 5% of pairs of measurement. The relationship
between the RC and SEM is 𝑅𝐶 = 𝑡 × 𝑆𝐸𝑀𝑑𝑖𝑓𝑓 where t is the t statistic for a given
cumulative probability with n degrees of freedom. As 𝑆𝐸𝑀𝑑𝑖𝑓𝑓 = √2 × 𝑆𝐸𝑀 the full equation
is 𝑅𝐶 = 𝑡 × √2 × 𝑆𝐸𝑀 = 2.77 × 𝑆𝐸𝑀, for cumulative probability of 0.975 and n degrees
of freedom [6,8,9]. Bland and Altman further describe the 95% limits of agreement (LOA) as
±RC. A biostatistician was consulted prior to the study and aided in the performance of all
statistical analysis.
[1] Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation
Coefficients for Reliability Research. J Chiropr Med 2016;15:155–63.
[2] McGraw KO, Wong SP. Forming inferences about some intraclass correlations
coefficients. Psychol Methods 1996;1:30–46.
[3] Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol
Bull 1979;86:420–8.
[4] Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness
of evaluative instruments. J Chronic Dis 1987;40:171–8.
[5] Beaton DE, Bombardier C, Katz JN, et al. A taxonomy for responsiveness. J Clin
Epidemiol 2001;54:1204–17.
[6] Hopkins WG. Measures of Reliability in Sports Medicine and Science. Sport Med
2000;30:375–81.
[7] Stratford PW, Goldsmith CH. Use of the Standard Error as a Reliability Index of
Interest: An Applied Example Using Elbow Flexor Strength Data. Phys Ther
1997;77:745–50.
[8] Weir JP. Quantifying Test-Retest Reliability Using the Intraclass Correlation
Coefficient and the SEM. J Strength Cond Res 2005;19:231.
[9] Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat
Methods Med Res 1999;8:135–60.