automatic landmark estimation for adolescent idiopathic … · 2017-09-10 · automatic landmark...

Automatic Landmark Estimation for AdolescentIdiopathic Scoliosis Assessment Using BoostNet

Hongbo Wu1, Chris Bailey1,3, Parham Rasoulinejad1,3, and Shuo Li1,2(B)

1 Department of Medical Imaging, Western Univeristy, London, ON, [email protected]

2 Digital Imaging Group (DIG), London, ON, Canada3 London Health Sciences Center, London, ON, Canada

Abstract. Adolescent Idiopathic Scoliosis (AIS) exhibits as an abnor-mal curvature of the spine in teens. Conventional radiographic assess-ment of scoliosis is unreliable due to the need for manual interventionfrom clinicians as well as high variability in images. Current methods forautomatic scoliosis assessment are not robust due to reliance on seg-mentation or feature engineering. We propose a novel framework forautomated landmark estimation for AIS assessment by leveraging thestrength of our newly designed BoostNet, which creatively integrates therobust feature extraction capabilities of Convolutional Neural Networks(ConvNet) with statistical methodologies to adapt to the variability inX-ray images. In contrast to traditional ConvNets, our BoostNet intro-duces two novel concepts: (1) a BoostLayer for robust discriminatory fea-ture embedding by removing outlier features, which essentially minimizesthe intra-class variance of the feature space and (2) a spinal structuredmulti-output regression layer for compact modelling of landmark coor-dinate correlation. The BoostNet architecture estimates required spinallandmarks within a mean squared error (MSE) rate of 0.00068 in 431crossvalidation images and 0.0046 in 50 test images, demonstrating itspotential for robust automated scoliosis assessment in the clinical setting.

Keywords: Boosting · ConvNet · AIS · Scoliosis · Deep Learning ·Outlier

1 Introduction

Adolescent Idiopathic Scoliosis (AIS) is an abnormal structural, lateral, rotatedcurvature of the spine, which arises in children at or around puberty and couldpotentially lead to reduced quality of life [1]. The estimated incidence of AIS is2.5% in the general population and only 0.25% of patients will progress to a statewhere treatment is necessary [2]. Early detection of progression symptoms haspotential positive impacts on prognosis by allowing clinicians to provide earliertreatment for limiting disease progression.

However, conventional manual measurement involves heavy intervention fromclinicians in identification of required vertebrae structures, which suffers fromc© Springer International Publishing AG 2017M. Descoteaux et al. (Eds.): MICCAI 2017, Part I, LNCS 10433, pp. 127–135, 2017.DOI: 10.1007/978-3-319-66182-7 15

128 H. Wu et al.

high inter- and intra-observer variability while being time-intensive. The accu-racy of measurement is often affected by many factors such as the selection ofvertebrae, the bias of observer, as well as image quality. Moreover, variabilitiesin measurements can affect diagnosis when assessing scoliosis progression. It istherefore important to provide accurate and robust quantitative measurementsfor spinal curvature. The current widely adapted standard for making scoliosisdiagnosis and treatment decisions is the manual measurement of Cobb angles.These angles are derived from a posterior-anterior (back to front) X-rays andmeasured by selecting the most tilted vertebra at the top and bottom of thespine with respect to a horizontal line [3]. It is challenging for clinicians to makeaccurate measurements due to the large anatomical variation and low tissuecontrast of x-ray images, which results in huge variations between different clini-cians. Therefore, computer assistance is necessary for making robust quantitativeassessments of scoliosis.

Segmentation and Filter-Based Method for AIS Assessment. Currentcomputer-aided methods proposed in the literature for the estimation of Cobbangles are not ideal as part of clinical scoliosis assessment. Mathematical modelssuch as Active Contour Model [4], Customized Filter [5] and Charged-ParticleModels [6] were used to localize required vertebrae in order to derive the Cobbangle from their slopes. These methods require accurate vertebrae segmentationsand feature engineering, which makes them computationally expensive and sus-ceptible to errors caused by variation in x-ray images.

Machine Learning-Based Method for AIS Assessment. Machine learningalgorithms such as Support Vector Regression (SVR) [7], Random Forest Regres-sion (RFR) [8], and Convolutional Neural Networks (ConvNet) [9,10] have beenused for various biomedical tasks, their direct application to AIS assessmentsuffer from the following limitations: (1) the method’s robustness and general-izability can be compromised by the presence of outliers (such as human error,imaging artifacts, etc.) in the training data [11], which usually requires a ded-icated preprocessing stage and (2) the explicit dependencies between multipleoutputs (landmark coordinates) are not taken into account, which is essentialfor enhancing discriminative learning with respect to spinal landmark locations.While [12] successfully modified the SVR to incorporate output dependencies forthe detection of spinal landmarks, their method still requires suboptimal featureextraction which does not cope with image outliers.

Proposed Method. Our proposed BoostNet achieves fully automatic clinicalAIS assessment through direct spinal landmark estimation. The use of land-marks is advantageous to scoliosis assessment due to the fact that a set of spinallandmarks contain a holistic representation of the spine, which are robust to

Automatic Landmark Estimation 129

Fig. 1. Architecture of the BoostNet for landmark based AIS assessment. Relevant fea-tures are automatically extracted and any outlier features are removed by the Boost-Layer. A spinal structured multi-output layer is then applied to the output to capturethe correlation between spinal landmarks.

variations in local image contrast. Therefore, small local deviations in spinallandmark coordinates will not affect the overall quality of the detected spinalstructure compared to conventional segmentation-based methods. Figure 1 showsour proposed BoostNet architecture overcoming the limitations of conventionalAIS assessment. As shown in Fig. 1, the BoostNet architecture overcomes the lim-itations of conventional AIS assessment by enhancing the feature space throughoutlier removal and improving robustness by enforcing spinal structure.

Contribution. In summary, our work contributes in the following aspects:

– The newly proposed BoostNet architecture can automatically and efficientlylocate spinal landmarks, which provides a multi-purpose framework for robustquantitative assessment of spinal curvatures.

– The newly proposed BoostLayer endows networks with the ability to effi-ciently eliminate deleterious effects of outlier features and thereby improvingrobustness and generalizability.

– The newly proposed spinal structured multi-output layer significantlyimproves regression accuracy by explicitly enforcing the dependencies betweenspinal landmarks.

130 H. Wu et al.

2 Methodology

2.1 Novel BoostNet Architecture

Our novel BoostNet architecture is designed to automatically detect spinal land-marks for comprehensive AIS assessment. Our BoostNet consists of 3 parts: (1)a series of convolutional layers as feature extractors to automatically learn fea-tures from our dataset without the need for expensive and potentially subop-timal hand-crafted features, (2) a newly designed BoostLayer (Sect. 2.1), whichremoves the impact of deleterious outlier features, and (3) a spinal structuredmulti-output layer (Sect. 2.1) that acts as a prior to alleviate the impact of smalldataset by capturing essential dependencies between each spinal landmark.

(a) (b) (c)

Fig. 2. Conceptualized diagram of our BoostLayer module. (a) The presence of outliersin the feature space impedes robust feature embedding. (b) The BoostLayer moduledetects outlier features based on a statistical properties. We use an orange dashed lineto represent the outlier correction stage of the BoostLayer. For the sake of brevity, wedid not include the biases and activation function in the diagram. (c) After correctingoutliers, the intra-class feature variance is reduced, allowing for a more robust featureembedding.

BoostLayer. As shown in Fig. 2, the BoostLayer reduces the impact of dele-terious outlier features by enhancing the feature space. The sources of outliersin medical images typically include imaging artifacts, local contrast variabil-ity, and human errors, which reduces the robustness of predictive models. TheBoostLayer algorithm creatively integrates statistical outlier removal methodsinto ConvNets in order to boost discriminative features and minimize the impactof outliers automatically during training. The BoostLayer improves discrimina-tive learning by minimizing the intra-class variance of the feature space. Outlierfeatures within the context of this paper is defined as values that are greaterthan a predetermined threshold from the mean of the feature distribution. Anoverview of the algorithm is shown in Algorithm1.


Algorithm 1. BoostLayer1: Initialization: set μ = 0, x = 0, σ = 0,

randomly initialize W , b1, b22: repeat3: for k ∈ {1, · · · , n} do4: Update μ using bootstrap sam-

pling.5: Compute the reconstruction R =

f(x · W + b1) · WT + b2.6: Compute the reconstruction error

ε = (x − R)2.7: Compute x using (1).8: Compute y = f(x · W + b1).9: Update W , b1, b2 via backpropa-

gation.10: end for11: until Convergence

The BoostLayer functions by firstcomputing a reconstruction (R) ofsome input feature (x): R = f(x ·W +b1) ·WT + b2 where f is the relu acti-vation function, W is the layer weightsand WT its transpose, and b1/2 arethe bias vectors.

The element-wise reconstructionerror (ε) can be defined as ε = (x −R)2. This can alternatively be seenas the variance of a feature withrespect to the latent feature distribu-tion. What we want to establish nextis a threshold such that any input(features) with reconstruction errorlarger than the threshold is replacedby the mean of the feature in order tominimize intra-feature variance. For our experiments, we assumed a Gaussiandistribution for the feature population and used a threshold of 2 standard devi-ations as the criterion for determining outliers.

In other words, we want to construct an enhanced feature space (x) suchthat:

x =

{xi εi ≤ (2σi)2

μi εi > (2σi)2(1)

where μi is the estimated population mean of the ith feature derived throughsampling and σi is the feature’s sample standard deviation.

Each feature’s population mean can be approximated by sampling across eachmini-batch during training using μ = 1

T×M

∑Tk

∑Mi xi, where M is the number

of mini-batches per epoch, T is the number of epochs and x is the sample meanof a batch. For our experiments, we used a mini-batch size of 100 and trainedfor 100 epochs.

Finally, we transform the revised input using the layer weights such thaty = f(x · W + b1).

Spinal Structured Multi-output Layer. The Spinal Structured Multi-Output Layer acts as a structural prior to our output landmarks, which alle-viates the impact of small datasets while improving the regression accuracy.As shown in Fig. 1, the layer captures the dependency information between thespinal landmarks in the form of a Dependency Matrix (DM) S. We define Sas a spinal structured DM for the output landmarks, in which adjacent spinallandmarks are represented by 1 while distant landmarks are represented by 0.For instance, since vertebrae T1 and T3 are not directly connected, we assigntheir dependency value as S[1, 3] = S[3, 1] = 0 while T1 and T2 are connected sotheir dependency was set to S[1, 2] = S[2, 1] = 1 and so on. The spinal structuredmulti-output layer f(ai) is defined as:

132 H. Wu et al.

f(ai) =

{ai · Si ai > 00 ai ≤ 0

(2)

where ai = xi · Wi + bi, Si is the landmark dependency matrix, Wi the weights,and bi the bias of landmark coordinate i.

2.2 Training Algorithm

We trained the BoostNet using mini-batch stochastic gradient descent optimiza-tion with Nesterov momentum of 0.9 and a starting learning rate of 0.01. Thelearning rate was adaptively halved based on validation error during training inorder to tune the parameters to a local minimum. We trained the model over1000 epochs and used Early Stopping to prevent over-fitting. During training,the loss function is optimized such that L(X,Y, θ) =

∑ci (Yi − F (X))2+λ

∑ki |θi|

(where c is the number of classes, Y is the ground truth landmark coordinates,F (X) is the predicted landmark coordinates, and θ is the set of model para-meters) is minimized. The model and training algorithm was implemented inPython 2.7 using the Keras Deep Learning library [13].

2.3 Dataset

Our dataset consists of 481 spinal anterior-posterior x-ray images provided bylocal clinicians. All the images used for training and testing show signs of scol-iosis to varying extent. Since the cervical vertebrae (vertebrae of the neck) areseldom involved in spinal deformity [14], we selected 17 vertebrae composedof the thoracic and lumbar spine for spinal shape characterization. Each ver-tebra is located by four landmarks with respect to four corners thus resultingin 68 points per spinal image. These landmarks were manually annotated bythe authors based on visual cues. During training, the landmarks were scaledbased on original image dimensions such that the range of values lies between0–1 depending on where the landmark lies with respect to the original image(e.g. [0.5, 0.5] is exact centre of the image). We then divided our data accordingto 431 training/validation (Trainset) and 50 testing set (Testset) such that nopatient is placed in both sets. We then trained and validated our model on theTrainset and tested the trained model on the Testset.

Data Augmentation. Since ConvNets like our BoostNet typically require largeamounts of training data, we augmented our data in order teach our networkthe various invariance properties in our dataset. The types of augmentation usedinclude: (a) Adding Gaussian Noise directly to our image in order to simulateinherent noise and (b) Randomly adjusting the landmark coordinates based onGaussian distribution in order to simulate variability during data labelling.


3 Results

The BoostNet achieved superior performance in landmark detection comparedto other baseline models in our crossvalidation study. Figure 3(a) shows the qual-itative results of the BoostNet’s effectiveness in spinal landmark detection. TheBoostNet accurately detects all the spinal landmarks despite the variations inanatomy and image contrast between different patients. The landmarks detectedby the BoostNet appear to follow the general spinal curvature more closely com-pared to conventional ConvNet. Figure 3(b) demonstrates the effectiveness ofour BoostNet in learning more discriminative features compared to an equiva-lent ConvNet (without BoostLayer and structured output).

BoostNet Detections

ConvNet Detections

(a) (b)

Fig. 3. Empirical results of our BoostNet algorithm. (a) The landmarks detected byour BoostNet conforms to the spinal shape more closely compared to the ConvNetdetections. (b) The BoostNet converges to a much lower error rate compared to theConvNet.

Evaluation. We use the Mean Squared Error (MSE = E[(f(X) − Y )2]) andPearson Correlation Coefficient (ρ = E[f(X)]E[Y ]

σf(X)σY) between the predicted land-

marks (f(X)) and annotated ground truth (Y ) as the criteria of evaluating theaccuracy of the estimations.

Crossvalidation. Our model achieved a reputable average MSE of 0.00068 inlandmark detection based on 431 images and is demonstrated as a robust methodfor automatic AIS assessment. In order to validate our model as an effective wayfor landmark estimation, we applied a 5-fold crossvalidation of our model againstthe Trainset. Table 1(a) summarizes the average crossvalidation performance ofour model and several baseline models including ConvNet (our model withoutBoostLayer and Structured Output Layer), RFR [15], and SVR [12].

134 H. Wu et al.

Table 1. The BoostNet achieved lowest error in landmark estimation on 5-fold cross-validation between various baseline models on (a) Trainset and (b) held out Testset.The units for MSE is fraction of original image (e.g. 0.010 MSE represents average of10 pixel error in a 100 × 100 image).

Method (a) Trainset (b) Testset

MSE ρ MSE ρ

SVR [12] 0.0051 ± 0.0018 0.95 ± 0.0037 0.006 0.93

RFR [15] 0.0026 ± 0.0025 0.96 ± 0.0045 0.0052 0.94

ConvNet 0.014 ± 0.0077 0.87 ± 0.062 0.018 0.84

BoostNet 0.00068±0.004 0.97 ± 0.0082 0.0046 0.94

Test Performance. Table 1(b) demonstrates the BoostNet’s effectiveness ina hypothetical real world setting. After training each of the models listed inthe table on all 431 images from the Trainset, we evaluated each model on theTestset consisting of 50 unseen images. The BoostNet outperforms the otherbaseline methods based on MSE rate while showing superior qualitative resultsas seen in Fig. 3(a).

Analysis. The BoostNet achieved the lowest average MSE of 0.0046 and thehighest correlation coefficient of 0.94 on the unseen Testset. This is due to thecontributions of (1) the BoostLayer, which successfully learned robust discrim-inative feature embeddings as is evident in the higher accuracy in images withnoticeable variability in Fig. 3(a) and (2) the spinal structured multi-outputregression layer, which faithfully captured the structural information of thespinal landmark coordinates. The success of our method is further exemplifiedby the more than 5-fold reduction in MSE as well as more rapid convergencecompared to the conventional ConvNet model Fig. 3(b).

4 Conclusion

We have proposed a novel spinal landmark estimation framework that uses ournewly designed BoostNet architecture to automatically assess scoliosis. The pro-posed architecture creatively utilizes the feature extraction capabilities of Con-vNets as well as statistical outlier detection methods to accommodate the oftennoisy and poorly standardized X-ray images. Intense experimental results havedemonstrated that our method is a robust and accurate way for detecting spinallandmarks for AIS assessment. Our framework allows clinicians to measure spinalcurvature more accurately and robustly as well as enabling researchers to developpredictive tools for measuring prospective risks based on imaging biomarkers forpreventive treatment.


References

1. Weinstein, S.L., Dolan, L.A., Cheng, J.C., Danielsson, A., Morcuende, J.A.: Ado-lescent idiopathic scoliosis. Lancet 371(9623), 1527–1537 (2008)

2. Asher, M.A., Burton, D.C.: Adolescent idiopathic scoliosis: natural history andlong term treatment effects. Scoliosis 1(1), 2 (2006)

3. Vrtovec, T., Pernus, F., Likar, B.: A review of methods for quantitative evaluationof spinal curvature. Eur. Spine J. 18(5), 593–607 (2009)

4. Anitha, H., Prabhu, G.: Automatic quantification of spinal curvature in scolioticradiograph using image processing. J. Med. Syst. 36(3), 1943–1951 (2012)

5. Anitha, H., Karunakar, A., Dinesh, K.: Automatic extraction of vertebral endplatesfrom scoliotic radiographs using customized filter. Biomed. Eng. Lett. 4(2), 158–165 (2014)

6. Sardjono, T.A., Wilkinson, M.H., Veldhuizen, A.G., van Ooijen, P.M., Purnama,K.E., Verkerke, G.J.: Automatic cobb angle determination from radiographicimages. Spine 38(20), 1256–1262 (2013)

7. Sanchez-Fernandez, M., de Prado-Cumplido, M., Arenas-Garcıa, J., Perez-Cruz,F.: SVM multiregression for nonlinear channel estimation in multiple-inputmultiple-output systems. IEEE Trans. Signal Process. 52(8), 2298–2307 (2004)

8. Zhen, X., Wang, Z., Islam, A., Bhaduri, M., Chan, I., Li, S.: Multi-scale deepnetworks and regression forests for direct bi-ventricular volume estimation. Med.Image Anal. 30, 120–129 (2016)

9. Kooi, T., Litjens, G., van Ginneken, B., Gubern-Mrida, A., Snchez, C.I., Mann,R., den Heeten, A., Karssemeijer, N.: Large scale deep learning for computer aideddetection of mammographic lesions. Med. Image Anal. 35, 303–312 (2017)

10. Christ, P.F., Elshaer, M.E.A., Ettlinger, F., Tatavarty, S., Bickel, M., Bilic, P.,Rempfler, M., Armbruster, M., Hofmann, F., D’Anastasi, M., Sommer, W.H.,Ahmadi, S., Menze, B.H.: Automatic liver and lesion segmentation in CT usingcascaded fully convolutional neural networks and 3D conditional random fields.CoRR abs/1610.02177

11. Acuna, E., Rodriguez, C.: On detection of outliers and their effect in supervisedclassification (2004)

12. Sun, H., Zhen, X., Bailey, C., Rasoulinejad, P., Yin, Y., Li, S.: Direct estima-tion of spinal cobb angles by structured multi-output regression. In: Nietham-mer, M., Styner, M., Aylward, S., Zhu, H., Oguz, I., Yap, P.-T., Shen, D. (eds.)IPMI 2017. LNCS, vol. 10265, pp. 529–540. Springer, Cham (2017). doi:10.1007/978-3-319-59050-9 42

13. Chollet, F., Keras: (2015). https://github.com/fchollet/keras14. S.D.S. Group: Radiographic Measurement Manual. Medtronic Sofamor Danek,

USA (2008)15. Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E.: Regression forests for

efficient anatomy detection and localization in CT studies. In: Menze, B., Langs,G., Tu, Z., Criminisi, A. (eds.) MCV 2010. LNCS, vol. 6533, pp. 106–117. Springer,Heidelberg (2011). doi:10.1007/978-3-642-18421-5 11

http://dx.doi.org/10.1007/978-3-319-59050-9_42

http://dx.doi.org/10.1007/978-3-319-59050-9_42

https://github.com/fchollet/keras

http://dx.doi.org/10.1007/978-3-642-18421-5_11

automatic landmark estimation for adolescent idiopathic … · 2017-09-10 · automatic landmark...

Documents