a computer vision-based system for monitoring vojta therapyfarid/papers/vojta_ijmi_2018.pdf ·...

A computer vision-based system for monitoring Vojta therapy

Muhammad Hassan Khana,b,∗, Julien Helspera, Muhammad Shahid Faridb, Marcin Grzegorzeka,c

aResearch Group of Pattern Recognition, University of Siegen, GermanybCollege of Information Technology, University of the Punjab, Pakistan

cFaculty of Informatics and Communication, University of Economics in Katowice, Poland

Abstract

A neurological illness is the disorder in human nervous system that can result in various diseases including the motor disabilities.Neurological disorders may affect the motor neurons, which are associated with skeletal muscles and control the body movement.Consequently, they introduce some diseases in the human e.g. cerebral palsy, spinal scoliosis, peripheral paralysis of arms/legs,hip joint dysplasia and various myopathies. Vojta therapy is considered a useful technique to treat the motor disabilities. In Vojtatherapy, a specific stimulation is given to the patient’s body to perform certain reflexive pattern movements which the patient isunable to perform in a normal manner. The repetition of stimulation ultimately brings forth the previously blocked connectionsbetween the spinal cord and the brain. After few therapy sessions, the patient can perform these movements without externalstimulation. In this paper, we propose a computer vision-based system to monitor the correct movements of the patient duringthe therapy treatment using the RGBD data. The proposed framework works in three steps. In the first step, patient’s bodyis automatically detected and segmented and two novel techniques are proposed for this purpose. In the second step, a multi-dimensional feature vector is computed to define various movements of patient’s body during the therapy. In the final step, amulti-class support vector machine is used to classify these movements. The experimental evaluation carried out on the largecaptured dataset shows that the proposed system is highly useful in monitoring the patient’s body movements during Vojta therapy.

Keywords: Vojta therapy, Cerebral palsy, Spinal scoliosis, Musculoskeletal system, Computer vision, Microsoft Kinect

1. Introduction

Vojta therapy (VT) is a technique to effectively treat thephysical and the mental impairments as well as to deal withthe structural disorders of the muscles and joints in human. Thepatients suffering from such problems are unable to make somespecific reflexive movements in their body parts in the normalmanner. A child neurologist, Prof. Dr. Vaclav Vojta discov-ered a therapy treatment for such problems and developed theso-called Vojta Principle [1–4]. Vojta therapy is based on theprinciple of reflex locomotion, i.e., the patient’s central nervoussystem can be activated by giving the correct stimulations, as-suming that it is still partially intact. The patient may be as-sessed in the first year of his/her life with reference to vojta’sneurokinesiological diagnostics [5]. At birth, these movementsprovided by the central nervous system can only be initializedin a very limited way, but in a healthy child these movementsappear within the first year of his/her life. It is observed thatsuch patients respond to certain stimulation when given to theirbody in certain positions. By repeating these stimulation, thepreviously blocked connections between the spinal cord and thebrain become available enabling the patient to perform thesemovements without any external stimulation. Vojta therapy ac-tivates the patient’s whole body to achieve better posture and

∗Corresponding authorEmail address: [email protected] (Muhammad

Hassan Khan)

more precise movements. It is quite useful in dealing with dis-eases like cerebral palsy, disturbance in central coordination,peripheral paralysis of the arms/legs, hip joint dysplasia, vari-ous myopathies. A large number of case studies and investiga-tions, e.g. [6–13], have been carried out to assess the effective-ness of the Vojta procedure. The results showed that the Vojtatreatment is efficient and helpful in various motor diseases in-cluding CP.

Reflex locomotion is a combination of Reflex Creeping inprone lying position (i.e., lying flat with the chest down andback up) and Reflex Rolling in supine lying position (i.e., lyingflat with the chest up and back down) and side lying positions,as shown in Fig. 1. Reflex locomotion enables the elementarypatterns of movement in patients. According to Vojta, whilelying in one of the aforementioned positions, one can observemotor reactions occurring throughout the patient’s body whena specific stimulation is given to him/her. Therefore, the ther-apists exploit a combination of 10 different zones on patient’sbody by putting light pressure on those areas and resistance tothe current movement (e.g., the tendency to rotate the head dur-ing reflex creeping) to cause the patient’s body to perform cer-tain reflexive movement patterns.

The Vojta therapy can be applied to the patients of any agegroup, however it is extremely effective for young babies of ageless than 6 months because most of the developmental changestake place in the early stage of a child’s life. For better re-sults, the therapy session of 5–20 minutes should be performed

Preprint submitted to International Journal of Medical Informatics October 1, 2018

(a) (b) (c)

Figure 1: Example of child’s lying positions during the therapy. (a) prone ly-ing position, (b) supine lying position, (c) side lying position. Photo cour-tesy: Red Cross Children Hospital, Siegen, Germany (Website: http://www.drk-kinderklinik.de).

several times in a day or week. The treatment may continue forseveral weeks depending upon the growth of the patient. There-fore, the therapists may recommend to parents for in-home con-tinuation of the therapy. The therapist explains the goals and themethods of the relevant therapy exercises to the parents so thatthey can continue the therapy at home. The therapists regularlyreviews the treatment program and the frequency of therapy ses-sions according to the child’s progress. Sometimes during ther-apy, the children start crying causing the parents’ concern abouttheir child’s well-being. At this age, however, crying is an ap-propriate means of expression for the young patient, which af-ter a short familiarization period, becomes less and less intense.They assume that the therapy is hurting their child and as a re-sult they might stop the therapy sessions at home. Therefore,a computer vision based monitoring system is required to an-alyze the movements of the patient and to determine whetherthe the therapy is being correctly performed or not. The aim ofthe proposed therapy motoring system is to provide an accuratein-home therapy alternative to in-hospital therapy. An in-homecontinuation of the therapy is not only helpful for quick recov-ery of the patient, it is also very useful for those who do nothave access to the hospitals in their towns. Moreover, the fre-quent visits to the therapist’s clinic add an economic burdentoo.

In this paper, we propose a computer vision-based frame-work to analyze and verify the accurate movements in infantsduring the Vojta therapy. The proposed system exploits bothcolor images and their corresponding depth images to detectand verify the patient’s movements during therapy. We proposetwo novel segmentation techniques to detect and segment a pa-tient’s body region from the RGB-D data. The first techniqueuses pre-defined human-head and torso template matching indepth images. This algorithm is based upon the sum of squaredifferences (SSD) and cross correlation (CC) based matching.In the second technique, a plane equation method is used ondepth images to identify the table surface where the child is ly-ing for therapy. We use the geometrical features of the plane(i.e., table surface) and segment the child’s body region fromthe table surface using k-means clustering algorithm.

During therapy, when a stimulation is given, a specific move-ment occurs in the upper and lower limbs of the child. Severalfeatures are computed, and analyze to identify these movementsusing a multi-class support vector machine. The classification

results reflect the correctness of the given treatment. The exper-imental results reveal that the proposed method performs verywell and is highly useful to monitor the accurate movements ofthe patient during therapy. Some preliminary results of this re-search are reported in [14, 15]. However, this paper proposes anumber of improvements and is different in many respects:

• Two novel patient detection techniques are proposed whichsignificantly increased the detection accuracy.

• In addition to head template, we also propose to use thetorso template in the template-based detection algorithm.It is particularly useful when the head-based detectionfails in case the head is occluded by the therapist’s armor hand.

• Numerous features extracted from the shape of body arediscussed in detail and exploited to improve the classifi-cation accuracy.

• In contrast to our previous work, search space optimiza-tion is performed in the proposed detection techniques.Exhaustive search is performed only in the first frame ofthe video to find the location of the patient’s body whichis then used to restrict the search area in the succeed-ing frames of the video. This has not only improved thedetection accuracy but also improved the computationalefficiency of the proposed system.

• Additional diagnostic experiments are presented alongwith in depth discussion.

The rest of the paper is organized as follows: the relatedliterature is described in Sect. 2. The overview of the pro-posed system is described in Sect. 3. The patient body detectionand segmentation is presented in Sect. 4. Sect. 5 describes themethod for feature computation to capture the movements inupper and lower limbs of the patient during the therapy. Theexperimental evaluation is presented in Sect. 6 and the conclu-sions are drawn in Sect. 7

2. Related Work

Physical rehabilitation aims at restoring the functional abil-ity of the patients with physical impairments or disabilities.Computer vision and image processing techniques have beeneffectively used in physical rehabilitation and other fields ofmedicine [16–23] to improve the quality of life of the patients.Industrial motion sensors have been utilized in [24, 25] as aphysical rehabilitation tool. However, this process requires wear-ing a number of sensors on the human body that may cause a bitdiscomfort for the patients, and it is not a realistic solution forinfants and young children. Virtual Reality (VR) and motion-based games have also been used for rehabilitation [24, 26].Chen et al. [27] proposed a therapy system for upper limbsmovement using an infrared camera with hand skateboard train-ing device. Patients participating in the therapy have a bindingband attached to hand skateboard on the table to guide the pa-tient in moving the hand skateboard along the designated path

2

http://www.drk-kinderklinik.de

http://www.drk-kinderklinik.de

to train the patient’s upper limbs. Bryanton et al. [28] proposedthe use of VR in children with Cerebral Palsy and concludedthat children took greater interest, performed more repetitionsof the exercise, and generated more ankle dorsiflexion using theVR game in comparison to standalone exercise. Although theevidence indicates that VR can provide an interactive, engag-ing and effective environment for physical therapy, however itrequires expensive hardware and software. They are generallyone-off designs to suit a very specific group of patients or ther-apeutic tasks and they are not helpful in case of newly bornbabies because they cannot interact with such systems.

Microsoft Kinect has been used in several researches asassistive technology at ambient assisted living and rehabilita-tion places. In [29], a system for home-based rehabilitation isproposed using Dynamic Time Warping (DTW) algorithm andfuzzy logic. The evaluation is performed on the trajectory ofjoints and on the time duration to complete the designated setof exercises. Wu [30] calculated the 3D coordinates distancesbetween 15 joints of a live motion skeleton using the humanskeleton from Kinect, and used them to monitor the rehabili-tation progress. In [31], an interface for adults is proposed tomonitor the correct description of therapeutic movement. Thissystem is a standalone exercise, limited to a specific positionof a patient without any occlusion. It is limited to capturethe shoulder movements in four different angles. Yao-Jen etal. [32] developed a Kinect-based rehabilitation system to assistthe therapists in their work to treat the children suffering frommotor disabilities. They use the motion tracking data to analyzethe rehabilitation standards and to allow the therapist to observethe rehabilitation progress. Chang et al. [33] proposed the mo-tion tracking of 6 upper limbs for rehabilitation and used theoutputs of OptiTrack as ground truth to compare with the out-puts of Kinect. Research in [34] used Kinect’s skeletal trackingto analyze the rehabilitation in an upper limb. Recent surveyson various therapy techniques for rehabilitation using Kinectcan be found in [35, 36].

Most of the aforementioned computer assisted rehabilita-tion techniques use skeleton information from Kinect, howeverthis information could not be available in the case of infantsand young children. Moreover, they do not perform well ifsome body parts are occluded. Due to the nature of Vojta ther-apy, some of the patient’s body parts are always occluded bythe therapist hand or arm (Fig. 2). In contrast to the existingtechniques, the proposed algorithm exploits both the visual anddepth information from Kinect to detect and classify the pa-tient’s movements. To the best of our knowledge, this is thefirst automatic computer vision based system for monitoring themovement patterns of infants during the Vojta therapy.

3. Overview of the Proposed Method

The proposed framework works in three steps. In the firststep, the patient is automatically detected and segmented fromthe depth image. The depth images obtained from MicrosoftKinect camera are pre-processed before the segmentation anddetection phase to reduce the noise. Two different approachesare used to detect and segment the child from the depth image.

Figure 2: Sample images where patient’s body is partially occluded.

In the first approach, the edges information in depth image isutilized to detect the candidate position of human head region.A head template is traversed across the whole edge image tolocate the possible regions of child’s head. The relationshipbetween the sum of squared differences and cross correlationis exploited in the proposed matching algorithm. The detectedhead region is verified through a 3D head-model fitting tech-nique which utilizes both the edge and relational depth changeinformation of depth image. In case when the head is occluded,we propose another template to detect the torso of a child in adepth image. The temporal body detection information is alsoexploited. That is, the detected location of the child’s body re-gion in the preceding frames of the video is utilized to track itin the successive frames. After detection, region growing algo-rithm [37] is used to segment the child’s body region from theimage. In the second approach, a plane equation of the tablesurface where the patient is lying for treatment is computed. Weused the geometrical features of the plane to segment the patientfrom the table surface using the k-means clustering algorithm.We performed the calibration of depth and color images usingthe camera’s extrinsic and intrinsic parameters to segment theinfant’s body region from both the depth image and the colorimage.

In the second step, the movement patterns in patient’s var-ious body parts during therapy are analyzed. Numerous fea-tures are designed to capture the movements in different bodyparts of the patient. The visual information (color image) isutilized to identify the lying position of the patient. In the finalstep, a multi-class support vector machine is trained and used toclassify the accurate movements of the patient during therapy.Fig. 3 shows the block diagram of the proposed system.

4. Patient Body Detection

4.1. Depth data pre-processing

The 3D depth sensor in Microsoft Kinect provides the depthinformation of the scene as a two-dimensional array, known asthe depth image. Each pixel in a depth image represents thedistance of the corresponding object from the camera. For vi-sualization and compression, the depth values are normalized inthe range 0-255 [38, 39]. The depth sensor in the Kinect cameracan capture the distance information in the range 0.8–3.5 me-ters. The pixel out of this range are filled with the offset value0 and can be viewed as random black dots in the depth image.These missing pixels are considered as noise, and must be re-covered to effectively use the depth data in image processingtechniques. Keeping in view that image is a continuous space,

3

Input (RGB & depth image)

Pre-processing

Yes Yes

Head

Verification

Patient’s Body Segmentation

Feature Extraction

Classification

xt

Head Detection

Torso

Detection No

Computation of Plane Equation

Calibration RGB image

Calibrated RGB image

K-means Clustering

Patient’s Body Segmentation

Figure 3: Overview of the proposed method.

we recover these missing pixels from the neighboring pixels byapplying nearest neighbor interpolation algorithm.

We propose two novel techniques to detect the patient’sbody region from the depth images, which is then used in activ-ity monitoring. They are explained in the following sections.

4.2. Detection using template matching

In this technique, a head and body torso templates are usedto detect and segment the patient’s body. The edge informationextracted from the depth image is used to locate the position ofthe child’s head. The Canny edge detector [40] is used to findthe edges in the depth image. The edges with length smallerthan a pre-defined threshold (τe) are insignificant and they aredropped from future processing to save the computational time.The proposed head template is traversed over the entire edgeimage from top-left to bottom-right and the best match is found.The exhaustive search is performed only in the first frame ofeach entire video. In the preceding frames, the temporal in-formation of detected location of the child’s body region is ex-ploited to limit the search area in succeeding frames.

The template matching is performed on each edge pixel ofthe depth image and similarity between the template and thesource is computed. For the matching, we utilized the sum ofsquare differences (SSD) and cross correlation (CC) based al-gorithms. Let I be the edge image from depth data, T be thetemplate image with size m × n. In the template image, werepresent a pixel as T (i, j) where i and j represent the x andy positions of the pixel respectively. Let us assume that T isbeing matched with a rectangular region in the edge image I ,where a pixel in I is represented as I(x, y). The SSD value Sat pixel I(x, y) is computed as:

S(x, y) =

m∑i=1

n∑j=1

(I(x+ i, y + j)− T (i, j)

)2(1)

S can also be viewed as squared Euclidean distance betweenthe image patch of I and the template T . Expanding (1) yields,

S(x, y) =

m∑i=1

n∑j=1

I2(x+ i, y + j) +

m∑i=1

n∑j=1

T 2(i, j)−

2

m∑i=1

n∑j=1

I(x+ i, y + j)T (i, j)

(2)

In (2), the first term is the sum of squared values in the edgeimage, the second term belongs to the template image, and thethird term is twice the correlation between the patch of edge im-age I and the template T . Note that the term

∑mi=1

∑nj=1 T

2(i, j)

is constant and the term∑mi=1

∑nj=1 I

2(x+ i, y+j) is approx-imately constant [41]. The remaining term (i.e., cross correla-tion between I(x, y) and T ) is:

C(x, y) =

m∑i=1

n∑j=1

I(x+ i, y + j)T (i, j), (3)

It can be observed from (2) that the Euclidean distance be-tween the source patch and the template T decreases as theirsimilarity (i.e., correlation) increases. This also gives an intu-ition to use correlation as similarity measure. The cross cor-relation in spatial domain can be computed by taking the in-ner product of image patch and template. This calculation iscomputationally very expensive [42]. For a search area of sizeM ×M and a template of size N ×N , it require N2(M −N +1)2 additions and N2(M − N + 1)2 multiplications [43]. Tocope with this computational overhead, the cross correlation iscomputed in frequency domain using the fast Fourier transform(FFT) [44–46] and correlation theorem. The theorem states thatmultiplying the Fourier transform of one function (i.e., edge im-age I) by the complex conjugate of the Fourier transform of theother (i.e., the head template T ) gives the Fourier transform oftheir correlation. That is,

C(u, v) = F−1(F∗(I(x, y))F(T (i, j))

), (4)

where F∗(I(x, y)) represents the complex conjugate of edgeimage I , F(T (i, j)) represents the Fourier transform of tem-plate image T and F−1 is the inverse Fourier transform. Thecomputational cost of computing CC in frequency domain is:18M2 log2M real additions and 12M2 log2M real multiplica-tions [43]. Computing correlation either in spatial domain or infrequency domain does not affect the performance of the pro-posed detection algorithm, however the correlation computa-tion in frequency domain is 2.5 to 12 times faster than in spatialdomain [47].

The size of the head varies from child to child and also de-pends on the distance of the child from the camera during thetherapy. A person’s head in depth image appearing close to thecamera is characterized by a large region compared to the oneat a far distance. The distance and the orientation angle of thecamera from the table surface is pre-defined and fixed to cap-ture the best quality data with minimum occlusion. Moreover,

4

(a) (b) (c)

(d) (e) (f)

Figure 4: Results of proposed algorithm on a sample depth image. (a) a ma-nipulated color image (i.e., converted into its negative); (b) corresponding depthimage of (a); (c) the template used for head detection; (d) head location afterverification; (e) segmented body region in depth image; (f) corresponding bodyregion in RGB image.

the matching algorithm is applied in a multi-resolution fash-ion which makes the algorithm robust to change in scale. Weused three different sizes of templates with a sampling rate of14 for pyramids construction to detect the head location in theedge image. The locations with high correlations are marked ascandidates for child head. However, all the detected locationsdo not necessarily contain the head. Therefore, the detectedlocations are verified through a three-dimensional head modelfitting technique. A 3D hemisphere model is constructed usingthe detected head region for the verification of the head. Weexploited the 3D head model fitting technique proposed in [48]for the head verification.

After the head verification, the entire body of the patient isextracted from the image using region growing algorithm. Theregion growing algorithm computes the similarity of a selectedpixel to its neighboring pixels and grows itself by includingthem in the region if the similarity is more than a pre-definedthreshold τs. The corresponding region in the RGB image isextracted too using the camera’s extrinsic and intrinsic param-eters. Fig. 4 shows the results of the proposed detection algo-rithm on a sample depth image. Fig. 4a shows a color image,and the corresponding depth image is shown in Fig. 4b. Fig. 4cis the head template used in experiments. The red dot in greenrectangle in Fig. 4d shows the best matching location. The headis verified and the whole body is extracted: Fig. 4e shows thefinal result in the depth image, and Fig. 4f shows the result inthe color image.

The proposed head-template based matching algorithm per-forms very well when the head of the patient is visible however,it may perfrom poor when the head is partially visible or totallyoccluded by the therapist. This situation mostly occurs whenthe treatment is given in supine lying position and sometimesin prone lying position too when the patient’s head is occludedby the therapist’s hand or arm. In such cases, we use a torsotemplate to detect the body the of child in the depth image.Fig. 5 shows the results of torso-based matching algorithm in asample depth image.

In both the head-based and the torso-based matching tech-

(a) (b) (c)

(d) (e) (f)

Figure 5: Patient body detection using the proposed torso template matchingapproach. (a) a sample color image; (b) corresponding depth image of (a); (c)the torso template used in detection; (d) detected torso location represented bythe red dot; (e) segmentation results in depth image; (f) segmentation results incolor image.

niques, exhaustive search is performed only for the first frameof the video. In the succeeding frames, instead of traversingthe template again on the entire image we exploit the temporalinformation of the patient’s detected location. Since the cam-era is fixed and the child is not making rapid movements onthe table, therefore the location information from the previousframe is very useful to set the search-space in the succeedingframes. In particular, the bounding box location of the child’sbody region in the previous frame is used to track it in the suc-ceeding frames with a relaxation of Ω pixels in all directions.It is not only helpful in improving the detection accuracy but italso reduces the computational cost of the proposed system.

4.3. Detection using plane equation

In the second approach, the child’s body is segmented fromthe depth image using the plane equation of the table surfacewhere the child is lying for treatment. Since the camera andtable remain fixed, the table surface remains in the same planeduring the therapy. To compute the plane equation, three pointson the table surface are needed. These points must neither be ona child’s body region nor on a single line (i.e., collinear). Thesepoints need to be marked only once for a therapy setup as thesystem setup remains fixed after that.

The corresponding world coordinates of the three selectedpixel locations are computed using their depth values. Let usdefine the subscript d for the depth and w for the world, so(xd, yd) is a point in the depth image and the correspondingpoint in the real world is (xw, yw, zw). Moreover, let H andV represent the number of horizontal and vertical pixels in thefield of view, respectively. For a given depth value z at a se-lected location, the remaining x and y coordinates for a givenpixel location (i, j) are determined using the following equa-tions [49]:

xw =H

2tan(θh)(j − H

2)× z (5)

5

(xw,yw,zw)

(xd,yd)

(H/2)

(

(Screen)

Vertical angle of

view (43°)

(a)

A(x1,y1,z1)

(b)

Figure 6: (a) Projecting the depth image pixel in a real world coordinate system.(b) Computing the vectors required for the plane equation.

yw =V

2tan(θv)(i−

V

2)× z (6)

where θh and θv represent the horizontal and vertical viewingangles of the Kinect camera (Fig. 6a), which are 57 and 43,respectively.The plane equation of table surface is computed based on thesethree correspondences. Let A, B and C are the correspondingreal world points. For three points A(x1, y1, z1), B(x2, y2, z2)

and C(x3, y3, z3), we form the two vectors−→U (from A to B)

and−→V (from A to C), as shown in Fig. 6b, using the following

equations:−→U =

−−→AB = B −A (7)

−→V =

−→AC = C −A (8)

Since−→U and

−→V are both located on the same plane and also

parallel to the plane (Fig. 6b), a normal vector (−→N ) can be com-

puted by taking the cross product of−→U and

−→V , which will be

perpendicular to the plane. That is,

−→N =

−→U ×

−→V = (B −A)× (C −A) (9)

We know that the dot product of two orthogonal vectors is zero.In other words, −→

N · (−→U ×

−→V ) = 0 (10)

That is:N · ((B −A)× (C −A)) = 0 (11)

Expanding (11) yields:

〈a, b, c〉 · (〈x2 − x1, y2 − y1, z2 − z1〉×〈x3 − x1, y3 − y1, z3 − z1〉) = 0

(12)

where a, b and c are the components of the normal vector−→N .

Equation (12) is known as the scalar equation of the plane (orthe Cartesian form of the plane equation) and can be written as:

ax+ by + cz = d, (13)

where d can be computed as,

d = ax1 + by1 + cz1 (14)

(a) (b) (c)

Figure 7: (a) Points marking on the table surface; (b) segmented body region indepth image; (c) segmented body region in the corresponding color image.

Equation (13) is known as the plane equation. Using the aboveequation, one can identify the table surface pixels in the depthimage. All of the pixels which satisfy the plane equation withina given threshold range are extracted and the remaining are dis-carded. Holes may appear in the selected region due to depthinaccuracy which can be recovered using image inpainting tech-niques, e.g. [50–54]. The hole filling algorithm described in [51]is applied to estimate the missing pixels. In case more than oneregions are extracted, connected components algorithm is usedto detect the largest region which represents the table surface.The corresponding region in the color image is extracted. Tosegment the child body from the extracted table surface, thek-means clustering algorithm is used with a skin color as obser-vation. Fig. 7 shows the results of the proposed algorithm on asample depth image. Fig. 7a shows the depth image with threepoints marked on the table surface for plane equation compu-tation. The detected body region in depth and correspondingcolor images are shown in Fig. 7b and Fig. 7c, respectively.

5. Feature Extraction for Pose Representation

Vojta therapy is given to a child in prone, supine, and sidelying positions. However, in this paper we primarily focus onthe treatment in prone and supine lying positions. During thetreatment, the movements in different body parts of the child arecarefully analyzed. It is noticed that a specific movement can beobserved in the upper and lower limbs of the child when a spe-cific stimulation is given to him/her. We designed a set of ninefeatures to capture these movements in upper and lower limbsand classified them into four classes as described in Tab. 1.

The depth image does not provide any visual informationwhich makes it very difficult to identify the lying position of

Table 1: Limbs movement classes. Label is class label.

Label Movements

ω1 Upper limbs movement in prone lying positionω2 Lower limbs movement in prone lying positionω3 Upper limbs movement in supine lying positionω4 Lower limbs movement in supine lying position

6

Au

Al

Lu

Ll

wu

hu

hl

wl

h

w

Figure 8: Example of feature computation. Shape bounding box and equallyhigh sub-boxes (i.e., hu = hl) used for feature extraction.

a child from the segmented depth image. Therefore, we cali-brated the color image with the depth image using the camera’sextrinsic and intrinsic parameters to segment the child’s bodyregion from color image too. The lying position of a child isdetected using the face detection algorithm [55] on segmentedcolor image. For feature computation, the bounding box of thewhole segmented region from the depth image is divided intotwo equally-high sub-boxes to define the movements in upperand lower regions separately and in a more discriminative way(see Fig. 8).

Let us define the subscript u is for the boundary box in theupper region and l is for the boundary box in the lower region.Nine features, described in Tab. 2, are computed to form a 9-dimensional feature vector F .

F = F1, F2, F3, F4, F5, F6, F7, F8, F9 (15)

Each feature attribute is scaled in the range [−1,+1]. Themain advantage of scaling is to avoid elements in greater nu-meric ranges dominating those in smaller numeric ranges. An-other advantage is to avoid the numerical difficulties during thecalculation, for example in classification, kernel values usuallydepend on the inner products of feature vectors, large attributevalues might cause numerical problems [56].

5.1. Feature classification

The Support Vector Machine (SVM) has been used as apowerful tool for solving classification problems in many ap-plications [57–61]. Due to the multi-dimensionality of our fea-tures, we chose SVM as classifier. Other similarity based clas-sifiers e.g. K-Nearest Neighbor (KNN) and probability basedclassifiers e.g. Naive Bayes do not perform well for high di-mensional features [62]. In recent years, neural networks havedemonstrated excellent recognition results, however the train-ing process is computationally expensive and requires a hugetraining data [63]. The KNN is an instance based classifier (i.e.,memorizes the complete training set), and therefore it requiresintensive memory. The decision trees and divide-and-conquerbased classification algorithms are highly biased to training setand may cause over-fitting [64]. In contrast, the SVM has demon-strates excellent classification results on high-dimensional data

Table 2: Description of 9 features used in classification. F represents the featureand V is how the respective feature is computed.

F V Feature Description

F1 the lying position of a child.

F2hu

wuThe ration of height and width in upper bound-ing box.

F3hl

wlThe ration of height and width in lower bound-ing box.

F4h

wThe ration of height and width in completebounding box.

F5 Au The area in upper bounding box.F6 Al The area in lower bounding box.F7 Lu The length of counter in upper bounding box.F8 Ll The length of counter in lower bounding box.F9 Au +Al The sum of area in upper and lower bounding

box.

with small training samples [65, 66]. Instead of memorizing theentire training set, it computes very small number of ‘supportvectors’ to construct a hyperplane for discrimination betweenthe positive and the negative samples. Moreover, it can performvery well even the data is not linearly separable in feature space,using kernel trick. The SVM first maps the training samples inhigh dimensional space and then extract a hyper-plane betweenthe different classes of objects using the principle of maximiz-ing the margin. Because of this principle, the generalizationerror of the SVM is theoretically independent of the number offeature dimensions. For a given training vector x, the SVM as-signs a label y = ±1 by taking the sign of a following lineardiscriminant function

y(x) = wTφ(x) + b

It exploits y(x) = 0 to define a hyperplane as a decision bound-ary between the classes. To get an optimal hyperplane, the fol-lowing optimization problem is solved,

minw,b,ξi

1

2wTw + C

n∑i=1

ξi

s.t. ∀i yi(wTφ(xi) + b) ≥ 1− ξi, ξi ≥ 0,

(16)

where (xi, yi), i = 1, 2, ...., n represent the training data, wherexi ∈ <n and yi ∈ +1,−1. Moreover, φ(xi) is the mappingof training data xi into a higher dimensional space, ξi representsthe loss function, and C is a user-defined regularization param-eter which plays an important role in maximizing the marginand minimizing the loss function. The values of the parametersweight vector w and scalar bias b are obtained during the learn-ing. Since the constraint in (16) is quite complex due to thepossible high dimensionality of w therefore, the direct solutionof this formulation is difficult [67]. However, it can be simpli-fied using the theory of Lagrangian duality [68]. This approachleads to solve the following dual problem:

7

Figure 9: Camera setup used in data acquisition.

maxα

D(α) =

n∑i=1

αi −1

2

n∑i,j=1

yiyjαiαjK(xi, xj)

s.t. ∀i 0 ≤ αi ≤ C,n∑i=1

αiyi = 0,

(17)

which is computationally easier to solve due to its much simpleconstraints [67]. Moreover, α is the Lagrange multiplier andK(xi, xj) = φ(xi)

Tφ(xj) represents the kernel function. Theoptimal w satisfies the

∑ni=1yiαiφ(xi) and the decision func-

tion would be

sign(wTφ(xi) + b) = sign(

n∑i=1

yiαiK(xi, xj) + b).

We used LIBSVM library [69] to implement SVM with RBF(Radial Basis Function, also known as Gaussian kernel) kernel,where

K(xi, xj) = exp[− γ‖xi − xj‖2

], γ > 0.

The parameter γ of RBF kernel specifies the spread of the ker-nel. To solve the multi-class problem, it uses one-against-onestrategy. Specifically, it uses two-class SVMs for each pair froma set of all considered classes. Thus for N classes, in totalN(N−1)/2 two-class classifiers are constructed. It uses votingstrategy for classification and each testing sample is classifiedto the class with maximum number of votes. We performed a 5-fold cross-validation to validate the model with the selection ofsoft margin parameter C and, γ, prior training the actual modelon the full training dataset. That is, the training dataset wasrandomly divided into 5 subsets and training was performedfive times; each time leaving one partition out of the trainingprocess which was used for testing.

6. Experiments and Results

6.1. Dataset acquisition

To the best of our knowledge, no public dataset of Vojtatherapy (VT) is available to classify the movements of child’s

(a) (b) (c) (d)

Figure 10: Sample images (negatives) showing various movements in Vojtatherapy. (a)-(b):limbs movement in supine lying position; (c)-(d): limbs move-ment in prone lying position.

body parts during the treatment. Therefore, we collected adataset for VT movements classification and monitoring froma local children hospital. With the consent of the parents, 10children of age between 2 weeks to 6 months, both genders,were selected for data collection. For a fair evaluation, thetherapy was also performed by the parents according to thetherapist’s instructions. The Kinect was put on a tripod at theheight of 2 meters with an angle of 45 with respect to the tablesurface. The setting was chosen to be in accordance with therecommendation for capturing the best data quality with mini-mum occlusion. Fig. 9 shows the camera setup used to capturethe database. Fig. 10 shows the various limbs movements per-formed in Vojta therapy.

For each patient the therapy session usually lasted between15–20 minutes, and both the color and the depth frames wererecorded at 640 × 480 resolution. Upon careful analysis, morethan 15,000 frames containing the useful information of upperand lower limbs movement during therapy were selected. Theperformance of the proposed system is tested for both the pa-tient detection and the accuracy of classification using proposedthe features. The results are discussed in the following sections.

6.2. Quantitative performance analysis and comparison

In this section, we evaluate the performance of the proposedalgorithms quantitatively and compare the results with existingtechnique [14]. We shall refer to our first algorithm which de-tects the patient’s body using head and torso templates as Algo-rithm 1 and the second algorithm which detect the table surfaceby using plane equation and uses a skin color classifier on thesegmented plane to obtain the patient’s body shall be referredas Algorithm 2

There are four possible outcomes of a classifier, namely,true positives (TP), false positive (FP), true negative (TN) andfalse negative (FN). True positive refers to the number of pre-dicted positives that were correct and the FP is the number ofpredicted positives that were incorrect. Similarly, TN and FNrefer to the number of predicted negatives that were correctand incorrect, respectively. To objectively quantify the per-formance of the proposed algorithms, we chose different per-formance metrics: Sensitivity, Specificity, Positive predictivevalue (PPV) or Precision, Negative predictive value (NPV), andaccuracy [48, 70–74]. We also computed the confidence inter-vals (CI) of each measure to estimate the range of values which

8

Table 3: Objective performance analysis of the proposed and the comparedmethods in terms of Sensitivity, Specificity, PPV, NPV and Accuracy. The val-ues in parenthesis represents the 95% confidence interval (CI) for each measure.

Measure Algorithm [14] Proposed Algo. 1 Proposed Algo. 2

(95% CI) (95% CI) (95% CI)

Sensitivity 86.01 (85.2, 86.8) 94.23 (93.7, 94.8) 98.20 (97.9, 98.5)

Specificity 80.93 (80.0, 81.8) 88.08 (87.4, 88.8) 95.71 (95.2, 96.2)

PPV 79.68 (78.9, 80.4) 87.92 (87.3, 88.5) 96.10 (95.7, 96.5)

NPV 86.93 (86.3, 87.6) 94.31 (93.8, 94.8) 98.02 (97.7, 98.3)

Accuracy 83.29 (82.7, 83.9) 91.03 (90.6, 91.5) 97.00 (96.7, 97.3)

is likely to contain the population parameter of interest. Confi-dence intervals are computed at confidence level of 95%. Theperformance measures are defined as follows:

Sensitivity =TP

TP + FN

Specificity =TN

TN + FP

Positive predictive value (PPV) or Precision =TP

TP + FP

Negative predictive value (NPV) =TN

TN + FN

Accuracy or Detection Rate =TP + TN

TP + FN + TN + FP

The results of the evaluation are presented in Tab. 3. Thevalues in parenthesis represent the 95% confidence interval (CI)for each measure. The results show that the algorithm [14]achieves 86% sensitivity. The proposed algorithms performbetter and achieve sensitivity of 94.23% and 98.20%, respec-tively. Moreover, the proposed algorithms perform better than [14]in terms of all performance metrics: specificity, PPV and NPV.The accuracy or detection rate of the algorithm [14] is poorest,83.29% and proposed Algorithm 2 achieves the best rate 97%.The statistics presented in Tab. 3 show that the proposed Al-gorithm 2 outperforms the other two approaches by significantmargins, achieving more than 95.7% score in all performancemeasures.

To evaluate the classification performance of the proposedalgorithm, the dataset is divided into four classes ω1, ω2, ω3 andω4 (Tab. 1). Table 4 summarizes the classification performanceof SVM using the computed features for accurate movements inupper and lower limbs during the treatment in prone and supinelying positions. The results show that the proposed features forpose representation are quite discriminative to classify all therespective movements except in ω2 where the most confusionoccurs. The reason for this confusion is that a significant por-tion of the lower limbs is occluded by the therapist’s arm. Thelying position of a child, however, is detected but the estimationof lower limbs movement might become unstable as shown inFig. 10c and 10d. Obtaining an average classification rate ofmore than 78% lets us believe in this solution and encouragesus to plan future work in this research area by considering some

Table 4: Summary of classification results. The recognition results are high-lighted in boldface.

Predicted

Reference ω1 ω2 ω3 ω4

ω1 80.16 19.50 0.17 0.17ω2 55.00 44.84 0.16 0ω3 0 0 91.33 8.67ω4 0 0 4.00 96.00

Average Accuracy: 78.08

Table 5: Execution time comparison. Time represents the average executiontime per frame in seconds.

Method Time (sec.)

Algorithm in [14] 0.69Proposed Algorithm 1 0.47Proposed Algorithm 2 0.55

novel features for movement analysis.

6.3. Computational complexity analysisWe also evaluated the performance of the proposed algo-

rithms with [14] in terms of execution time. All algorithms areimplemented in Matlab and are tested on the entire dataset forexecution time using Dell Core i5 notebook with 8GB RAM.The results are reported in Tab. 5. The execution time includesthe file I/O time, pre-processing, patient’s body detection, seg-mentation and computation of features. The results show thatthe proposed algorithm 1 and algorithms 2 achieve a speedupof 1.47 and 1.25 over [14], respectively. We observed that thespeedup of algorithm 1 is mainly due to the search area op-timization for template search. We recall that the exhaustivesearch is performed only in the first frame of the video se-quence, the matched location is then used to restrict the search-space in the succeeding frame. In case of algorithm 2, the tablesurface is computed just once - at the time of system setup asthe camera and table positions remain fixed after that. The planeequation is then used in each frame of the video sequence to ex-tract the table surface and segment the patient’s body region.

7. Conclusion

This paper proposed a computer system to monitor the ac-curate movements of patients with motor disabilities during theVojta therapy, using RGBD data. The proposed method worksin three steps. First, the patient’s body is detected from theimage. We proposed two novel techniques for this purpose.Second, the movements in the patient’s body are analyzed anda feature vector is computed to estimate the pose of the seg-mented body region. Third, the proposed feature vector is clas-sified using the support vector machine to classify the accuratemovements during the treatment. The proposed system is testedfor performance on a large database. The experimental evalua-tion results show that the proposed method performs well and

9

can be used for in-home based Vojta therapy to estimate theaccuracy of the treatment. In future, we plan to extend the pro-posed solution to multi-view camera setup and also exploit themotion sensors, such as IMU (Inertial Measurement Units), todetect the movements in the occluded regions.

Conflicts of interest

The authors declare that they have no conflict of interest.

Acknowledgments

We thank the Vojta therapists Mr. Wolfarm Mueller and Ms.Katrin Springmann for helping us in capturing dataset.

References

References

[1] V. Vojta, Reflexumdrehen als bahnungsystem in der menschlichen fortbe-wegung, Z orthop 108 (1970) 446–452.

[2] V. Vojta, Die zerebralen Bewegungsstorungen im Sauglingsalter:Fruhdiagnose und Fruhtherapie; 48 Tabellen, Georg Thieme Verlag,2008.

[3] D. Scrutton, Management of the motor disorders of children with cerebralpalsy, no. 90, Cambridge University Press, 1984.

[4] V. Vojta, A. Peters, Das vojta-prinzip: muskelspiele in reflexfortbewe-gung und motorischer ontogenese, Springer-Verlag, 2007.

[5] Vojta therapy, http://www.vojta.com, accessed: 2018-1-15.[6] S. Imamura, K. Sakuma, T. Takahashi, Follow-up study of children with

cerebral coordination disturbance (ccd, vojta), Brain and Development5 (3) (1983) 311 – 314.

[7] D. D. Juehring, M. R. Barber, A case study utilizing vojta/dynamic neuro-muscular stabilization therapy to control symptoms of a chronic migrainesufferer, Journal of Bodywork and Movement Therapies 15 (4) (2011)538 – 541.

[8] L. P. Lopez, A. P. Gorricho, M. Atin, E. Varela, Effect of the therapy vojtain the rehabilitation of walking in two adult patients with brain damageacquired in chronic phase, Fisioterapia 31 (4) (2009) 151 – 162.

[9] B. Backstrom, L. Dahlgren, Vojta self-training: Experiences of six neu-rologically impaired people: A qualitative study, Physiotherapy 86 (11)(2000) 567 – 574.

[10] H.-W. Lim, The effect of vojta therapy on gross motor function measureand selective voluntary motor control in children with spastic diplegia,Journal of the Korean Society of Physical Medicine 7 (2) (2012) 213–221.

[11] H. Bauer, G. Appaji, D. Mundt, Vojta neurophysiologic therapy, Indianjournal of pediatrics 59 (1) (1992) 37–51.

[12] S. Brandt, H. LøSNSTRUP, T. Marner, K. Rump, P. Selmar, L. Schack,Prevention of cerebral palsy in motor risk infants by treatment ad modumvojta: a controlled study, Acta Paediatrica 69 (3) (1980) 283–286.

[13] C. Morgan, J. Darrah, A. M. Gordon, R. Harbourne, A. Spittle, R. John-son, L. Fetters, Effectiveness of motor interventions in infants with cere-bral palsy: a systematic review, Developmental Medicine & Child Neu-rology 58 (9) (2016) 900–909.

[14] M. H. Khan, J. Helsper, C. Yang, M. Grzegorzek, An Automatic Vision-based Monitoring System for Accurate Vojta-Therapy, in: Proc. Int. Conf.Comput. Inf. Sci., IEEE, 2016, pp. 379–384.

[15] M. H. Khan, J. Helsper, Z. Boukhers, M. Grzegorzek, Automatic recogni-tion of movement patterns in the vojta-therapy using rgb-d data, in: Proc.Int. Conf. Image Process. (ICIP), IEEE, 2016, pp. 1235–1239.

[16] A. Mihailidis, B. Carmichael, J. Boger, The use of computer vision inan intelligent environment to support aging-in-place, safety, and inde-pendence in the home, IEEE Trans. Inf. Technol. Biomed. 8 (3) (2004)238–247.

[17] N. Ayache, Medical computer vision, virtual reality and robotics, ImageVis. Comput. 13 (4) (1995) 295 – 313.

[18] L. Grimm, J. Zhang, M. Mazurowski, Computational approach to radio-genomics of breast cancer: Luminal A and luminal B molecular subtypesare associated with imaging features on routine breast MRI extracted us-ing computer vision algorithms, J. Magn. Reson. Imaging. 42 (4) (2015)902–907.

[19] H. S. Goldberg, M. D. Paterno, R. W. Grundmeier, B. H. Rocha, J. M.Hoffman, E. Tham, M. Swietlik, M. H. Schaeffer, D. Pabbathi, S. J.Deakyne, et al., Use of a remote clinical decision support service for amulticenter trial to implement prediction rules for children with minorblunt head trauma, Int. J. Med. Informat. 87 (2016) 101–110.

[20] P. Peris-Lopez, A. Orfila, A. Mitrokotsa, J. C. Van der Lubbe, A compre-hensive rfid solution to enhance inpatient medication safety, Int. J. Med.Informat. 80 (1) (2011) 13–24.

[21] R. Takahashi, Y. Kajikawa, Computer-aided diagnosis: A survey withbibliometric analysis, Int. J. Med. Inform. 101 (2017) 58 – 67.

[22] A. Singh, M. K. Dutta, Imperceptible watermarking for security of fundusimages in tele-ophthalmology applications and computer-aided diagnosisof retina diseases, Int. J. Med. Inform. 108 (2017) 110 – 124.

[23] H.-C. Lin, Y.-H. Chiu, Y. J. Chen, Y.-P. Wuang, C.-P. Chen, C.-C. Wang,C.-L. Huang, T.-M. Wu, W.-H. Ho, Continued use of an interactive com-puter game-based visual perception learning system in children with de-velopmental delay, Int. J. Med. Inform. 107 (2017) 76 – 87.

[24] S. Arteaga, J. Chevalier, A. Coile, A. Hill, S. Sali, S. Sudhakhrisnan,S. Kurniawan, Low-cost accelerometry-based posture monitoring systemfor stroke survivors, in: Proc. 10th Int. ACM SIGACCESS Conf. Comput.Access., 2008, pp. 243–244.

[25] C. Shih, M. Chang, C. Shih, A limb action detector enabling people withmultiple disabilities to control environmental stimulation through limbaction with a nintendo wii remote controller, Res. Dev. Disabil. 31 (5)(2010) 1047–1053.

[26] D. Jack, R. Boian, A. Merians, S. Adamovich, M. Tremaine, M. Recce,G. Burdea, H. Poizner, A virtual reality-based exercise program for strokerehabilitation, in: Proc. Int. ACM Conf. Assist. Technol., 2000, pp. 56–63.

[27] C.-C. Chen, C.-Y. Liu, S.-H. Ciou, S.-C. Chen, Y.-L. Chen, Digitizedhand skateboard based on ir-camera for upper limb rehabilitation, Journalof Medical Systems 41 (2) (2017) 36.

[28] C. Bryanton, J. Bosse, M. Brien, J. Mclean, A. McCormick, H. Sveistrup,Feasibility, motivation, and selective motor control: virtual reality com-pared to conventional home exercise in children with cerebral palsy, Cy-berPsychol. Behav. 9 (2) (2006) 123–128.

[29] C. Su, C. Chiang, J. Huang, Kinect-enabled home-based rehabilitationsystem using Dynamic Time Warping and fuzzy logic, in: Applied SoftComputing, Vol. 177, Springer, 2014, pp. 652–666.

[30] K. Wu, Using human skeleton to recognizing human exercise by kinect’scamera, Master, Department of Computer Science and Information Engi-neering, National Taipei University of Technology.

[31] A. Da Gama, T. Chaves, L. Figueiredo, V. Teichrieb, Guidance and move-ment correction based on therapeutics movements for motor rehabilitationsupport systems, in: Proc. IEEE Symp. Virtual Augment. Real. (SVR),2012, pp. 191–200.

[32] Y. Chang, S. Chen, J. Huang, A kinect-based system for physical rehabil-itation: A pilot study for young adults with motor disabilities, Res. Dev.Disabil. 32 (6) (2011) 2566–2570.

[33] C. Chang, et al., Towards pervasive physical rehabilitation using mi-crosoft kinect, in: Proc. IEEE 6th Int. Conf. Pervasive Comput. Technol.Healthc., 2012, pp. 159–162.

[34] T. Exell, C. Freeman, K. Meadmore, M. Kutlu, E. Rogers, A. M. Hughes,E. Hallewell, J. Burridge, Goal orientated stroke rehabilitation utilisingelectrical stimulation, iterative learning and microsoft kinect, in: Proc.IEEE Int. Conf. Rehabil. Robot. (ICORR), 2013, pp. 1–6.

[35] A. Da Gama, P. Fallavollita, V. Teichrieb, N. Navab, Motor rehabilitationusing kinect: A systematic review, Games Health J. 4 (2) (2015) 123–135.

[36] H. M. Hondori, M. Khademi, A review on technical and clinical impactof microsoft kinect on physical therapy and rehabilitation, J. Med. Eng.Technol.

[37] R. Adams, L. Bischof, Seeded region growing, IEEE Trans. Pattern Anal.Mach. Intell. 16 (6) (1994) 641–647.

[38] K. J. Oh, A. Vetro, Y. S. Ho, Depth coding using a boundary reconstruc-tion filter for 3-d video systems, IEEE Trans. Circuits Syst. Video Tech-nol. 21 (3) (2011) 350–359.

10

http://www.vojta.com

[39] M. S. Farid, M. Lucenteforte, M. Grangetto, Panorama view with spa-tiotemporal occlusion compensation for 3d video coding, IEEE Trans.Image Process. 24 (1) (2015) 205–219.

[40] J. Canny, A computational approach to edge detection, IEEE Trans. Pat-tern Anal. Mach. Intell. 8 (6) (1986) 679–698.

[41] A. Goshtasby, S. H. Gage, J. F. Bartholic, A two-stage cross correlationapproach to template matching, IEEE Trans. Pattern Anal. Mach. Intell.6 (3) (1984) 374–378.

[42] D. Lyon, The discrete fourier transform, part 6: Cross-correlation, J. Ob-ject Technol. 9 (2) (2010) 18–22.

[43] J. P. Lewis, Fast normalized cross-correlation, in: Vision interface,Vol. 10, 1995, pp. 120–123.

[44] E. O. Brigham, E. Brigham, The fast Fourier transform and its applica-tions, Vol. 1, prentice Hall Englewood Cliffs, NJ, 1988.

[45] M. S. Farid, A. Mahmood, Image morphing in frequency domain, Journalof Mathematical Imaging and Vision 42 (1) (2012) 50–63.

[46] H. J. Nussbaumer, Fast Fourier transform and convolution algorithms,Vol. 2, Springer Science & Business Media, 2012.

[47] M. H. Khan, K. Shirahama, M. S. Farid, M. Grzegorzek, Multiple hu-man detection in depth images, in: Proc. Int. Workshop Multimed. SignalProcess. (MMSP), IEEE, Motreal, Canada, 2016, p. In Press.

[48] L. Xia, C. Chen, J. Aggarwal, Human detection using depth informa-tion by kinect, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. PatternRecognit. Workshop (CVPRW), 2011, pp. 15–22.

[49] B. Choo, M. Landau, M. DeVore, P. Beling, Statistical analysis-based er-ror models for the microsoft kinecttm depth sensor, Sensors 14 (9) (2014)17430–17450.

[50] M. S. Farid, H. Khan, A. Mahmood, Image inpainting based on pyramids,in: IEEE 10th International Conference on Signal Processing (ICSP),2010, pp. 711–715.

[51] M. Farid, H. Khan, Image inpainting using dynamic weighted kernels, in:Proc. IEEE 3rd Int. Conf. Comput. Sci. Inf. Technol. (ICCSIT), Vol. 8,2010, pp. 252–255.

[52] J. Mairal, M. Elad, G. Sapiro, Sparse representation for color imagerestoration, IEEE Trans. Image Process. 17 (1) (2008) 53–69.

[53] A. Criminisi, P. Perez, K. Toyama, Region filling and object removalby exemplar-based image inpainting, IEEE Trans. Image Process. 13 (9)(2004) 1200–1212.

[54] M. S. Farid, A. Mahmood, M. Grangetto, Image de-fencing frameworkwith hybrid inpainting algorithm, Signal, Image and Video Processing10 (7) (2016) 1193–1201.

[55] P. Viola, M. Jones, Rapid object detection using a boosted cascade ofsimple features, in: Proc. IEEE Comput. Soc. Conf. Comput. Vis. PatternRecognit. (CVPR), 2001, pp. 511–518.

[56] M. H. Khan, M. Grzegorzek, Vojta-Therapy: A Vision-Based Frameworkto Recognize the Movement Patterns, in: Int. J. Software Innovation. 5(3)(2017) 18–32.

[57] J. M. Moguerza, A. Munoz, et al., Support vector machines with applica-tions, Stat. Sci. 21 (3) (2006) 322–336.

[58] J.-Y. Wang, Application of support vector machines in bioinformatics,Ph.D. thesis, National Taiwan University (2002).

[59] L. Wang, Support vector machines: theory and applications, Vol. 177,Springer Science & Business Media, 2005.

[60] M. H. Khan, F. Li, M. S. Farid, M. Grzegorzek, Gait Recognition Us-ing Motion Trajectory Analysis, in: Int. Conf. on Computer RecognitionSystems, Springer, 2017, pp. 73–82.

[61] M. H. Khan, M. S. Farid, M. Grzegorzek, Person Identification UsingSpatiotemporal Motion Characteristics, in: Proc. Int. Conf. Image Pro-cess. (ICIP), IEEE, 2017, pp. 166–170.

[62] G. Guo, C. R. Dyer, Learning from examples in the small sample case:face expression recognition, IEEE Transactions on Systems, Man, andCybernetics, Part B (Cybernetics) 35 (3) (2005) 477–488.

[63] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feed-forward neural networks, in: Proc. of Int. Conf. on Artificial Intelligenceand Statistics, 2010, pp. 249–256.

[64] R. Romero, E. Iglesias, L. Borrajo, A linear-rbf multikernel svm to clas-sify big text corpora, BioMed research international 2015.

[65] B.-C. Kuo, H.-H. Ho, C.-H. Li, C.-C. Hung, J.-S. Taur, A kernel-basedfeature selection method for svm with rbf kernel for hyperspectral imageclassification, IEEE Journal of Selected Topics in Applied Earth Obser-vations and Remote Sensing 7 (1) (2014) 317–326.

[66] G. Camps-Valls, L. Bruzzone, Kernel-based methods for hyperspectralimage classification, IEEE Transactions on Geoscience and Remote Sens-ing 43 (6) (2005) 1351–1362.

[67] L. Bottou, C.-J. Lin, Support vector machine solvers, Large scale kernelmachines 3 (1) (2007) 301–320.

[68] D. P. Bertsekas, Nonlinear programming, Athena scientific Belmont,1999.

[69] C. Chang, C. Lin, LIBSVM: A library for support vector machines, ACMTrans. Intell. Syst. Technol. 2 (2011) 27.

[70] T. Fawcett, An Introduction to ROC Analysis, Pattern Recognit. Lett.27 (8) (2006) 861–874.

[71] M. S. Farid, M. Lucenteforte, M. Grangetto, Dost: a distributed objectsegmentation tool, Multimed. Tools Appl. (2017) 1–24.

[72] D. M. Powers, Evaluation: from precision, recall and f-measure to roc,informedness, markedness and correlation, Journal of Machine LearningTechnologies 2 (1) (2011) 37–63.

[73] M. S. Farid, M. Lucenteforte, M. H. Khan, M. Grangetto, Semi-automaticsegmentation of scattered and distributed objects, in: Int. Conf. on Com-puter Recognition Systems, Springer, 2017, pp. 110–119.

[74] J. K. Udupa, V. R. LaBlanc, H. Schmidt, C. Imielinska, P. K. Saha, G. J.Grevera, Y. Zhuge, L. Currie, P. Molholt, Y. Jin, Methodology for eval-uating image-segmentation algorithms, in: Medical Imaging, 2002, pp.266–277.

11

a computer vision-based system for monitoring vojta therapyfarid/papers/vojta_ijmi_2018.pdf ·...

Documents