combination of similarity measures for time series classification using genetic algorithms

8
Combination of Similarity Measures for Time Series Classification using Genetic Algorithms Deepti Dohare and V. Susheela Devi Department of Computer Science and Automation Indian Institute of Science, India {deeptidohare, susheela}@csa.iisc.ernet.in Abstract—Time series classification deals with the problem of classification of data that is multivariate in nature. This means that one or more of the attributes is in the form of a sequence. The notion of similarity or distance, used in time series data, is significant and affects the accuracy, time, and space complexity of the classification algorithm. There exist numerous similarity measures for time series data, but each of them has its own disadvantages. Instead of relying upon a single similarity measure, our aim is to find the near optimal solution to the classification problem by combining different similarity measures. In this work, we use genetic algorithms to combine the similarity measures so as to get the best performance. The weightage given to different similarity measures evolves over a number of generations so as to get the best combination. We test our approach on a number of benchmark time series datasets and present promising results. I. I NTRODUCTION Time series data are ubiquitous, as most of the data is in the form of time series, for example, stocks, annual rainfall, blood pressure, etc. In fact, other forms of data can also be meaningfully converted to time series including text, DNA, video, audio, images, etc [1]. It is also evident that there has been a strong interest in applying data mining techniques to time series data. The problem of classification of time series data is an interesting problem in the field of data mining. The need to classify time series data occurs in broad range of real-world applications like medicine, science, finance, entertainment, and industries. In cardiology, ECG signals (an example of time series data) are classified in order to see whether the data comes from a healthy person or from a patient suffering from heart disease [2]. In anomaly detection, users’ system access activities on Unix system are monitored to detect any kind of abnormal behavior [3]. In information retrieval, different documents are classified into different topic categories which has been shown to be similar to time series classification [4]. Another example in this respect is the classification of signals coming either from nuclear explosions or from earthquakes, in order to monitor a nuclear test ban treaty [5]. Generally, a time series t = t 1 , ..., t r , is an ordered set of r data points. Here the data points, t 1 , ..., t r , are typically measured at successive point of time spaced at uniform time intervals. A time series may also carry a class label. The problem of time series classification is to learn a classifier C, which is a function that maps a time series t to a class label l, that is, C(t)= l where l L, the set of class labels. The time series classification methods can be divided into three large categories. The first is the distance based clas- sification method which requires a measure to compute the distance or similarity between pairs of time sequences [6]–[8]. The second is the feature based classification method which transforms each time series data into a feature vector and then applies conventional classification method [9], [10]. The third is the model based classification methods where a model such as Hidden Markov Model (HMM) or any other statistical model is used to classify time series data [11], [12]. In this paper, we consider the distance based classification method where the choice of the similarity measure affects the accuracy, as well as the time and the space complexity of classification algorithms [6]. There exist some similarity measures for time series data, but each of them has their own disadvantages. Some well known similarity measures for time series data are Euclidean distance, Dynamic time warping distance (DTW), Longest Common Subsequence (LCSS) etc. We introduce a similarity based time series classification algo- rithm that uses the concept of genetic algorithms. One nearest neighbor (1NN) classifier has often been found to perform better than any other method for time series classification [7]. Due to the effectiveness and the simplicity of 1NN classifier, we focus on combining different similarity measures into one and use the resultant similarity measure with 1NN classifier. The paper is organized as follows: We present a brief survey of the related work in Section II. We formally define our problem in Section III. In Section IV, we describe the proposed genetic approach for the time series classification. Section V presents the experimental evaluation. Results are shown in Section VI. Finally, we conclude in Section VII. II. RELATED WORK AND MOTIVATION We begin this section with a brief description of the dis- tance based classification method. The distance based method requires a similarity measure or a distance function, which is used with some existing classification algorithms. In the current literature, there are over a dozen distance measures for finding the similarity of time series data. Although many algorithms have been proposed providing a new similarity measure as a subroutine to 1NN classifier, it has been shown

Upload: deepti-dohare

Post on 18-Dec-2014

107 views

Category:

Education


0 download

DESCRIPTION

In this work, we use genetic algorithms to combine the similarity measures so as to get the best performance. The weightage given to different similarity measures evolves over a number of generations so as to get the best combination. We test our approach on a number of benchmark time series datasets and presented promising results.

TRANSCRIPT

Page 1: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

Combination of Similarity Measures for TimeSeries Classification using Genetic Algorithms

Deepti Dohare and V. Susheela DeviDepartment of Computer Science and Automation

Indian Institute of Science, India{deeptidohare, susheela}@csa.iisc.ernet.in

Abstract—Time series classification deals with the problemof classification of data that is multivariate in nature. Thismeans that one or more of the attributes is in the form ofa sequence. The notion of similarity or distance, used in timeseries data, is significant and affects the accuracy, time, andspace complexity of the classification algorithm. There existnumerous similarity measures for time series data, but each ofthem has its own disadvantages. Instead of relying upon a singlesimilarity measure, our aim is to find the near optimal solutionto the classification problem by combining different similaritymeasures. In this work, we use genetic algorithms to combinethe similarity measures so as to get the best performance. Theweightage given to different similarity measures evolves over anumber of generations so as to get the best combination. We testour approach on a number of benchmark time series datasetsand present promising results.

I. INTRODUCTION

Time series data are ubiquitous, as most of the data is inthe form of time series, for example, stocks, annual rainfall,blood pressure, etc. In fact, other forms of data can also bemeaningfully converted to time series including text, DNA,video, audio, images, etc [1]. It is also evident that there hasbeen a strong interest in applying data mining techniques totime series data.

The problem of classification of time series data is aninteresting problem in the field of data mining. The need toclassify time series data occurs in broad range of real-worldapplications like medicine, science, finance, entertainment, andindustries. In cardiology, ECG signals (an example of timeseries data) are classified in order to see whether the datacomes from a healthy person or from a patient suffering fromheart disease [2]. In anomaly detection, users’ system accessactivities on Unix system are monitored to detect any kindof abnormal behavior [3]. In information retrieval, differentdocuments are classified into different topic categories whichhas been shown to be similar to time series classification [4].Another example in this respect is the classification of signalscoming either from nuclear explosions or from earthquakes,in order to monitor a nuclear test ban treaty [5].

Generally, a time series t = t1, ..., tr, is an ordered setof r data points. Here the data points, t1, ..., tr, are typicallymeasured at successive point of time spaced at uniform timeintervals. A time series may also carry a class label. Theproblem of time series classification is to learn a classifier

C, which is a function that maps a time series t to a classlabel l, that is, C(t) = l where l ∈ L, the set of class labels.

The time series classification methods can be divided intothree large categories. The first is the distance based clas-sification method which requires a measure to compute thedistance or similarity between pairs of time sequences [6]–[8].The second is the feature based classification method whichtransforms each time series data into a feature vector andthen applies conventional classification method [9], [10]. Thethird is the model based classification methods where a modelsuch as Hidden Markov Model (HMM) or any other statisticalmodel is used to classify time series data [11], [12].

In this paper, we consider the distance based classificationmethod where the choice of the similarity measure affectsthe accuracy, as well as the time and the space complexityof classification algorithms [6]. There exist some similaritymeasures for time series data, but each of them has theirown disadvantages. Some well known similarity measures fortime series data are Euclidean distance, Dynamic time warpingdistance (DTW), Longest Common Subsequence (LCSS) etc.We introduce a similarity based time series classification algo-rithm that uses the concept of genetic algorithms. One nearestneighbor (1NN) classifier has often been found to performbetter than any other method for time series classification [7].Due to the effectiveness and the simplicity of 1NN classifier,we focus on combining different similarity measures into oneand use the resultant similarity measure with 1NN classifier.

The paper is organized as follows: We present a brief surveyof the related work in Section II. We formally define ourproblem in Section III. In Section IV, we describe the proposedgenetic approach for the time series classification. Section Vpresents the experimental evaluation. Results are shown inSection VI. Finally, we conclude in Section VII.

II. RELATED WORK AND MOTIVATION

We begin this section with a brief description of the dis-tance based classification method. The distance based methodrequires a similarity measure or a distance function, whichis used with some existing classification algorithms. In thecurrent literature, there are over a dozen distance measuresfor finding the similarity of time series data. Although manyalgorithms have been proposed providing a new similaritymeasure as a subroutine to 1NN classifier, it has been shown

Page 2: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

that one nearest neighbor with Euclidean distance (1NN-ED) isvery difficult to beat [7]. However, Euclidean distance also hassome disadvantages, for instance, it is sensitivity to distortionsin time dimension. Dynamic time warping distance (DTW)[13] is proposed to overcome this problem. It allows a timeseries to be “stretched” or “compressed” to provide a bettermatch with another time series. DTW has been shown to bemore accurate than Euclidean distance for small datasets [8].However, on large datasets, the accuracy of DTW convergeswith Euclidean distance [6]. Due to the quadratic complexity,DTW is costly on large datasets. Several lower boundingmeasures have been introduced to speed up similarity searchusing DTW [14]–[16]. Ratanamahatana and Keogh [17] pro-posed a method that dramatically increases the speed of DTWsimilarity search process by using tight lower bounds to prunemany of the calculations and it has been shown that theamortized cost for computing DTW distance on large datasetsis linear. Xi et al. [8] use numerical reduction to speed upDTW computation.

Another technique to describe the similarity is based on theconcept of edit distance for strings. A well known similaritymeasure in this respect is the Longest Common Subsequence(LCSS) distance [18]. The idea behind this measure is to findthe longest common subsequence of two sequences and thedistance is then defined as the length of the subsequence. Athreshold parameter ε, is used such that the two points fromdifferent time series are considered to match if their distanceis less than ε. Another similarity measure is the Edit Distanceon Real sequence (EDR) [19] which is also based on editdistance for strings. It also uses a threshold parameter ε buthere the distance between a pair of points is quantified to 0or 1. EDR assigns penalties to the unmatched segments oftwo time series based on the length of the segments. The EditDistance with Real Penalty (ERP) distance [20] is anothersimilarity measure that combines the merits of DTW and EDR.ERP computes the distance between gaps of two time seriesby using a constant reference point. If the distance betweenthe two points is large, ERP selects the distance betweenthe reference point and one of those points. Lee et al. [21]point out that the disadvantage of the above distance measures(LCSS, EDR, ERP) is that these measures capture the globalsimilarity between two sequences, but not their local similarityduring a short time interval. Other distance measures are:DISSIM [22], Sequence Weighted Alignment model (Swale)[23], Spatial Assembling Distance (SpADe) [24] and similaritysearch based on Threshold Queries (TQuEST) [25] etc.

A. Motivation

Although, most of the newly introduced similarity measureshave been shown to perform well, each of them has its owndisadvantages. Also, the efficiency of a similarity measuredepends critically on the size of the dataset [6]. So, insteadof deciding which is the single best performing similaritymeasure for the classification task on a dataset, we make use ofa number of distance measures and appropriately weigh theirperformance with the help of some kind of heuristic. Motivated

by these considerations, we combine different existing similar-ity measures to find near-optimal solutions using a stochastictechnique. We make use of Genetic Algorithms [26], [27] ,which are popular stochastic algorithms for estimating near-optimal solutions. Although, there is a vast amount of literatureon time series classification and mining, we believe that we aresolving the problem in a novel way. The closest work is thatof [28]. Here, the authors make use an ensemble of multiplekNN classifiers based on different distance functions for textclassification, whereas we are applying genetic algorithmsto combine different similarity measures to achieve a betterclassification accuracy. Another difference is that we are doingthis for time series data where finding a good similaritymeasure is non trivial.

III. PROBLEM DEFINITION

We will now define our problem formally. A time series

t = [t1, t2, . . . , tr]

where t1, t2, . . . , tr are the data points, measured at uniformtime intervals. T is the set of such time series. Let Dtr be atraining set represented as a matrix of size, q × r,

Dtr = [tr1, tr2, . . . , trq]T

where tri ∈ T . In this work, we consider labeled time serieswhere Ltr is a vector of class labels of training set Dtr,

Ltr = [l1, l2, . . . , lq]T

where li ∈ L, L is the set of class labels. The test set is amatrix of size, p× r,

Dtst = [ts1, ts2, . . . , tsp]T

where tsi ∈ T .

Input: A time series dataset partitioned into the training setDtr with class labels Ltr, and the test set Dtst

Output: A classifier C such that C(t) = l where l ∈ L andt ∈ T .

The problem of time series classification is to learn aclassifier C, which is a function C : T → L. Here, we arenot designing a classifier, we are using 1NN classifier whichrequires a similarity measure for time series classification. Asmentioned in Section II, there are different similarity measures,but we might not know which similarity measure is best suitedfor the dataset. Our aim in this work is to combine differentsimilarity measures (s1, s2, . . . , sn) by assigning them someweight based on their performance. A new similarity measure(Snew) is obtained such that

Snew =

n∑i=1

wi · si

where r is the number of similarity measures. The parameterof evaluating the solution in our approach is the accuracy.

Page 3: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

IV. METHODOLOGY

In this section, we give a brief introduction to GeneticAlgorithms (GA) and then explain our proposed method to findthe solution based on it. Most stochastic algorithms operate ona single solution of the problem at hand. Genetic algorithms(often called evolutionary algorithms) operate on populationsof many solutions from the search space. The idea is to evolvethe population of solutions through a number of evolutionarysteps which produce new generations of solutions by usinggenetic operators. Each of the steps is designed so that itimprovises the average fitness of the candidate solutions inthe population with respect to the problem. Fitness is simplythe value of a function which estimates how capable thecandidate solution is of solving the problem. For the problemof classification, a measure of classification accuracy wouldbe an important part of the fitness function. There are threebasic steps in the evolutionary process of a genetic algorithm:• Selection: Some of the fittest solutions survive by having

one or more copies of it being present in the nextgeneration of solutions.

• Crossover: Two fit parent solutions are selected fromgeneration i. A new solution is generated for generationi+ 1 by applying a binary crossover operator to the twoparent solutions. The crossover operator generates thenew solution by copying some pieces from each of theparent solutions. Crossover is applied according to theprobability of crossover.

• Mutation: A small number of new solutions are gener-ated for generation i+ 1 by selecting a fit solution fromgeneration i and applying a mutation operator to it. Themutation operator works by changing some pieces of theselected solution. Mutation is applied according to theprobability of mutation.

The same evolutionary process is then applied to the newgeneration of candidate solutions. Carefully designed geneticalgorithms guide the search into those areas of the search spacethat contain good candidate solutions to the problem at hand.The search stops when the evolutionary process has reacheda maximum number of generations, or when the fitness of thebest solution found so far has reached an appropriate level.

The general idea of the proposed approach is given below:1) Run GA to find w1, w2, . . . , wn

• Set w1, w2, . . . , wn at random, each value beingbetween 0-1 for m strings.

• repeat for Ngen iterations– Use S = w1.s1+w2.s2+ · · ·+wn.sn to classify

validation set. Set fitness as the classificationaccuracy.

– Use selection, crossover and mutation to get anew population of strings.

• Set w1, w2, ..., wn as the values from the string inthe populations giving best fitness.

2) Set Snew = s1.w1 + s2.w2 + ...+ sn.wn

3) Use Snew and 1NN to classify the test data set andmeasure the classification accuracy.

Using genetic algorithms (GA), we can find the best combi-nation of weights for the available similarity measures usingthe validation data. The obtained weights are then used tocombine the available similarity measures to yield a newsimilarity measure, S, which is the summation of the productof obtained weights and the available measures. This newsimilarity measure is then used with one nearest neighbor(1NN) classifier.

A. The Algorithm

The proposed genetic algorithms based approach finds anear optimal solution of the time series classification problem.The algorithm can be used to combine different similaritymeasures where the efficiency of these measures is not knownwith respect to the datasets. Table I summarizes the notationsused in algorithm. The psuedocode of the proposed algorithm

TABLE ISYMBOL TABLE

Notations

Ngen Number of iteration in GA

NextPi Next best fit population matrix for the ith iteration

CAi Classification Accuracy for the ith iteration

Dt Time Series dataset

P Population matrix

m Number of rows of P

n Number of distance function used

Dtr Training Set

Dv Validation Set

Dtst Test Set

L Set of class labels

T Set of time series patterns

pc Predicted class

is given in Fig. 1. It calls the subroutine CLASSIFIER() andNEXTGEN() to compute the next solution from the currentpopulation matrix. The algorithm is described below:• GENETIC APPROACH(): This subroutine returns the

weights of the similarity measures (Fig. 1.). First weinitialize an m × n random matrix (P ) where n is thenumber of similarity measure used and the rows repre-sents the weight combination for similarity measures. Wetake m such rows. We call this matrix as initial populationmatrix (NextP0). The CLASSIFIER() function returnsthe initial fitness vector (CA0) of size (m × 1) whereeach entry is the classification accuracy for each weightcombination, corresponding to each row. Now given aninitial population matrix and initial fitness, we performthe evolution process in line 4. We provide the currentpopulation matrix (NextPi) and current fitness (CAi)

Page 4: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

to the function NEXTGEN() which returns the nextpopulation matrix (NextPi+1) and next fitness vector(CAi+1). The evolution process is run for Ngen times.At the end of the algorithm, the genetic approach willreturn the best combination of weights with maximumfitness.

GENETIC APPROACH(Ngen,m, n)

1: Initialize an m × n P matrix where each element israndomly generated

2: CA0 ← CLASSIFIER(P )3: NextP0 ← P4: for i← 1 to Ngen do5: CAi,NextPi ←NEXTGEN(CAi−1,NextPi−1)6: end for7: return weights with maximum fitness

Fig. 1. Finding the best weight combination of varioussimilarity measures using GA.

• CLASSIFIER(): The accuracy on the validation set forall the rows of P is calculated as shown in Fig. 2. Itpredicts the class label of each validation time seriesobject by using the combined similarity measure (line10). Note that, the combined similarity measure in line8 is obtained by multiplying the elements of P withsimilarities of validation and training object resultingfrom different distance functions. Finally, this subroutinereturns the classification accuracy for all the rows of P .

• NEXTGEN(): The main aim of this subroutine is to applythe genetic operators, selection, crossover and mutationon the current population matrix NextPi based on currentfitness vector CAi to yield the next population matrix(NextPi+1). The CLASSIFIER() subroutine is calledagain to get the next fitness vector (CAi+1). This functionreturns the next fitness and next population matrix.

V. EXPERIMENTAL EVALUATION

We tested our proposed genetic algorithms based approachon various benchmark datasets from the UCR classifica-tion/clustering archive [29]. Table II shows the statistics ofthe datasets used in our experiment.

A. Procedure

• We divide the original training set of the benchmarkdatasets into two sets: the training set and the validationset.

• The training set and the validation set is then providedto the proposed GENETIC APPROACH which gives thebest combination of weights (w1, w2, . . . , wn) for the nsimilarity measures.

• The resultant weights are assigned to the different sim-ilarity measures which are combined to yield the new

CLASSIFIER(P )1: for i← 1 to m do2: for j ← 1 to size(Dv) do3: best so far ← inf4: for k ← 1 to size(Dtr) do5: x←training pattern6: y←Validation pattern7: Compute s1, s2, ..., sn distance function for x

and y8: S[i]← s1∗P [i][1]+s2∗P [i][2]+....+sn∗P [i][n]9: if S[i] < best so far then

10: pc← Train Class labels[k]11: end if12: end for13: if predicted class (pc) is same as the actual class

then14: correct← correct+ 115: end if16: end for17: CA ← (correct/size(Dv)) ∗ 10018: end for19: return CA

Fig. 2. Subroutine CLASSIFIER: Computation of CA forone population matrix P of size m× n .

NEXTGEN(CA,P )

1: P ′′ ← P2: fitness← CA3: Generate P ′ from P ′′ by applying selection{Selection}

4: Generate P from P ′ by applying crossover{Crossover}

5: Select randomly some elements of P and change thevalues.{Mutation}

6: NextP ← P7: NextCA←CLASSIFIER(P )8: return NextCA,NextP

Fig. 3. Subroutine NEXTGEN: Applying Genetic Opera-tors (selection, crossover and mutation) to produce nextpopulation matrix NextP .

similarity measure which is:

Snew = s1.w1 + s2.w2 + · · ·+ sn.wn

• This new similarity measure Snew is then used to classifythe test data using 1NN which gives the final classifica-tion accuracy.

Page 5: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

TABLE IISTATISTICS OF THE DATASETS USED IN OUR

EXPERIMENT

Number Size of Size of Size of Time

Dataset of training validation test series

classes set set set Length

Control Chart 6 180 120 300 60

Coffee 2 18 10 28 286

Beef 5 18 12 30 470

OliveOil 4 18 12 30 570

Lightning-2 2 40 20 61 637

Lightning-7 7 43 27 73 319

Trace 4 62 38 100 275

ECG 2 67 33 100 96

B. Similarity Measures

In order to test the genetic approach empirically, we im-plemented the algorithm using eight distance functions. Giventwo time series:

p = (p1, p2, ..., pn)

q = (q1, q2, ..., qn)

a similarity function s calculates the distance between the twotime series, denoted by s(p, q). The eight similarity measuresused in implementation are:

1) Euclidean Distance (L2 norm): For simple time seriesclassification, Euclidean distance is a widely adoptedoption. The distance from p to q is given by:

s1(p, q) =

√√√√ n∑i=1

(pi − qi)2

2) Manhattan Distance (L1 norm): The distance functionis given by:

s2(p, q) =

n∑i=1

|(pi − qi)|

3) Maximum Norm (L∞ norm): The infinity norm dis-tance is also called Chebyshev distance. The distancefunction is given by:

s3(p, q) = max(|(p1 − q1)|, |(p2 − q2)|, ..., |(pn − qn)|)

4) Mean dissimilarity: Fink and Pratt [30] proposed asimilarity measure between two numbers a and b as :

sim(a, b) = 1− |a− b||a|+ |b|

They define two similarities, mean similarity and rootmean square similarity. We use the above similaritymeasure to define a distance function:

disim(a, b) =|a− b||a|+ |b|

and then define Mean dissimilarity as:

s4(p, q) =1

n.

n∑i=1

disim(pi, qi)

where

disim(pi, qi) =|pi − qi||pi|+ |qi|

5) Root Mean Square Dissimilarity: By using the abovesimilarity measure, we define Root Mean Square Dis-similarity as:

s5(p, q) =

√√√√ 1

n.

n∑i=1

dissim(pi, qi)2

6) Peak Dissimilarity: In addition to above similaritymeasures, Fink and Pratt [30] also define peak similaritybetween two numbers a and b as:

psim(a, b) = 1− |a− b|2.max(|a|, |b|)

and then define peak dissimilarity as

peakdisim(pi, qi) =|pi − qi|

2.max(|pi|, |qi|)

The peak dissimilarity between two time series p and qis given by:

s6(p, q) =1

n.

n∑i=1

peakdisim(pi, qi)

7) Cosine Distance: Cosine similarity is a measure ofsimilarity between two vectors of n dimensions byfinding the cosine of the angle between them. Giventwo time series p and q, the cosine similarity, θ, isrepresented using a dot product and magnitude as:

cos(θ) =p.q

‖p‖‖q‖

and cosine dissimilarity as:

s7(p, q) = 1− cos(θ)

8) Dynamic Time Warping Distance: In order to calculateDTW(p, q) [17], we create a matrix of size |p| × |q|where each element is the squared distance, d(pi, qj) =(pi − qj)

2, between every pair of point in two timeseries. Every possible warping between two time series,is a path W, though the matrix. A warping path W, isa contiguous set of matrix elements that characterizesa mapping between p and q where kth element of W

Page 6: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

is defined as wk = (i, j)k. We want the best path thatminimizes the warping cost:

s8(p, q) = DTW (p, q) = min

{√√√√ K∑k=1

wk/K

where max(|p|, |q|)≤ K < |p| + |q| − 1. This pathcan be found using dynamic programming to evaluatethe following recurrence which defines the cumulativedistance γ(i, j). The recursive function γ(i, j) gives usthe minimum cost path:

γ(i, j) = d(pi, qj) +min{γ(i− 1, j − 1),

γ(i− 1, j),

γ(i, j − 1)}

C. Experiments Conducted

We initialize a 10 × 10 population matrix, P , where eachentry represents weights with random values between 0 to 1.The function CLASSIFIER() combines the above similaritymeasures to yield the resultant measure which is

Si = s1.P (i, 1) + s2.P (i, 2) + ....+ sn.P (i, n)

for ith row of P . We run the evolution process of geneticalgorithm for ten iterations (Ngen = 10) and the solutionobtained in every generation is used to generate the next bettersolution. We used deterministic selection and single pointcrossover in the NEXTGEN() function. After ten iterations,the GENETIC APPROACH() returns the best combinationof weights for the tenth generation. The weights are thencombined to yield a new similarity measure. This similaritymeasure is then used to classify the test data Dtst of the datasetDt. Note that, one may use many other similarity measuresalso as candidate distance measures, for example, elasticmeasures (ERP, EDR, LCSS etc), threshold based measures(eg. TQuEST) or pattern based measures (eg. SpADe) insteadof any of the measure that we have used.

VI. RESULTS

In this section, we will elaborately describe the results.Table III shows the weights obtained after ten iterations for theeight benchmark datasets using the validation set. We can seefrom the table that the highest weight is assigned to the mostefficient similarity measure. The inefficient similarity measurewith respect to others, are simply discarded as the weightsassigned to them is zero or negligible. For example, for theControl Chart dataset, the weight assigned to DTW (s8) ismore than that of Euclidean distance (s1) whereas for theLighting2 dataset, weight assigned to Euclidean distance ismore than the weight given to DTW. Note that, the MaximumNorm (L∞ norm) distance function, which is consideredinefficient as compared to Euclidean Distance and DTW, hashighest weight in case of Coffee dataset. We do not haveto pre-specify a distance measure for a particular dataset, as

TABLE IIIWEIGHTS ASSIGNED TO EACH SIMILARITY MEASURE

AFTER 10 ITERATIONS.

Dataset s1 s2 s3 s4 s5 s6 s7 s8

Control Chart 0.72 0.29 0.33 0.18 0.12 0.61 0.31 0.82

Coffee 0.74 0.9 0.9 0.1 0.03 0.03 0.06 0.70

Beef 0.95 0.09 0 0.48 0 0.62 0.58 0.73

OliveOil 0.7 0 0.79 0 0 0 0.58 0.67

Lightning-2 0.99 0.75 0.79 0.09 0.21 0.09 0.71 0.97

Lightning-7 0.95 .06 0.09 0.81 0.95 0.29 0.38 0.99

Trace 0.62 0.08 0.28 0.39 0.14 0.47 0.23 0.98

ECG 0.052 0 0.21 0 0 0.98 0.90 0

we might not know what kind of similarities exist in thedata. Instead of depending on a single similarity measurebeforehand, we find a weighted combination of differentsimilarity measures for the classification task. This is the mainadvantage of combining similarity measures. Thus, GeneticAlgorithms guide us in estimating near-optimal solutions ofthe time series classification problem.

We find the accuracy with each of the eight similarity mea-sures using one nearest neighbour classifier on all the datasets.Fig. 4. shows the classification accuracy for each similaritymeasure. The weighted combination of the eight similaritymeasures is used with one nearest neighbour classifier toclassify the test set of the benchmark datasets.

The results are shown in Table IV. It compares the resultsobtained from the proposed genetic approach, with 1NN-ED,1NN-DTW and other classifier based on similarity measuresgiven in Section V. In most cases, the classification accuracyobtained from our approach by using the weighted combina-tion of similarity measures, exceeds the classification accuracyobtained using individual similarity measures. Even though, inmany cases, the accuracy obtained by 1NN-DTW matches theaccuracy obtained with our approach, in some cases like ECGand Coffee, our method gives significantly better results.

VII. CONCLUSION

We presented a novel algorithm for time series classificationusing the combination of different similarity measures basedon Genetic Algorithms. Since different similarity measure areput together, we obtain large number of solution sets. Theadvantage of using genetic algorithm is that an inefficientsimilarity measure does not affect the proposed algorithm as itwill be simply discarded in next generated solution. Thus, wecan say that the genetic algorithms based approach, proposedby us, is guaranteed to yield better results. Although, obtainingthe combination of similarity measures may take time, but it

Page 7: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

Fig. 4. Classification Accuracy for different similarity measures on various datasets from the UCR classification/clusteringarchive [29]

TABLE IVCOMPARISON OF CLASSIFICATION ACCURACY USING OUR SIMILARITY MEASURE AND OTHER SIMILARITY

MEASURES.

Dataset Size (using our (1NN-ED) (1NN-L1) (1NN-L∞ (1NN- (1NN- (1NN- (1NN- (Traditional

approach) norm) norm) disim) rootdisim) peakdisim) cosine) 1NN-DTW)

Control Chart 600 99.33% 88% 88% 81.33% 58% 53% 77% 80.67% 99.33%

Coffee 56 89.28% 75% 79.28% 89.28% 75% 75% 75% 53.57% 82.14%

Beef 60 53.34% 53.33% 50% 53.33% 46.67% 50% 46.67% 20% 50%

OliveOil 121 86.67% 86.67% 36.67% 83.33% 63.33% 60% 63.33% 16.67% 86.67%

Lightning-2 121 86.89% 74.2% 52.4% 68.85% 55.75% 50.81% 83.60% 63.93% 85.25%

Lightning-7 143 67.12% 67.53% 24.65% 45.21% 34.24% 28.76% 61.64% 53.42% 72.6%

Trace 200 100% 76% 74% 69% 65% 57% 75% 53% 100%

ECG 200 91% 88% 66% 87% 79% 79% 91% 81% 77%

is only the design time. Once the combination is obtained, itcan be easily used for classifying new patterns.

The implementation of the proposed algorithm has shownthat the results obtained using this approach are considerablybetter.

The future work can be extended in the following directions:• It would be interesting to see whether our approach can

be applied to various other kinds of datasets with little orno modification, for example, streaming datasets.

• The algorithm can be used with any distance basedclassifier. We wish to present results by using otherclassifiers.

• Other similarity measures can also be used in the pro-posed genetic algorithm based approach.

REFERENCES

[1] E. Keogh, “Recent advances in mining time series data,” in KnowledgeDiscovery in Databases: PKDD 2005, ser. Lecture Notes in ComputerScience, A. Jorge, L. Torgo, P. Brazdil, R. Camacho, and J. Gama, Eds.Springer Berlin / Heidelberg, 2005, vol. 3721, pp. 6–6.

[2] L. Wei and E. Keogh, “Semi-supervised time series classification,” inProceedings of the 12th ACM SIGKDD international conference onKnowledge discovery and data mining, ser. KDD ’06. New York,NY, USA: ACM, 2006, pp. 748–753.

[3] T. Lane and C. E. Brodley, “Temporal sequence learning and datareduction for anomaly detection,” ACM Trans. Inf. Syst. Secur., vol. 2,pp. 295–331, August 1999.

[4] F. Sebastiani, “Machine learning in automated text categorization,” ACMComput. Surv., vol. 34, pp. 1–47, March 2002.

[5] R. H. S. Y. Kakizawa and M. Taniguchi, “Discrimination and clusteringfor multivariate time series.” Journal of the American Statistical Asso-ciation, vol. 93, pp. 328–340, 1998.

[6] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh,“Querying and mining of time series data: experimental comparison ofrepresentations and distance measures,” Proc. VLDB Endow., vol. 1, pp.1542–1552, August 2008.

[7] E. Keogh and S. Kasetty, “On the need for time series data miningbenchmarks: A survey and empirical demonstration,” in SIGKDD’02,2002, pp. 102–111.

[8] X. Xi, E. Keogh, C. Shelton, L. Wei, and C. A. Ratanamahatana, “Fasttime series classification using numerosity reduction,” in ICML06, 2006,pp. 1033–1040.

[9] N. Lesh, M. J. Zaki, and M. Ogihara, “Mining features for sequenceclassification,” in Proceedings of the fifth ACM SIGKDD internationalconference on Knowledge discovery and data mining, ser. KDD ’99.New York, NY, USA: ACM, 1999, pp. 342–346.

Page 8: Combination of Similarity Measures for Time Series Classification using Genetic Algorithms

[10] N. A. Chuzhanova, A. J. Jones, and S. Margetts, “Feature selection forgenetic sequence classification.” Bioinformatics, vol. 14, no. 2, pp. 139–143, 1998.

[11] O. Yakhnenko, A. Silvescu, and V. Honavar, “Discriminatively trainedmarkov model for sequence classification,” in Proceedings of the FifthIEEE International Conference on Data Mining, ser. ICDM ’05. Wash-ington, DC, USA: IEEE Computer Society, 2005, pp. 498–505.

[12] D. D. Lewis, “Naive (bayes) at forty: The independence assumption ininformation retrieval,” in Proceedings of the 10th European Conferenceon Machine Learning. London, UK: Springer-Verlag, 1998, pp. 4–15.

[13] E. J. Keogh and M. J. Pazzani, “Scaling up dynamic time warpingfor datamining applications,” in Proceedings of the 6th Int. Conf. onKnowledge Discovery and Data Mining, 2000, pp. 285–289.

[14] E. Keogh, “Exact indexing of dynamic time warping,” in Proceedings ofthe 28th international conference on Very Large Data Bases, ser. VLDB’02. VLDB Endowment, 2002, pp. 406–417.

[15] E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic timewarping,” Knowl. Inf. Syst., vol. 7, pp. 358–386, March 2005.

[16] S.-W. Kim, S. Park, and W. Chu, “An index-based approach for similaritysearch supporting time warping in large sequence databases,” in DataEngineering, 2001. Proceedings. 17th International Conference on,2001, pp. 607 –614.

[17] C. A. Ratanamahatana and E. Keogh, “Making time-series classificationmore accurate using learned constraints,” in SDM 04: SIAM Interna-tional Conference on Data Mining, 2008.

[18] D. Gunopulos, G. Kollios, and M. Vlachos, “Discovering similar mul-tidimensional trajectories,” in 18th International Conference on DataEngineering, 2002, pp. 673–684.

[19] L. Chen, M. T. Ozsu, and V. Oria, “Robust and fast similarity search formoving object trajectories,” in Proceedings of the 2005 ACM SIGMODinternational conference on Management of data, ser. SIGMOD ’05.New York, NY, USA: ACM, 2005, pp. 491–502.

[20] L. Chen and R. Ng, “On the marriage of lp-norms and edit distance,”in Proceedings of the Thirtieth international conference on Very largedata bases - Volume 30, ser. VLDB ’04. VLDB Endowment, 2004,pp. 792–803.

[21] J.-G. Lee, J. Han, and K.-Y. Whang, “Trajectory clustering: a partition-and-group framework,” in Proceedings of the 2007 ACM SIGMODinternational conference on Management of data, ser. SIGMOD ’07.New York, NY, USA: ACM, 2007, pp. 593–604.

[22] E. Frentzos, K. Gratsias, Y. Theodoridis, E. Frentzos, K. Gratsias, andY. Theodoridis, “Index-based most similar trajectory search,” 2006.

[23] M. D. Morse and J. M. Patel, “An efficient and accurate method forevaluating time series similarity,” in Proceedings of the 2007 ACM SIG-MOD international conference on Management of data, ser. SIGMOD’07, New York, NY, USA, 2007, pp. 569–580.

[24] Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. H. Tung, “Spade: Onshape-based pattern detection in streaming time series,” Data Engineer-ing, International Conference on, vol. 0, pp. 786–795, 2007.

[25] J. Afalg, H.-P. Kriegel, P. Krger, P. Kunath, A. Pryakhin, and M. Renz,“Similarity search on time series based on threshold queries,” in Ad-vances in Database Technology - EDBT 2006, ser. Lecture Notes inComputer Science. Springer Berlin / Heidelberg, 2006, vol. 3896, pp.276–294.

[26] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Ma-chine Learning, 1st ed. Boston, MA, USA: Addison-Wesley LongmanPublishing Co., Inc., 1989.

[27] S. M. Thede, “An introduction to genetic algorithms,” J. Comput. SmallColl., vol. 20, pp. 115–123, October 2004.

[28] T. Yamada, K. Yamashita, N. Ishii, and K. Iwata, “Text classificationby combining different distance functions with weights,” SoftwareEngineering, Artificial Intelligence, Networking and Parallel/DistributedComputing, International Conference on and Self-Assembling WirelessNetworks, International Workshop on, vol. 0, pp. 85–90, 2006.

[29] E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana,The UCR Time Series Classification/Clustering Homepage,http://www.cs.ucr.edu/∼eamonn/time series data/, 2006.

[30] E. Fink and K. B. Pratt, “Indexing of compressed time series,” in DataMining in Time Series Databases. World Scientific, 2004, pp. 51–78.