fast global alignment kernels copy - new york universitymohri/aml15/thanos_rodrigo_fast...6*24=144...

Fast Global Alignment Kernels

05.05.2015 Rodrigo Frassetto Nogueira Thanos Papadopoulos

Time Series

Agenda• Motivation

• Dynamic Time Warping (DTW)

• Global Alignment (GA) Kernels

• Experiments

• Conclusion


• Time series Introduction

• Problem Formulation



• Experiments

• Conclusion

Example #1• Stock Price Forecasting   Apple Inc (9.2007 - 8.2012)

Example #2• Caltrans Performance Measurement System (PeMS)

• occupancy rate in San Francisco bay area freeways - [0,1]

• 963 sensors (different car lanes)

• data every 10’

• time series per day: dimension - 963, length - 6*24=144

Example #3• Speech signals


• Time series Introduction

• Problem Formulation



• Experiments

• Conclusion

Problem Formulation• Goal: Find adequate kernels for time series

• handle variable length

• be positive definite

• low computational cost

How we start?

Similarity Measure: K(x, y) = h�(x),�(y)i




• Experiments

• Conclusion

Dynamic Time Warping• pairwise comparisons?

• alignment:  associate each element of sequence X to one or more elements of sequence Y and vice-versa

DTW Cost Matrix

Optimal Alignment

• warping functions:

DTW definition⇡1(i),⇡2(j)


• moves:

DTW definition

!, ",%

⇡1(i),⇡2(j)


• moves:

• cost per alignment π:   

• optimal alignment:

DTW definition

!, ",%

⇡1(i),⇡2(j)

DTW (x, y) = min⇡2A(n,m)

D

x,y

(⇡)

D

x,y

(⇡) =

|⇡|X

i=1

'(x⇡1(i), y⇡2(i))

Why not DTW?• not PDS

• high computational cost, O(dnm)



• Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost

• Experiments

• Conclusion

GA kernels• soft maximum motivation, :    

• by defining :    

• whole spectrum of costs/alignments

kGA

=X

⇡2A(n,m)

e�Dx,y(⇡) =X

⇡2A(n,m)

e�P|⇡|

i=1 '(x⇡1(i),y⇡2(i))

kGA =X

⇡2A(n,m)

|⇡|Y

i=1

(x⇡1(i), y⇡2(i))

= e�'

log(X

i

e

xi)




• Experiments

• Conclusion

Diagonal Dominance• off-diagonal entries far smaller than trace (Gram

matrix)

• Solution:

= e��'(x⇡1(i),y⇡2(i)), � > 0

Behavior in the limits of λ• Let be a sample of time series    

1. All samples have same length n

When

When

{X1, X2, ..., Xp}

� ! 1, kGA ! Ip

� ! 0, kGA ! D(n, n) · 1p,p

k

GA

(x, y) =X

⇡2A(n,m)

|⇡|Y

i=1

e

��·'(x⇡1(i),y⇡2(i))

Delannoy numbers

D(n,m) = D(n,m� 1) +D(n� 1,m) +D(n� 1,m� 1)

Behavior in the limits of λ• Let be a sample of time series    

2. Samples have different lengths

When

When

{X1, X2, ..., Xp}

� ! 1, kGA ! Ip

k

GA

(x, y) =X

⇡2A(n,m)

|⇡|Y

i=1

e

��·'(x⇡1(i),y⇡2(i))

� ! 0, kGA(Xi, Xj) ! D(|Xi|, |Xj |)

Diagonal Dominance• problem arises when length varies and λ tends to 0

• n sequences with length 1 to n       

• Empirically: for � ⇡ 0 ) 12 n

m 2




• Experiments

• Conclusion

Positive Definiteness• if κ is p.d. what about ?

• mapping kernels:   κ is a local kernel on substructures of x, y  M is a mapping set

kGA

k(x, y) =X

(xi,yi)2M(x,y)

l

(xi

, y

i

)

GA kernels as Mapping kernels

• if κ is p.d. what about ?

• For GA kernels:  

• Theorem 1: is p.d. if and only if M is transitive  

• Lemma 1: is not transitive

kGA

MGA(x, y) = {(x⇡1 , y⇡2)|⇡ = (⇡1,⇡2) 2 A(n,m)}

l(x⇡1 , y⇡2) =

|⇡|Y

i=1

(x⇡1(i), y⇡2(i))

kGA

MGA

(xi, yi) 2 M(x, y), (yi, zi) 2 M(y, z) ) (xi, zi) 2 M(x, z)

we don’t

know

Constraints on κ• Theorem 2: If κ is p.d. and is p.d., is p.d.

• Geometric Divisibility (g.d.):

• mapping

• inverse mapping

• is g.d. if is p.d.

• Lemma 2: If κ is g.d. then is p.d.

⌧

�1(x) =x

1� x : [0, 1) ! R+

⌧(x) =x

1 + x: R+ ! [0, 1)

f ⌧f

1 + kGA

kGA

Constraints on κ• Infinite divisibility (i.d.): is i.d. iff is n.d.

• Lemma 3: For any i.d. kernel s.t. ,   is g.d. and i.d.

• how to construct a local kernel?

• Example: Gaussian kernel, , is i.d., thus is i.d. and g.d.  Also, is n.d. and we set  

�log()

0 < < 1⌧�1

� ⌧�1(�2)

' = �log(⌧�1(�2)) = e��'




• Experiments

• Conclusion

Time Complexity• DTW time complexity: O(dnm)

• What to do?

• Ignore “bad” alignments - keep alignments close to the diagonal:

D

�

x,y

(⇡) =

|⇡|X

i=1

�

⇡1(i),⇡2(i) · '(x⇡1(i), y⇡2(i))

�i,j =

⇢1, |i� j| < T1, |i� j| � T

Mi,j = (xi, yj)(Mi�1,j�1 +Mi,j�1 +Mi�1,j)

(2T � 1) ·min(n,m)� T (T � 1)/2 ) O(2Tmin(n,m))

Triangular GA Kernels• if κ is infinite divisible, and ω is a p.d. kernel in N,

then as a local kernel gives a p.d. GA kernel

• common choice for ω:   

• Example:

⌧�1(! ⌦ )

!(i, j) =

✓1� |i� j|

T

◆

+

⌧

�1(! ⌦ 12�)(i, x; j, y) =

!(i, j)(x, y)

2� !(i, j)(x, y)




• Experiments

• Conclusion

Kernels to compare• DTW:

• DTW SC: 

• DTAK: 

• GA:

• TGA:

kDTW = e� 1tDTW

kSC = e� 1tDTWSC

DTW

SC

(x, y) = min⇡2A(n,m)

D

�

x,y

(⇡)

�kDTAK(x, y)

� 1t

kDTAK(x, y) = max⇡2A(n,m)

|⇡|X

i=1

�(x⇡1(i), y⇡2(i))

= e�log�⌧

�1( 12�)�

= ⌧�1(! ⌦ 12�)

Datasets

Classification error rates

Performance and speed




• Experiments

• Conclusion

Conclusion

AlignmentGlobalFast Kernels

O(T ·min(n,m))

wide spectrum of alignments

DTW based

PDS variable size

fast global alignment kernels copy - new york universitymohri/aml15/thanos_rodrigo_fast...6*24=144...

Documents