fast global alignment kernels copy - new york universitymohri/aml15/thanos_rodrigo_fast...6*24=144...

42
Fast Global Alignment Kernels 05.05.2015 Rodrigo Frassetto Nogueira Thanos Papadopoulos Time Series

Upload: others

Post on 10-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Fast Global Alignment Kernels

    05.05.2015 Rodrigo Frassetto Nogueira Thanos Papadopoulos

    Time Series

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels

    • Experiments

    • Conclusion

  • Agenda• Motivation

    • Time series Introduction

    • Problem Formulation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels

    • Experiments

    • Conclusion

  • Example #1• Stock Price Forecasting 

Apple Inc (9.2007 - 8.2012)

  • Example #2• Caltrans Performance Measurement System (PeMS)

    • occupancy rate in San Francisco bay area freeways - [0,1]

    • 963 sensors (different car lanes)

    • data every 10’

    • time series per day: dimension - 963, length - 6*24=144

  • Example #3• Speech signals

  • Agenda• Motivation

    • Time series Introduction

    • Problem Formulation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels

    • Experiments

    • Conclusion

  • Problem Formulation• Goal: Find adequate kernels for time series

    • handle variable length

    • be positive definite

    • low computational cost

  • How we start?

    Similarity Measure: K(x, y) = h�(x),�(y)i

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels

    • Experiments

    • Conclusion

  • Dynamic Time Warping• pairwise comparisons?

    • alignment: 
associate each element of sequence X to one or more elements of sequence Y and vice-versa

  • DTW Cost Matrix

  • Optimal Alignment

  • • warping functions:

    DTW definition⇡1(i),⇡2(j)

  • • warping functions:

    • moves:

    DTW definition

    !, ",%

    ⇡1(i),⇡2(j)

  • • warping functions:

    • moves:

    • cost per alignment π: 



    • optimal alignment:

    DTW definition

    !, ",%

    ⇡1(i),⇡2(j)

    DTW (x, y) = min⇡2A(n,m)

    D

    x,y

    (⇡)

    D

    x,y

    (⇡) =

    |⇡|X

    i=1

    '(x⇡1(i), y⇡2(i))

  • Why not DTW?• not PDS

    • high computational cost, O(dnm)

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost

    • Experiments

    • Conclusion

  • GA kernels• soft maximum motivation, : 




    • by defining : 




    • whole spectrum of costs/alignments

    kGA

    =X

    ⇡2A(n,m)

    e�Dx,y(⇡) =X

    ⇡2A(n,m)

    e�P|⇡|

    i=1 '(x⇡1(i),y⇡2(i))

    kGA =X

    ⇡2A(n,m)

    |⇡|Y

    i=1

    (x⇡1(i), y⇡2(i))

    = e�'

    log(X

    i

    e

    xi)

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost

    • Experiments

    • Conclusion

  • Diagonal Dominance• off-diagonal entries far smaller than trace (Gram

    matrix)

    • Solution:

    = e��'(x⇡1(i),y⇡2(i)), � > 0

  • Behavior in the limits of λ• Let be a sample of time series 




    1. All samples have same length n

    When

    When

    {X1, X2, ..., Xp}

    � ! 1, kGA ! Ip

    � ! 0, kGA ! D(n, n) · 1p,p

    k

    GA

    (x, y) =X

    ⇡2A(n,m)

    |⇡|Y

    i=1

    e

    ��·'(x⇡1(i),y⇡2(i))

  • Delannoy numbers

    D(n,m) = D(n,m� 1) +D(n� 1,m) +D(n� 1,m� 1)

  • Behavior in the limits of λ• Let be a sample of time series 




    2. Samples have different lengths

    When

    When

    {X1, X2, ..., Xp}

    � ! 1, kGA ! Ip

    k

    GA

    (x, y) =X

    ⇡2A(n,m)

    |⇡|Y

    i=1

    e

    ��·'(x⇡1(i),y⇡2(i))

    � ! 0, kGA(Xi, Xj) ! D(|Xi|, |Xj |)

  • Diagonal Dominance• problem arises when length varies and λ tends to 0

    • n sequences with length 1 to n 







    • Empirically: for � ⇡ 0 ) 12 n

    m 2

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost

    • Experiments

    • Conclusion

  • Positive Definiteness• if κ is p.d. what about ?

    • mapping kernels:


κ is a local kernel on substructures of x, y 
M is a mapping set

    kGA

    k(x, y) =X

    (xi,yi)2M(x,y)

    l

    (xi

    , y

    i

    )

  • GA kernels as Mapping kernels

    • if κ is p.d. what about ?

    • For GA kernels:



    • Theorem 1: is p.d. if and only if M is transitive



    • Lemma 1: is not transitive

    kGA

    MGA(x, y) = {(x⇡1 , y⇡2)|⇡ = (⇡1,⇡2) 2 A(n,m)}

    l(x⇡1 , y⇡2) =

    |⇡|Y

    i=1

    (x⇡1(i), y⇡2(i))

    kGA

    MGA

    (xi, yi) 2 M(x, y), (yi, zi) 2 M(y, z) ) (xi, zi) 2 M(x, z)

    we don’t

    know

  • Constraints on κ• Theorem 2: If κ is p.d. and is p.d., is p.d.

    • Geometric Divisibility (g.d.):

    • mapping

    • inverse mapping

    • is g.d. if is p.d.

    • Lemma 2: If κ is g.d. then is p.d.

    �1(x) =x

    1� x : [0, 1) ! R+

    ⌧(x) =x

    1 + x: R+ ! [0, 1)

    f ⌧f

    1 + kGA

    kGA

  • Constraints on κ• Infinite divisibility (i.d.): is i.d. iff is n.d.

    • Lemma 3: For any i.d. kernel s.t. , 
 is g.d. and i.d.

    • how to construct a local kernel?

    • Example: Gaussian kernel, , is i.d., thus is i.d. and g.d. 
Also, is n.d. and we set 


    �log()

    0 < < 1⌧�1

    � ⌧�1(�2)

    ' = �log(⌧�1(�2)) = e��'

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost

    • Experiments

    • Conclusion

  • Time Complexity• DTW time complexity: O(dnm)

    • What to do?

    • Ignore “bad” alignments - keep alignments close to the diagonal:

    D

    x,y

    (⇡) =

    |⇡|X

    i=1

    ⇡1(i),⇡2(i) · '(x⇡1(i), y⇡2(i))

    �i,j =

    ⇢1, |i� j| < T1, |i� j| � T

  • Mi,j = (xi, yj)(Mi�1,j�1 +Mi,j�1 +Mi�1,j)

    (2T � 1) ·min(n,m)� T (T � 1)/2 ) O(2Tmin(n,m))

  • Triangular GA Kernels• if κ is infinite divisible, and ω is a p.d. kernel in N,

    then as a local kernel gives a p.d. GA kernel

    • common choice for ω:




    • Example:

    ⌧�1(! ⌦ )

    !(i, j) =

    ✓1� |i� j|

    T

    +

    �1(! ⌦ 12�)(i, x; j, y) =

    !(i, j)(x, y)

    2� !(i, j)(x, y)

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels

    • Experiments

    • Conclusion

  • Kernels to compare• DTW:

    • DTW SC:


    • DTAK:


    • GA:

    • TGA:

    kDTW = e� 1tDTW

    kSC = e� 1tDTWSC

    DTW

    SC

    (x, y) = min⇡2A(n,m)

    D

    x,y

    (⇡)

    �kDTAK(x, y)

    � 1t

    kDTAK(x, y) = max⇡2A(n,m)

    |⇡|X

    i=1

    �(x⇡1(i), y⇡2(i))

    = e�log�⌧

    �1( 12�)�

    = ⌧�1(! ⌦ 12�)

  • Datasets

  • Classification error rates

  • Performance and speed

  • Agenda• Motivation

    • Dynamic Time Warping (DTW)

    • Global Alignment (GA) Kernels

    • Experiments

    • Conclusion

  • Conclusion

    AlignmentGlobalFast Kernels

    O(T ·min(n,m))

    wide spectrum of alignments

    DTW based

    PDS variable size