mosse - hamedkiani.com · mosse kmosse miltrack struck oab(1) semiboost fragtrack our method mean...

1
IEEE 2015 Conference on Computer Vision and Pattern Recognition C ORRELATION F ILTERS WITH L IMITED B OUNDARIES H AMED K IANI ,T ERENCE S IM AND S IMON L UCEY C ORRELATION F ILTERS Boundary Effects: Synthetic training patches * E (h)= 1 2 N X i=1 ||y i - x i ? h|| 2 2 + λ 2 ||h|| 2 2 1- Frequency domain: O (ND log D ) E ( ˆ h)= 1 2 N X i=1 || ˆ y i - diagx i ) > ˆ h|| 2 2 + λ 2 || ˆ h|| 2 2 ( ˆ y denotes the DFT of vector y ) 2- Spatial domain: O (D 3 + ND 2 ) E (h)= 1 2 N X i=1 D X j =1 ||y i (j ) - h > x i τ j ] | {z } circular shift || 2 2 + λ 2 ||h|| 2 2 C ONTRIBUTIONS 1. A new correlation filter objective to drastically reduce the number of synthetic patches. 2. Optimizing the new objective using ADMM with very effi- cient complexity and memory usage. K EY I DEA * T≫D E (h)= 1 2 N X i=1 T X j =1 ||y i (j ) - h > Px i τ j ]|| 2 2 + λ 2 ||h|| 2 2 1. # of training patches: T vs. D (T D ) 2. # of patches affected by circular shift (synthetic): D -1 T vs. D -1 D 3. Complexity: O (D 3 + NDT ) E (h, ˆ g ) = 1 2 N X i=1 || ˆ y i - diagx i ) > ˆ g || 2 2 + λ 2 ||h|| 2 2 s.t. ˆ g = D FP > h (F: D × D discrete Fourier transform matrix) Augmented Lagrangian Lg , h, ˆ ζ ) = 1 2 N X i=1 || ˆ y i - diagx i ) > ˆ g || 2 2 + λ 2 ||h|| 2 2 + ˆ ζ > g - D FP > h)+ μ 2 || ˆ g - D FP > h|| 2 2 Augmented Lagrangian is solved using Alternating Direction Method of Multipliers (ADMM) with a time complexity of O ([N + K ]T log T ) and memory usage of O (T ) . RESULTS (2) MOSSE KMOSSE MILTrack STRUCK OAB(1) SemiBoost FragTrack Our method mean {0.80, 11} {0.84, 12} {0.72, 16} {0.91, 12} {0.53, 31} {0.62, 29} {0.51, 37} {0.97, 8} fps 600 100 25 11 25 25 2 100 R ESULTS (1) Runtime and Convergence Performance 1 100 200 300 400 500 600 0 1 2 3 4 5 6 7 8 x 10 4 Number of training images Time to converge (s) Spatial steepest descent (32x32) Our method (32x32) Spatial steepest descent (64x64) Our method (64x64) 0 500 1000 1500 10 0 10 2 10 4 10 6 10 8 10 10 10 12 # of iteration Objective value Spatial steepest descent Our method Localization Performance MOSSE ASEF UMACE MACE OTF Our method 1 50 100 150 200 250 300 350 400 0 0.5 1 Number of training images Localization rate at d < 0.10 0.05 0.10 0.15 0.20 0 0.5 1 Threshold (fraction of interocular distance) Localization rate Tracking Performance: 100 fps girl clifbar 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 Threshold Precision MOSSE K_MOSSE MILTrack STRUCK FragTrack IVTs Our method 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 Threshold Precision MOSSE K_MOSSE MILTrack STRUCK FragTrack IVTs Our method 100 200 300 400 0 50 100 150 200 Frame # Position Error (pixel) MOSSE K_MOSSE STRUCK FragTrack Our method 50 100 150 200 250 300 0 20 40 60 80 100 120 140 Frame # Position Error (pixel) MOSSE K_MOSSE STRUCK FragTrack Our method REFERENCES [1] D. Bolme, J. Beveridge, B. Draper, and Y. Lui. Visual Object Tracking using Adaptive Correlation Filters CVPR’10 [2] Code and Demo: http://www.hamedkiani.com/cfwlb.html

Upload: others

Post on 22-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • IEEE 2015 Conference on Computer Vision and Pattern

    Recognition Correlation Filters with Limited Boundaries

    Hamed Kiani, Terence Sim and Simon Lucey PAVIS (IIT), SoC (NUS) and RI (CMU)

    Section 1 (sizes): Posters boards are 48” tall and 96” wide, but we recommend you leave a little border since you may not be able to pin at the vertical edge. Since PowerPoint does not let one define such a large paper size, this template is designed to be printed at 200%, yielding a 46” x94” poster. You can scale it up or down a bit (e.g. 42” is a common paper size at FexEd). Note there is no direct international A0.. A1 equivalent. The poster size is approximately three A0 boards next to each other, i.e., each column in this example is about one A0 board.

    Ideally you want to keep it very readable: this is nots your paper, it is a poster. 32pt here (64 final printing) is good for most text: • Sub-bullets are 28 here (56 final)

    – Don’t use smaller than 24pt in this template (which is 48pt in final printing at 200%)

    – Insert plenty of graphics and any math you need

    When inserting graphics or equations, keep the resolution high (remember this will be printed at 200%). If you can see blocking artifacts at 400% magnification in PowerPoint, consider finding better graphics. This is an example of BAD/LOW RES GRAPHICS

    Leave enough margin for pushpin and remember many big plotters cannot get within .5” of the actual paper edge.

    You are free to use colored backgrounds and such but they generally reduce readability.

    You are free to use what ever fonts you like. • San Serif fonts like Arial are more readable from a distance,

    • Serif fonts like times may look more consistent with your mathematics

    Section 2 (layout): Remember the poster session will be crowded so design the poster to be read in columns so people can read what is in front of them and move left to right to get the whole story.

    The poster should use photos, figures, and tables to tell the story of the study. For clarity, present the information in a sequence that is easy to follow.

    There is often way too much text in a poster - there definitely is in this template! Posters primarily are visual presentations; the text should support the graphics. Look critically at the layout. Some poster 'experts' suggest that if there is about 20-25% text, 40-45% graphics and 30-40% empty space, you are doing well.

    Section 3: Include more figures than are in the paper so you can talk to them. Include things that are not in the paper and then encourage them to read the paper. Don’t try to just put all the paper here.

    If it looks like a cut/paste of the paper, people skip that poster since they can read the papers after the conference. Many people find it better to spend time talking with poster presenters that have more to offer than just redoing the paper content paper in big fonts.

    Remember Poster boards look like this.. This is your canvas. Paint us a picture of your work.

    CORRELATION FILTERS WITH LIMITED BOUNDARIESHAMED KIANI, TERENCE SIM AND SIMON LUCEY

    CORRELATION FILTERSBoundary Effects: Synthetic training patches

    ���� ∆��

    IEEE 2015 Conference on Computer Vision and Pattern

    Recognition

    𝑥𝑖 𝑦𝑖 ℎ

    *

    𝑥𝑖 𝑦𝑖

    *

    T ≫ D

    E(h) =1

    2

    N∑i=1

    ||yi − xi ? h||22 +λ

    2||h||22

    1- Frequency domain: O(ND logD)

    E(ĥ) =1

    2

    N∑i=1

    ||ŷi − diag(x̂i)>ĥ||22 +λ

    2||ĥ||22

    (ŷ denotes the DFT of vector y)

    2- Spatial domain: O(D3 +ND2)

    E(h) =1

    2

    N∑i=1

    D∑j=1

    ||yi(j)− h> xi[∆τj ]︸ ︷︷ ︸circular shift

    ||22 +λ

    2||h||22

    CONTRIBUTIONS1. A new correlation filter objective to drastically reduce the

    number of synthetic patches.2. Optimizing the new objective using ADMM with very effi-

    cient complexity and memory usage.

    KEY IDEA

    IEEE 2015 Conference on Computer Vision and Pattern

    Recognition

    𝑥𝑖 𝑦𝑖 ℎ

    *

    𝑥𝑖 𝑦𝑖

    *

    T ≫ D

    E(h) =1

    2

    N∑i=1

    T∑j=1

    ||yi(j)− h>Pxi[∆τj ]||22 +λ

    2||h||22

    𝐏 𝑥 ∆𝜏𝑗

    1. # of training patches: T vs. D (T � D)2. # of patches affected by circular shift (synthetic): D−1T vs.

    D−1D

    3. Complexity: O(D3 +NDT )

    E(h, ĝ) =1

    2

    N∑i=1

    ||ŷi − diag(x̂i)>ĝ||22 +λ

    2||h||22

    s.t. ĝ =√DFP>h (F: D ×D discrete Fourier transform matrix)

    Augmented Lagrangian

    L(ĝ,h, ζ̂) = 12

    N∑i=1

    ||ŷi − diag(x̂i)>ĝ||22 +λ

    2||h||22 + ζ̂>(ĝ −

    √DFP>h) +

    µ

    2||ĝ −

    √DFP>h||22

    Augmented Lagrangian is solved using Alternating Direction Method of Multipliers (ADMM) with a time complexity ofO([N +K]T log T ) and memory usage of O(T ) .

    RESULTS (2)

    MOSSE KMOSSE MILTrack STRUCK OAB(1) SemiBoost FragTrack Our method

    mean {0.80, 11} {0.84, 12} {0.72, 16} {0.91, 12} {0.53, 31} {0.62, 29} {0.51, 37} {0.97, 8}fps 600 100 25 11 25 25 2 100

    RESULTS (1)Runtime and Convergence Performance

    1 100 200 300 400 500 6000

    1

    2

    3

    4

    5

    6

    7

    8x 10

    4

    Number of training images

    Tim

    e t

    o c

    on

    ve

    rge

    (s)

    Spatial steepest descent (32x32)

    Our method (32x32)

    Spatial steepest descent (64x64)

    Our method (64x64)

    0 500 1000 150010

    0

    102

    104

    106

    108

    1010

    1012

    # of iteration

    Obje

    ctive

    va

    lue

    Spatial steepest descent

    Our method

    Localization Performance

    1

    Lo

    ca

    liza

    tio

    n r

    ate

    at

    d <

    0.1

    0

    MOSSE ASEF UMACE MACE OTF Our method

    1 50 100 150 200 250 300 350 4000

    0.5

    1

    Number of training images

    Localiz

    ation r

    ate

    at d <

    0.1

    0

    0.05 0.10 0.15 0.200

    0.5

    1

    Threshold (fraction of interocular distance)

    Localiz

    ation r

    ate

    Tracking Performance: 100 fpsgirl clifbar

    10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    Threshold

    Pre

    cis

    ion

    MOSSE

    K_MOSSE

    MILTrack

    STRUCK

    FragTrack

    IVTs

    Our method

    10 20 30 40 500

    0.2

    0.4

    0.6

    0.8

    1

    Threshold

    Pre

    cis

    ion

    MOSSE

    K_MOSSE

    MILTrack

    STRUCK

    FragTrack

    IVTs

    Our method

    100 200 300 4000

    50

    100

    150

    200

    Frame #

    Positio

    n E

    rror

    (pix

    el)

    MOSSE

    K_MOSSE

    STRUCK

    FragTrack

    Our method

    50 100 150 200 250 3000

    20

    40

    60

    80

    100

    120

    140

    Frame #

    Positio

    n E

    rror

    (pix

    el)

    MOSSE

    K_MOSSE

    STRUCK

    FragTrack

    Our method

    REFERENCES

    [1] D. Bolme, J. Beveridge, B. Draper, and Y. Lui. Visual Object Tracking usingAdaptive Correlation Filters CVPR’10

    [2] Code and Demo: http://www.hamedkiani.com/cfwlb.html