Download - Online Multi-Object Tracking with Dual Matching Attention

Online Multi-Object Tracking with Dual Matching Attention Networks

Ji Zhu, Hua Yang, Shanghai Jiao Tong UniversityNian Liu, Northwestern Polytechnical University

Minyoung Kim, Massachusetts Institute of TechnologyWenjun Zhang, Shanghai Jiao Tong University

Ming-Hsuan Yang, University of California, MercedECCV 2018

Pipeline

• STEP 1: Apply single object tracker to keep tracking each target;• STEP 2: If tracking result becomes unreliable, suspend the tracker;• STEP 3: Do data association between lost targets and detections;• STEP 4: Update results.

Xu Gao, Peking University 2

Pipeline



Single Object Tracking

• Baseline Method. “ECO: Efficient Convolution Operators for Tracking.” 2017 CVPR.

• 𝑥 = {(𝑥%)',… , (𝑥*)'} is a feature map with D feature channels extracted from an image patch.

• Aim to learn a multi-channel convolution filter 𝑓 = {𝑓%,… , 𝑓*}.

• 𝐸 𝑓 = ∑ 𝛼0||𝑆3 𝑥0 𝑡 − 𝑦0 𝑡 ||789 + ∑ ||𝑤(𝑡)𝑓< 𝑡 ||78

9 *<>%

?0>% .

• Where 𝑆3 𝑥0 𝑡 = 𝑓 ∗ 𝑃'𝑥0, 𝑃 is a 𝐷×𝐶 matrix. 𝑦0 𝑡 is the desired confidence map. 𝑀 is the number of training samples.

• ||𝑔 𝑡 ||789 = %

' ∫ |𝑔(𝑡)|9𝑑𝑡'I .

Desired Confidence Map

Score Map Predicted by ECO


Cost-Sensitive Tracking Loss

• Drawback of ECO: As shown in the figure, the center of the object next to the target also gets high confidence score.

• Analysis: The center of the object next to the target also gets high confidence score. Hence, these negative samples should be penalized more heavily to prevent the tracker from drifting.

• 𝐸 𝑓 = ∑ 𝛼0||𝑞(𝑡)(𝑆3 𝑥0 𝑡 − 𝑦0 𝑡 )||789 + ∑ ||𝑤(𝑡)𝑓< 𝑡 ||78

9 *<>%

?0>% .

• Where 𝑞 𝑡 = | KL MN O PQN ORSMT|KL MN O PQN O |

|9.

Desired Confidence Map

Score Map Predicted by ECO


Pipeline



Preparation for Data Association

• When the tracking process becomes unreliable, suspend the tracker and set the target to a lost state.

• 𝑠𝑡𝑎𝑡𝑒 = X𝑡𝑟𝑎𝑐𝑘𝑒𝑑, 𝑖𝑓𝑠 > 𝜏_𝑎𝑛𝑑𝑜RbSc > 𝜏d𝑙𝑜𝑠𝑡, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

• 𝑠 is the tracking score (the highest value in the confidence map);• 𝑜RbSc is the mean value of the maximum IoU between the tracked target 𝑡g

and the detections 𝐷g at frame each frame 𝑙.• 𝑜RbSc > 𝜏d is used since a false alarm detection is prone to be consistently

tracked with high confidence.• I think the set of 𝑜RbSc need to be reconsidered.


Pipeline



Data Association with DMAN

• Data association between lost trajectories and candidate detections.• Candidate detections are detections that surrounding the predicted location

which are not covered by any tracked target.• The predicted location are predicted from the lost trajectory with linear

motion model.• Dual Matching Attention Networks (DMAN) with spatial and temporal

attention.


Pipeline of DMAN


Spatial Attention Network (SAN)

• Intuition: pay more attention to common local patterns of the two feature maps.

• Matching Layer: Compute the cosine similarity between each 𝑥hi and 𝑥0

j. 𝑆h0 = (𝑥hi)'𝑥0j, 𝑥h ∈

ℝm.

• 𝑆 = (𝑥i)'𝑥j, 𝑆 ∈ ℝn×n,𝑁 = 𝐻×𝑊.

• Reshape 𝑆 ∈ ℝn×n into𝑋Ki ∈ ℝs×t×n.

• Reshape 𝑆' ∈ ℝn×n into𝑋Kj ∈ ℝs×t×n.

• Training Loss: Identification Loss and verification Loss.


Temporal Attention Network (TAN)

• Intuition: The tracklet may contain noisy observations, hence average pooling is unreliable.• Training Strategy: First train the SAN on randomly

generated image pairs, and fixed. Then train the TAN with extracted features as input.• Reason of the Strategy: The sequence of each id has

large redundancies to generate image pair, hence it is easy to overfit.• MOT 16 is used for training.


Pipeline



Datasets

• MOT 16: 14 sequences, including 7 for training and 7 for testing.• MOT 17: Same video sequences as MOT 16 but with 3 detections

(DPM, Faster-RCNN, SDP)


Visualization of the Spatial and Temporal Attention

Positive

Negative


More Visualization Results


Experiment


Ablation Study


Conclusion

• Integrate the merits of single object tracking and data association methods in a unified online MOT framework.• + Combine with single object tracking results.• + Spatial attention network seems to be useful.• - Results are not the best.• - Not too much innovation.


Download - Online Multi-Object Tracking with Dual Matching Attention

Top Related