【bmvc2016】recognition of transitional action for short-term action prediction using...

24
Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature Hirokatsu Kataoka, Ph.D. Computer Vision Research Group (CVRG), AIST http://www.hirokatsukataoka.net/ Yudai Miyashita (TDU), Masaki Hayashi (Liquid Inc., Keio Univ.)Kenji Iwata, Yutaka Satoh (AIST)

Upload: hirokatsu-kataoka

Post on 06-Jan-2017

242 views

Category:

Science


1 download

TRANSCRIPT

Page 1: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Hirokatsu Kataoka, Ph.D. Computer Vision Research Group (CVRG), AIST

http://www.hirokatsukataoka.net/

Yudai Miyashita (TDU), Masaki Hayashi (Liquid Inc., Keio Univ.), Kenji Iwata, Yutaka Satoh (AIST)

Page 2: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Related work: Early Action Recognition

•  [Ryoo, ICCV2011]

M. S. Ryoo, “Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos”, International Conference on Computer Vision (ICCV), pp.1036-1043, 2011.

Page 3: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Related work: Action Prediction

•  [Kataoka+, VISAPP2016]

???Daytime (Time Zone)

Walking (Previous Activity)

Sitting (Current Activity)

??? (Next Activity)

xtimezone xprevious xcurrent

θ = “Using a PC”

Given Not givenTime series

H. Kataoka, Y. Aoki, K. Iwata, Y. Satoh, “Activity Prediction using a Space-Time CNN and Bayesian Framework”, in VISAPP, 2016.

Page 4: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Problem of related works

•  Early action recognition

–  Action recognition in an early frame of the action

–  Enough cue is required, so almost equals to action recognition

•  Action prediction

–  Complete future prediction in an unstable situation

Page 5: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Proposal

•  Transitional Action (TA): Action-class while an action is transitive

–  TA contains cue of prediction: Earlier than early action recognition

–  Recognition-like future action prediction: More stable prediction

[Applications] Autonomous driving, active safety and robotics

Δt

【Proposal】 Short-term action prediction recognize “cross” at time t5

【Previous works】 Early action recognition recognize “cross” at time t9

Walk straight (Action)

Cross (Action)

Walk straight – Cross (Transitional action)

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12

Page 6: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Problem settings

Framework Problem Action Recognition

Early Action Recognition

Action Prediction

Transitional Action Recognition

f (F1...tA )→ At

f (F1...t−LA )→ At

f (F1...tA )→ At+L

f (F1...tTA )→ At+L

Page 7: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Difference

Framework Problem Action Recognition

Early Action Recognition

Action Prediction

Transitional Action Recognition

f (F1...tA )→ At

f (F1...t−LA )→ At

f (F1...tA )→ At+L

f (F1...tTA )→ At+L

Walk straight (Action)

Cross (Action)

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12

f (F1...t−LA )→ At

A(cross)The objective action is -  Early action recognition is late response

Page 8: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Difference

Framework Problem Action Recognition

Early Action Recognition

Action Prediction

Transitional Action Recognition

f (F1...tA )→ At

f (F1...t−LA )→ At

f (F1...tA )→ At+L

f (F1...tTA )→ At+L

Walk straight (Action)

Cross (Action)

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12

f (F1...tA )→ At+L

A(cross)The objective action is -  Action prediction is unstable

Page 9: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Difference

Framework Problem Action Recognition

Early Action Recognition

Action Prediction

Transitional Action Recognition

f (F1...tA )→ At

f (F1...t−LA )→ At

f (F1...tA )→ At+L

f (F1...tTA )→ At+L

Walk straight (Action)

Cross (Action)

Walk straight – Cross (Transitional action)

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12

A(cross)The objective action is -  Transitional action recognition is reasonable

f (F1...tTA )→ At+L

Page 10: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Details of transitional action (TA)

•  Annotation for TA

–  TA and normal action (NA) classes are partially overlapped each other

•  Difficulty of TA

–  Temporally mixed between NA and TA

Page 11: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Subtle Motion Descriptor (SMD)

•  A discriminative temporal CNN feature

–  To divide classes between NA and TA

Page 12: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Subtle Motion Descriptor (SMD)

•  Activation feature from VGG-16

–  Fully-connected layer (N = 4,096)

–  Based on pooled time series (PoT) [Ryoo+, CVPR2015]

Page 13: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Subtle Motion Descriptor (SMD)

•  Temporal difference ΔVt is calculated

–  (Frame t) – (Frame t-1)

Page 14: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Subtle Motion Descriptor (SMD)

•  Temporal pooling from ΔV t

–  Plus and minus

–  Zero-around values are pooled (→This is the contribution of SMD)

–  TH is experimentally fixed

Page 15: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Datasets

•  Temporal action datasets

–  NTSEL [Kataoka+, ITSC2015]

•  Walk (NA), cross (NA), bicycle (NA), turn (TA) with human bbox

–  UTKinect-Action [Xia+, CVPRW2012]

•  Ordered 10 NAs (e.g. walk, throw, sit)

•  8 TAs (excluding push/pull; next page)

•  Without human bbox

–  Watch-n-Patch [Wu+, CVPR2015]

•  Daily 10 NAs (e.g. read, turn on monitor, leave office)

•  Top frequent 10 TAs (next page)

•  Without human bbox

Page 16: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Experimental settings (list of TAs)

•  @UTKinect-Action @Watch-n-Patch

Page 17: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Implements

•  Action recognition appraoches

–  Temporal CNN models

•  Pooled Time-series (PoT) [Ryoo+, CVPR2015]

•  CNN accumulation

•  CNN + IDT [Jain+, ECCVW2014]

–  Improved dense trajectories (IDT) and with improved features

•  IDT [Wang+, ICCV2013]

•  IDT + cooccurrence-feature [Kataoka+, ACCV2014]

•  All Features in IDT

Page 18: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Exploration experiment

•  Parameters

–  Frame accumulation

–  Thresholding value TH

–  Layer fc6 vs fc7

Page 19: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Exploration experiment

•  Temporal accumulation [frames]

–  Faster prediction: 3 [frames] (0.1s)

–  Toward state-of-the-art: 10 [frames] (0.33s)

–  Baseline should be 3 and 10 frames accumulation

Page 20: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Exploration experiment

•  Thresholding value

–  Depending on data

Page 21: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Exploration experiment

•  Layer fc6 vs fc7

–  Layer fc6 is better

Page 22: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Results

•  SMD (ours) is state-of-the-art in transitional action recognition

Page 23: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Comparison of PoT

•  Subtle motion is effective for transitional action recognition

–  NTSEL: +2.18%, +8.63%

–  UTKinect: +7.19%, +4.31%

–  Watch-n-Patch: +4.82%, +5.12%

Page 24: 【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

Conclulsion

•  Two contribusions:

1.  Definition of transitional action for short-term action prediction

2.  Subtle Motion Descriptor (SMD) to classify transitional and normal actions