combined classification and channel/basis selection with l1-l2 regularization with application to...
Post on 14-Dec-2015
229 Views
Preview:
TRANSCRIPT
Combined classification and channel/basis selection withL1-L2 regularization with application to P300 speller
system
Ryota Tomioka & Stefan HaufeTokyo Tech / TU Berlin / Fraunhofer FIRST
P300 speller system
EvokedResponse
Farwell & Donchin 1988
P300 speller systemA B C D E FG H I J K LM N O P Q RS T U V W XY Z 1 2 3 45 6 7 8 9 _
A B C D E FG H I J K LM N O P Q RS T U V W XY Z 1 2 3 45 6 7 8 9 _
ER detected!
ER detected!
The character must be “P”
Common approach
Feature extraction
P300 detection
Decoding
e.g., ICA or channel selection
e.g., Binary SVM classifier
e.g., Compare the detector outputs
EEG signal
Feature vector
Detector outpus(6 cols& 6rows)
Decoded character(36 class)
?
?
Lots of intemediate goals!!
Our approach
e.g., ICA or channel selection
e.g., Binary SVM classifier
Compare the detector outputs
Decoding
EEG signal
Decoded character(36 class)
P300 detection
Feature extraction
Define a “detector” fW(X)
Our approach
minimize L(W) + lW(W)
Data-fit Regularization
Regularized empirical risk minimization:
Decoding
EEG signal
Decoded character(36 class)
P300 detection
Feature extraction
Detect P300
Extract structure
Learning the decoding model
• Suppose that we have a detector fw(X) that detects the P300 response in signal X.
f1 f2 f3 f4 f5 f6
f7
f8
f9
f10
f11
f12
This is nothing but learning 2 x 6-class classifier
How we do this
12 2 8 1 3 4 11 9 5 6 10 7 …
Multinomial likelihood f. Multinomial likelihood f.
-log PW(col | Xi) -log PW(row | Xi)+Si=1
nL(w) =
…
( )
Detector
fW(X) =<W, X>
X#samples
#cha
nnel
s
W#samples
#cha
nnel
s
L1-L2 regularization
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
16
W#samples
#cha
nnel
s
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
16
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
16
(1) Channel selection (linear sum of row norms)
(2) Time sample selection(linear sum of col norms)
(3) Component selection(linear sum of component norms)
The method
minimize L(W) + lW(W)
2 x 6-class multinomial loss L1-L2 regularization
Nonlinear convex optimization with second order cone constraint
Results - BCI competition III dataset II [Albany](1) Channel selection regularizer
l=5.46Subject A:99% (97%)72% (72%)
Subject B:93% (96%)80% (75%)
(Rakotomamonjy & Gigue)
15 repetitions5 repetitions
Results- BCI competition III dataset II [Albany](2) Time sample selection regularizer
l=5.46Subject A:98% (97%) 70% (72%)
Subject B:94% (96%)81% (75%)
(Rakotomamonjy & Gigue)
15 repetitions5 repetitions
Results- BCI competition III dataset II [Albany](3) Component selection regularizer
15 repetitions5 repetitions
l=100Subject A:98% (97%) 70% (72%)
Subject B:94% (96%)82% (75%)
(Rakotomamonjy & Gigue)
Filters(1) Channel selection regularizer
(2) Time sample selection regularizer
(3) Component selection regularizer
Summary
• Unified feature extraction and classifier learning– L1-L2 regularization
• Use decoding model to learn the classifier– 2x 6-class multinomial model
• Solve the problem in a convex regularized empirical risk minimization problem– Nonlinear second-order cone problem(efficient subgradient based optimization routine will
be made available soon!)
top related