principal sensitivity analysis

28
Principal Sensitivity Analysis Sotetsu Koyamada (Presenter), Masanori Koyama, Ken Nakae, Shin Ishii Graduate School of Informatics, Kyoto University @PAKDD2015 May 20, 2015 Ho Chi Minh City, Viet Nam

Upload: sotetsu-koyamada

Post on 11-Apr-2017

1.106 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Principal Sensitivity Analysis

Principal Sensitivity Analysis�

Sotetsu Koyamada (Presenter), Masanori Koyama, Ken Nakae, Shin Ishii Graduate School of Informatics, Kyoto University

@PAKDD2015

May 20, 2015

Ho Chi Minh City, Viet Nam�

Page 2: Principal Sensitivity Analysis

Table of contents�

��

2�

3�

4�

Sensitivity analysis and PSA �

Results�

Conclusion�

1� Motivation�

Page 3: Principal Sensitivity Analysis

Prediction and Recognition tasks at high accuracy�

Machine learning is awesome�

��

Horikawa et al., 2014�

Taigman et al., 2014�

Machines can carry out the tasks beyond human capability

Deep Learning matches human in the accuracy of face Recognition tasks�

Predicting the dream contents from Brain activities�You can’t do this unless you are Psychic! � �

Page 4: Principal Sensitivity Analysis

How can the machines carry out the tasks beyond our capability?�

��

How can we learn the machine’s “secret” knowledge?�

In the process of training, machines must have learned the knowledge not in our natural scope

Page 5: Principal Sensitivity Analysis

Machine is a black box�

��

Neural Networks, Nonlinear kernel SVM, …�

Input� Classification result�

?�

Page 6: Principal Sensitivity Analysis

Visualizing the knowledge of Linear Model�The knowledge of the linear classifiers like Logistic Regression are expressible in terms of weight parameters w = (w1,…, wd)

Classifier�

Input x = (x1,…, xd)

Classification labels: {0, 1} wi : weight parameter b: bias parameter σ : sigmoid activation function �

Meaning of wi =� Importance of i-th input dimension within machine’s knowledge �

Page 7: Principal Sensitivity Analysis

It is extremely difficult to make sense out of weight parameters in Neural Networks (nonlinear composition of logistic regressions)

Visualizing the knowledge of non Linear Model �

Our proposal�

We shall directly analyze the behavior of f in the input space! �

Meaning of wij(k)

= ??????? �

h: nonlinear activation function�

Page 8: Principal Sensitivity Analysis

Table of contents�

��

2�

3�

4�

Sensitivity analysis and PSA �

Results�

Conclusion�

1� Motivation�

Page 9: Principal Sensitivity Analysis

Sensitivity analysis�

Sensitivity analysis compute the sensitivity of f with respect to i-th input dimension

��

Def. Sensitivity with respect to i-th input dimension�

Note�

Def. Sensitivity map�

Zurada et al., 1994, 97, Kjems et al., 2002�

q: true distribution of x�

In the case of linear model (e.g. logistic regression)�

Page 10: Principal Sensitivity Analysis

c.f.�Sensitivity with respect to i-th input dimension �

PSM: Principal Sensitivity Map�Define the directional sensitivity in the arbitrary direction and seek the the direction to which the machine is most sensitive

���

Def. (1st) Principal Sensitivity Map�

���

Def. Directional sensitivity in the direction v �

Recall�

ei: standard basis of �

Page 11: Principal Sensitivity Analysis

PSA: Principal Sensitivity Analysis�Define the kernel metric K as:

���

Def. (1st) Principal Sensitivity Map (PSM)�

1st PSM is the dominant eigen vector of K! �

When K is covariance matrix, 1st PSM is same as the 1st PC �

Recall� PSA vs PCA�

Page 12: Principal Sensitivity Analysis

PSA: Principal Sensitivity Analysis�Define the kernel metric K as:

���

Def. (1st) Principal Sensitivity Map (PSM)�

1st PSM is the dominant eigen vector of K! �

k-th PSM is the k-th dominant eigen vector of K!

Def. (k-th) Principal Sensitivity Map (PSM)�

When K is covariance matrix, 1st PSM is same as the 1st PC �

Recall�

k-th PSM := analogue of k-th PC�

PSA vs PCA�

Page 13: Principal Sensitivity Analysis

Table of contents�

���

2�

3�

4�

Sensitivity analysis and PSA �

Numerical Experiments�

Conclusion�

1� Motivation�

Page 14: Principal Sensitivity Analysis

Digit classification�

!  Artificial Data�Each pixel have the same meaning�

!  Classifier

–  Neural Network (one hidden layer) –  Error percentage: 0.36% –  We applied the PSA to the log of each output from NN�

���

(b) Noisy samples�

(a) Templates�

c = 0, …, 9 �

c = 0, ….9 �

Page 15: Principal Sensitivity Analysis

Strength of PSA (relatively signed map)�

���

(b) 1st PSMs (proposed)�

(a) (Conventional) sensitivity maps�

Page 16: Principal Sensitivity Analysis

visualize the values of

Strength of PSA (relatively signed map)�

��

(Conventional) sensitivity map cannot distinguish the set of the edges whose presence characterizes the class 1 and The set of the edges whose absence characterizes the class 1 On the other hand, 1st PSM (proposed) can!

(b) 1st PSMs (proposed)�

(a) (Conventional) sensitivity maps�

visualize the values of

Page 17: Principal Sensitivity Analysis

Strength of the PSA (sub PSM) �PSMs of f9 (c = 9)�

What is the meaning of sub PSMs

��

(Same as the previous slide)�

Page 18: Principal Sensitivity Analysis

Strength of the PSA (sub PSM) �PSMs (c = 9)�

���

By definition, globally important knowledge�

Perhaps locally important knowledge?�

Page 19: Principal Sensitivity Analysis

Local Sensitivity �

���

Def. Local sensitivity in the region A�

sA(v) := EA

!

"

∂fc(x)

∂v

#2$

Expectation over the region A�

Page 20: Principal Sensitivity Analysis

Local Sensitivity �

���

Measure of the contribution of k-th PSM in the classification of class c in the subset A�

Def. Local sensitivity in the region A�

sk

A := sA(vk)k-th PSM�

sA(v) := EA

!

"

∂fc(x)

∂v

#2$

Def. Local sensitivity in the direction of k-th PSM�

Page 21: Principal Sensitivity Analysis

Local Sensitivity �

���

Measure of the contribution of k-th PSM in the classification of class c in the subset A�

Def. Local sensitivity in the region A�

c = 9, k = 1, A = A(9,4) := set of all the samples of the classes 9 and 4 SA

k is The contribution of 1st PSM in the classification of 9 in the data containing class 9 and 4 = The contribution of 1st PSM in distinguishing 9 from 4. �

sk

A := sA(vk)k-th PSM�

sA(v) := EA

!

"

∂fc(x)

∂v

#2$

Def. Local sensitivity in the direction of k-th PSM�

Example�

Page 22: Principal Sensitivity Analysis

Strength of the PSA (sub PSM) �

Let’s look at what the knowledge of f9 is doing in distinguishing the pairs of classes (class 9 vs the other class)

���

Local sensitivity of k-th PSM of f9 on the subdata containing class 9 and class c’ ( = A(9, c’) )�

c = 9 �

Page 23: Principal Sensitivity Analysis

Strength of the PSA (sub PSM) �

Let’s look at what the knowledge of f9 is doing in distinguishing the pairs of classes (class 9 vs the other class)

���

Local sensitivity of k-th PSM of f9 on the subdata containing class 9 and class c’�

c = 9 �Example: c = 9, c’ = 4, k = 1�

Recall This indicates the contribution of 1st PSM in distinguishing 9 from 4�

Page 24: Principal Sensitivity Analysis

Strength of the PSA (sub PSM) �

Let’s look at what the knowledge of f9 is doing in distinguishing the pairs of classes (class 9 vs the other class

���

Local sensitivity of k-th PSM of f9 on the subdata containing class 9 and class C’ � 3rd PSM contributes MUCH

more than the 1st PSM in the classification of 9 against 4!!! �c = 9 �

Page 25: Principal Sensitivity Analysis

In fact….! �

���

PSM (c = 9, k = 3)�

We can visually confirm that the 3rd PSM of f9 is indeed the knowledge of the machine that helps (MUCH!) in distinguishing 9 from 4! �

9�4�

Page 26: Principal Sensitivity Analysis

When PSMs are difficult to interpret �

•  PSMs of NN trained from MNIST data in classifying 10 digits •  Each pixel have different meaning •  In order to applying PSA, Data should be registered

��

Page 27: Principal Sensitivity Analysis

Table of contents�

��

2�

3�

4�

Sensitivity analysis and PSA �

Numerical Experiments�

Conclusion�

1� Motivation�

Page 28: Principal Sensitivity Analysis

Conclusion�

���

Can identify the sets of input dimensions that acts oppositely in characterizing the classes�

Sub PSMs provide additional information of the machines (possibly local)�

PSA is different from the original sensitivity analysis in that it identifies the weighted combination of the input dimensions that are essential in the machine’s knowledge�

2 �

1 �

Made possible with the definition of the PSMs that allows negative elements�

Merits of PSA�

Sotetsu Koyamada [email protected]�Thank you�