theoretical analysis of multi-instance leaning 张敏灵周志华...

Theoretical Analysis of Multi-Instance Leaning

张敏灵周志华 [email protected]

南京大学软件新技术国家重点实验室2002.10.11

Outline Introduction Theoretical analysis

PAC learning model PAC learnablility of APR Real-valued multi-instance learning

Future work

Introduction Origin

Multi-instance learning originated from the problem of “drug activity prediction”, and was first formalized by T. G. Dietterich et al. in their seminal paper “Solving the multiple-instance problem with axis-parallel rectangles”(1997)

Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “multi-instance learning” to “multi-part learning”, and pointed out that many previously studied problems are “multi-part” problems rather than “multi-instance” ones.

Introduction-cont’d

Comparisons

Fig.1. The shape of a molecule changes as it rotates it’s bonds

Fig.2. Classical and multi-instance learning frameworks

Drug activity prediction problem

Introduction-cont’d Experiment data

Dataset

#dim #bags #posbags

#neg bags

#instan-ces

#instances/bag

max min ave

musk1 166 92 47 45 476 40 2 5.17

musk2 166 102 39 63 6598 1044 1 64.69

APR(Axis-Parallel Rectangles) algorithms

Fig.3. APR algorithms

GFS elim-count APR(standard)

GFS elim-kde APR(outside-in)

Iterated discrim APR(inside-out)

musk1: 92.4%

musk2: 89.2%

Introduction-cont’d Various algorithms

APR (T. G. Dietterich et al.1997) MULTINST (P. Auer 1997) Diverse Density (O. Maron 1998) Bayesian-kNN, Citation-kNN (J. Wang et al. 20

00) Relic (G. Ruffo 2000) EM-DD (Q. Zhang & S. A. Goldman 2001) ……

Introduction-cont’d Comparison on benchmark data sets

Algorithms Musk1(%correct)

Musk2(%correct)

iterated-discrim APR 92.4 89.2

Citation-kNN 92.4 86.3

Diverse Density 88.9 82.5

RELIC 83.7 87.3

MULTINST 76.7 84.0

BP 75.0 67.7

C4.5 68.5 58.8

Fig.4. A comparison of several multi-instance learning algorithm

Introduction-cont’d Application area Drug activity prediction (T. G. Dietterich et al. 1997) Stock prediction (O. Maron 1998) Learn a simple description of a person from a series

of images (O. Maron 1998) Natural scene classification (O. Maron & A. L. Ratan

1998) Event prediction (G. M. Weiss & H. Hirsh 1998) Data mining and computer security (G. Ruffo 2000) …… Multi-instance learning has been regarded as the

fourth machine learning framework parallel to supervised learning, unsupervised learning, and reinforcement learning.

Theoretical analysis PAC learning model

Definition and it’s properties VC dimension

PAC learnability of APR Real-valued multi-instance learning

Theoretical Analysis － PAC model Computational learning theory

L. G. Valiant (1984) A theory of learnable Deductive learning

Used for constructing a mathematical model of a cognitive process.

W PActual

example MCoded

example0/1

Fig.5. Diagram of a framework for learning

PAC model-cont’d Definition of PAC learning We say that a learning algorithm L is a pac(probably

approximately correct) learning algorithm for the hypothesis space H if, given

A confidence parameter δ (0< δ<1); An accuracy parameter ε (0< ε<1);

then there is a positive integer mL ＝ mL (δ,ε) such that For any target concept t ∈H For any probability distribution µ on X

whenever m mL , µm{s ∈ S(m,t) | er µ(L(s) , t)< ε}>1- δ

PAC model-cont’d Properties of a pac learning algorithm

It is probable that a useful training sample is presented. One can only expect that the output hypothesis is

approximately correct. mL depends upon δ and ε, but not on t and µ.

If there is a pac learning algorithm for a hypothesis space H, then we say that H is pac-learnable.

Efficient pac learning algorithm If the running time of a pac learning algorithm L is

polynomial in 1/ δ and 1/ ε, then L is said to be efficient. It is usually necessary to require a pac learning algorithm

to be efficient.

PAC model-cont’d VC dimension

VC (Vapnik-Chervonenkis) dimension of a hypothesis space H is a notion originally defined by Vapnik and Chervonenkis(1971), and was introduced into computational learning theory by Blumer et al.(1986)

VC dimension of a hypothesis space H, denoted by VCdim(H), describes the ‘expressive power’ of H in a sense.Generally, the greater of VCdim(H), the greater ‘expressive power’ of H, so H is more difficult to learn.

PAC model-cont’d Consistency If for any target concept t∈H and any training sample

s=((x1,b1),(x2,b2), . . ., (xm,bm)) for t, the corresponding hypothesis L(s)∈H agrees with s, i.e. L(s)(xi)=t(xi)=bi, then we say that L is a consistent algorithm.

VC dimension and pac learnability

L is a consistent learning algorithm for H

H has finite VC dimensionH is pac-learnable

Theoretical Analysis － PAC learning of APR Early work While T. G. Dietterich et al. have proposed three APR

algorithms for multi-instance learning, P. M. Long & L. Tan (1997) had some theoretical analysis of the pac learnability of APR and showed that if,

Each instance in a bag is draw from a product distribution.

All instance in a bag are drawn independently.2 6

10( log )d n ndO 5 12

220( log )d n ndO

then APR is pac learnable under the multi-instance learning framework with sample complexity and time complexity .

PAC learning of APR-cont’d A hardness result

3 2

2( )d nO

2 2

2( log )d n dO

Via the analysis of VC dimension, P. Auer et al.(1998) gave a much more efficient pac learning algorithm than with sample complexity and time complexity .

More important, they proved that if the instances in a bag are not independent, then learning APR under multi-instance learning framework is as hard as learning DNF formulas, which is a NP-Complete problem.

PAC learning of APR-cont’d A further reduction

A. Blum & A. Kalai (1998) further studied the problem of pac learning APR from multi-instance examples, and proved that

If H is pac learnable from 1-sided (or 2-sided) random classification noise, then H is pac learnable from multi-instance examples.

Via a reduction to the “Statistical Query” model ( M. Kearns 1993), APR is pac learnable from multi-instance examples with sample complexity and with time complexity .

2

2( )d nO

3 2

2( )d nO

PAC learning of APR-cont’d Summary

Sample

complexity

Time

complexityConstrains

Theoretical tools

P. M. Long et al. product distribution,

independent instances

p-concept,

VC dimension

P. Auer et al.

independent

instancesVC dimension

A. Blum et al.

independent instances

statistical query model,

VC dimension

2 6

10( log )d n ndO

5 122

20( log )d n ndO

2 2

2( log )d n dO

3 2

2( )d nO

2

2( )d nO

3 2

2( )d nO

Fig.6. A comparison of three theoretical algorithm

Theoretical Analysis － Real-valued multi-instance learning Real-valued multi-instance learning

It is worthwhile to note that in several applications of the multiple instance problem, the actual predictions desired are real valued. For example, the binding affinity between a molecule and receptor is quantitative, so a real-valued label of binding strength is preferable.

S. Ray & D. Page (2001) showed that the problem of multi-instance regression is NP-Complete, furthermore, D. R. Dooly et al. (2001) showed that learning from real-valued multi-instance examples is as hard as learning DNF.

Nearly at the same time, R. A. Amar et al.(2001) extended the KNN, Citation-kNN and Diverse Density algorithms for real-valued multi-instance learning, they also provided a flexible procedure for generating chemically realistic artificial data sets and studied the performance of these modified algorithms on them.

Future work Further theoretical analysis of multi-instance

learning. Design multi-instance modifications for neural

networks, decision trees, and other popular machine learning algorithms.

Explore more issues which can be translated into multi-instance learning problems.

Design appropriate bag generating methods. ……

Thanks

theoretical analysis of multi-instance leaning 张敏灵 周志华...

Documents

theoretical analysis of multi-instance leaning 张敏灵周志华...