theoretical analysis of multi-instance leaning 张敏灵 周志华...
DESCRIPTION
Introduction Origin Multi-instance learning originated from the problem of “ drug activity prediction ”, and was first formalized by T. G. Dietterich et al. in their seminal paper “ Solving the multiple-instance problem with axis-parallel rectangles ” (1997) Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “ multi-instance learning ” to “ multi-part learning ”, and pointed out that many previously studied problems are “ multi-part ” problems rather than “ multi-instance ” ones.TRANSCRIPT
Outline Introduction Theoretical analysis
PAC learning model PAC learnablility of APR Real-valued multi-instance learning
Future work
Introduction Origin
Multi-instance learning originated from the problem of “drug activity prediction”, and was first formalized by T. G. Dietterich et al. in their seminal paper “Solving the multiple-instance problem with axis-parallel rectangles”(1997)
Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “multi-instance learning” to “multi-part learning”, and pointed out that many previously studied problems are “multi-part” problems rather than “multi-instance” ones.
Introduction-cont’d
Comparisons
Fig.1. The shape of a molecule changes as it rotates it’s bonds
Fig.2. Classical and multi-instance learning frameworks
Drug activity prediction problem
Introduction-cont’d Experiment data
Dataset
#dim #bags #posbags
#neg bags
#instan-ces
#instances/bag
max min ave
musk1 166 92 47 45 476 40 2 5.17
musk2 166 102 39 63 6598 1044 1 64.69
APR(Axis-Parallel Rectangles) algorithms
Fig.3. APR algorithms
GFS elim-count APR(standard)
GFS elim-kde APR(outside-in)
Iterated discrim APR(inside-out)
musk1: 92.4%
musk2: 89.2%
Introduction-cont’d Various algorithms
APR (T. G. Dietterich et al.1997) MULTINST (P. Auer 1997) Diverse Density (O. Maron 1998) Bayesian-kNN, Citation-kNN (J. Wang et al. 20
00) Relic (G. Ruffo 2000) EM-DD (Q. Zhang & S. A. Goldman 2001) ……
Introduction-cont’d Comparison on benchmark data sets
Algorithms Musk1(%correct)
Musk2(%correct)
iterated-discrim APR 92.4 89.2
Citation-kNN 92.4 86.3
Diverse Density 88.9 82.5
RELIC 83.7 87.3
MULTINST 76.7 84.0
BP 75.0 67.7
C4.5 68.5 58.8
Fig.4. A comparison of several multi-instance learning algorithm
Introduction-cont’d Application area Drug activity prediction (T. G. Dietterich et al. 1997) Stock prediction (O. Maron 1998) Learn a simple description of a person from a series
of images (O. Maron 1998) Natural scene classification (O. Maron & A. L. Ratan
1998) Event prediction (G. M. Weiss & H. Hirsh 1998) Data mining and computer security (G. Ruffo 2000) …… Multi-instance learning has been regarded as the
fourth machine learning framework parallel to supervised learning, unsupervised learning, and reinforcement learning.
Theoretical analysis PAC learning model
Definition and it’s properties VC dimension
PAC learnability of APR Real-valued multi-instance learning
Theoretical Analysis - PAC model Computational learning theory
L. G. Valiant (1984) A theory of learnable Deductive learning
Used for constructing a mathematical model of a cognitive process.
W PActual
example MCoded
example0/1
Fig.5. Diagram of a framework for learning
PAC model-cont’d Definition of PAC learning We say that a learning algorithm L is a pac(probably
approximately correct) learning algorithm for the hypothesis space H if, given
A confidence parameter δ (0< δ<1); An accuracy parameter ε (0< ε<1);
then there is a positive integer mL = mL (δ,ε) such that For any target concept t ∈H For any probability distribution µ on X
whenever m mL , µm{s ∈ S(m,t) | er µ(L(s) , t)< ε}>1- δ
PAC model-cont’d Properties of a pac learning algorithm
It is probable that a useful training sample is presented. One can only expect that the output hypothesis is
approximately correct. mL depends upon δ and ε, but not on t and µ.
If there is a pac learning algorithm for a hypothesis space H, then we say that H is pac-learnable.
Efficient pac learning algorithm If the running time of a pac learning algorithm L is
polynomial in 1/ δ and 1/ ε, then L is said to be efficient. It is usually necessary to require a pac learning algorithm
to be efficient.
PAC model-cont’d VC dimension
VC (Vapnik-Chervonenkis) dimension of a hypothesis space H is a notion originally defined by Vapnik and Chervonenkis(1971), and was introduced into computational learning theory by Blumer et al.(1986)
VC dimension of a hypothesis space H, denoted by VCdim(H), describes the ‘expressive power’ of H in a sense.Generally, the greater of VCdim(H), the greater ‘expressive power’ of H, so H is more difficult to learn.
PAC model-cont’d Consistency If for any target concept t∈H and any training sample
s=((x1,b1),(x2,b2), . . ., (xm,bm)) for t, the corresponding hypothesis L(s)∈H agrees with s, i.e. L(s)(xi)=t(xi)=bi, then we say that L is a consistent algorithm.
VC dimension and pac learnability
L is a consistent learning algorithm for H
H has finite VC dimensionH is pac-learnable
Theoretical Analysis - PAC learning of APR Early work While T. G. Dietterich et al. have proposed three APR
algorithms for multi-instance learning, P. M. Long & L. Tan (1997) had some theoretical analysis of the pac learnability of APR and showed that if,
Each instance in a bag is draw from a product distribution.
All instance in a bag are drawn independently.2 6
10( log )d n ndO 5 12
220( log )d n ndO
then APR is pac learnable under the multi-instance learning framework with sample complexity and time complexity .
PAC learning of APR-cont’d A hardness result
3 2
2( )d nO
2 2
2( log )d n dO
Via the analysis of VC dimension, P. Auer et al.(1998) gave a much more efficient pac learning algorithm than with sample complexity and time complexity .
More important, they proved that if the instances in a bag are not independent, then learning APR under multi-instance learning framework is as hard as learning DNF formulas, which is a NP-Complete problem.
PAC learning of APR-cont’d A further reduction
A. Blum & A. Kalai (1998) further studied the problem of pac learning APR from multi-instance examples, and proved that
If H is pac learnable from 1-sided (or 2-sided) random classification noise, then H is pac learnable from multi-instance examples.
Via a reduction to the “Statistical Query” model ( M. Kearns 1993), APR is pac learnable from multi-instance examples with sample complexity and with time complexity .
2
2( )d nO
3 2
2( )d nO
PAC learning of APR-cont’d Summary
Sample
complexity
Time
complexityConstrains
Theoretical tools
P. M. Long et al. product distribution,
independent instances
p-concept,
VC dimension
P. Auer et al.
independent
instancesVC dimension
A. Blum et al.
independent instances
statistical query model,
VC dimension
2 6
10( log )d n ndO
5 122
20( log )d n ndO
2 2
2( log )d n dO
3 2
2( )d nO
2
2( )d nO
3 2
2( )d nO
Fig.6. A comparison of three theoretical algorithm
Theoretical Analysis - Real-valued multi-instance learning Real-valued multi-instance learning
It is worthwhile to note that in several applications of the multiple instance problem, the actual predictions desired are real valued. For example, the binding affinity between a molecule and receptor is quantitative, so a real-valued label of binding strength is preferable.
S. Ray & D. Page (2001) showed that the problem of multi-instance regression is NP-Complete, furthermore, D. R. Dooly et al. (2001) showed that learning from real-valued multi-instance examples is as hard as learning DNF.
Nearly at the same time, R. A. Amar et al.(2001) extended the KNN, Citation-kNN and Diverse Density algorithms for real-valued multi-instance learning, they also provided a flexible procedure for generating chemically realistic artificial data sets and studied the performance of these modified algorithms on them.
Future work Further theoretical analysis of multi-instance
learning. Design multi-instance modifications for neural
networks, decision trees, and other popular machine learning algorithms.
Explore more issues which can be translated into multi-instance learning problems.
Design appropriate bag generating methods. ……
Thanks