![Page 2: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/2.jpg)
Outline Introduction Theoretical analysis
PAC learning model PAC learnablility of APR Real-valued multi-instance learning
Future work
![Page 3: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/3.jpg)
Introduction Origin
Multi-instance learning originated from the problem of “drug activity prediction”, and was first formalized by T. G. Dietterich et al. in their seminal paper “Solving the multiple-instance problem with axis-parallel rectangles”(1997)
Later in 2001, J. D. Zuker and Y. Chevaleyre extended the concept of “multi-instance learning” to “multi-part learning”, and pointed out that many previously studied problems are “multi-part” problems rather than “multi-instance” ones.
![Page 4: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/4.jpg)
Introduction-cont’d
Comparisons
Fig.1. The shape of a molecule changes as it rotates it’s bonds
Fig.2. Classical and multi-instance learning frameworks
Drug activity prediction problem
![Page 5: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/5.jpg)
Introduction-cont’d Experiment data
Dataset
#dim #bags #posbags
#neg bags
#instan-ces
#instances/bag
max min ave
musk1 166 92 47 45 476 40 2 5.17
musk2 166 102 39 63 6598 1044 1 64.69
APR(Axis-Parallel Rectangles) algorithms
Fig.3. APR algorithms
GFS elim-count APR(standard)
GFS elim-kde APR(outside-in)
Iterated discrim APR(inside-out)
musk1: 92.4%
musk2: 89.2%
![Page 6: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/6.jpg)
Introduction-cont’d Various algorithms
APR (T. G. Dietterich et al.1997) MULTINST (P. Auer 1997) Diverse Density (O. Maron 1998) Bayesian-kNN, Citation-kNN (J. Wang et al. 20
00) Relic (G. Ruffo 2000) EM-DD (Q. Zhang & S. A. Goldman 2001) ……
![Page 7: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/7.jpg)
Introduction-cont’d Comparison on benchmark data sets
Algorithms Musk1(%correct)
Musk2(%correct)
iterated-discrim APR 92.4 89.2
Citation-kNN 92.4 86.3
Diverse Density 88.9 82.5
RELIC 83.7 87.3
MULTINST 76.7 84.0
BP 75.0 67.7
C4.5 68.5 58.8
Fig.4. A comparison of several multi-instance learning algorithm
![Page 8: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/8.jpg)
Introduction-cont’d Application area Drug activity prediction (T. G. Dietterich et al. 1997) Stock prediction (O. Maron 1998) Learn a simple description of a person from a series
of images (O. Maron 1998) Natural scene classification (O. Maron & A. L. Ratan
1998) Event prediction (G. M. Weiss & H. Hirsh 1998) Data mining and computer security (G. Ruffo 2000) …… Multi-instance learning has been regarded as the
fourth machine learning framework parallel to supervised learning, unsupervised learning, and reinforcement learning.
![Page 9: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/9.jpg)
Theoretical analysis PAC learning model
Definition and it’s properties VC dimension
PAC learnability of APR Real-valued multi-instance learning
![Page 10: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/10.jpg)
Theoretical Analysis - PAC model Computational learning theory
L. G. Valiant (1984) A theory of learnable Deductive learning
Used for constructing a mathematical model of a cognitive process.
W PActual
example MCoded
example0/1
Fig.5. Diagram of a framework for learning
![Page 11: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/11.jpg)
PAC model-cont’d Definition of PAC learning We say that a learning algorithm L is a pac(probably
approximately correct) learning algorithm for the hypothesis space H if, given
A confidence parameter δ (0< δ<1); An accuracy parameter ε (0< ε<1);
then there is a positive integer mL = mL (δ,ε) such that For any target concept t ∈H For any probability distribution µ on X
whenever m mL , µm{s ∈ S(m,t) | er µ(L(s) , t)< ε}>1- δ
![Page 12: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/12.jpg)
PAC model-cont’d Properties of a pac learning algorithm
It is probable that a useful training sample is presented. One can only expect that the output hypothesis is
approximately correct. mL depends upon δ and ε, but not on t and µ.
If there is a pac learning algorithm for a hypothesis space H, then we say that H is pac-learnable.
Efficient pac learning algorithm If the running time of a pac learning algorithm L is
polynomial in 1/ δ and 1/ ε, then L is said to be efficient. It is usually necessary to require a pac learning algorithm
to be efficient.
![Page 13: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/13.jpg)
PAC model-cont’d VC dimension
VC (Vapnik-Chervonenkis) dimension of a hypothesis space H is a notion originally defined by Vapnik and Chervonenkis(1971), and was introduced into computational learning theory by Blumer et al.(1986)
VC dimension of a hypothesis space H, denoted by VCdim(H), describes the ‘expressive power’ of H in a sense.Generally, the greater of VCdim(H), the greater ‘expressive power’ of H, so H is more difficult to learn.
![Page 14: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/14.jpg)
PAC model-cont’d Consistency If for any target concept t∈H and any training sample
s=((x1,b1),(x2,b2), . . ., (xm,bm)) for t, the corresponding hypothesis L(s)∈H agrees with s, i.e. L(s)(xi)=t(xi)=bi, then we say that L is a consistent algorithm.
VC dimension and pac learnability
L is a consistent learning algorithm for H
H has finite VC dimensionH is pac-learnable
![Page 15: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/15.jpg)
Theoretical Analysis - PAC learning of APR Early work While T. G. Dietterich et al. have proposed three APR
algorithms for multi-instance learning, P. M. Long & L. Tan (1997) had some theoretical analysis of the pac learnability of APR and showed that if,
Each instance in a bag is draw from a product distribution.
All instance in a bag are drawn independently.2 6
10( log )d n ndO 5 12
220( log )d n ndO
then APR is pac learnable under the multi-instance learning framework with sample complexity and time complexity .
![Page 16: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/16.jpg)
PAC learning of APR-cont’d A hardness result
3 2
2( )d nO
2 2
2( log )d n dO
Via the analysis of VC dimension, P. Auer et al.(1998) gave a much more efficient pac learning algorithm than with sample complexity and time complexity .
More important, they proved that if the instances in a bag are not independent, then learning APR under multi-instance learning framework is as hard as learning DNF formulas, which is a NP-Complete problem.
![Page 17: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/17.jpg)
PAC learning of APR-cont’d A further reduction
A. Blum & A. Kalai (1998) further studied the problem of pac learning APR from multi-instance examples, and proved that
If H is pac learnable from 1-sided (or 2-sided) random classification noise, then H is pac learnable from multi-instance examples.
Via a reduction to the “Statistical Query” model ( M. Kearns 1993), APR is pac learnable from multi-instance examples with sample complexity and with time complexity .
2
2( )d nO
3 2
2( )d nO
![Page 18: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/18.jpg)
PAC learning of APR-cont’d Summary
Sample
complexity
Time
complexityConstrains
Theoretical tools
P. M. Long et al. product distribution,
independent instances
p-concept,
VC dimension
P. Auer et al.
independent
instancesVC dimension
A. Blum et al.
independent instances
statistical query model,
VC dimension
2 6
10( log )d n ndO
5 122
20( log )d n ndO
2 2
2( log )d n dO
3 2
2( )d nO
2
2( )d nO
3 2
2( )d nO
Fig.6. A comparison of three theoretical algorithm
![Page 19: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/19.jpg)
Theoretical Analysis - Real-valued multi-instance learning Real-valued multi-instance learning
It is worthwhile to note that in several applications of the multiple instance problem, the actual predictions desired are real valued. For example, the binding affinity between a molecule and receptor is quantitative, so a real-valued label of binding strength is preferable.
S. Ray & D. Page (2001) showed that the problem of multi-instance regression is NP-Complete, furthermore, D. R. Dooly et al. (2001) showed that learning from real-valued multi-instance examples is as hard as learning DNF.
Nearly at the same time, R. A. Amar et al.(2001) extended the KNN, Citation-kNN and Diverse Density algorithms for real-valued multi-instance learning, they also provided a flexible procedure for generating chemically realistic artificial data sets and studied the performance of these modified algorithms on them.
![Page 20: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/20.jpg)
Future work Further theoretical analysis of multi-instance
learning. Design multi-instance modifications for neural
networks, decision trees, and other popular machine learning algorithms.
Explore more issues which can be translated into multi-instance learning problems.
Design appropriate bag generating methods. ……
![Page 21: Theoretical Analysis of Multi-Instance Leaning 张敏灵 周志华 南京大学软件新技术国家重点实验室 2002.10.11](https://reader036.vdocuments.site/reader036/viewer/2022082205/5a4d1b807f8b9ab0599bab73/html5/thumbnails/21.jpg)
Thanks