seminaire ercim

44
Imprecise and uncertain labelling a solution based on Mixture model and belief functions. Etienne Cˆ ome [email protected] Institut National de Recherche sur les Transport et leurs S´ ecurit´ e Directeur : Patrice Aknin, Encadrant : Latifa Oukhellou Universit´ e de Technologie de Compi` egne Directeur : Thierry Denœux 19 juin 2008 EtienneCˆome (INRETS,UTC) ERCIM Workshop 19 juin 2008 1 / 44

Upload: ticien

Post on 29-Jun-2015

3.711 views

Category:

Education


4 download

TRANSCRIPT

Page 1: Seminaire Ercim

Imprecise and uncertain labellinga solution based on Mixture model and belief functions.

Etienne [email protected]

Institut National de Recherche sur les Transport et leurs SecuriteDirecteur : Patrice Aknin, Encadrant : Latifa Oukhellou

Universite de Technologie de CompiegneDirecteur : Thierry Denœux

19 juin 2008

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 1 / 44

Page 2: Seminaire Ercim

Introduction

Introduction

Constatation

Supervised / Unsupervised frameworks→ do not correspond with situations encountered in applicationsIntermediary situations, Examples :

I few labelled samples and a lot of unlabelled samples

I difficult labelling (experts), can’t assign samples to unique class

I label noise, errors in the labels

I ...

Aim of the study

Building a method to deal with imprecise and uncertain labels, based onmixture models

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 2 / 44

Page 3: Seminaire Ercim

Introduction

Soft labels examples

Set of classes : Y = {c1, ..., cK}

mi : bba pli : plausibility

Unsupervised(vacuous labels)

mi (Y) = 1 plik = 1, ∀k

Supervised(perfect labels)

mi (ck) = 1 plik = 1, plik ′ = 0, ∀k ′ 6= k

Partially supervised(imprecise labels)

mi (C ) = 1

plik = 1, if ck ∈ C

plik = 0, if ck /∈ C

soft supervision(imprecise,uncertain)

mi ? plik ∈ [0, 1]

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 3 / 44

Page 4: Seminaire Ercim

Plan

1 Introduction

2 Mixture Model

3 The different settings and their classical solutionsSupervised and unsupervised learningSemi-supervised learningPartially supervised learning, imprecise labelling

4 Learning with soft supervision, imprecise and uncertain labelsAvailable solutions ?Generative setting in the belief function frameworkThe criterionOptimisation

5 ExperimentationsExperimentation (1) : from unsupervised to supervised learningExperiment (2) : Label noise

6 Conclusion

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 4 / 44

Page 5: Seminaire Ercim

Plan

Mixture model

Graphical model :

label sampling :

p(Y = ck) = πk , ∀k ∈ {1, . . . ,K}. (1)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 5 / 44

Page 6: Seminaire Ercim

Plan

Mixture model

Graphical model :

knowing the class, the observable variables are drawn :

p(x|Y = ck) = f (x; θk), ∀k ∈ {1, . . . ,K}. (2)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 6 / 44

Page 7: Seminaire Ercim

Mixture Model

Example : mixture of Gaussian (clustering)

0-5 -2 2 50

0.1

0.2

0.3

x

f(x)

Fig.: Example of probability density function of a gaussian mixture

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 7 / 44

Page 8: Seminaire Ercim

Mixture Model

Example : Weibull mixture

0 2 4 6 8 100

0.5

1

x

F(x)

Fig.: Exemple of cumulative density function of a weibull mixture

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 8 / 44

Page 9: Seminaire Ercim

The different settings and their classical solutions Supervised and unsupervised learning

Supervised learning with mixture models

Data

learning set : Xs = {xi , yi}Ni=1,

I measurements xi ∈ X ,

I labels yi ∈ Y = {c1, ..., cK},

Generative setting

parameters Ψ = (π,θ1, . . . ,θK ), log-Likelihood :

L(Ψ; Xs) =M∑i=1

K∑k=1

zik log (πk f (xi ; θk)) , (3)

with zi ∈ {0, 1}K binary variables encoding the class membership :zik = 1 if yi = ck , and zik = 0 otherwise.Analytical solutions if f is the density of a classical law.

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 9 / 44

Page 10: Seminaire Ercim

The different settings and their classical solutions Supervised and unsupervised learning

Unsupervised learning, clustering

Data

learning set : Xns = {xi}Ni=1

I measurements xi ∈ X ,

Goal, clustering

Find a good model of p(x)with a latent discrete variable encoding the class membership y

p(x) =∑y

p(y)p(x|y)

using bayes :

p(y |x) =p(y)p(x|y)∑y p(y)p(x|y)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 10 / 44

Page 11: Seminaire Ercim

The different settings and their classical solutions Supervised and unsupervised learning

Unsupervised learning, with mixture model

Mixture model

p(x) =K∑

k=1

πk fk(x,θk) (4)

L(Ψ; Xns) =N∑

i=1

ln(K∑

k=1

πk fk(x,θk)) (5)

fk(.,θk) parametric density over X .Parameters : Ψ = (π,θ).

Optimisation !

! non-convexeClassical solution : EM algorithm

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 11 / 44

Page 12: Seminaire Ercim

The different settings and their classical solutions Supervised and unsupervised learning

EM algorithm

Log-likelihood :

latent variable z : p(x) = p(x ,z)p(z|x)

log likelihood decomposition :

L(Ψ; Xns) =N∑

i=1

K∑k=1

t(q)ik ln (πk f (xi ; θk))︸ ︷︷ ︸Q(Ψ,Ψ(q))

−N∑

i=1

K∑k=1

t(q)ik ln (tik)︸ ︷︷ ︸

H(Ψ,Ψ(q))

, (6)

with :

t(q)ik = EΨ(q) [zik |xi ] =

π(q)k f (xi ; θ

(q)k )∑K

k ′=1 π(q)k ′ f (xi ; θ

(q)k ′ )

. (7)

! H propertie

H(Ψ(q),Ψ(q))− H(Ψ(q+1),Ψ(q)) >= 0

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 12 / 44

Page 13: Seminaire Ercim

The different settings and their classical solutions Supervised and unsupervised learning

EM algorithm

⇒ maximisation of Q(Ψ,Ψ(q)) sufficient to increase the likelihood.

EM algorithm

Initialisation :Ψ(0)

until convergence

I Expectation : computation of t(q)ik

I Maximisation : Ψ(q+1) = arg maxΨ Q(Ψ,Ψ(q))

analytical solution for the proportions : π(q+1)k = 1

N

∑Ni=1 t

(q)ik .

analytical solution for θk for classical laws∑outside of the logarithm.

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 13 / 44

Page 14: Seminaire Ercim

The different settings and their classical solutions Semi-supervised learning

Semi-supervised learning

Data

A mix of labelled and unlabelled samples

Xss = {(x1, y1), . . . , (xM , yM), xM+1, . . . , xN}

for unlabelled samples be usefull hypothesis must be made :

I model of p(x , y)→ generative methods(Mixture model [Nigam 00])

I decision boundary in law density area→ discriminative methods(Data dependant regularization [Bengio 05])

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 14 / 44

Page 15: Seminaire Ercim

The different settings and their classical solutions Semi-supervised learning

Semi-supervised learning with mixture models

Generative setting

log-Likelihood :

L(Ψ,Xss) =M∑i=1

K∑k=1

zik ln(πk f (xi ; θk)) +N∑

i=M+1

ln(K∑

k=1

πk f (xi ; θk)), (8)

Optimisation

Modification of the EM algorithm :

I During the E step posterior probabilities are computed only forunlabelled samples.

I During the M step the known labels are used instead of the posteriorprobabilities for the labelled samples.

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 15 / 44

Page 16: Seminaire Ercim

The different settings and their classical solutions Partially supervised learning

Partially supervised learning

Data

each label is a set of possible classes :

Xps = {(xi ,Ci )}Ni=1

where Ci ⊆ Y is the set of possible classes for sample i , (can be Y)

⇒ more general :semi-supervisised learning is a special case of partially supervised learning.

Solutions in the literature :I Self consistent logistic regression [Grandvallet 02]

I Mixture model [Ambroise 00]

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 16 / 44

Page 17: Seminaire Ercim

The different settings and their classical solutions Partially supervised learning

Partially supervised learning with mixture models

Generative setting

log-Likelihood :

L(Ψ,Xps) =N∑

i=1

ln

(K∑

k=1

likπk f (xi ,θk)

), (9)

with li ∈ {0, 1}K binary variables encoding the possibles classes :lik = 1 if ck ∈ Ci , and lik = 0 otherwise.

Optimisation

Modification of the EM algorithm :During the E step the posterior probabilities are computed using :

t(q)ik =

likπ(q)k f (xi ; θ

(q)k )∑K

k ′=1 likπ(q)k ′ f (xi ; θ

(q)k ′ )

. (10)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 17 / 44

Page 18: Seminaire Ercim

Learning with soft supervision Available solutions ?

Learning with soft supervision

Data

labels = bba mi over Y :

Xsd = {(xi ,mi )}Ni=1

⇒ all other settings are special case of this one depending on kind of bbaused

existent solutions in the belief functions frameworkI k-nn [Denœux 95]

I belief decision tree [Elouedi 01]

I ...

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 18 / 44

Page 19: Seminaire Ercim

Learning with soft supervision Generative setting in the belief function framework

Learning with soft supervision,mixture model and generative setting

[Vannoorenberghe 05]

Previous work : attempt to deal with soft label in mixture model,modification of an EM algorithm

ProblemsI No criterion

I Convergence ?

I Relationship with probabilistic formulation unclear

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 19 / 44

Page 20: Seminaire Ercim

Learning with soft supervision The criterion

The criterion

Starting point :

Relation between likelihood // possibility // plausibilityNatural criterion : parameters Ψ with biggest plausibility | observations

Ψ = arg maxΨ

plΨ(Ψ|Xsd)

plΨ(Ψ|Xsd) = plX1×...×XN (x1, . . . , xN |Ψ) , (GBT)

=∏N

i=1 plXi (xi |Ψ) , (Ind. cond.)

Development of plXi (xi |Ψ) :

plXi (xi |Ψ) =∑

C⊆Y mYi (C |Ψ) plXi |Yi (xi |C ,Ψ) , (Th. pl. totale)

Need to calculate : mYi (C |Ψ) and plXi |Yi (xi |C ,Ψ)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 20 / 44

Page 21: Seminaire Ercim

Learning with soft supervision The criterion

The criterion

mYi (C |Ψ), bba over classes

2 belief functions over Yi : π and mi

Assumptions : labels are independent from parameters⇒ conjunctive combination! π is a bayesian bba⇒ after combination focal sets = singletons :

mYi (ck |Ψ) = (mYii ∩©πYi )(ck) , ∀k ∈ {1, ...,K}

=∑

C∩ck 6=∅mYii (C ).πYi (ck)

= plik .πk

bba over Y :

mYi (ck |Ψ) = plik .πk (11)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 21 / 44

Page 22: Seminaire Ercim

Learning with soft supervision The criterion

The criterion

plXi |Yi (·|ck ,Ψ), plausibility of observations | class

From the model, plXi |Yi (·|ck ,Ψ) plausibility measure associated withf (xi ; θk)

Problem

⇒ Plausibility of a point with infinite precision is null.

Solution

Punctual observations ?Always a finite precision → considers a small region around the pointApproximation, product of function with an infinitesimal dxi1 . . . dxip :

plXi |Yi (xi |ck ,Ψ) = f (xi ; θk)dxi1 . . . dxip. (12)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 22 / 44

Page 23: Seminaire Ercim

Learning with soft supervision The criterion

The criterion

Using all of this

plXi (xi |Ψ) =

(K∑

k=1

plikπk f (xi ; θk)

)dxi1 . . . dxip. (13)

Final criterion : generalized log-likelihood

L(Ψ,Xsd) =N∑

i=1

ln

(K∑

k=1

plikπk f (xi ,θk)

)+ Cst. (14)

RemarkI natural expression ;

I uniquely depends of singletons plausibilities ;

I extends all the probabilist criterions (semi supervised, partiallysupervised ,...).

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 23 / 44

Page 24: Seminaire Ercim

Learning with soft supervision Optimisation

Optimisation

Information available on samples classes

at iteration q information on class labels of all samples comes from threesources :

1 label mYi ;

2 proportions π(q) (bayesian bba over Y) ;

3 observation xi and current parameters estimates θ(q) which lead to abba over Y by the GBT : plYi |Xi ({ck}|xi ,Ψ) = f (xi ; θk)dxi1 . . . dxip.

tik : conjunctive combination of the three sources

t(q)ik =

plikπk f (xi ; θ(q))∑K

k ′=1 plik ′πk ′f (xi ; θ(q))

,

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 24 / 44

Page 25: Seminaire Ercim

Learning with soft supervision Optimisation

Optimisation

log likelihood decomposition :

Classical as in EM :

L(Ψ; Xns) =N∑

i=1

K∑k=1

t(q)ik ln (plikπk f (xi ; θk))︸ ︷︷ ︸

Q(Ψ,Ψ(q))

−N∑

i=1

K∑k=1

t(q)ik ln (tik)︸ ︷︷ ︸

H(Ψ,Ψ(q))

,

with : t(q)ik = plikπk f (xi ;θ

(q))PKk′=1 plik′πk′ f (xi ;θ

(q))

Increasing

H same form as in classical EM.⇒ Optimisation of Q sufficient.⇒ Same algorithm except for the E step where the tik are computed usingthe soft labels

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 25 / 44

Page 26: Seminaire Ercim

Learning with soft supervision Optimisation

Optimisation

An EM algorithm

Initialisation :Ψ(0)

While :

I Expectation : t(q)ik = plikπk f (xi ;θ

(q))PKk′=1 plik′πk′ f (xi ;θ

(q)),

I Maximisation : Ψ(q+1) = arg maxΨ Q(Ψ,Ψ(q))

Analytical solution for the proportions : π(q+1)k = 1

N

∑Ni=1 t

(q)ik .

Analytical solution for the θk if classical laws.

Convergence

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 26 / 44

Page 27: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Experimentation (1) : from unsupervised to supervisedlearning

from unsupervised to supervised learning

Simulation of different sets of labels with varying label precisionMore precise labels :

I Better results ?

I Simpler optimisation problem ?

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 27 / 44

Page 28: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Information measure : non specificity

Average non specificity µs of learning set labels :

NS(mi ) =∑C⊆Y

mYi (C )log(card(C ))

between :

0 → ln(card(Y))perfect labels → vacuous labels

supervised learning → unsupervised learning

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 28 / 44

Page 29: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Labels simulations :I for each sample i a non-specificity NS(mi ) is drawn,

with a uniform law in [µs − 0.05, µs + 0.05]

I a plausibility pli with NS(mi ) as non-specificity is build.I each plausibility value is assigned to a class with the following rules :

I the true class has the biggest plausibilityI random assignation for the other classes

⇒ No error in the labelling(Assumption removed in the second experiment)

Data simulation

Mixture of 2 Gaussian : Σ1 = Σ1 = I , π1 = π2 = 0.5, dimension 10 ;Distance between the two centres δ ∈ {1, 2, 4} ;Learning set size N = 1000.

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 29 / 44

Page 30: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Influence of labelling on results quality, δ = 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

30

35

40

45

50

Em

piric

al c

lass

ifica

tion

erro

r (%

)

Label’s Non−specificity Mean

Fig.: Classification error on test set / labels non-specificity : δ = 1, N = 1000

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 30 / 44

Page 31: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Influence of labelling on results quality, δ = 2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

15

20

25

30

35

40

45

50

Em

piric

al c

lass

ifica

tion

erro

r (%

)

Label’s Non−specificity Mean

Fig.: Classification error on test set / labels non-specificity : δ = 2 N = 1000

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 31 / 44

Page 32: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Influence of labelling on results quality, δ = 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

Em

piric

al c

lass

ifica

tion

erro

r (%

)

Label’s Non−specificity Mean

Fig.: Classification error on test set / labels non-specificity δ = 4, N = 1000

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 32 / 44

Page 33: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Influence of labels precision on optimisation

δ = 2,N = 1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

20

40

60

80

100

120

140

Label’s Non−specificity Mean

# lo

cal m

axim

ums

(a) # Locals maximums

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

50

100

150

200

250

300

Label’s Non−specificity Mean

# ite

ratio

ns b

efor

e co

nver

genc

e

(b) # iterations before convergence

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 33 / 44

Page 34: Seminaire Ercim

Experimentations Experimentation (1) : from unsupervised to supervised learning

Conclusions on experiment (1)

More precise labels :

I Better classification performance if the classes overlapI Simplify the optimisation problem

I less local maximaI faster EM convergence

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 34 / 44

Page 35: Seminaire Ercim

Experimentations Experiment (2) : Label noise

Experiment (2) : Label noise

Context :I Labelling done by an expert.

I Some mistake are made during the labeling process.

I But a value, pi ∈ [0, 1] = expert’s doubt, is supplied.

How to deal with such additional information ?

Solution with “soft” labels

Discounting the hard label by pi :{mi (Ck) = 1− pi

mi (Y) = pi⇔

{plik = 1plik ′ = pi , ∀k ′ 6= k

(15)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 35 / 44

Page 36: Seminaire Ercim

Experimentations Experiment (2) : Label noise

SimulationsI Data = benchmark classification problem (UCI),

I Expert’s doubt pi simulated with a Beta law,

I With a probability pi , the true label is flipped.

Rmq : Expectation of pi = asymptotic labelling error rate.

Comparison :

Supervised learning, Unsupervised learning, adaptative Semi-supervisedlearning, label noise model.Performance Criterion : classification error

ParametersI Expectation of Beta law : (0.1→ 0.4) = (% false labels )

I Variance of Beta law keep constant at 0.2

Results are averaged over 100 simulated learning set and performances areassessed by cross-validation (10 block)

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 36 / 44

Page 37: Seminaire Ercim

Experimentations Experiment (2) : Label noise

Results Iris

10 15 20 25 30 35 400

5

10

15

20

25

Asymptotic label error rate (%)

Em

piric

al c

lass

ifica

tion

erro

r (%

)

Soft labelsSupervisedUnsupervisedSemi−supervisedLabel Noise Model

Classification error / Expert’s labelling error rate

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 37 / 44

Page 38: Seminaire Ercim

Experimentations Experiment (2) : Label noise

Results Wine

10 15 20 25 30 35 400

5

10

15

20

25

30

35

Asymptotic label error rate (%)

Em

piric

al c

lass

ifica

tion

erro

r (%

)

Soft labelsSupervisedUnsupervisedSemi−supervisedLabel Noise Model

Classification error / Expert’s labelling error rate

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 38 / 44

Page 39: Seminaire Ercim

Experimentations Experiment (2) : Label noise

Results Crabs

10 15 20 25 30 35 404

6

8

10

12

14

16

18

20

22

24

Asymptotic label error rate (%)

Em

piric

al c

lass

ifica

tion

erro

r (%

)

Soft labelsSupervisedUnsupervisedSemi−supervisedLabel Noise Model

Classification error / Expert’s labelling error rate

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 39 / 44

Page 40: Seminaire Ercim

Experimentations Experiment (2) : Label noise

Results Breast Cancer Wisconsin

10 15 20 25 30 35 400

5

10

15

20

25

30

35

Asymptotic label error rate (%)

Em

piric

al c

lass

ifica

tion

erro

r (%

)

Soft labelsSupervisedUnsupervisedSemi−supervisedLabel Noise Model

Classification error / Expert’s labelling error rate

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 40 / 44

Page 41: Seminaire Ercim

Experimentations Experiment (2) : Label noise

Conclusions on experiment (2)

I Information on label reliability useful in the presence of label noise

I Our solution based on mixture model with soft label can deals withsuch additional information efficiently

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 41 / 44

Page 42: Seminaire Ercim

Conclusion

Conclusion

ResultsI Definition of a generalized likelihood criterion

I Demonstration of EM convergence if likelihood bounded and smooth

I Relations with probabilist criterion

I Validation with experiments

Advantages of the method

I Can benefit from all the developments done in the probabilistframework (parsimonious model, model selection criterion BIC, ...)

I Can deal with any type of information on labels

Disadvantages of the method

I Generative settings : distributional assumptions must hold for themethod to work

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 42 / 44

Page 43: Seminaire Ercim

Conclusion

Future work

Performance assessmentI Validation of classification results (how to extend misclassification

rate to soft labels),

Extension to other Latent variable modelsI Latent Traits Analysis :

Continuous latent variables, discrete observations ;

I Latent Profile Analysis :Discrete latent variables, discrete observations ;

I Independent Factor Analysis :Mixed continuous, discrete latent variables, continuous observations.

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 43 / 44

Page 44: Seminaire Ercim

Conclusion

C. Ambroise & G. Govaert.

EM algorithm for partially known labels.

In Proceedings of the 7th international conference, IFCS, pages 161–166. Springer, 2000.

Y. Bengio & Y. Grandvalet.

Semi-supervised Learning by Entropy Minimization.Advances in Neural Information Processing Systems, vol. 17, 2005.

T. Denœux.

A k-Nearest Neighbor Classification rule Based on Dempster Shafer Theory.IEEE Tansaction on System Man and Cybernetics, vol. 25, no. 5, pages 804–813, may 1995.

Z. Elouedi, K. Mellouli & P. Smets.

Belief Decision Trees : Theoretical Foundations.International Journal of Approximate Reasoning, vol. 28, 2001.

Y. Grandvallet.

Logistic regression for partial labels.In 9th International Conference IPMU’02, volume III, pages 1935–1941, 2002.

K. Nigam, A. McCallum, S. Thrun & T. Mitchell.

Text Classification from labelled and unlabelled documents using EM.Machine Learning, vol. 39, no. 2-3, pages 103–134, 2000.

P. Smets.

Belief functions : The disjunctive rule of combination and the generalized Bayesian theorem.International Journal of Approximate Reasoning, vol. 9, no. 1, pages 1–35, August 1993.

P. Smets.

Belief functions on real numbers.International Journal of Approximate Reasoning, vol. 40, no. 3, pages 181–223, 2005.

P. Vannoorenberghe & P. Smets.

Partially supervised learning by a credal em approach.Springer, 2005.

Etienne Come (INRETS,UTC) ERCIM Workshop 19 juin 2008 44 / 44