sequential data modeling...2016/07/01  · sequential data modeling tomoki toda 2 graham neubig1...

44
Sequential Data Modeling Sequential Data Modeling Tomoki Toda 2 1 Graham Neubig 1 Sakriani Sakti 1 1 A dH C i i Lb Sakriani Sakti 1 AugmentedHuman Communication Laboratory Graduate School of Information Science, NAIST 2 If ti T h l C t /G d t Sh l fI f ti Si 2 Inf ormation T echnology Center /Graduate School ofInf ormation Science Nagoya University

Upload: others

Post on 14-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Sequential Data ModelingSequential Data Modelingq gq g

Tomoki Toda 21Graham Neubig 1

Sakriani Sakti 1

1A d H C i i L b

Sakriani Sakti

1 Augmented Human Communication LaboratoryGraduate School of Information Science, NAIST

2 I f ti T h l C t / G d t S h l f I f ti S i2 Information Technology Center / Graduate School of Information ScienceNagoya University

Page 2: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: Evaluation/Alignment/TrainingEvaluation

Model λ

Sequential  Forward/Backward  Likelihood

Model λ

qdata algorithms

( )

z

λzxλx all

)|,()|( ppx

Alignment (Decoding)Model λ

Viterbi algorithm )|,(maxargˆ λzxzz

pSequential data x

State sequence

Training

z

Baum‐Welch (i.e., EM) algorithm )|(maxargˆ λxλ

λp

Sequential data x

Model parameter set

Review: 1

λ

Page 3: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Example of Parameter Estimation起: wake up 寝: sleepTraining data samples with state sequences:

• /s/⇒ state 1:起⇒ state 1:起⇒ state 2:寝• /s/ ⇒ state 1: 起⇒ state 1: 起⇒ state 2: 寝• /s/ ⇒ state 2: 寝⇒ state 2: 起⇒ state 1: 起⇒ state 1: 寝• /s/ ⇒ state 2: 寝⇒ state 2: 起

/s/ * States 1 and 2起 0 起 012

Number of observed samples:

13 2/s/  States 1 and 2            can be a final state.0起: 0

寝: 0起: 0寝: 00

0

112

11

1213

12

32

1 20

001 11

2 2

)(1 起B )(2 起B/s/1 2Maximum likelihood estimates:

2/(1+2)1/(1+2)3/(3+1) 2/(2+3))(

)(

1

1

寝B )()(

2

2

寝B/ /

A2,1A

A

2/(1+2)1/(1+2)

1/(2+1)

3/(3 1)1/(3+1)

( )3/(2+3)

1 21,1A

1,2A

2,2A2/(2+1) 2/(2+1)1/(2+1)

Review: 2

Page 4: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: Lower Bound of HMM Likelihood

ULog‐scaled likelihood function for U samples of sequential data

U

u

uuU

u

pp1 all

)()()()1(

)(

|,ln|,,lnz

λzxλxx

λz

λzxzz

,)(

)|,(ln1 all

)(

)()()(

)(

qq

pqU

uu

uuu

u

L Lower bound

)()(

E‐step: calculate posterior probabilities of latent variables (i.e., state sequences)

Lower bound

)(old

)()(old

)()(

old)()()(

)|,()|,(,|ˆ uu

uuuuu

pppq

λzxλzxλxzz

U

)(all uz

M‐step: maximize auxiliary function with respect to model parameters

U

u

uuu

u

pq1 all

new)()()(

oldnew)(

|,ln ˆ,z

λzxzλλQ

Review: 3

Page 5: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: E‐Step• Calculate posterior probabilities of latent variables

ˆ)( )()( un

us szqn

Expected # of samples observed in state s at time n in sample u

)()( ss nn

)|(

,|)()(

old)()(

λx

λxuu

uun

szp

szp

1 1 1 1 1

)|()|,(

old)(

old

λxλx

un

pszp

3

2

3

2

3

2

3

2

3

2

',ˆ)1( )()(1

)('

un

un

uss szszqn

Expected # of samples from state s’ at time n – 1 to state s at time n in sample u

)|'(

,|',

,)(

)()()(old

)()()(1

1,

λ

λxuuu

uun

un

nnss

szszp

q

1 1 1 1 1

)'()( '',1 sBAs nnsssn x

)|()|',,(

old)(

old)()(

1)(

λxλx

u

un

un

u

pszszp

1

2

1

2

1

2

1

2

1

2

3 3 3 3 3Review: 4

Page 6: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: M‐Step

SS SSAuxiliary function

S

sss

S

s

S

sssss

S

sss BAn

1 "o" all1 1'',',

1old o""ln)o""(lnln)1(, λλQ

)1(

S For each state, 

ML estimates

S

s

ss

n

n

)1(

)1(ˆ

0

1,1

old

λλs

s

Q Initial state probability

ss

1)(

ˆ λλs

1, ',old

λλ

S

ssAQ Sss

ssA ','

ˆ Transition 

0ˆ',

1'

λλss

s

A

S

sss

ss

1'',

,

probability

0

)""(

)o""(1,"o" all

old

λλ s

B

BQ

)o"(")o"(")o""(ˆ

s

ssB

Output 

probability)o""( ˆ λλsB "o" all

Review: 5

Page 7: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: Example of E‐Step

起 寝 寝2Forward and backward probabilities Forward

BackwardTime

起 寝 寝

0.8 0.2 0.2

1n 2n 3n Backward

0.060.32

0.0121

0.40.14560.5

0.7

0 3Initial Pseudo final

0.7

0 3 1

1s

0 18 0 1080 20.50.4 0.6

0.3

0.11

0.120.12

10.60.3

0.1

1

12 0.18

0.560.108

10.2

0.3088 0.9 0.92sState

1456.04.0

0.05824 0.0192 0.0120.00840.01792Their products

32.02.07.04.0

1456.04.0

0.0108

0.0036

0.04032

0.00128

0.05824

0.061760.06176 0.1008 0.108

0.09720.06048Review: 6

Page 8: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: Example of Posterior ProbabilitiesTime起 寝 寝1n 2n 3n

058240 01920 012012.0

01792.0)1()(1,1 u

12.00084.0)2()(

1,1 u

12.005824.0)1()1(

1 12.0

0192.0)2()1(1

12.0012.0)3()1(

1 1s

0108.0)2()(u04032.0)1()(u12.0

0 08.0)2()(2,1 u

0036.0)2()(u

12.00 03.0)1()(

2,1 u

00128.0)1()(u

06176.0)1()1(2 1008.0)2()1(

2 108.0)3()1(2 2s

12.0)2()(

1,2 u12.0

)1()(1,2 u

12.0)(2 12.0

)(2 12.0)(2

12006048.0)1()(

2,2 u120

0972.0)2()(2,2 u

2s

State 12.0 12.0State

Calculate these posterior probabilities (= expected # of samples) 

Review: 7

sequence by sequence

Page 9: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: Example of Sufficient Statistics起

1n2Posterior probabilities (= expected # of samples) 寝

寝 2n 3n

/s/12.005824.0)1()1(

1 12.0

06176.0)1()1(2

Posterior probabilities (  expected # of samples)

12.005824.0)1()1(

1 起12.0

06176.0)1()1(2 起

0192.0)2()1( 寝 1008.0)2()1( 寝012.0)3()1( 寝 108.0)3()1( 寝12.0

)2(1 寝12.0

)2(2 寝12.0

)3(1 寝

01792.0)1()(u06048.0)1()( u12.0

04032.0)1()(2,1 u

12.00108.0)2()(

2,1 u

12.0)3(2 寝

12.0)1()(

1,1 u 12.0)1(2,2

12.00084.0)2()(

1,1 u12.0

0972.0)2()(2,2 u

1200036.0)2()(

1,2 u120

00128.0)1()(1,2 u

1 2

12.012.0

Sufficient statistics (= expected # of samples for each parameter)

12.005824.0)1(1 n

06176012.0

02632.01,1

12.005112.0

2,1

004880 1 68012.0

05824.0)(1 起

2088012.0

0312.0)(1 寝

061 6012.0

06176.0)1(2 n12.0

00488.01,2

12.015768.0

2,2 12.0

2088.0)(2 寝12.0

06176.0)(2 起Review: 8

Page 10: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: Example of ML EstimatesSufficient statistics (expected # of samples for each parameter)

12.005824.0)1(1 n

12.006176.0)1(2 n

051120

12.005824.0)(1 起

0312.0)(1 寝

/s/起:寝:

起: 寝: 

1202088.0)(2 寝12.0

06176.0)(2 起

02632.011

12.005112.0

2,1

004880 15768.0

12.0)(1 寝

1 2

寝: 12.0)(2

12.01,1

12.000488.0

1,2 12.02,2

)1( n ML estimates

49.0)1()1(

)1(ˆ21

11

nn

n

34.0ˆ2,11,1

1,11,1

A

)1(n

66.0ˆ2,11,1

2,12,1

A

51.0

)1()1()1(ˆ

21

22

nn

n

03.0ˆ2,21,2

1,21,2

A 97.0ˆ2,21,2

2,22,2

A

)(起 )(寝65.0)()(

)()(ˆ11

11

寝起

起起

B 35.0

)()()()(ˆ11

11

寝起

寝寝

B

)(起 )(寝23.0)()(

)()(ˆ22

22

寝起

起起

B 77.0

)()()()(ˆ22

22

寝起

寝寝

B

Review: 9

Page 11: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Sequential Data ModelingSequential Data Modelingq gq g

55thth classclass“Continuous Latent Variable Model 1” “Continuous Latent Variable Model 1” 

Tomoki TodaInformation Technology Center / Graduate School of Information Science

N U i itNagoya University

Page 12: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Basic Techniques

Discrete latent variables Continuous latent variables

Mixture model (e g GMM) Factor analysis (FA)

Discrete latent variables Continuous latent variables

Mixture model (e.g., GMM) Factor analysis (FA)

1z 2z 3z1z 2z 3z

mod

el

1x 2x 3x1x 2x 3x

Linear dynamical systems (LDS)Markov m

Hidden Markov model (HMM) Linear dynamical systems (LDS)M Hidden Markov model (HMM)

1z 2z 3z1z 2z 3z

1x 2x 3x1x 2x 3x

1

Page 13: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Continuous LatentContinuous LatentContinuous Latent Continuous Latent VariablesVariables

(from(from PCAPCA to FA)to FA)(from (from PCAPCA to FA)to FA)

Page 14: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Example of High Dimensional Data• Example of hand‐written digits

• Each image of 100 x 100 = 10,000 pixels, i.e., represented as 10,000 dimensional vector

Each image is represented as one point in the space.

10,000 dimensional space

However # of the degrees of freedom of variability would be limitedHowever, # of the degrees of freedom of variability would be limited…(e.g., only vertical and horizontal translations and the rotations: 3 degrees)

Can we find a lower dimensional subspace on which the data points live?

2

Page 15: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Extraction of Synthetic Variables • Synthesis of new variables by linearly combining observable variables

e.g., from 2‐dimensional observation data to one dimensional data

on5060di

men

si

2,1,5.0 nnn xxy

150 402n

dd

40 15.0

1, n

xx

40ny20

2040

nx 2,nx

represented by inner product:

1st dimension

nny xw Τrepresented by inner product:

0

1, ,

15.0 n

n

xxwwhere

3

2,1 n

n x

Page 16: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Principal Component Analysis (PCA)• How to extract a synthetic variable that the most effectively 

represents observable variables?

• Determine a unit vector                      by maximizing a variance of synthetic variables  Mean vector

Synthetic variable:Mean value

μ

Its variance :

Constraint              : i.e., unit vector (length = 1)

4

Page 17: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Eigenvalue Problem• Maximization of variance of synthetic variable 

nn

N

n 1

N

nnnN 1

1 ΤμxμxSμ μ

Maximize the following objective function with respect to uMaximize the following objective function with respect to

C t i t

u uF

VarianceConstraint

Lagrange multiplier

Ei l bl F Eigenvalue problem:

Eigenvector

0uu

F

Eigenvalue5

Page 18: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Eigenvector and Eigenvalue• Eigenvalue problem Direction

Variance of synthetic

= Eigenvector

Variance of synthetic variable = Eigenvalue

Variance of synthetic i bl / i

Eigenvalue Eigenvector for the largest eigenvaluerepresents the direction that maximizes 

variable w/ eigenvector1u

pthe variance of a synthetic variable.

• Synthetic variable with eigenvector (= principal component)

μxuy Τ Its mean = 0 μxu nny 11,

n

Its mean  0Its variance = Eigenvalue 1

6n

Page 19: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Projection onto Low‐Dimensional Space

21,uu• Extraction of multiple eigenvectors, e.g., 

Orthonormal vectors: 1u2uConstraints

and

• Represent high‐dimensional data      w/ low‐dimensional data (i i i l t )

nx ny(i.e., principal components)

1st principal component:Synthetic variable w/ an eigenvector for the largest eigenvalue

μxu

y

n

nn

Τ11, n

Synthetic variable w/ an eigenvector for the largest eigenvalue

u

nn

n y Τ22,

n2nd principal component:h bl / f h d l l

1 0

ΛM 0 C i t i

Synthetic variable w/ an eigenvector for the 2nd largest eigenvalue

7

2

1

0 ΛMean vector: 0 Covariance matrix:

Page 20: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Whitening Transformation

0Mean vector:μMean vector: U Τ

nΛCovariance   :SCovariance   :

x

μxUy nnΤ

n

y

nx

ny

Whiteningnn yΛz 2/1

n nWhitening

UΛ Τ2/1 μxUΛz nnΤ2/1

n

0Mean vector:Linear transform f hit i

nn

nzI0

Covariance   :for whitening

8

Page 21: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Continuous LatentContinuous LatentContinuous Latent Continuous Latent VariablesVariables

(from PCA to(from PCA to FAFA))(from PCA to (from PCA to FAFA))

Page 22: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Whitening Process with PCA

Linear transformationHigh‐dimensional space Low‐dimensional space

Linear transformation  for whitening

z μxuΤ21

Observation data: nxμMean vector         :

Low dimensional data: nzMean vector                 : 0

1 n

nn z μxu11SCovariance           : Covariance                    :

nx1

n

nz01. Dimension reduction

2 Processing for low dimensional data2. Processing for low‐dimensional datae.g., probability density modeling

Regard low‐dimensional data as observation data Ignore errors caused by linear transformation

i.e., unable to model probability density of the original observation data

9

Page 23: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Basic Idea of Factor Analysis (FA)

Linear transformationHigh‐dimensional space Low‐dimensional space

Linear transformation Low dimensional data: nzMean vector                 :μux nn z21

11ˆ 0Projected data: nx̂

nn Covariance                    :

nx̂ 1

tz02. Projection onto

1. Low‐dimensionaldata generation

Observation data: nxsubspace

data generation

x exx ˆ3. Random noise addition

nx nnn exx

10

Page 24: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Comparison between PCA and FA• FA capable of defining p.d.f. of observation data based on inversion 

process of whitening transformation

Observation data: Low dimensional data:x z

10

Observation data: nx nzμMean vector         :

SCovariance           :

Mean vector                 :Covariance                    :

Τ1Whitening with PCA

nxS μxu nnz Τ

11

1

ˆnz0

ˆ

nx

Modeled asModeled as

μux nn z11ˆ Factor analysis (FA)

Error:  a random variablea random variable

Modeled as a random variableModeled as a random variableModeled as a random variableModeled as a random variable

11

Page 25: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Representation of Observation Data w/ FA• Representation of observation data Loading matrixg

n

n Observation model given the factors n n

ΣμWzxλzx , ;,| nnnnp Ngiven the factors

Observation noiseFactors Observation noise

Σ0eλe , ;| nnp N I0zλz ;|p N

Factors(low dimensional data)

I0zλz ,;| nnp N

012

Page 26: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Marginalization over Latent VariablesIf one sample is generated

If K samples are generated

If an infinite # of samples are generatedgenerated… generated… are generated…

λ|,, )()1( K zpzz λ|)1( zpz 0 0 0

λ|nzp λ|,, nnn zpzz λ|nn zpz |np

Σ

λx ,|)1(

)1( nn zp

N K

k )(1 ΣN

n

zzpzp

p

d||

|

λλx

λx

λx |np

Σμwx , ; )1( nn zN

k

knn z

K 1

)( ,; ΣμwxN nnnn zzpzp d |,| λλx13

Page 27: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Derivation of p.d.f. of Observation Data• Derived by marginalizing the joint p.d.f. over factors, which are 

nnnnn ppp zλzλzxλx d|,||regarded as a latent variable

nnnnn ppp zzz |,||

nnnn zI0zΣμWzx d ,;,; NN

ΣWWμx Τ,;nN

Covariance matrix

Expectations:

ΤΤ zI0zμWzμWzΣxx

μμzWzI0zμWzx

nnnnn

d,;

d,;

N

N = Mean vector

ΤΤΤΤΤΤ μzWWzμμμWzzWΣ

zI0zμWzμWzΣxx

nnnn

nnnnnn d,;N

ΤΤ μμWWΣ = Covariance matrix + Squared mean vector14

Page 28: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Comparison between GMM and FAGMM: discrete latent variables FA: continuous latent variables

1z 2z 3z1z 2z 3z

1x 2x 3x1x 2x 3x

nn zpp || λλx

λλx |1|1 ,

M

m mnn zpp

nnn zzp d ,| λx λx ,1| , mnn zp

Prior = discrete distribution Prior = Gaussian distributionPrior discrete distribution Prior Gaussian distribution

15

Page 29: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

M d l T i iM d l T i iModel TrainingModel Training(Parameter Optimization)(Parameter Optimization)(Parameter Optimization)(Parameter Optimization)

Page 30: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Maximum Likelihood (ML) Estimation• Log‐scaled likelihood function:

N

nn

N

nnnnpp

11,;lnd |,ln|ln ΣWWμxzλzxλX ΤN

• ML estimates of model parameters: Linear equation!

N

nnN 1

1ˆ xμ 0μ

λX

λλ

ˆ

|ln pMean vector           : equation!

L di t i 0λX |ln pNonlinear equations… ˆLoading matrix       :  0

WλX

λλ

ˆ

|ln p equations…

λX |ln p

W

ˆ

?

Covariance matrix  :  0Σ

λX

λλ

ˆ

|ln p Σ̂ ?

How to determine ML estimates of these parameters?16

Page 31: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Lower Bound of Likelihood Function• Derivation of lower bound of log‐scaled likelihood function

Log‐scaled likelihood function:

zλzxλX d |,ln|ln1

ppN

nnnn

Probability density functionof latent variables

zq

zz

λzxz d |,ln1 q

pqN

nn

n

nnn

J ’ i li

zz

λzxz d|,ln1

1

qpq

qN

nnn

n

n n

Jensen’s inequality

λ

z,

1

qqn n

L

Lower bound:

N

nnn

n qpqq d |,ln, z

zλzxzλL n nq1 z

17

Page 32: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

EM Algorithm• Maximization of lower bound (functional of q and function of λ ) 

N

Maximize lower bound with respect to q:

n

nnn pqpq1

,|||KL|ln, λxzzλXλL

KL divergenceLog‐scaled likelihood

q||pKLMaximize lower bound with respect to λ(={W, Σ}):

λX |ln p λ,qL

N

nnnnn pqq

1

d |,ln, zλzxzλLn 1

Auxiliary function

18

Page 33: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Review: Schematic Image of EM Algorithm

λX |ln p

λX |ln

λX |ln pLog‐scaled likelihood function

λX |ln p )1()1( , iiq λL

)()( , iiq λL λ,)1( iqL

λ)(iλ )1( iλ

λ,)(iqL )(iλ )1( iλ

1 E‐step: determine lower

,q3. E‐step

0. Current model parameter set

2. M‐step: update model parameters based on   the lower bound

1. E step: determine lowerbound  based on current   model parameters

parameter set the lower bound

19

Page 34: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

E‐Step: Update q• Set KL divergence to 0 under the fixed model parameters oldλ

old,|ˆ λxzz nnn pq 0,|||KL1

old

N

nnnn pq λxzz

C l l t t i b bilit d it f l t t i bl f h lCalculate posterior probability density of latent variables for each sample

0ˆKL ||pq || pp λzλzx

old,| λxz nnp

d |,|

|,|

oldold

oldold

nnnn

nnn

pppp

zλzλzxλzλzx

old|ln λXp old,ˆ λqL |,|1oldold nnn pp

Zλzλzx

,;,ˆ;1nnnZ

I0zΣμWzx NN

? , ? ;nzN20

Page 35: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Posterior Probability Density Function

const1exp,| )|(1)|(1)|(old

xzn

xznn

xznnnp μΣzzΣzλxz ΤΤ

2

p,| old nnnnnnp μzzzz

)|(1)|()|(1exp xzxzxz μzΣμz Τ* See appendix 1

2

exp nnnn μzΣμz

Posterior probability density function of latent variables:

ppfor derivation

)|()|(old ,;,| xzxz

nnnnp Σμzλxz N

Posterior probability density function of latent variables:

11)|( IWΣWΣ ΤxzCovariance matrix

+1

‐ Sample‐independent

xzxz ΣWΣ 1)|()|( ΤMean vector  

‐ Full matrix

nxzxz

n xΣWΣμ 1)|()|( Τn

nxA ‐ Sample‐dependent‐ Linear transformation

n

μxx ˆ nn -n

wheren

21

Page 36: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Schematic Image of E‐Step

)|()|( xx

Posterior probability density funciton of latent variables: )|()|(

old ,;,| xzxznnnnp Σμzλxz N

11 IWΣW Τ μxΣWΣ ˆ1)|( xz Τ CovarianceMean  IWΣW μxΣWΣ n

+n1

Covariance matrixvector

Observation data samples

)|()|(|| xzxzp Σμzλxz N

Posterior p.d.f. of latent variables

1x )|()|(11old11 ,|,|p Σμzλxz N1

1st data sample

)|()|(|| xzxz Σλ Nx )|()|(22old22 ,|,| xzxzp Σμzλxz N2x2nd data sample

22

Page 37: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

M‐Step: Update λ• Maximize auxiliary function with respect to model parameters  newλ

N

nnnn pqq d |,lnˆ, zλzxzλL

||pq̂KLn 1

Auxiliary function

N oldnew , λλQ

d |,ln,|1

newold

N

N

nnnnnn pp zλzxλxz new,ˆ λqL

new|ln λXp d ,|ln,|

1newold

N

nnnnnn pp zλzxλxz

d ,;ln,;1

)|()|(

N

nnnn

xzxznn zΣWzxΣμz NN

?23

Page 38: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Expansion of Auxiliary Function

N

nnnxzxz

nn)|()|(

oldnew d ,;ln,;, zΣWzxΣμzλλ NNQ* See appendix 2for more details

nnnnnn

1oldnewQ

N

nn11

21ln

21 xΣxΣ T Expectation of  nz n

z

nnn

1 22

nnnxzxz

nn xΣWzzΣμz 1)|()|( d ,; TTN

Expectation of  Τnnzz

Τzz

d ,;tr

21 )|()|(1

nnnxzxz

nn zzzΣμzWΣW TT N

n

N

nnN 11 tr21ln

21 TxxΣΣ Summation of  T

nnxx

2

n 122

NN11 tr1tr TTTT zzWΣWzxΣW

nn

# of samples

n

nn

nn11

tr2

tr zzWΣWzxΣW

Summation of TzzSummation ofTzx Summation of 

nzzSummation of  nn zx

24

Page 39: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

Sufficient Statistics

Analytical calculation of expectations:

)|()|()|( d ,; xznnn

xzxznnn

μzzΣμzz N TTT )|()|()|()|()|( d; xzxzxzxzxz μμΣzzzΣμzzz N

n

+n

)|()|()|()|()|( d,; nnnnnnnnμμΣzzzΣμzzz N n nn +

Sufficient statistics:

N

NSufficient statistics:

# of samples

N

nnn

1

TT xxxx

N

n 1 nndiagSum of squared samples

N

n 1

N

nn

1

TT zzzzn

Sum of expectations of squared latent variables

N

nnn

1

TT zxzx

N

n 1 nnSum of cross terms

25

n 1

Page 40: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

ML Estimates

TxxΣΣλλ 11oldnew tr1ln1, NQAuxiliary function: oldnew 22

,Q

TTTT zzWΣWzxΣW 11 tr1tr

y

zzz2

ML estimate of loading matrix: λλ Linear equation! 0zzWΣzxΣW

λλ

λλ

TT ˆˆˆ, 11

ˆ

oldnewQ

1

Linear equation!

ML i f i iLinear equation!

1ˆ TT zzzxW

1

ML estimate of covariance matrix: 0WzzWWzxxxΣΣ

λλ

TTTTT ˆˆ

21ˆ

21ˆ

21diag,

1oldnew NQ

q

Σ λλ 222ˆ

1

TTT WzxxxΣ ˆdiag1ˆ diag

26

WzxxxΣ diagN

diag

Page 41: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

App. 1: Derivation of Posterior p.d.f.nx I0zΣμWzxλx ,;,ˆ;,| old nnnnnzp NN

nnnnnn zzWzμxΣWzμx ΤΤ

21expˆˆ

21exp 1

- 22

const11 11 xΣWzzzWzΣWz ΤΤΤΤΤInside of exp( ) const22

nnnnnn xΣWzzzWzΣWz

const1 11 xΣWzzIWΣWz ΤΤΤΤ

Inside of exp( )

1)|( xzΣ

const2

nnnn xΣWzzIWΣWz

+)|( xzΣ

const21 1)|(1)|(1)|(

nxzxz

nnxz

n xΣWΣΣzzΣz ΤΤΤ

)|( xz1

2

)|( xznμ

const21 )|(1)|(1)|(

xzn

xznn

xzn μΣzzΣz ΤΤ

Appendix: 1

Page 42: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

App. 2: Expansion of Auxiliary Function (1)Auxiliary function:

N

N

N

nnnn

xzxznn

1

)|()|(oldnew d ,;ln,;, zΣWzxΣμzλλ NNQ

N

nnn

xzxznn

1

11)|()|(

21ln

21 ,; xΣxΣΣμz TN

nnnnn zzWzΣWxΣWz d tr21 11

TTTTQuadratic form in          nz

N

nnn

1

11

21ln

21 xΣxΣ T

n 1

nnnxzxz

nn xΣWzzΣμz 1)|()|( d ,; TTN

d ,;tr

21 )|()|(1

nnnxzxz

nn zzzΣμzWΣW TT NExpectation of  nz

Expectation of  Τ

nnzzAppendix: 2

Page 43: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

App. 2: Expansion of Auxiliary Function (2)

Analytical calculation of expectations:

)|()|()|( d ,; xznnn

xzxznnn

μzzΣμzz N TTT )|()|()|()|()|( d; xzxzxzxzxz μμΣzzzΣμzzz N

n

+n

)|()|()|()|()|( d,; nnnnnnnnμμΣzzzΣμzzz N

N

11 11 T

n nn +

Q d ti f i x

nnn

1

11oldnew 2

1ln21, xΣxΣλλ TQ

1 11 TTTT

Quadratic form in     nx

tr

21 11

nnnTTTT zzWΣWxΣWz

N11

nnnN

1

11 tr21ln

21 TxxΣΣ

Summation of  Tnnxx

# f l

N

nn

N

nnn

1

1

1

1 tr21tr TTTT zzWΣWzxΣW

# of samples

Summation of n

TzzSummation of Tnn zx

Appendix: 3

Page 44: Sequential Data Modeling...2016/07/01  · Sequential Data Modeling Tomoki Toda 2 Graham Neubig1 SakrianiSakti1 1 AdAugmented HumanCiiCommun icat onLbLaboratory Graduate School of

App. 2: Expansion of Auxiliary Function (3)Sufficient statistics:

N

TT xxxx

N

N n

# of samples

f d l

n

nn1xxxx

NN

TT

n 1 nndiagSum of squared samples

Sum of expectations of

N

TTN

N

n 1

n

n1

TT zzzzn

Sum of expectations of squared latent variables

n

nn1

TT zxzx

N

n 1 nnSum of cross terms

TxxΣΣλλ 11oldnew tr

21ln

21, NQAuxiliary function:

22

TTTT zzWΣWzxΣW 11 tr21 tr 2

Appendix: 4