cs626: nlp, speech and the webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “the basic...

80
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lectures 30, 31, 32, 33: Recurrent NN, Language Modeling 13 th October onwards, 2014 (Guiding paper: Application of Deep Belief Networks for Natural Language Understanding, IEEE Transactions on Audio, Speech and Language Processing) 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 1

Upload: others

Post on 02-Mar-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

CS626: NLP, Speech and the Web

Pushpak BhattacharyyaCSE Dept., IIT Bombay

Lectures 30, 31, 32, 33: Recurrent NN, Language Modeling

13th October onwards, 2014

(Guiding paper: Application of Deep Belief Networks for Natural Language Understanding, IEEE Transactions on

Audio, Speech and Language Processing)

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 1

Page 2: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Harris’s distributional hypothesis

“We group A and B into substitution set whenever A and B have the same (or par-tially same) environments X (Harris, 1981, p.17)”

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 2

Page 3: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

“The basic concept of word”

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 3

The Basic Concept

of word

is hard

to express

The 0 1 1 1 1 0 0 0 0

basic 1 0 1 1 1 0 0 0 0

concept

of

word …..

is

hard

to

express 0 0 0 0 0 0 0 0 0

Page 4: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Backpropagation algorithm

Fully connected feed forward network Pure FF network (no jumping of

connections over layers)

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j

iwji

….

….

….

….

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 4

Page 5: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

General Backpropagation Rule

ijjk

kkj ooow )1()(layernext

)1()( jjjjj ooot

iji jow • General weight updating rule:

• Where

for outermost layer

for hidden layers

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 5

Page 6: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Recurrent NN

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 6

Page 7: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Hopfield net Inspired by associative memory which means

memory retrieval is not by address, but by part of the data.

Consists ofN neurons fully connected with symmetric weight strength wij = wji

No self connection. So the weight matrix is 0-diagonal and symmetric.

Each computing element or neuron is a linear threshold element with threshold = 0.

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 7

Page 8: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Connection matrix of the network, 0-diagonal and symmetric

n1 n2 n3 . . . nk

n1

n2

n3. ..

nk

j

i wij

0 – diagonal

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 8

Page 9: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Examplew12 = w21 = 5w13 = w31 = 3w23 = w32 = 2At time t=0

s1(t) = 1s2(t) = -1s3(t) = 1Unstable state: Neuron 1 will flip.A stable pattern is called an

attractor for the net.

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 9

Page 10: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Concept of Energy Energy at state s is given by the equation:

nn xxwxxwxxwsE 1131132112)(

nn xxwxxw 223223

nnnn xxw )1()1(

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 10

Page 11: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Relation between weight vector W and state vector X

W TX

Weight vector Transpose of state vector

3

21

2

5

3

1

-11

023205350

1

11

W TX

023205350

W

1

11

TX

For example, in figure 1,At time t = 0, state of the neural network is:s(0) = <1, -1, 1> and corresponding vectors are as shown.

Fig. 1

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 11

Page 12: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

W.XT gives the inputs to the neurons at the next time instant

023205350

1

11

W TX

17

2

11

1).sgn( TXW

This shows that the n/w will change state

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 12

Page 13: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Theorem

In the asynchronous mode of operation, the energy of the Hopfield net alwaysdecreases.

Proof:

)()()()()()()( 11111311131211121 txtxwtxtxwtxtxwtE nn

)()()()( 1122131223 txtxwtxtxw nn

)()( 11)1()1( txtxw nnnn

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 13

Page 14: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Proof Let neuron 1 change state by summing and comparing We get following equation for energy

)()()()()()()( 22112321132221122 txtxwtxtxwtxtxwtE nn

)()()()( 2222232223 txtxwtxtxw nn

)()( 22)1()1( txtxw nnnn

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 14

Page 15: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Proof: note that only neuron 1 changes state

)()()()()()( 1111131113121112 txtxwtxtxwtxtxw nn

n

jjjj txtxtxtxw

21112211 )()()()(

)()()()()()( 2211232113222112 txtxwtxtxwtxtxw nn

)()( 12 tEtEE

n

j

jj txtxtxw2

211111 )()()(

Since only neuron 1 changes state, xj(t1)=xj(t2), j=2, 3, 4, …n, and hence

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 15

Page 16: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Proof (continued)

Observations: When the state changes from -1 to 1, (S) has to be +ve

and (D) is –ve; so ΔE becomes negative. When the state changes from 1 to -1, (S) has to be -ve and

(D) is +ve; so ΔE becomes negative.

Therefore, Energy for any state change always decreases.

n

jjj txtxtxw

2211111 )()()(

(D)(S)

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 16

Page 17: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

The Hopfield net has to “converge” in the asynchronous mode of operation

As the energy E goes on decreasing, it has to hit the bottom, since the weight and the state vector have finite values.

That is, the Hopfield Net has to converge to an energy minimum.

Hence the Hopfield Net reaches stability.

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 17

Page 18: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Training of Hopfield Net Early Training Rule proposed by Hopfield Rule inspired by the concept of electron

spin Hebb’s rule of learning

If two neurons i and j have activation xi and xjrespectively, then the weight wij between the two neurons is directly proportional to the product xi ·xj i.e.

jiij xxw 13 Oct, 2014

Pushpak Bhattacharyya: recurrent NN 18

Page 19: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Hopfield Rule

Training by Hopfield Rule Train the Hopfield net for a specific

memory behavior Store memory elements How to store patterns?

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 19

Page 20: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Hopfield Rule To store a pattern

<xn, xn-1, …., x3, x2, x1>make

jiij xxn

w

)1(

1

• Storing pattern is equivalent to ‘Making that pattern the stable state’

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 20

Page 21: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Training of Hopfield Net

Establish that<xn, xn-1, …., x3, x2, x1>

is a stable state of the net

To show the stability of <xn, xn-1, …., x3, x2, x1>

impress at t=0<xt

n, xtn-1, …., xt

3, xt2, xt

1>13 Oct, 2014

Pushpak Bhattacharyya: recurrent NN 21

Page 22: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Training of Hopfield Net Consider neuron i at t=1

n

jijjiji

ii

txwtnet

tnetta

1,

))0(()0(

))0(sgn()1(

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 22

Page 23: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Establishing stability

))0(sgn()1(,

)0(

)1())0(()1(

1

1))0(()1(

1

])0([))0(()1(

1

))0(())0(())0(()1(

1

)0(

,1

2

1,

txtxThus

tx

ntxn

txn

txtxn

txtxtxn

txw

ii

i

i

ijji

jji

jj

ji

n

jijjij

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 23

Page 24: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Example

We want <1, -1, 1> as stored memory

Calculate all the wijvalues

wAB = 1/(3-1) * 1 * -1 = -1/2

Similarly wBC = -1/2 and wCA = ½

Is <1, -1, 1> stable?

C

BA

1

-11

C

BA

-0.5

-0.5

0.5

1

-11

Initially

After calculating weight values

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 24

Page 25: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Observations

How much deviation can the net tolerate?

What if more than one pattern is to be stored?

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 25

Page 26: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Storing k patterns Let the patterns be:

P1 : <xn, xn-1, …., x3, x2, x1>1

P2 : <xn, xn-1, …., x3, x2, x1>2

.

.

.Pk : <xn, xn-1, …., x3, x2, x1>k

Generalized Hopfield Rule is:

p

k

pji xx

nwij |

)1(1

1

Pth pattern

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 26

Page 27: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Storing k patterns

Study the stability of<xn, xn-1, …., x3, x2, x1>

Impress the vector at t=0 and observer network dynamics

Looking at neuron i at t=1, we have

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 27

Page 28: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Examining stability of the qth

pattern

)1(|)0(

1|)0()1(

1

]|)0([)|)0(()1(

1]|)0(|)0([)1(

1

|)0(|)0(|)0()1(

1|)0(]|)0(|)0([)1(

1

|)0(]|)0(|)0([)1(

1

|)0(

|)0(|)1();|)1(sgn(

|)1(

,1

2

,1

1

,1

nx

Q

xn

Q

xxn

xxn

xxxn

xxxn

xxxn

xw

xwnetnet

x

qi

qi

k

qppqjqipjqi

k

qppqjqjqiqjpjpi

k

pqjpjpi

qjij

n

ijjqjijqiqi

qi

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 28

Page 29: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Examining stability of the qth

pattern

)]0()|)0(|)0((sgn[

)]0(sgn[

])1()0(sgn[)1(

,

,1,1

,1

,1,1

in

ijj

k

qpppjqi

in

ijj

inijj

nijji

xxx

xQ

nxQx

Thus

Small when k << n

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 29

Page 30: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Storing k patterns

Condition for patterns to be stable on a Hopfield net with n neurons is:

k << n The storage capacity of Hopfield net is

very small Hence it is not a practical memory

element

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 30

Page 31: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Boltmann M/C

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 31

Page 32: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Boltzmann Machine

Hopfield net Probabilistic neurons Energy expression = -Σi Σj>i wij xi xj

where xi = activation of ith neuron Used for optimization Central concern is to ensure global

minimum Based on simulated annealing13 Oct, 2014

Pushpak Bhattacharyya: recurrent NN 32

Page 33: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Comparative RemarksFeed forward n/w with BP

Hopfield net Boltzmann m/c

Mapping device:(i/p pattern --> o/p pattern), i.e.Classification

Associative Memory

+Optimization device

Constraint satisfaction.(Mapping + Optimization device)

Minimizes total sum square error

Energy Entropy (Kullback–Leibler divergence)

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 33

Page 34: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Comparative Remarks (contd.)Feed forward n/w with BP

Hopfield net Boltzmann m/c

Deterministic neurons

Deterministic neurons

Probabilistic neurons

Learning to associate i/p with o/p i.e.equivalent to a function

Pattern Probability Distribution

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 34

Page 35: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Comparative Remarks (contd.)

Feed forward n/w with BP

Hopfield net Boltzmann m/c

Can get stuck in local minimum (Greedy approach)

Local minimum possible

Can come out of local minimum

Credit/blame assignment (consistent with Hebbian rule)

Activation product (consistent with Hebbian rule)

Probability and activation product (consistent with Hebbian rule)

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 35

Page 36: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Theory of Boltzmann m/c

For the m/c the computation means the following:At any time instant, make the state of the kth neuron (sk) equal to 1, with probability:

1 / (1 + exp(-ΔEk / T))ΔEk = change in energy of the m/c when

the kth neuron changes stateT = temperature which is a parameter of

the m/c13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 36

Page 37: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Theory of Boltzmann m/c (contd.)

P(sk = 1)

ΔEk0

1

P(sk = 1) = 1 / (1 + exp(-ΔEk / T))

Increasing T

ΔEk = α

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 37

Page 38: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Theory of Boltzmann m/c (contd.)

ΔEk = Ekfinal – Ek

initial

= (sinitialk - sfinal

k) * Σj≠kwkjsj

We observe:1. The higher the temperature, lower is P(Sk=1)2. at T = infinity, P(Sk=1) = P(Sk=0) = 0.5, equal

chance of being in state 0 or 1. Completely random behavior

3. If T 0, then P(Sk=1) 14. The derivative is proportional P(Sk=1)*(1 -

P(Sk=1))13 Oct, 2014

Pushpak Bhattacharyya: recurrent NN 38

Page 39: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Consequence of the form of P(Sk=1)P(Sα) proportional to exp[( -E(Sα)) / T]

Probability Distribution called as Boltzmann Distribution

1 -1 1 -1

N - bits

P(Sα) is the probability of the state Sα

Local “sigmoid” probabilistic behavior leads to global Boltzmann Distribution behaviour of the n/w

SP(

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 39

Page 40: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

T

P(Sα) α exp[( -E(Sα)) / T]

P

E

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 40

Page 41: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Ratio of state probabilities

Normalizing,P(Sα) = (exp(-E(Sα)) / T) / (∑β є all statesexp(-E(Sβ)/T)

P(Sα) / P(Sβ) = exp -(E(Sα) - E(Sβ) ) / T

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 41

Page 42: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Learning a probability distribution

Digression: Estimation of a probability distribution Q by another distribution P

D = deviation = ∑sample space Qln Q/P D >= 0, which is a required property

(just like sum square error >= 0)

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 42

Page 43: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Recurrent n/w and optimization

Problem representation

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 43

Page 44: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

What is common between Sentence Generation Sorting Travelling Salesman Problem

Page 45: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Sentence GenerationGiven a set of words place them at appropriate positions in the sentence.

Position(pj)

Words(wi) 1 2 3 ........... M123....M

xij =1 iff ith word is in jh position

Page 46: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

SortingGiven some numbers, place them at appropriate position in the ordered list.

Position(pj)

Numbers(ni) 1 2 3 ........... M123....M

xij =1 iff ith number is in jh position

Page 47: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

TSPGiven the cities a traveller must visit, place the cities in the “tour” so that the total distance travelled is minimized.

Position(pj)

Cities(ci) 1 2 3 ........... M123....M

xij =1 iff ith city is in jh position

Page 48: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Hopfield Net for Optimization

Optimization problem Maximizes or minimizes a quantity

Hopfield net used for optimization Hopfield net and Traveling Salesman

Problem Hopfield net and Job Scheduling Problem

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 48

Page 49: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

The essential idea of the correspondence

In optimization problems, we have to minimize a quantity.

Hopfield net minimizes the energy THIS IS THE CORRESPONDENCE

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 49

Page 50: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Hopfield net and Traveling Salesman problem

We consider the problem for n=4 cities In the given figure, nodes represent cities

and edges represent the paths between the cities with associated distance.

D C

A B

dDA dBC

dCD

dAB

dBDdAC

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 50

Page 51: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Traveling Salesman Problem

Goal Come back to the city A, visiting j = 2 to n

(n is number of cities) exactly once and minimize the total distance.

To solve by Hopfield net we need to decide the architecture: How many neurons? What are the weights?

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 51

Page 52: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Constraints decide the parameters

1. For n cities and n positions, establish city to position correspondence, i.e.

Number of neurons = n cities * n positions

2. Each position can take one and only one city

3. Each city can be in exactly one position

4. Total distance should be minimum13 Oct, 2014

Pushpak Bhattacharyya: recurrent NN 52

Page 53: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Architecture n * n matrix where

rows denote cities and columns denote positions

cell(i, j) = 1 if and only if ith city is in jthposition

Each cell is a neuron n2 neurons, O(n4)

connections

pos(α)

city(i)

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 53

Page 54: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions corresponding to constraints

1. Each city in one and only one position i.e. a row has a single 1.

• Above equation partially ensures each row has a single 1

• xiα is I/O cost at the cell(i, α)

i

n

i

n n

i xxAE

1 1 ,1

1 2

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 54

Page 55: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions corresponding to constraints (contd.)

2. Each position has a single cityi.e. each column has at most single 1.

n n

i

n

ijjji xxBE

1 1 ,12 2

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 55

Page 56: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions corresponding to constraints (contd.)

3. All cities MUST be visited once and only once

2

1 13 ])[(

2

n

i

n

i nxCE

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 56

Page 57: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions corresponding to constraints (contd.)

E1, E2, E3 ensure that each row has exactly one 1 and each column has exactly one 1

If we minimize E1 + E2 + E3

Ensures a Hamiltonian circuit on the city graph which is an NP-complete problem.

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 57

Page 58: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Constraint of distance4. The distance traversed should be minimum

dij = distance between city i and city j

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN

n

i

n

j

n

jjiij xxxdE1 1 1

1,1,2 )(21

Page 59: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions corresponding to constraints (contd.)

We equate constraint energy:EProblem = Enetwork (*)

Where, Eproblem= E1+E2+E3+E4

and Enetwork is the well known energy expression for the Hopfield net

Find the weights from (*).

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 59

Page 60: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Finding weights for Hopfield Net applied to TSP

Alternate and more convenient Eproblem

EP = E1 + E2

whereE1 is the equation for n cities, each city in one position and each position with one city.E2 is the equation for distance

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 60

Page 61: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions for E1 and E2

n n

i

n

i

n

ii xxAE1 1 1 1

221 ])1()1([

2

n

i

n

j

n

jjiij xxxdE1 1 1

1,1,2 )(21

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 61

Page 62: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Explanatory example3

1 2

1 2 31 x11 x12 x13

2 x21 x22 x23

3 x31 x32 x33

pos

city

Fig. 1 shows two possible directions in which tour can take place

Fig. 1

For the matrix alongside, xiα = 1, if and only if, ith city is in position α

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 62

Page 63: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Kinds of weights

Row weights:w11,12 w11,13 w12,13

w21,22 w21,23 w22,23

w31,32 w31,33 w32,33

Column weightsw11,21 w11,31 w21,31

w12,22 w12,32 w22,32

w13,23 w13,33 w23,3313 Oct, 2014

Pushpak Bhattacharyya: recurrent NN 63

Page 64: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Cross weights

w11,22 w11,23 w11,32 w11,33

w12,21 w12,23 w12,31 w12,33

w13,21 w13,22 w13,31 w13,32

w21,32 w21,33 w22,31 w23,33

w23,31 w23,32

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 64

Page 65: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions

])1(

)1(

)1(

)1(

)1(

)1[(2

2332313

2322212

2312111

2333231

2232221

21312111

21

xxxxxx

xxxxxxxxx

xxxAE

EEE problem

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 65

Page 66: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Expressions (contd.)

)...]()()()()(

)([21

32311313

31331213

33321113

22211312

21231212

232211122

xxxdxxxdxxxdxxxdxxxd

xxxdE

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 66

Page 67: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Enetwork

...]

[

331133,11321132,11311131,11

231123,11221122,11211121,11

131213,12131113,11121112,11

xxwxxwxxwxxwxxwxxwxxwxxwxxwEnetwork

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 67

Page 68: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Find row weight

To find, w11,12

= -(co-efficient of x11x12) in Enetwork

Search x11x12 in Eproblem

w11,12 = -A ...from E1. E2 cannot contribute

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 68

Page 69: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Find column weight

To find, w11,21

= -(co-efficient of x11x21) in Enetwork

Search co-efficient of x11x21 in Eproblem

w11,21 = -A ...from E1. E2 cannot contribute

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 69

Page 70: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Find Cross weights

To find, w11,22

= -(co-efficient of x11x22) Search x11x22 from Eproblem. E1 cannot

contribute Co-eff. of x11x22 in E2

(d12 + d21) / 2

Therefore, w11,22 = -( (d12 + d21) / 2)13 Oct, 2014

Pushpak Bhattacharyya: recurrent NN 70

Page 71: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Find Cross weights

To find, w11,33

= -(co-efficient of x11x33) Search for x11x33 in Eproblem

w11,33 = -( (d13 + d31) / 2)

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 71

Page 72: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Summary

Row weights = -A Column weights = -A Cross weights = - ( (dij + dji) / 2), j = i

± 1

13 Oct, 2014Pushpak Bhattacharyya: recurrent

NN 72

Page 73: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Restricted Boltzmann Machines (RBM)

Lecture 396th Nov, 2014

Page 74: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Restricted Boltzmann Machine (Binary)

VISIBLE HIDDEN

1v2v3v

mv

1h2h3h

nh

Fully connected

Page 75: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Weights(Bidirectional and symmetric)

11w 12w . . . 1nw21w 22w . . . 2nw

. . . . . .

. . . . . .

. . . . . .

m1w m2w . . . mnw

Bias 1 Visible Units

Bias 2Hidden Units

1b2b

.

.

.

mb

1c2c

.

.

.

nc

V = Visible Unit Activation 1v2v

.

.

.

mv

H = Hidden Unit Activation 1h2h

.

.

.

nh

Page 76: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Energy at a state

Energy E(H, V) at the state <H, V>

E(H,V)VTWHBTV CTH ijw

j 1

n

i1

m

iv jh ib ivi1

m

jcj 1

n

jh

Page 77: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Problem: Name Identification

Let a sentence be denoted by S

Input: Sentence POS tagged with Noun (= 1) or not Noun (= 0)

TajMahal is the most visited site in India

‘1’ for NE, 0 otherwise

wwwww nnS 1210 ...

1 10 0 0 0 0 0

Page 78: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

How do the neurons become 0 and 1?

where

Similarly, for visible neurons

P ( h j 1 | V )1

1 e net j

net j w jii 1

m

v i c j

Page 79: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Input: Delhi is in IndiaOutput: 1 0 0 1

Where E = Energy, Z = partition function

P( H,V ) eE ( H ,V )

eE( H ,V )

Z

Z eV '

H ' E ( H ' ,V ' )

Page 80: CS626: NLP, Speech and the Webpb/cs626-2014/cs626-lect30to... · 2014. 11. 9. · “The basic concept of word” 13 Oct, 2014 Pushpak Bhattacharyya: recurrent NN 3 The Basic Con

Probability of a state of the network is given by energy.

Probability of the state of a neuron is given by sigmoid.

The weights and biases should be adjusted such that the desired <H,V> combination is stabilized.