effective composition of dense features in natural language … · 2020. 7. 11. · basic concepts...

48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Concepts Neural Models for NLP Feature Compositions References Effective Composition of Dense Features in Natural Language Processing Xipeng Qiu [email protected] http://nlp.fudan.edu.cn/ ~ xpqiu Fudan University August 25, 2015 CCIR 2015 Xipeng Qiu Effective Composition of Dense Features in Natural Language P

Upload: others

Post on 25-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Effective Composition of Dense Features inNatural Language Processing

Xipeng [email protected]

http://nlp.fudan.edu.cn/~xpqiu

Fudan University

August 25, 2015CCIR 2015

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 2: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Outline

1 Basic ConceptsMachine Learning & Deep Learning

2 Neural Models for NLPGeneral ArchitectureVarious ModelsFeature Composition Problems

3 Feature CompositionsCube Activation FunctionNeural Tensor Model

Multi-relational Data EmbeddingsNeural Tensor ModelTrainingApplications

Gated Recursive Neural NetworkAttention Model

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 3: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Machine Learning & Deep Learning

Basic Concepts of Machine Learning

Input Data: (xi , yi ),1 ≤ i ≤ m

Model:

Linear Model: y = f (x) = wT x + bGeneralized Linear Model: y = f (x) = wTϕ(x) + bNon-linear Model: Neural Network

Criterion:

Loss Function:L(y , f (x)) → OptimizationEmpirical Risk:Q(θ) = 1

m ·∑m

i=1 L(yi , f (xi , θ)) → Minimization

Regularization: ∥θ∥2

Objective Function: Q(θ) + λ ∥θ∥2

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 4: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Machine Learning & Deep Learning

Basic Concepts of Deep Learning

Model: Artificial Neural Network (ANN)

Function: Non-linear function y = σ(∑

i wixi + b)

Input x1

Input x2

Input x3

Input x4

y

Hiddenlayer

Hiddenlayer

Inputlayer

Outputlayer

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 5: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Traditional ML methods for NLP

Structured Learning

Structured Learning is the task of assigning a group of labelsy = y1, . . . , yn to a group of inputs x = x1, . . . , xn.Given a sample x, we define the feature Φ(x, y). Thus, we canlabel x with a score function,

y = argmaxy

F (w,Φ(x, y)), (1)

where w is the parameter of function F (·).

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 6: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

General Neural Architectures for NLP

How to use neural network for the NLP tasks?

Distributed Representation

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 7: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

General Neural Architectures for NLP

How to use neural network for the NLP tasks?

Distributed Representation

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 8: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

General Neural Architectures for NLP

from [Collobert et al., 2011]

1 represent the words/featureswith dense vectors(embeddings) by lookup table;

2 concatenate the vectors;

3 classify/match/rank withmulti-layer neural networks.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 9: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Difference with the traditional methods

Traditional methods Neural methods

Discrete Vector Dense VectorFeatures (One-hot Representation) (Distributed Representation)

High-dimension Low-dimension

Classifier Linear Non-Linear

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 10: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Beyond word/feature embeddings

Can we encode the phrase, sentence, paragraph, or even documentinto the distributed representation?

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 11: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

General Neural Architectures for NLP

Word LevelNNLMC&WCBOW & Skip-Gram

Sentence LevelNBOWSequence Models: Recurrent NN, LSTM, Paragraph VectorTopoligical Models: Recursive NN,Convolutional Models: DCNN

Document LevelNBOWHierachical Models two-level CNNSequence Models LSTM, Paragraph Vector

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 12: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Skip-Gram Model

from [Mikolov et al., 2013]

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 13: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Skip-Gram Model

Given a pair of words (w , c), the probability that the word c isobserved in the context of the target word w is given by

Pr(D = 1|w , c) =1

1 + exp(−wTc),

where w and c are embedding vectors of w and c respectively.The probability of not observing word c in the context of w isgiven by,

Pr(D = 0|w , c) = 1− 1

1 + exp(−wTc).

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 14: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Skip-Gram Model with Negative Sampling

Given a training set D, the word embeddings are learned bymaximizing the following objective function:

J(θ) =∑

w ,c∈DPr(D = 1|w , c) +

∑w ,c∈D′

Pr(D = 0|w , c),

where the set D′ is randomly sampled negative examples, assumingthey are all incorrect.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 15: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Recurrent Neural Network (RNN) [Elman, 1990]

The RNN has recurrent hiddenstates whose output at each timeis dependent on that of theprevious time.More formally, given a sequencex(1:n) = (x(1), . . . , x(t), . . . , x(n)),the RNN updates its recurrenthidden state h(t) by

h(t) = g(Uh(t−1) +Wx(t) + b),

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 16: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Long Short Term Memory Neural Network (LSTM)[Hochreiter and Schmidhuber, 1997]

x(t)

c

σ

σ

σ

φ

φ

OutputGate o

InputGate i

ForgetGate f

LSTMMemory Unit

c(t-1)

c(t-1)

c(t)

c(t)

c(t-1)

o(t)

i(t)

f(t)

h(t) h(t-1)

h(t-1)h(t-1)

h(t-1)

The core of the LSTM model is amemory cell c encoding memoryat every time step of what inputshave been observed up to thisstep.The behavior of the cell iscontrolled by three “gates”:

input gate i

output gate o

forget gate f

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 17: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Unfolded LSTM for Text Classification

h1 h2 h3 h4 · · · hT softmax

x1 x2 x3 x4 xT y

Drawback: long-term dependencies need to be transmittedone-by-one along the sequence.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 18: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Multi-Timescale LSTM

(a) Fast-to-Slow Strategy (b) Slow-to-Fast Strategy

Figure: Two feedback strategies of our model. The dashed line shows thefeedback connection, and the solid link shows the connection at currenttime.

from [Liu et al., 2015a]

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 19: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Unfolded Multi-Timescale LSTM with Fast-to-SlowFeedback Strategy

g31 g3

2 g33 g3

4 · · · g3T

g21 g2

2 g23 g2

4 · · · g2T softmax

g11 g1

2 g13 g1

4 · · · g1T y

x1 x2 x3 x4 xT

from [Liu et al., 2015a]

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 20: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

LSTM for Sentiment Analysis

<s> Is this progress ? </s>

0.2

0.3

0.4

0.5

LSTMMT-LSTM

<s> He ’d create a movie better than this . </s>

0

0.2

0.4

0.6

0.8

LSTMMT-LSTM

<s> It ’s not exactly a gourmetmeal but the fare is fair , even coming from the drive . </s>

0

0.2

0.4

0.6

0.8

LSTMMT-LSTM

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 21: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Recursive Neural Network (RecNN) [Socher et al., 2013]

a,Det red,JJ bike,NN

red bike,NP

a red bike,NP

Topological modelscompose the sentencerepresentation following agiven topological structureover the words.

Given a labeled binary parse tree,((p2 → ap1), (p1 → bc)), thenode representations arecomputed by

p1 = f (W

[bc

]),

p2 = f (W

[ap1

]).

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 22: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

A variant of RecNN for Dependency Parse Tree [Zhu et al.,2015]

Recursive neural network can only process the binary combinationand is not suitable for dependency parsing.

a,Det red,JJ bike,NN

Convolution

Pooling

a red bike,NN

a bike,NN red bike,NN

Recursive Convolutional Neural Network

introducing the convolution andpooling layers;

modeling the complicatedinteractions of the head word andits children.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 23: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Convolutional Neural Network (CNN)

Key steps

Convolution

(optional) Folding

Pooling

Various models

DCNN (k-max pooling)[Kalchbrenner et al., 2014]

CNN (binary pooling) [Huet al., 2014]

· · ·

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 24: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Overview of state-of-the-art neural models in NLP

Not “Really” Deep Learning in NLP

Most of the neural models is very shallow in NLP.

The major benefit is introducing dense representation.

The feature composition is also quite simple.

ConcatenationSum/AverageBilinear model

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 25: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

General ArchitectureVarious ModelsFeature Composition Problems

Quite Simple Feature Composition

Given two embeddings a and b,1 how to calculate their similarity/relevence/relation?

1 Concatenation

a⊕ b → ANN → output

2 Bilinear

aTMb → output

2 how to use them in classification task?1 Concatenation

a⊕ b → ANN → output

2 Sum/Average

a+ b → ANN → output

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 26: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Problem

How to enhance the neural model without increasing the networkdepth?

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 27: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Solution 1: Cube Activation Function [Chen and Manning,2014]

How to enhance the neural model without increasing the networkdepth?

A simple solution: Cube Activation FunctionSuppose x = a⊕ b,

f(wx+ b) = (w1x1 + · · ·+ wmxm + b)3

=∑i ,j ,k

(wiwjwk)xixjxk +∑i ,j

b(wiwj)xixj + · · ·

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 28: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Solution 2: Neural Tensor Model [Chen et al., 2013]

How to enhance the neural model without increasing the networkdepth?

Another intuitive solution: Neural Tensor Model

s(a,b) = uT f(aTW[1:k]b+ VT (a⊕ b) + c)

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 29: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Neural Tensor Model

The original Neural Tensor Model is proposed to model themulti-relational data embeddings [Chen et al., 2013] .

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 30: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Multi-relational Data Embeddings

Multi-relational Data (e1, e2, r): a pair of entities (e1, e2) and theirrelation mention r .The basic idea is that, the relationship between two entitiescorresponds to a translation between the embeddings of entities,that is, e1 + r ≈ e2 when (e1, e2, r) holds.

the embeddings r of relation mention r

the entity embeddings e1, e2 of the entity pair e1, e2

We can get them by a lookup table respectively.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 31: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Neural Tensor Model [Chen et al., 2013]

Neural Tensor Model gives a score function s(e1, e2, r) to modelthe triplet (e1, e2, r) as followed:

s(e1, e2, r) = uT f (eT1 W[1:k]r e2 + VT

r (e1 ⊕ e2) + br ),

where W[1:k]r ∈ Rd×d×k is a tensor, and the bilinear tensor product

takes two vectors e1 and e2 as input, and generates ak-dimensional phrase vector z as output.

z = eT1 M[1:k]r e2

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 32: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Neural Tensor Model

from [Chen et al., 2013]

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 33: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Training

The objective function is based on the idea that the similarityscores of observed triples in the training set should be larger thanthose of any other triples.

L =∑

(e1,e2,r)∈D

∑(e′1,e

′2,r)∈D′

[γ − s(e1, e2, r) + s(e′1, e′2, r)]+

where [x ]+ denotes the positive part of x ; γ > 0 is a marginhyper-parameter; D is all triplet extracted from the plain text; D ′

denotes all the corrupted triplets, which is composed of trainingtriplets with either the e1 or e2 replaced by a random entity (butnot both at the same time).

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 34: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Tensor Factorization

The tensor operation complexity is O(d2k). To remedy this, weuse a tensor factorization approach that factorizes each tensor sliceas the product of two low-rank matrices. Formally, each tensorslice M[i ] ∈ Rd×d is factorized into two low rank matrixP[i ] ∈ Rd×r and Q[i ] ∈ Rr×d :

M[i ] = P[i ]Q[i ], 1 ≤ i ≤ k

where r ≪ d is the number of factors.

g(w , c , t) = uT f (wTP[1:k]Q[1:k]t+ VTc (w ⊕ t) + bc).

The complexity of the tensor operation is now O(rdk).

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 35: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Applications

Parsing [Socher et al., 2012] [Socher et al., 2013]

Chinese Word Segmentation [Pei et al., 2014]

Semantic Composition [Zhao et al., 2015b]

Sentence Matching [Qiu and Huang, 2015]

Context-Sensitive Word Embeddings [Liu et al., 2015b]

· · ·

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 36: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Convolutional Neural Tensor Network for CQA [Qiu andHuang, 2015]

question+f +

answer MatchingScore

Architecture of Convolutional Neural Tensor Network

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 37: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Solution 3: Gated Recursive Neural Network [Chen et al.,2015a]

Rainy

下 雨Day

天Ground

地 面Accumulated water

积 水

M E SB

DAG based Recursive NeuralNetwork

Gating mechanism

An relative complicated solution

GRNN models the complicatedcombinations of the features,which selects and preserves theuseful combinations via reset andupdate gates.

A similar model: AdaSent [Zhaoet al., 2015a]

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 38: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Gated Recursive Neural Network

E M B S

……

……

……

……

……

……

……

……

……

……

……

……

……

……

……

……

……

ci-2 ci-1 ci ci+1 ci+2

……

……

……

……

Linearxiyi = Ws × xi + bs

Concatenate

yi

The RecNN needs a topologicalstructure to model a sequence,such as a syntactic tree.GRNN uses a directed acyclicgraph (DAG) to model thecombinations of the inputcharacters, in which twoconsecutive nodes in the lowerlayer are combined into a singlenode in the upper layer.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 39: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

GRNN Unit

Gate z

Gate rL Gate rR

hj-1(l-1) hj

(l-1)

hj^(l)

hj(l)

Two Gates

reset gate

update gate

The new activation hlj is

computed as:

hlj = tanh(Wh

[rL ⊙ hl−1

j−1

rR ⊙ hl−1j

]).

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 40: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Applications

Chinese Word Segmentation [Chen et al., 2015a]

Dependency Parsing [Chen et al., 2015c]

Sentence Modeling [Chen et al., 2015b]

· · ·

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 41: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Attention Model

from [Bahdanau et al., 2014]

Similar to gate mechanism!

1 The context vector ci iscomputed as a weighted sumof hi :

ci =T∑j=1

αijhj

2 The weight αij is computed by

αij = softmax(a(si−1; hj))

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 42: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Applications

Machine Translation [Bahdanau et al., 2014, Meng et al.,2015]

Speech Recognition [Chan et al., 2015]

· · ·

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 43: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

Cube Activation FunctionNeural Tensor ModelGated Recursive Neural NetworkAttention Model

Conclusion

Four intuitive ways to model the combination the distributedfeatures (dense vectors)?

Cube Activation Function

Neural Tensor Model

Gated Recursive Neural Network

Attention Model

Better models?

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 44: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

References I

D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointlylearning to align and translate. ArXiv e-prints, September 2014.

W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals. Listen, Attend and Spell. ArXive-prints, August 2015.

Danqi Chen and Christopher D Manning. A fast and accurate dependencyparser using neural networks. In Proceedings of the 2014 Conference onEmpirical Methods in Natural Language Processing (EMNLP), pages740–750, 2014.

Danqi Chen, Richard Socher, Christopher D Manning, and Andrew Y Ng.Learning new facts from knowledge bases with neural tensor networks andsemantic word vectors. arXiv preprint arXiv:1301.3618, 2013.

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, and Xuanjing Huang. Gated recursiveneural network for Chinese word segmentation. In Proceedings of AnnualMeeting of the Association for Computational Linguistics, 2015a.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 45: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

References II

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu, and Xuanjing Huang.Sentence modeling with gated recursive neural network. In Proceedings ofthe Conference on Empirical Methods in Natural Language Processing,2015b.

Xinchi Chen, Yaqian Zhou, Chenxi Zhu, Xipeng Qiu, and Xuanjing Huang.Transition-based dependency parsing using two heterogeneous gatedrecursive neural networks. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, 2015c.

Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, KorayKavukcuoglu, and Pavel Kuksa. Natural language processing (almost) fromscratch. The Journal of Machine Learning Research, 12:2493–2537, 2011.

Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211,1990.

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neuralcomputation, 9(8):1735–1780, 1997.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 46: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

References III

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. Convolutional neuralnetwork architectures for matching natural language sentences. In Advancesin Neural Information Processing Systems, 2014.

Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. A convolutionalneural network for modelling sentences. In Proceedings of ACL, 2014.

PengFei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuanjing Huang.Multi-timescale long short-term memory neural network for modellingsentences and documents. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, 2015a.

PengFei Liu, Xipeng Qiu, and Xuanjing Huang. Learning context-sensitive wordembeddings with neural tensor skip-gram model. In Proceedings ofInternational Joint Conference on Artificial Intelligence, 2015b.

F. Meng, Z. Lu, Z. Tu, H. Li, and Q. Liu. Neural Transformation Machine: ANew Architecture for Sequence-to-Sequence Learning. ArXiv e-prints, June2015.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 47: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

References IV

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficientestimation of word representations in vector space. arXiv preprintarXiv:1301.3781, 2013.

Wenzhe Pei, Tao Ge, and Chang Baobao. Maxmargin tensor neural network forchinese word segmentation. In Proceedings of ACL, 2014.

Xipeng Qiu and Xuanjing Huang. Convolutional neural tensor networkarchitecture for community-based question answering. In Proceedings ofInternational Joint Conference on Artificial Intelligence, 2015.

Richard Socher, Brody Huval, Christopher D Manning, and Andrew Y Ng.Semantic compositionality through recursive matrix-vector spaces. InProceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages1201–1211, 2012.

Richard Socher, John Bauer, Christopher D Manning, and Andrew Y Ng.Parsing with compositional vector grammars. In In Proceedings of the ACLconference. Citeseer, 2013.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing

Page 48: Effective Composition of Dense Features in Natural Language … · 2020. 7. 11. · Basic Concepts Neural Models for NLP Feature Compositions References Outline 1 Basic Concepts Machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Basic ConceptsNeural Models for NLPFeature Compositions

References

References V

Han Zhao, Zhengdong Lu, and Pascal Poupart. Self-adaptive hierarchicalsentence model. arXiv preprint arXiv:1504.05070, 2015a.

Yu Zhao, Zhiyuan Liu, and Maosong Sun. Phrase type sensitive tensor indexingmodel for semantic composition. In AAAI, 2015b.

Chenxi Zhu, Xipeng Qiu, Xinchi Chen, and Xuanjing Huang. A re-rankingmodel for dependency parser with recursive convolutional neural network. InProceedings of Annual Meeting of the Association for ComputationalLinguistics, 2015.

Xipeng Qiu Effective Composition of Dense Features in Natural Language Processing