do machines know the meaning of a...
TRANSCRIPT
![Page 1: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/1.jpg)
Do machines know the meaning of a word?
Hung-yi Lee
1
![Page 2: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/2.jpg)
Language Technology
(http://spam-filter-review.toptenreviews.com/)
spam detection
John saw the saw.
PN V D N
Part-of-speech Tagging
Name of People
這 位 是 李 宏 毅
Name Entity Recognition
Sentiment Analysis
“Machine learning ……”
“機器學習 ……”
document summary
Negative (負雷)
這部電影太糟了
大家好……
Translation
Summarization
Speech Recognition
Syntactic Analysis
Retrieval
2
![Page 3: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/3.jpg)
Do machine really understand human language?
http://cse3521.artifice.cc/chinese-room.html 3
![Page 4: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/4.jpg)
Meaning Representation
dog
cat
rabbit
jumprun
flower
tree
the animals with four legs
Do machine know the meaning of a word or word sequence?
4
![Page 5: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/5.jpg)
Meaning of Word
5
![Page 6: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/6.jpg)
Fill in the Blank
哈密瓜 有 一種
Neural Network
wi-3 wi-2 wi-1 wi
……
Input: the previous words wi-1, wi-2, wi-3
Output: the most possible next word wi
Each word should be represented as a feature vector.
_____ 味
6
![Page 7: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/7.jpg)
Fill in the Blank
Each dimension corresponds to a word in the lexicon
The dimension for the wordis 1, and others are 0
lexicon = {apple, bag, cat, dog, elephant}
apple = [ 1 0 0 0 0]
bag = [ 0 1 0 0 0]
cat = [ 0 0 1 0 0]
dog = [ 0 0 0 1 0]
elephant = [ 0 0 0 0 1]
The vector is lexicon size.
1-of-N Encoding
7
![Page 8: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/8.jpg)
…… wi-2 wi-1 ___Fill in the Blank
wi-2
……
lexicon size
1
0
0
Neural Network
lexicon size
The probability for each word as the next word wi
Output layer: Softmax
wi-1
……
0
0
1lexicon size
P(wi=“apple”)
P(wi=“bag”)
P(wi=“cat”)
P(wi=“dog”)
P(wi=“elephant”)
8
![Page 9: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/9.jpg)
Fill in the Blank
• Training:哈密瓜
有
有
一種
一種
哈
哈密瓜 有 一種 哈 味不爽 不要 買公道價 八萬 一………
Neural Network
Neural Network
Neural Network
一種
哈
味Minimizing
cross entropy
Collect data:
9
![Page 10: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/10.jpg)
Word Vector
z1
z2
dog
cat
rabbit
…
1-of-N encoding
of the word wi-1
……
1
0
0
The probability for each word as the next word wi
……
z1
z2
Take out the input of the neurons in the first layer
Use it to represent a word w jump
run
flowertree
Word vector, word embedding feature: V(w)
……
10
![Page 11: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/11.jpg)
Word Vector
Training text:
…… 李逍遙 御劍飛行 ……
…… 酒劍仙 御劍飛行 ……
wi-1
wi-1
wi
wi
“御劍飛行 ” should have large probability
z1
z2
李逍遙
酒劍仙
You shall know a word by the company it keeps
李逍遙or
酒劍仙
…
……
1
0
0
The probability for each word as the next word wi
……
z1
z2
……
11
![Page 12: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/12.jpg)
Word Vector – Sharing Parameters
…
The probability for each word as the next word wi
……
z1
z2
1-of-N encoding
of the word wi-2
……
1
0
0
1-of-N encoding
of the word wi-1
……
0
0
1
The weights with the same color should be the same.
Or, one word would have two word vectors.
12
![Page 13: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/13.jpg)
Word Vector – Sharing Parameters
…
1-of-N encoding
of the word wi-2
……
1
0
0
The probability for each word as the next word wi
……
z1
z2
1-of-N encoding
of the word wi-1
……
1
0
0The weight matrix W1 and W2 are both |Z|X|V| matrices.
xi-2
xi-1
The length of xi-1 and xi-2 are both |V|.
z
W1
W2 The length of z is |Z|.
z = W1 xi-2 + W2 xi-1
W1 = W2 z = W ( xi-2 + xi-1 ) = W13
![Page 14: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/14.jpg)
Word Vector – Sharing Parameters
…
1-of-N encoding
of the word wi-2
……
1
0
0
The probability for each word as the next word wi
……
z1
z2
1-of-N encoding
of the word wi-1
……
0
0
1
wi
wj
Given wi and wj the same initialization
𝑤𝑖 ← 𝑤𝑖 − 𝜂𝜕𝐶
𝜕𝑤𝑖
𝑤𝑗 ← 𝑤𝑗 − 𝜂𝜕𝐶
𝜕𝑤𝑗
−𝜂𝜕𝐶
𝜕𝑤𝑗
−𝜂𝜕𝐶
𝜕𝑤𝑖
How to make wi equal to wj
14
![Page 15: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/15.jpg)
…… ____ wi ____ ……
Word Vector – Various Architectures• Continuous bag of word (CBOW) model
• Skip-gram
…… wi-1 ____ wi+1 …… Neural Network
wi
wi-1
wi+1
Neural Network
wi-1wi
wi+1
predicting the word given its context
predicting the context given a word15
![Page 16: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/16.jpg)
Beyond 1-of-N encoding
w = “apple”
a-a-a
a-a-b
p-p-l
26 X 26 X 26…
…a-p-p…
p-l-e…
……
……
1
1
1
0
0
Word hashingDimension for “Other”
w = “Sauron” …
apple
bag
cat
dog
elephant
“other”
0
0
0
0
0
1
w = “Gandalf” 16
![Page 17: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/17.jpg)
Word Vector
Source: http://www.slideshare.net/hustwj/cikm-keynotenov201417
![Page 18: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/18.jpg)
Word Vector
Fu, Ruiji, et al. "Learning semantic hierarchies via word embeddings."Proceedings of
the 52th Annual Meeting of the Association for Computational Linguistics: Long
Papers. Vol. 1. 2014. 18
![Page 19: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/19.jpg)
Word Vector
• Characteristics
• Solving analogies
𝑉 ℎ𝑜𝑡𝑡𝑒𝑟 − 𝑉 ℎ𝑜𝑡 ≈ 𝑉 𝑏𝑖𝑔𝑔𝑒𝑟 − 𝑉 𝑏𝑖𝑔
𝑉 𝑅𝑜𝑚𝑒 − 𝑉 𝐼𝑡𝑎𝑙𝑦 ≈ 𝑉 𝐵𝑒𝑟𝑙𝑖𝑛 − 𝑉 𝐺𝑒𝑟𝑚𝑎𝑛𝑦
𝑉 𝑘𝑖𝑛𝑔 − 𝑉 𝑞𝑢𝑒𝑒𝑛 ≈ 𝑉 𝑢𝑛𝑐𝑙𝑒 − 𝑉 𝑎𝑢𝑛𝑡
Rome : Italy = Berlin : ?
𝑉 𝐺𝑒𝑟𝑚𝑎𝑛𝑦≈ 𝑉 𝐵𝑒𝑟𝑙𝑖𝑛 − 𝑉 𝑅𝑜𝑚𝑒 + 𝑉 𝐼𝑡𝑎𝑙𝑦
Compute 𝑉 𝐵𝑒𝑟𝑙𝑖𝑛 − 𝑉 𝑅𝑜𝑚𝑒 + 𝑉 𝐼𝑡𝑎𝑙𝑦
Find the word w with the closest V(w)
19
![Page 20: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/20.jpg)
Demo
• Model used in demo is provided by 陳仰德• Part of the project done by陳仰德、林資偉
• TA:劉元銘
• Training data is from PTT (collected by 葉青峰)
20
![Page 21: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/21.jpg)
Meaning of Word Sequence
21
![Page 22: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/22.jpg)
Meaning of Word Sequence
• word sequences with different lengths → the vector with the same length• The vector representing the meaning of the word
sequence
• A word sequence can be a document or a paragraph
…
(a document or paragraph)
word sequence
22
![Page 23: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/23.jpg)
Outline
• Application: Information Retrieval (IR)
Deep Structured Semantic Model
(DSSM)
• Application: Sentiment Analysis, Sentence Relatedness
Recursive Neural Network
• Paragraph Vector
• Sequence-to-sequence auto-encoder
Unsupervised
23
![Page 24: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/24.jpg)
Information Retrieval (IR)
The documents are vectors in the space.
The query is also a vector.
Vector Space Model
How to use a vector to represent word sequences
24
![Page 25: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/25.jpg)
Information Retrieval (IR)
word string s1:“This is an apple”
word string s2:“This is a pen”
…
this
is
a
an
apple
pen
…
this
is
a
an
apple
pen
1
1
0
1
1
0
1
1
1
0
0
1
Bag-of-word
Weighted by IDF25
![Page 26: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/26.jpg)
Information Retrieval (IR)
Query q Document d1 Document d2
Bag-of-word
All documents in the database
Retrieved
All the words are treated as discrete tokens.
Never considered: Different words can have the same meaning, and the same word can have different meanings.
Vector Space Model + Bag-of-word
26
![Page 27: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/27.jpg)
IR - Semantic Embedding
word stringHow to achieve that? (No target ……)
Bag-of-word
(document or query)
query
Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507
27
![Page 28: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/28.jpg)
DSSM
query q1
Click-through data: q1 d1 d2: + : -
q2 d3 d4: - : +……
document d1 document d2
query q2 document d3 document d4
Training:
+
+
-
-
close
close Far apart
Far apart
28
![Page 29: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/29.jpg)
DSSM v.s. Typical DNNTypical DNN DSSM
input
document d +
reference
query q
document d +query q29
![Page 30: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/30.jpg)
• How to do retrieval?
New Query q’ Document d1 Document d2
Retrieved
Click-through data: q1 d1 d2: + : -
q2 d3 d4: - : +……
30
![Page 31: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/31.jpg)
Reference
• Huang, Po-Sen, et al. "Learning deep structured semantic models for web search using clickthroughdata." ACM, 2013.
• Shen, Yelong, et al. "A latent semantic model with convolutional-pooling structure for information retrieval." ACM, 2014.
31
![Page 32: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/32.jpg)
Outline
• Application: Information Retrieval (IR)Deep Structured Semantic Model
(DSSM)
• Application: Sentiment Analysis, Sentence Relatedness
Recursive Neural Network
• Paragraph Vector
• Sequence-to-sequence auto-encoderUnsupervised
32
![Page 33: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/33.jpg)
Recursive Deep Model
• To understand the meaning of a word sequence, the order of the words can not be ignored.
white blood cells destroying an infection
an infection destroying white blood cells
exactly the same bag-of-word
positive
negative
differentmeaning
33
![Page 34: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/34.jpg)
Recursive Deep Model
not very good
not very good
syntactic structure
word sequence:
How to do it is out of the scope
34
![Page 35: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/35.jpg)
Recursive Deep Model
NN
By composing the two meaning, what should the meaning be.
Meaning of “very good”V(“very good”)
not very good
syntactic structure
not very good
V(“not”) V(“very”) V(“good”)
Dimension of word vector = |Z|
Input: 2 X |Z|, output: |Z|
35
![Page 36: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/36.jpg)
Recursive Deep Model
not very good
V(“not”) V(“very”) V(“good”)
NN
not very good
syntactic structureV(wA wB) ≠ V(wA) + V(wB)
“good”: positive
“not”: neutral
“not good”: negative Meaning of “very good”V(“very good”)
36
![Page 37: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/37.jpg)
Recursive Deep Model
V(wA wB) ≠ V(wA) + V(wB)
“棒”: positive
“好棒”: positive
“好棒棒”: negative
not very good
V(“not”) V(“very”) V(“good”)
NN
not very good
syntactic structure
Meaning of “very good”V(“very good”)
37
![Page 38: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/38.jpg)
Recursive Deep Model
not very good
V(“not”) V(“very”) V(“good”)
NN
not very good
syntactic structure
NN
“not” “good”
NN
“not” “bad”
“not bad”“not good”
“not”: “reverse” another input
Meaning of “very good”V(“very good”)
38
![Page 39: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/39.jpg)
Recursive Deep Model
not very good
V(“not”) V(“very”) V(“good”)
NN
not very good
syntactic structure
NN
“very” “good”
NN
“very” “bad”
“very bad”“very good”
“very”: “emphasize” another input
Meaning of “very good”V(“very good”)
39
![Page 40: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/40.jpg)
not very good
V(“not”) V(“very”) V(“good”)
NN
NN
V(“not very good”)
How to train the NN?
The word order is considered.
The representation of the sequence will change if the order of the words are changed
V(“very good”)
What is the target?
40
![Page 41: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/41.jpg)
Need a Training Target ……
5-class sentiment classification ( -- , - , 0 , + , ++ )
41
![Page 42: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/42.jpg)
NN
NN
V(“not”) V(“very”) V(“good”)
NN
-
output5 classes
( -- , - , 0 , + , ++ )
not very good
ref
NN
NN
Train both …
42
![Page 43: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/43.jpg)
Sentiment Analysis
Socher, Richard, et al. "Recursive deep models for semantic compositionality
over a sentiment treebank." Proceedings of the conference on empirical
methods in natural language processing (EMNLP). Vol. 1631. 2013.
43
![Page 44: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/44.jpg)
Need a Training Target ……
• Sentence relatedness
NN
Sentence 1 Sentence 2
Recursive Neural
Network
Recursive Neural
Network
Tai, Kai Sheng, Richard Socher, and
Christopher D. Manning. "Improved
semantic representations from tree-
structured long short-term memory
networks." arXiv preprint
arXiv:1503.00075 (2015).44
![Page 45: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/45.jpg)
Outline
• Application: Information Retrieval (IR)Deep Structured Semantic Model
(DSSM)
• Application: Sentiment Analysis, Sentence Relatedness
Recursive Neural Network
• Paragraph Vector
• Sequence-to-sequence auto-encoderUnsupervised
45
![Page 46: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/46.jpg)
索倫 (Sauron)魔君wi-1 wi
wi
…… ……
…… ……
(The paragraph is from“The lord of the ring”)
(The paragraph is from “仙五”)
姜世離
Paragraph d1:
Paragraph d2:
名叫wi-2
魔君wi-1
名叫wi-2
z = W ( xi-2 + xi-1 )
z = W ( xi-2 + xi-1 )
the same
…
1-of-N encoding
of word wi-2
……
z1
z2
Neural Network
1-of-N encoding
of word wi-1
W
W
xi-2
xi-1
The probability for each word as the next word wi
z
z = W ( xi-2 + xi-1 )
Same output
46
![Page 47: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/47.jpg)
…
1-of-N encoding
of word wi-2
……
z1
z2
Neural Network
1-of-N encoding
of paragraph d ……
1-of-N encoding
of word wi-1
W
W
xi-2
xi-1
W’
d
The probability for each word as the next word wi
z
1-of-N encoding of paragraph d
d1 d2 d3
…
1
0
0 …
0
1
0 …
0
0
1
z = W ( xi-2 + xi-1 )
z = W ( xi-2 + xi-1 ) + W’ d
Original word vector:
Paragraph vector:
Le, Quoc, and Tomas Mikolov. "Distributed Representations of Sentences and Documents.“ ICML, 2014
Paragraph Vector
47
![Page 48: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/48.jpg)
索倫 (Sauron)魔君wi-1 wi
wi
…… ……
…… ……
(The paragraph is related to “The lord of the ring”)
(The document is related to “仙五”)
姜世離
Then error of the prediction can be explained by the meaning of the paragraphs.
z = W ( xi-2 + xi-1 )
Paragraph d1:
Paragraph d2:
名叫wi-2
魔君wi-1
名叫wi-2
z = W ( xi-2 + xi-1 )
z = W ( xi-2 + xi-1 )
different
+ W’ d1
+ W’ d2
z = W ( xi-2 + xi-1 ) + W’ d
Original word vector:
Paragraph vector:
Paragraph vector of d:V(d) = W’ d Meaning of the paragraph
Le, Quoc, and Tomas Mikolov. "Distributed Representations of Sentences and Documents.“ ICML, 2014
Paragraph Vector
48
![Page 49: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/49.jpg)
Sequence-to-sequence Auto-encoder• Original Auto-encoder
Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507
Inp
ut Layer
Layer
Layer
bo
ttle
Ou
tpu
t Layer
Layer
Layer
Layer
Layer
… …
Representing the input object
As close as possible
49
![Page 50: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/50.jpg)
Sequence-to-sequence Auto-encoder
Li, Jiwei, Minh-Thang Luong, and Dan Jurafsky. "A hierarchical neural autoencoder
for paragraphs and documents." arXiv preprint arXiv:1506.01057(2015).50
![Page 51: Do machines know the meaning of a word?speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture...Sentence Relatedness Recursive Neural Network •Paragraph Vector •Sequence-to-sequence](https://reader033.vdocuments.site/reader033/viewer/2022042018/5e763d8320807279d1661278/html5/thumbnails/51.jpg)
Summary
• Application: Information Retrieval (IR)Deep Structured Semantic Model
(DSSM)
• Application: Sentiment Analysis, Sentence Relatedness
Recursive Neural Network
• Paragraph Vector
• Sequence-to-sequence auto-encoderUnsupervised
51