deep learning intro. - kangwoncs.kangwon.ac.kr/.../12_deeplearning_intro.pdfย ยท 2016-06-17ย ยท ๐...
TRANSCRIPT
๐ ๐๐๐๐ ๐ถ
Deep Learning intro.
๐ ๐๐๐๐ ๐ถ
2016.01.02.
๐ ๐๐๐๐ ๐ถ 2
Outline
Natural Language Processing (NLP)
Representation and Processing
Deep Learning Models
๐ ๐๐๐๐ ๐ถ
Natural Language Processing
๐ ๐๐๐๐ ๐ถ 4
Natural Language Processing (NLP)
โข ๋ต๋ณ
โข ๊ฒ์
โข ์ถ๋ก
โข ๋ํ
์ธ์ด์ดํด ์ธ์ด์์ฑ์์ฉโข ์ง๋ฅํ๋ก๋ด
โข ์ ๋ณด๊ฒ์
โข ๊ธฐ๊ณ๋ฒ์ญ
โข ๋ฌธ์์์ฝ
โข ์ง๋ฌธ
โข ๋จ์ด์ดํด
โข ์๋ฏธ์ดํด
โข ์๋ํ์
๐ ๐๐๐๐ ๐ถ
Representation and Processing
๐ ๐๐๐๐ ๐ถ 6
Representation in mathematics
<0.156, 0.421, 0.954, โฆ>
<0.096, 0.510, 0.991, โฆ>
<0.496, 0.951, 0.321, โฆ>
<0.196, 0.851, 0.119, โฆ>
<โฆ, 0.486, 0.854, โฆ>
<โฆ, 0.751, 0.912, โฆ>
<โฆ, 0.123, 2.554, 5.124, โฆ>
<โฆ, 7.451, 21.45, 8.999>
<โฆ, 1.109, 11.854, 0.456>
Real World Vector Space
https://www.google.com/imghp?hl=ko
๐ ๐๐๐๐ ๐ถ 7
์ค๋ฆฌ vs. ํ ๋ผ
๐ ๐๐๐๐ ๐ถ 8
์์ฅ
๐ ๐๐๐๐ ๐ถ 9
Neural Network for Human
https://uncyclopedia.kr/wiki/%EB%87%8C
Neural Network
Pattern recognition
Multi layer
Human: 10 layers
I see lion
๐ ๐๐๐๐ ๐ถ 10
Neural Network
Vector representation
Pattern of layers
+ Learning
๐ ๐๐๐๐ ๐ถ 11
Pattern of layers
Deep learning automatic pattern combination
Why we say deep ?
โฆ โฆ โฆ โฆ โฆ โฆ
โฆ
Unit
layer
n
m
Connection link: (n x n) x (m-1)
Automatic combination
๐ ๐๐๐๐ ๐ถ 12
How to use layers?
Input vector
Output real number or class (vector)
Vector representation โOne-hotโ
๐ ๐๐๐๐ ๐ถ 13
Vector representation
[Symbol]
Lion[Text representation] [One-hot representation]
<0, 0, 0, 0, 0, 1, 0, 0, 0, 0, โฆ>
[Symbol representation]
<1.45, 75.12, 0.425, 0.953, โฆ>
๐ ๐๐๐๐ ๐ถ 14
Jung, DEEP LEARNING FOR KOREAN NLP
๐ ๐๐๐๐ ๐ถ 15
How to define symbol to one-hot
Lion
Big cat
[Symbolic words]
<0, 0, 1, 0, 0>
<0, 1, 0, 0, 1>
[One-hot]
If it uses AND op., two words is non-match
โด we need symbolic vector representation
๐ ๐๐๐๐ ๐ถ 16
How to define symbol to one-hot
Lion
Big cat
TigerDog
Wolf
Mouse
โด [Symbolic representation]
<0, 0, 1, 0, 0>
<0, 1, 0, 0, 1>
<1.45, 75.12, 0.425, 0.953, โฆ>
<1.78, 61.11, 0.611, 2.011, โฆ>
Use cosine similarity
[Symbolic vectors] (from NNLM)
๐ ๐๐๐๐ ๐ถ 17
Neural Network Language Model
Feed-forward NN
parametric Estimator
overall parameter set ๐ = (๐ถ,๐ค)
one-hot representationโข [0 1 0 0 0 0 0 0 0 0]
Lookup Tableโข word embedding
Non-linear projectionโข activation function
Normalize weightโข softmax (length: ๐)
๐ ๐๐๐๐ ๐ถ 18
Neural Network Language Model
max๐ โ ๐๐๐ ๐๐๐๐๐๐โ๐๐๐
๐ฟ = max๐
1
๐ ๐ก ๐๐๐๐(๐ค๐ก, ๐ค๐กโ1, โฆ , ๐ค๐กโ๐+1)
parametersโข โ: ๐กโ๐ ๐๐ข๐๐๐๐ ๐๐ โ๐๐๐๐๐ ๐ข๐๐๐ก๐
โข ๐: ๐กโ๐ ๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐ก๐ข๐๐๐ ๐ค๐๐กโ ๐๐๐โ ๐ค๐๐๐
โข ๐: ๐กโ๐ ๐๐ข๐ก๐๐ข๐ก ๐๐๐๐ ๐๐
โข ๐: ๐กโ๐ โ๐๐๐๐๐ ๐๐๐ฆ๐๐ ๐๐๐๐ ๐๐
โข ๐: โ โ ๐ก๐ โ ๐ ๐ค๐๐๐โ๐ก๐
โข ๐: ๐ผ โ ๐ก๐ โ ๐ ๐ค๐๐๐โ๐ก๐
โข ๐ป: ๐ผ โ ๐ก๐ โ ๐ป ๐ค๐๐๐โ๐ก๐
โข ๐ถ:๐ค๐๐๐ ๐๐๐๐ก๐ข๐๐๐ (๐๐๐๐๐ข๐ ๐ก๐๐๐๐)
โข ๐ = (๐, ๐,๐, ๐,๐ป, ๐ถ)
๐ ๐๐๐๐ ๐ถ 19
NNLM for Korean
Leeck, ๋ฅ๋ฌ๋์์ด์ฉํํ๊ตญ์ด์์กด๊ตฌ๋ฌธ๋ถ์
๐ ๐๐๐๐ ๐ถ
Deep Learning Models
๐ ๐๐๐๐ ๐ถ 21
Deep learning Models
โ๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น๊ฐ์ด๋์ผ?โโข ๊ฐ๋/NNG ์ฃผ๋ณ/NNG ์/JX ์คํ๋ฒ ์ค/NNG โฆ
Feed-forward Neural Network (FFNN)
๐๐ก
Y
๊ฐ๋
NNG
์ฃผ๋ณ
NNG
์
JX
FFNN:
1-FFNN 2-FFNN 3-FFNN
๐ ๐๐๐๐ ๐ถ 22
Deep learning Models
โ๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น๊ฐ์ด๋์ผ?โโข ๐๐ก๐๐ฅ๐ก [๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น], [์ด๋]
โข ๐๐ก๐๐๐ [ B I I I I ], [ B ]
Recurrent Neural Network (RNN)
๐๐ก
Y
unfold ๊ฐ๋
B
์ฃผ๋ณ
I
์
I
์คํ๋ฒ ์ค
I
์์น
I
RNN
๐ ๐๐๐๐ ๐ถ 23
Deep learning Models
โ๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น๊ฐ์ด๋์ผ?โโข ๐๐ก๐๐ฅ๐ก [๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น], [์ด๋]
โข ๐๐ก๐๐๐ [ B I I I I ], [ B ]
Long Short-Term Memory RNN (LSTM-RNN)โข Using gate matrix (LSTM or GRU)
๐๐ก
Y
unfold ๊ฐ๋
B
์ฃผ๋ณ
I
์
I
์คํ๋ฒ ์ค
I
์์น
I
LSTM-RNN
๐ ๐๐๐๐ ๐ถ 24
Deep learning Models
โ๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น๊ฐ์ด๋์ผ?โโข ๐๐ก๐๐ฅ๐ก [๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น], [์ด๋]
โข ๐๐ก๐๐๐ [ B I I I I ], [ B ]
LSTM-RNN CRF โข Using gate matrix (LSTM or GRU)
๐๐ก
Y
unfold ๊ฐ๋
B
์ฃผ๋ณ
I
์
I
์คํ๋ฒ ์ค
I
์์น
I
LSTM-RNN
Viterbi or Beam search
๐ ๐๐๐๐ ๐ถ 25
Deep learning Models
โ๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น๊ฐ์ด๋์ผ?โโข ๐๐ก๐๐ฅ๐ก [๊ฐ๋์ฃผ๋ณ์์คํ๋ฒ ์ค์์น], [์ด๋]
โข ๐๐ก๐๐๐ [ B I I I I ], [ B ]
Bidirectional LSTM-RNN CRF (Bi-LSTM-RNN CRF)โข Using gate matrix (LSTM or GRU)
Viterbi or Beam search
๊ฐ๋
B
์ฃผ๋ณ
I
์
I
์คํ๋ฒ ์ค
I
์์น
I
forward
backward
๐ ๐๐๐๐ ๐ถ 26
Deep learning Models
Sequence-to-sequence model
Two different LSTM: Input/output sentence LSTM
Using the Shallow LSTM
Reverse input sentence
Training: Decoding & Rescoring
๐ ๐๐๐๐ ๐ถ 27
Deep learning Models
Encoder-Decoder Architecture
๐ ๐๐๐๐ ๐ถ 28
Pointer Networks
โข Seq2seq์ attention mechanism ์๊ธฐ๋ฐ์ผ๋กํ๋ฅ๋ฌ๋๋ชจ๋ธ
โข ์ ๋ ฅ์ด์์์น(์ธ๋ฑ์ค)๋ฅผ์ถ๋ ฅ์ด๋กํ๋๋ชจ๋ธ
โข X = {A:0, B:1, C:2, D:3, <EOS>:4}
โข Y = {3, 2, 0, 4}
A B C D <EOS> D C A <EOS>
Encoding Decoding
Deep learning Models
๐ ๐๐๐๐ ๐ถ 29
Deep learning Models
Siamese Neural Network
๐ ๐๐๐๐ ๐ถ 30
References
Jung, DEEP LEARNING FOR KOREAN NLP
Lee, ๋ฅ๋ฌ๋์์ด์ฉํํ๊ตญ์ด์์กด๊ตฌ๋ฌธ๋ถ์
Park, Point networks for Coreference Resolution
Park, Bi-LSTM-RNN CRF for Mention Detection
๐ ๐๐๐๐ ๐ถ 31
QA
๊ฐ์ฌํฉ๋๋ค.
๋ฐ์ฒ์, ์ต์๊ธธ, ๋ฐ์ฐฌ๋ฏผ, ์ต์ฌํ, ํ๋ค์
๐ ๐๐๐๐ ๐ถ , ๊ฐ์๋ํ๊ต
Email: [email protected]