neural network ๐œถ - kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46...

50
Machine Learning 2015.06.27. Neural Network

Upload: others

Post on 19-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ

Machine Learning

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ

2015.06.27.

Neural Network

Page 2: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 2

Neural Network

โ€ข Human Neuron

โ€ข Perceptron

โ€ข Artificial Neural Network

โ€ข Feed-forward Neural Nets.

โ€ข Gradient

โ€ข Least Square Error

โ€ข Cross Entropy

โ€ข Back-propagation

โ€ข Conclusion

Page 3: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 3

Issues

โ€ข Inceptionism of Google

Page 4: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 4

Issues

โ€ข Inceptionism of Google

Page 5: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 5

Issues

โ€ข Inceptionism of Google

Page 6: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 6

Issues

โ€ข Inceptionism of Google

Page 7: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 7

Human Neuron

Page 8: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 8

Human Neuron

Input WeightSum

ActivationFunction

Output

Defined vectors

This is calculated as the weighted sum of the input vectors

The input vectors are transformed into an output signal via a activation function

An output signal is [0 or 1] or real value number (between 0 to 1)

Page 9: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 9

Perceptron

Raw data Input vector Weight ActivationFunction

Output

Page 10: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 10

Perceptron

โ€ข Inputs are features

โ€ข Each feature has weight

โ€ข Sum is the activationโ€ข Positive: 1

โ€ข Negative: 0

๐‘ง = ๐‘–๐‘๐‘ค๐‘– โˆ™ ๐‘ฅ๐‘–๐‘ฆ = ๐‘“ ๐‘ง ,Activation is

Step function Sigmoid function Gaussian function

Page 11: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 11

Perceptron & Logistic Regression

๐‘ฅ๐‘–

๐‘ฅ ๐‘ค

โ€ฆ

Logistic RegressionPerceptron

Parametric problem

Page 12: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 12

Perceptron learning rule

โ€ข On-line, error (mistake) driven learning

โ€ข Rosenblatt (1959, a psychologist)โ€ข suggested that when a target output value is provided for

a single neuron with fixed input, it can incrementally change weights and learn to produce the output using the Perceptron learning rule

โ€ข Perceptron == Linear Threshold Unit

๐‘ง = ๐‘–๐‘๐‘ค๐‘–

๐‘‡ โˆ™ ๐‘ฅ๐‘–

= ๐‘ค๐‘‡๐‘ฅ

Page 13: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 13

Perceptron learning rule

Page 14: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 14

Geometric View

Page 15: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 15

Geometric View

Page 16: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 16

Geometric View

Page 17: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 17

Geometric View

Page 18: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 18

Deriving the delta rule

Page 19: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 19

Perceptron Example

-1

x1

x2

Raw data Input vector

?

Weight ActivationFunction

0

Output

X1 X2 Output

0 0 0

0 1 0

1 0 0

1 1 1

For AND

Page 20: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 20

Perceptron Example

X1 X2 Output

0 0 0

0 1 0

1 0 0

1 1 1

For AND

X0 X1 X2 Summation Output

-1 0 0 (-1*0.5) + (0*0.4) + (0*0.4) = -0.5 0

-1 0 1 (-1*0.5) + (0*0.4) + (1*0.4) = -0.1 0

-1 1 0 (-1*0.5) + (1*0.4) + (0*0.4) = -0.1 0

-1 1 1 (-1*0.5) + (1*0.4) + (1*0.4) = 0.3 1

Page 21: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 21

Limitation of a Perceptron: Linear separable

Page 22: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 22

Decision surface of a perceptron

โ€ข Perceptron is able to represent some useful functions

โ€ข AND(x1, x2) choose weights w0=-1.5, w1=1, w2=1

โ€ข But functions that are not linearly separable(e.g. XOR) are not representable

Page 23: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 23

Perceptrons...

โ€ข Perceptron: Mistake Bound Theorem

โ€ข Dual Perceptron

โ€ข Voted-Perceptron

โ€ข Regularization: Average Perceptron

โ€ข Passive-Aggressive Algorithm

โ€ข Unrealizable Case

Page 24: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 24

We need Non-linearly separable

StructureTypes of

Decision Regions

Exclusive-OR

Problem

Classes with

Meshed regions

Most General

Region Shapes

Single-Layer

Two-Layer

Three-Layer

Half Plane

Bounded By

Hyperplane

Convex Open

Or

Closed Regions

Arbitrary

(Complexity

Limited by No.

of Nodes)

A

AB

B

A

AB

B

A

AB

B

BA

BA

BA

Page 25: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 25

Artificial Neural Network

Raw data Input vector Weight ActivationFunction

Output

Add units!!Layer

Page 26: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 26

Artificial Neural Network

Raw data Input layer

Weight

ActivationFunction

Hiddenlayer

ActivationFunction

Outputlayer

Weight

Page 27: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 27

Artificial Neural Network

๐‘ง๐‘˜ = ๐‘ฆ1 ๐ด๐‘๐ท ๐‘๐‘‚๐‘‡ ๐‘ฆ2 = ๐‘ฅ1 ๐‘‚๐‘… ๐‘ฅ2 ๐ด๐‘๐ท ๐‘๐‘‚๐‘‡ ๐‘ฅ1 ๐ด๐‘๐ท ๐‘ฅ2= ๐‘ฅ1 ๐‘‹๐‘‚๐‘… ๐‘ฅ2

๊ทธ๋ฆผ์ถœ์ฒ˜: Pattern Classification

Solve a XOR!!

Page 28: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 28

Artificial Neural Network

Input value

Emission value

Weight

Activation function

๊ทธ๋ฆผ์ถœ์ฒ˜: Pattern Classification

Combination of each states

Page 29: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 29

Feed-forward Neural Nets.

โ€ข Net activation (scalar, hidden unit โ€˜๐‘—โ€™)

โ€ข input-to-hidden

1) ๐‘›๐‘’๐‘ก๐‘— =

๐‘–=1

๐‘‘

๐‘ฅ๐‘–๐‘ค๐‘–๐‘— + ๐‘ค๐‘—0 =

๐‘–=0

๐‘‘

๐‘ฅ๐‘–๐‘ค๐‘–๐‘— โ‰ก ๐‘ค๐‘–๐‘‡๐‘ฅ

โ€ข ๐‘–: ๐‘–๐‘›๐‘๐‘ข๐‘ก ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ, ๐‘—: โ„Ž๐‘–๐‘‘๐‘‘๐‘’๐‘› ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ, ๐‘ค๐‘–๐‘—: ๐‘– โ†’ ๐‘—์˜ ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก

โ€ข ๐‘ฅ: ๐‘ข๐‘›๐‘–๐‘ก๐‘ (= ๐‘›๐‘œ๐‘‘๐‘’), ๐‘ค:๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก

โ€ข ๐‘ฅ0 = 1, ๐‘ค0 = 0~1 (๐‘๐‘–๐‘Ž๐‘  ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’)

Page 30: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 30

Feed-forward Neural Nets.

โ€ข Activation function (non-linear function)

2) ๐‘ฆ๐‘— = ๐‘“ ๐‘›๐‘’๐‘ก๐‘—

โ€ข โ†’ ๐‘ ๐‘”๐‘› = ๐‘ ๐‘–๐‘”๐‘›๐‘ข๐‘š ํ‘œํ˜„ ํ•จ์ˆ˜ (๐œ‘)

3) ๐‘“ ๐‘›๐‘’๐‘ก = ๐‘ ๐‘”๐‘› ๐‘›๐‘’๐‘ก โ‰ก 1, ๐‘›๐‘’๐‘ก โ‰ฅ 0โˆ’1, ๐‘›๐‘’๐‘ก < 0

: ๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘“๐‘ข๐‘›๐‘๐‘ก๐‘–๐‘œ๐‘›

Page 31: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 31

Feed-forward Neural Nets.

โ€ข Activation functions

logistic sigmoid

๐‘“ ๐‘›๐‘’๐‘ก =1

1 + exp โˆ’๐‘›๐‘’๐‘ก

๐œ•๐‘“ ๐‘›๐‘’๐‘ก

๐œ•๐‘›๐‘’๐‘ก= ๐‘“ ๐‘›๐‘’๐‘ก 1 โˆ’ ๐‘“ ๐‘›๐‘’๐‘ก

tanh

๐‘“ ๐‘›๐‘’๐‘ก = tanh ๐‘›๐‘’๐‘ก =๐‘’๐‘ฅ + ๐‘’โˆ’๐‘ฅ

๐‘’๐‘ฅ + ๐‘’โˆ’๐‘ฅ

๐‘ก๐‘Ž๐‘›โ„Ž` ๐‘›๐‘’๐‘ก = 1 โˆ’ ๐‘ก๐‘Ž๐‘›โ„Ž` ๐‘›๐‘’๐‘ก2

hard tanh

๐‘“ ๐‘›๐‘’๐‘ก = ๐ป๐‘Ž๐‘Ÿ๐‘‘Tanh ๐‘›๐‘’๐‘ก

๐ป๐‘Ž๐‘Ÿ๐‘‘Tanh ๐‘›๐‘’๐‘ก =

โˆ’1 ๐‘–๐‘“ ๐‘ฅ < โˆ’1

๐‘ฅ ๐‘–๐‘“ โˆ’ 1 โ‰ค ๐‘ฅ โ‰ค 1

1 ๐‘–๐‘“ ๐‘ฅ > 1

๊ทธ๋ฆผ์ถœ์ฒ˜: Torch7 Documentation

Page 32: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 32

Feed-forward Neural Nets.

โ€ข Activation functions

SoftSign๐‘“ ๐‘›๐‘’๐‘ก = ๐‘†๐‘œ๐‘“๐‘ก๐‘†๐‘–๐‘”๐‘›(๐‘›๐‘’๐‘ก)

๐‘†๐‘œ๐‘“๐‘ก๐‘†๐‘–๐‘”๐‘› ๐‘›๐‘’๐‘ก =๐‘Ž

1 + ๐‘Ž

SoftMax๐‘“ ๐‘›๐‘’๐‘ก = ๐‘†๐‘œ๐‘“๐‘ก๐‘€๐‘Ž๐‘ฅ(๐‘›๐‘’๐‘ก)

=exp ๐‘›๐‘’๐‘ก๐‘– โˆ’ ๐‘ โ„Ž๐‘–๐‘“๐‘ก

๐‘— exp(๐‘›๐‘’๐‘ก๐‘— โˆ’ ๐‘ โ„Ž๐‘–๐‘“๐‘ก)

, ๐‘ โ„Ž๐‘–๐‘“๐‘ก = max๐‘–

(๐‘›๐‘’๐‘ก๐‘–)

Rectifier

๐‘“ ๐‘›๐‘’๐‘ก = ๐‘Ÿ๐‘’๐‘๐‘ก (๐‘›๐‘’๐‘ก)

๐‘Ÿ๐‘’๐‘๐‘ก ๐‘›๐‘’๐‘ก = max(0, ๐‘›๐‘’๐‘ก)

๐‘š๐‘Ž๐‘ฅ 0, ๐‘›๐‘’๐‘ก =๐‘ฅ ๐‘–๐‘“ ๐‘ฅ > 0

0.01๐‘ฅ ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’

๊ทธ๋ฆผ์ถœ์ฒ˜: Wikipedia

Page 33: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 33

Feed-forward Neural Nets.

โ€ข output layer (output unit โ€˜๐‘˜โ€™)

โ€ข hidden-to-output

4) ๐‘›๐‘’๐‘ก๐‘˜ =

๐‘—=1

๐‘›+1

๐‘ฆ๐‘–๐‘ค๐‘—๐‘˜ + ๐‘ค๐‘˜0 =

๐‘—=0

๐‘›๐ป

๐‘ฆ๐‘—๐‘ค๐‘—๐‘˜ = ๐‘ค๐‘—๐‘‡๐‘ฆ

โ€ข ๐‘˜: ๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก ๐‘™๐‘Ž๐‘ฆ๐‘’๐‘Ÿ, ๐‘›๐ป: ๐‘กโ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ โ„Ž๐‘–๐‘‘๐‘‘๐‘’๐‘› ๐‘ข๐‘›๐‘–๐‘ก๐‘ 

โ€ข ๐‘ฆ0 = 1 ๐‘๐‘–๐‘Ž๐‘  ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ ๐‘–๐‘› โ„Ž๐‘–๐‘‘๐‘‘๐‘’๐‘›

โ€ข output unitโ€ข ์—ฌ๊ธฐ๋„ ๐‘ ๐‘”๐‘› . ์ ์šฉ

5) ๐‘ง๐‘˜ = ๐‘“(๐‘›๐‘’๐‘ก๐‘˜)

Page 34: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 34

Gradient

โ€ข ๊ฐ ๋ณ€์ˆ˜๋กœ์˜ ์ผ์ฐจ ํŽธ๋ฏธ๋ถ„ ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” ๋ฒกํ„ฐโ€ข ๋ฒกํ„ฐ: ๐‘“(. )์˜ ๊ฐ’์ด ๊ฐ€ํŒŒ๋ฅธ ์ชฝ์˜ ๋ฐฉํ–ฅ์„ ๋‚˜ํƒ€๋ƒ„

โ€ข ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ: ๋ฒกํ„ฐ ์ฆ๊ฐ€์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋ƒ„

โ€ข ์–ด๋–ค ๋‹ค๋ณ€์ˆ˜ ํ•จ์ˆ˜ ๐‘“(๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘›)๊ฐ€ ์žˆ์„ ๋•Œ, ๐‘“์˜gradient๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Œ

๐›ป๐‘“ = (๐œ•๐‘“

๐œ•๐‘ฅ1,๐œ•๐‘“

๐œ•๐‘ฅ2, โ€ฆ ,

๐œ•๐‘“

๐œ•๐‘ฅ๐‘›)

โ€ข Gradient๋ฅผ ์ด์šฉํ•œ ๋‹ค๋ณ€์ˆ˜ scalar ํ•จ์ˆ˜ ๐‘“์˜ ์  ๐‘Ž๐‘˜์˜ ๊ทผ์ฒ˜์—์„œ์˜ ์„ ํ˜• ๊ทผ์‚ฌ์‹ (using Taylor expansion)

๐‘“ ๐‘Ž = ๐‘“ ๐‘Ž๐‘˜ + ๐›ป๐‘“ ๐‘Ž๐‘˜ ๐‘Ž โˆ’ ๐‘Ž๐‘˜ + ๐‘œ( ๐‘Ž โˆ’ ๐‘Ž๐‘˜ )

Page 35: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 35

Gradient Descent

โ€ข Formula

๐‘Ž ๐‘˜+1 = ๐‘Ž๐‘˜ โˆ’ ๐œ‚๐‘˜๐›ป๐‘“ ๐‘Ž๐‘˜ , ๐‘˜ โ‰ฅ 0

๐œ‚๐‘˜: ๐‘™๐‘’๐‘Ž๐‘Ÿ๐‘›๐‘–๐‘›๐‘” ๐‘Ÿ๐‘Ž๐‘ก๐‘’

โ€ข Algorithm

๐’ƒ๐’†๐’ˆ๐’Š๐’ ๐‘–๐‘›๐‘–๐‘ก ๐‘Ž, ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘ ๐œƒ, ๐œ‚๐’…๐’ ๐‘˜ โ† ๐‘˜ + 1

๐‘Ž โ† ๐‘Ž โˆ’ ๐œ‚๐›ป๐‘“ ๐‘Ž๐’–๐’๐’•๐’Š๐’ ๐œ‚๐›ป๐‘Ž ๐‘˜ < 0

๐’“๐’†๐’•๐’–๐’“๐’ ๐‘Ž๐’†๐’๐’…

์ถœ์ฒ˜: wikipedia

Page 36: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 36

Least Square Error

โ€ข ์–ด๋–ค ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถ”์ •ํ•  ๋•Œ sample data์™€train data ๊ฐ„, ๐‘Ÿ๐‘’๐‘ ๐‘–๐‘‘๐‘ข๐‘Ž๐‘™2์˜ ํ•ฉ์ด ์ตœ์†Œ๊ฐ€ ๋˜๋„๋ก ํ•˜๋Š” ๊ฒƒ

๐‘Ÿ1

๐‘Ÿ2

๐‘Ÿ3

๐‘Ÿ4

๐‘Ÿ5

ใ…ก์ •๋‹ต๋ชจ๋ธใ…ก์ถ”์ •๋ชจ๋ธ

์ •๋‹ต๋ฐ์ดํ„ฐ์ถ”์ •๋ฐ์ดํ„ฐ

Residual: ๐‘Ÿ(= ํœ€)

min ๐‘Ÿ =

๐‘–

(๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–)

Page 37: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 37

Least Square Error

โ€ข ์–ด๋–ค ์ถ”์ •๋œ ๋ชจ๋ธ ๐‘“ ๐‘ฅ = ๐‘Ž๐‘ฅ + ๐‘ ์ธ ๊ฒฝ์šฐ

โ€ข ๐‘Ÿ๐‘’๐‘ ๐‘–๐‘‘๐‘ข๐‘Ž๐‘™์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Œ

๐‘Ÿ๐‘’๐‘ ๐‘–๐‘‘๐‘ข๐‘Ž๐‘™๐‘– = ๐‘ฆ๐‘– โˆ’ ๐‘“ ๐‘ฅ๐‘–

โ€ข ์ฆ‰, LSE์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถ”์ •ํ•œ๋‹ค๋Š” ๊ฒƒ์€ min(๐‘Ÿ๐‘’๐‘ ๐‘–๐‘‘๐‘ข๐‘Ž๐‘™2)์„๊ตฌํ•œ๋‹ค๋Š” ๊ฒƒ

โ€ข ๋”ฐ๋ผ์„œ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด

๐‘–=1

๐‘›

๐‘Ÿ2 =

๐‘–=1

๐‘›

๐‘ฆ๐‘– โˆ’ ๐‘“ ๐‘ฅ๐‘–2

โ€ข ์œ„์˜ ๋ชจ๋ธ, ์ฆ‰ ์ง์„ ์ธ ๊ฒฝ์šฐ

๐‘–=1

๐‘›

๐‘Ÿ2 =

๐‘–=1

๐‘›

๐‘ฆ๐‘– โˆ’ ๐‘Ž๐‘ฅ๐‘– + ๐‘๐‘–2

โ€ข ๋”ฐ๋ผ์„œ ๐‘Ÿ2์„ ์ตœ์†Œํ™” ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ a, b๋ฅผ ๊ฒฐ์ •

Page 38: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 38

Back-propagation

โ€ข Delta Rule์— ๊ธฐ๋ฐ˜ํ•œ ๋ฐฉ๋ฒ•โ€ข LSE๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ target(t)๊ณผ output(z)์˜ ์˜ค์ฐจ ์ œ๊ณฑ์„ ์ตœ์†Œ๋กœ

ํ•จ

โ€ข Credit assignment problemโ€ข NN์˜ Hidden layer์—์„œ ์ •๋‹ต์„ ํ™•์ธํ•  ๋ฐฉ๋ฒ• ์—†์Œ

โ€ข ๋”ฐ๋ผ์„œ Back Prop.์„ ์ด์šฉํ•˜์—ฌ weight ๊ฐฑ์‹ 

output(z) : target(t)

compare์ฐจ์ด๋ฐœ์ƒ: error(=scalar function)

โˆดweight๋“ค์€์ด error ๊ฐ’์„์ค„์ด๋„๋ก์กฐ์ ˆ weight๋Š”ํŒจํ„ด๋ณ„๋กœํ•™์Šต

weight

Page 39: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 39

Back-propagation

โ€ข ์ž„์˜ ํŒจํ„ด์— ๋Œ€ํ•œ ํ•™์Šต๋ฅ (training error)

9) ๐ฝ ๐‘ค โ‰ก1

2

๐‘˜=1

๐‘

๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜2 =

1

2๐‘ก โˆ’ ๐‘ง 2

โ€ข ๐‘ก๐‘˜: ์ •๋‹ต(target), ๐‘ง๐‘˜: net ์ถœ๋ ฅ(train result) output

โ€ข ๐‘ก, ๐‘ง: ๊ธธ์ด๊ฐ€ c์ธ target, net์˜ ์ถœ๋ ฅ โ€˜vectorโ€™

โ€ข ๐‘ค: net์˜ ๋ชจ๋“  ๊ฐ€์ค‘์น˜ (training error)

โ€ข Back prop. training rule

โ€ข gradient descent์— ๊ธฐ๋ฐ˜ (init: random weight)

10) โˆ†๐‘ค = โˆ’๐œ‚๐œ•๐ฝ

๐œ•๐‘ค, ๐‘œ๐‘Ÿ 11) โˆ†๐‘ค๐‘๐‘ž = โˆ’๐œ‚

๐œ•๐ฝ

๐œ•๐‘ค๐‘๐‘ž

โ€ข ๐œ‚: ํ•™์Šต๋ฅ (training error) ๊ฐ€์ค‘์น˜ ๋ณ€ํ™”์˜ ์ƒ๋Œ€์  ํฌ๊ธฐ

โ€ข ๋ฐ˜๋ณต m๋ฒˆ์ผ ๋•Œ, gradient descent ๊ธฐ์ค€ํ•จ์ˆ˜(๐ฝ(๐‘ค))๋ฅผ ๋‚ฎ์ถ”๋„๋ก ์›€์ง์ž„

12) ๐‘ค๐‘š+1 = ๐‘ค๐‘š + โˆ†๐‘ค๐‘š

Page 40: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 40

Back-propagation

โ€ข Back Prop. of Hidden-to-Output

โ€ข ๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘›๐‘–๐‘›๐‘” ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ "๐‘ค๐‘—๐‘˜" ์ตœ์ ํ™” ํ•„์š” (โˆด ๐ฝ ๐‘ค ๋ฅผ๐‘ค๋กœ ์ตœ์ ํ™”)

โ€ข ๐‘ค๐‘—๐‘˜๊ฐ€ ๐‘ค๐‘˜๐‘—์— ์™ธ์—ฐ์ ์œผ๋กœ ์ข…์†๋˜์ง€ ์•Š์Œ

โ€ข ์ฆ‰, ๐ฝ๋Š” ๐‘›๐‘’๐‘ก์— ์˜์กด์ : (9)1

2๐‘ก โˆ’ ๐‘ง 2, (5) ๐‘ง๐‘˜ = ๐‘“(๐‘›๐‘’๐‘ก๐‘˜)

โ€ข ๐‘›๐‘’๐‘ก์€ ๐‘ค์— ์˜์กด์ : (4) ๐‘›๐‘’๐‘ก๐‘˜ = ๐‘ค๐‘˜๐‘‡๐‘ฆ

โ€ข ๋”ฐ๋ผ์„œ chain rule ์ ์šฉ ๊ฐ€๋Šฅ

I J K

โ„Ž๐‘–๐‘‘๐‘‘๐‘’๐‘› โˆ’ ๐‘ก๐‘œ โˆ’ ๐‘œ๐‘ข๐‘ก์—๋Œ€ํ•œ๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘›๐‘–๐‘›๐‘” ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ "๐‘ค๐‘—๐‘˜"๋ฅผ๊ณ„์‚ฐ

Page 41: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 41

Back-propagation

โ€ข ๐‘ค๐‘˜๐‘—์ตœ์ ํ™”์— ๋Œ€ํ•œ ๐œ•๐‘›๐‘’๐‘ก๐‘˜์˜ chain rule

13)๐œ•๐ฝ

๐œ•๐‘ค๐‘˜๐‘—=

๐œ•๐ฝ

๐œ•๐‘›๐‘’๐‘ก๐‘˜

๐œ•๐‘›๐‘’๐‘ก๐‘˜๐œ•๐‘ค๐‘—๐‘˜

โ€ข unit k์˜ โ€˜๐›ฟ๐‘˜โ€™: Delta rule [(๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜)]

โ€ข unit์˜ net ํ™œ์„ฑํ™”์— ๋”ฐ๋ผ ์ „๋ฐ˜์  ์—๋Ÿฌ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฐ”๋€Œ๋Š”์ง€ ๋ฌ˜์‚ฌ(LSE, ์˜ค์ฐจ)

14) ๐‘‘๐‘’๐‘™๐‘ก๐‘Ž: โˆ’๐›ฟ๐‘˜ =๐œ•๐ฝ

๐œ•๐‘›๐‘’๐‘ก๐‘˜โ€ข ํ™œ์„ฑํ•จ์ˆ˜ ๐‘“(. )๊ฐ€ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๋‹ค ๊ฐ€์ •: (5) ๐‘ง๐‘˜ = ๐‘“(๐‘›๐‘’๐‘ก๐‘˜),

9 ๐ฝ =1

2 ๐‘˜=1๐‘ ๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜

2์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ถœ๋ ฅ unit์— ๋Œ€ํ•œ ๐›ฟ๐‘˜๋Š” ๋‹ค

์Œ๊ณผ ๊ฐ™์Œ

15) ๐›ฟ๐‘˜ = โˆ’๐œ•๐ฝ

๐œ•๐‘›๐‘’๐‘ก๐‘˜= โˆ’

๐œ•๐ฝ

๐œ•๐‘ง๐‘˜

๐œ•๐‘ง๐‘˜๐œ•๐‘›๐‘’๐‘ก๐‘˜

= ๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜ ๐‘“โ€ฒ(๐‘›๐‘’๐‘ก๐‘˜)

Page 42: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 42

Back-propagation

โ€ข ๐‘ค๐‘˜๐‘—์ตœ์ ํ™”์— ๋Œ€ํ•œ ๐œ•๐‘›๐‘’๐‘ก๐‘˜์˜ chain rule

13)๐œ•๐ฝ

๐œ•๐‘ค๐‘—๐‘˜=

๐œ•๐ฝ

๐œ•๐‘›๐‘’๐‘ก๐‘˜

๐œ•๐‘›๐‘’๐‘ก๐‘˜๐œ•๐‘ค๐‘—๐‘˜

โ€ข ์šฐ๋ณ€์˜ ๋งˆ์ง€๋ง‰ ๋ฏธ๋ถ„์‹์€ (4) ๐‘›๐‘’๐‘ก๐‘˜ = ๐‘ค๐‘˜๐‘‡๐‘ฆ๋ฅผ ์ด์šฉ

๐œ•๐‘›๐‘’๐‘ก๐‘˜๐œ•๐‘ค๐‘—๐‘˜

= ๐‘ฆ๐‘—

โ€ข Hidden-to-output์˜ weight๋ฅผ ์œ„ํ•œ ํ•™์Šต๋ฃฐ17) โˆ†๐‘ค๐‘—๐‘˜ = ๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜ ๐‘“โ€ฒ ๐‘›๐‘’๐‘ก๐‘˜ ๐‘ฆ๐‘—

โˆดoutput unit์ด ์„ ํ˜•์ผ ๊ฒฝ์šฐโ€ข ์ฆ‰, ๐‘“ ๐‘›๐‘’๐‘ก๐‘˜ = ๐‘›๐‘’๐‘ก๐‘˜, ๐‘“

โ€ฒ ๐‘›๐‘’๐‘ก๐‘˜ = 1

โ€ข โˆ†๐‘ค๐‘—๐‘˜ = ๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜ ๐‘ฆ๐‘–

โ€ข ์‹ (17)์€ LSE(Least Square Error)์™€ ๊ฐ™์Œ

โ€ข LSE: ๐‘Ž๐‘˜+1 = ๐‘Ž๐‘˜ + ๐œ‚๐‘˜ ๐‘๐‘˜ โˆ’ ๐‘“(๐‘Ž๐‘˜) ๐‘ฆ๐‘˜ , ๐‘“ ๐‘Ž๐‘˜ = ๐‘Ž๐‘˜๐‘‡๐‘ฆ๐‘˜

Page 43: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 43

Back-propagation

โ€ข Back Prop. of Input-to-Hidden

โ€ข ๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘›๐‘–๐‘›๐‘” ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ "๐‘ค๐‘–๐‘—" ์ตœ์ ํ™” ํ•„์š” (โˆด ๐ฝ ๐‘ค ๋ฅผ๐‘ค๋กœ ์ตœ์ ํ™”)

I J K

๐‘–๐‘›๐‘๐‘ข๐‘ก โˆ’ ๐‘ก๐‘œ โˆ’ โ„Ž๐‘–๐‘‘๐‘‘๐‘’๐‘›์—๋Œ€ํ•œ๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘›๐‘–๐‘›๐‘” ๐‘’๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ "๐‘ค๐‘–๐‘—"๋ฅผ๊ณ„์‚ฐ

Page 44: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 44

Back-propagation

โ€ข Back Prop. of Input-to-Hidden

โ€ข (11) โˆ†๐‘ค๐‘๐‘ž = โˆ’๐œ‚๐œ•๐ฝ

๐œ•๐‘ค๐‘๐‘ž๊ณผ chain rule ์ด์šฉ

18)๐œ•๐ฝ

๐œ•๐‘ค๐‘–๐‘—=

๐œ•๐ฝ

๐œ•๐‘ฆ๐‘–๐‘—

๐œ•๐‘ฆ๐‘–๐‘—

๐œ•๐‘›๐‘’๐‘ก๐‘—

๐œ•๐‘›๐‘’๐‘ก๐‘—

๐œ•๐‘ค๐‘–๐‘—

โ€ข ์œ„ ์‹์—์„œ ์šฐ๋ณ€์˜ ์ฒซ ํ•ญ์€ ๐‘ค๐‘˜๐‘—๋ฅผ ๋ชจ๋‘ ํฌํ•จ

19)๐œ•๐ฝ

๐œ•๐‘ฆ๐‘–๐‘—=

๐œ•

๐œ•๐‘ฆ๐‘–๐‘—

1

2

๐‘˜=1

๐‘

๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜2

= โˆ’

๐‘˜=1

๐‘

๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜๐œ•๐‘ง๐‘˜๐œ•๐‘ฆ๐‘—

= โˆ’

๐‘˜=1

๐‘

๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜๐œ•๐‘ง๐‘˜๐œ•๐‘›๐‘’๐‘ก๐‘˜

๐œ•๐‘›๐‘’๐‘ก๐‘˜๐œ•๐‘ฆ๐‘—

= โˆ’

๐‘˜=1

๐‘

๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜ ๐‘“โ€ฒ ๐‘›๐‘’๐‘ก๐‘˜ ๐‘ค๐‘˜๐‘— = โˆ’

๐‘˜=1

๐‘

๐‘ค๐‘—๐‘˜๐›ฟ๐‘˜

9) ๐ฝ ๐‘ค โ‰ก1

2

๐‘˜=1

๐‘

๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜2 =

1

2๐‘ก โˆ’ ๐‘ง 2

๐‘ง๐‘˜ = ๐‘“ ๐‘›๐‘’๐‘ก๐‘˜

๐‘›๐‘’๐‘ก๐‘˜ =

๐‘—

๐‘ฆ๐‘—๐‘ค๐‘—๐‘˜

๐›ฟ๐‘˜ = ๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜ ๐‘“โ€ฒ ๐‘›๐‘’๐‘ก๐‘˜ ๐‘ฆ๐‘—

chain rule

๐›ฟ๐‘˜

Page 45: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 45

Back-propagation

โ€ข unit k์˜ โ€˜๐›ฟ๐‘—โ€™ (์‹(19)์™€ ์‹(18)์—์„œ์˜ ๋‘ ๋ฒˆ์งธ ์‹)

20) ๐›ฟ๐‘— โ‰ก ๐‘“โ€ฒ ๐‘›๐‘’๐‘ก๐‘—

๐‘˜=1

๐‘

๐‘ค๐‘—๐‘˜๐›ฟ๐‘˜

๐‘“โ€ฒ ๐‘›๐‘’๐‘ก๐‘— =๐œ•๐‘ฆ๐‘—

๐œ•๐‘›๐‘’๐‘ก๐‘—=

๐œ•๐‘“ ๐‘›๐‘’๐‘ก๐‘—

๐œ•๐‘›๐‘’๐‘ก๐‘—

โ€ข Input-to-hidden์˜ weight ํ•™์Šต

21) โˆ†๐‘ค๐‘–๐‘— = ๐œ‚๐‘ฅ๐‘–๐›ฟ๐‘— = ๐œ‚

๐‘˜=1

๐‘

๐‘ค๐‘—๐‘˜๐›ฟ๐‘˜ ๐‘“โ€ฒ ๐‘›๐‘’๐‘ก๐‘— ๐‘ฅ๐‘–

๐‘ฅ๐‘–: 18 ์˜๋งˆ์ง€๋ง‰ =๐œ•๐‘›๐‘’๐‘ก๐‘—๐œ•๐‘ค๐‘–๐‘—

=๐œ• ๐‘– ๐‘ฅ๐‘–๐‘ค๐‘–๐‘—

๐œ•๐‘ค๐‘–๐‘—= ๐‘ฅ๐‘–

Page 46: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 46

Conclusion

โ€ข Back propagation์€ chain rule์„ ์ด์šฉํ•œ ๋ชฉ์ ํ•จ์ˆ˜์˜๋ฏธ๋ถ„ ๊ณ„์‚ฐ์„ multi layer model์— ์ ์šฉํ•œ gradient descent์— ๊ธฐ๋ฐ˜ํ•œ ๊ฒƒ

โ€ข ๋ชจ๋“  gradient descent์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Back Prop.์˜ ๋™์ž‘์€ ์‹œ์ž‘์ ์— ์˜์กดโ€ข ์‹œ์ž‘, ์ฆ‰ weight init์€ ๊ฐ€๊ธ‰์  0์„ ํ”ผํ•ด์•ผ ํ•จ (๊ณฑ ์—ฐ์‚ฐ ๋•Œ๋ฌธ)

โ€ข ์‹ (17)์„ ๋ณด๋ฉด, unit k์—์„œ์˜ ๊ฐ€์ค‘์น˜ ๊ฐฑ์‹ ์€ (๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜)์— ๋น„๋ก€ํ•ด์•ผ ํ•จโ€ข (๐‘ก๐‘˜ = ๐‘ง๐‘˜), ์ฆ‰ ์ถœ๋ ฅ๊ณผ ์ •๋‹ต์ด ๊ฐ™์œผ๋ฉด weight ๋ณ€ํ™” X

โ€ข sigmoid function ๐‘“โ€ฒ(๐‘›๐‘’๐‘ก)๋Š” ํ•ญ์ƒ ์–‘์˜ ์ˆ˜ [0 or 1]โ€ข (๐‘ก๐‘˜ โˆ’ ๐‘ง๐‘˜)์™€ ๐‘ฆ๐‘—๊ฐ€ ๋‘˜ ๋‹ค ์–‘์ด๋ฉด output์€ ์ž‘๊ณ  ๊ฐ€์ค‘์น˜๋Š” ์ฆ๊ฐ€๋ผ

์•ผ ํ•จ

Page 47: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 47

Conclusion

โ€ข weight ๊ฐฑ์‹ ์€ ์ž…๋ ฅ ๊ฐ’์— ๋น„๋ก€ํ•ด์•ผ ํ•จโ€ข ๐‘ฆ๐‘– = 0 ์ด๋ฉด, hidden unit โ€œjโ€๋Š” output๊ณผ error์— ์˜ํ–ฅ์„ ์ฃผ์ง€

์•Š์Œ ๐‘ค๐‘—๐‘–์˜ ๋ณ€๊ฒฝ์€ ํ•ด๋‹น ํŒจํ„ด์˜ error์— ์˜ํ–ฅ ์—†์Œ

โ€ข feed forward์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ์‚ฌ์šฉํ•œ Back prop.์˜ ์ผ๋ฐ˜ํ™”โ€ข input unit๋“ค์€ bias unit ํฌํ•จ

โ€ข input unit๋“ค์€ hidden unit ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ output unit๋“ค์—๋„ ์ง์ ‘ ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ (๊ทธ๋ฆผ ์ฐธ์กฐ)

โ€ข ๊ฐ ์ธต๋งˆ๋‹ค ๋‹ค๋ฅธ ๋น„์„ ํ˜•์„ฑ์ด ์žˆ์Œ

โ€ข NN [i-to-h: sigmoid, h-to-o: ReLU]

โ€ข ๊ฐ unit๋“ค์€ ๊ทธ ์ž์‹ ์˜ ๋น„์„ ํ˜•์„ฑ์„ ๊ฐ€์ง

โ€ข ๊ฐ unit๋“ค์€ ๋‹ค๋ฅธ ํ•™์Šต๋ฅ (โˆ†๐‘ค)์„ ๊ฐ€์ง

Page 48: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 48

References

โ€ข https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliattX4OUCj_8EP65_cTVnBmS1jnYgsGQAieQUc1VQWdgQ?key=aVBxWjhwSzg2RjJWLWRuVFBBZEN1d205bUdEMnhB

โ€ข http://cs.kangwon.ac.kr/~leeck/Advanced_algorithm/4_Perceptron.pdf

โ€ข Pattern Recognition, Richard O. Duda et al.

Page 49: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 49

QA

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๋ฐ•์ฒœ์Œ, ๋ฐ•์ฐฌ๋ฏผ, ์ตœ์žฌํ˜, ๋ฐ•์„ธ๋นˆ, ์ด์ˆ˜์ •

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ , ๊ฐ•์›๋Œ€ํ•™๊ต

Email: [email protected]

Page 50: Neural Network ๐œถ - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdfย ยท 2016-06-17ย ยท ๐œถ46 Conclusion โ€ขBack propagation์€chain rule์„์ด์šฉํ•œ๋ชฉ์ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ณ„์‚ฐ์„multi

๐‘ ๐‘–๐‘”๐‘š๐‘Ž ๐œถ 50

๐‘ฅ๐‘–

Input layer

Weight

ActivationFunction

๐‘ฆ๐‘—

Hiddenlayer

ActivationFunction

๐‘ง๐‘˜

Outputlayer

Weight