2.supervised learning(epoch#2)-3

48
Introduction to Machine Learning with Python 2. Supervised Learning(3) Honedae Machine Learning Study Epoch #2 1

Upload: haesun-park

Post on 21-Jan-2018

129 views

Category:

Software


0 download

TRANSCRIPT

Page 1: 2.supervised learning(epoch#2)-3

Introduction to

Machine Learning with Python

2. Supervised Learning(3)

Honedae Machine Learning Study Epoch #2

1

Page 2: 2.supervised learning(epoch#2)-3

Contacts

Haesun Park

Email : [email protected]

Meetup: https://www.meetup.com/Hongdae-Machine-Learning-Study/

Facebook : https://facebook.com/haesunrpark

Blog : https://tensorflow.blog

2

Page 3: 2.supervised learning(epoch#2)-3

Book

ํŒŒ์ด์ฌ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผํ™œ์šฉํ•œ๋จธ์‹ ๋Ÿฌ๋‹, ๋ฐ•ํ•ด์„ .

(Introduction to Machine Learning with Python, Andreas

Muller & Sarah Guido์˜๋ฒˆ์—ญ์„œ์ž…๋‹ˆ๋‹ค.)

๋ฒˆ์—ญ์„œ์˜ 1์žฅ๊ณผ 2์žฅ์€๋ธ”๋กœ๊ทธ์—์„œ๋ฌด๋ฃŒ๋กœ์ฝ์„์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

์›์„œ์—๋Œ€ํ•œํ”„๋ฆฌ๋ทฐ๋ฅผ์˜จ๋ผ์ธ์—์„œ๋ณผ์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

Github:

https://github.com/rickiepark/introduction_to_ml_with_python/

3

Page 4: 2.supervised learning(epoch#2)-3

์ปค๋„์„œํฌํŠธ๋ฒกํ„ฐ๋จธ์‹ 

4

Page 5: 2.supervised learning(epoch#2)-3

Linear SVM

5

๊ฒฐ์ •ํ‰๋ฉด์˜๊ธฐ์šธ๊ธฐ๋ฅผ์ค„์—ฌ๋งˆ์ง„์„์ตœ๋Œ€ํ™”ํ•˜๊ธฐ์œ„ํ•ด w๋ฅผ์ตœ์†Œํ™”

๋งˆ์ง„์œ„๋ฐฐ๋ฅผํ—ˆ์šฉํ•˜๋Š”์Šฌ๋ž™๋ณ€์ˆ˜๋ฅผ์ตœ์†Œํ™”

๊ฐ™์€ epsilon ๋†’์ด์—์„œ๋งˆ์ง„์„์ตœ๋Œ€ํ™”ํ•˜๊ธฐ์œ„ํ•ด w๋ฅผ์ตœ์†Œํ™”

Page 6: 2.supervised learning(epoch#2)-3

๋น„์„ ํ˜•ํŠน์„ฑ

6

Page 7: 2.supervised learning(epoch#2)-3

์ปค๋„๊ธฐ๋ฒ•kernel trick

์–ด๋–คํŠน์„ฑ์„์ถ”๊ฐ€ํ•ด์•ผํ• ์ง€๋ถˆ๋ถ„๋ช…ํ•˜๊ณ ๋งŽ์€ํŠน์„ฑ์„์ถ”๊ฐ€ํ•˜๋ฉด์—ฐ์‚ฐ๋น„์šฉ์ปค์ง

์ปค๋„ํ•จ์ˆ˜๋ผ๋ถ€๋ฅด๋Š”ํŠน๋ณ„ํ•œํ•จ์ˆ˜๋ฅผ์‚ฌ์šฉํ•˜์—ฌ๋ฐ์ดํ„ฐํฌ์ธํŠธ๊ฐ„์˜๊ฑฐ๋ฆฌ๋ฅผ๊ณ„์‚ฐํ•˜์—ฌ

๋ฐ์ดํ„ฐํฌ์ธํŠธ์˜ํŠน์„ฑ์ด๊ณ ์ฐจ์›์—๋งคํ•‘๋œ๊ฒƒ๊ฐ™์€ํšจ๊ณผ๋ฅผ์–ป์Œ

๋‹คํ•ญ์ปค๋„

xa = ๐‘Ž1, ๐‘Ž2 , ๐‘ฅ๐‘ = ๐‘1, ๐‘2 ์ผ๋•Œ (๐‘ฅ๐‘Žโ‹… ๐‘ฅ๐‘)2 = (๐‘Ž1๐‘1 + ๐‘Ž2๐‘2)

2 =

๐‘Ž12๐‘1

2 + 2๐‘Ž1๐‘1๐‘Ž2๐‘2 + ๐‘Ž22๐‘2

2 = ๐‘Ž12, 2๐‘Ž1๐‘Ž2, ๐‘Ž2

2 โ‹… (๐‘12, 2๐‘1๐‘2, ๐‘2

2)

RBFradial basis function ์ปค๋„

exp(โˆ’๐›พ ๐‘ฅ๐‘Ž โˆ’ ๐‘ฅ๐‘2), ๐‘’๐‘ฅ์˜ํ…Œ์ผ๋Ÿฌ๊ธ‰์ˆ˜์ „๊ฐœ: ๐ถ ๐‘›=0

โˆž ๐‘ฅ๐‘Žโ‹…๐‘ฅ๐‘๐‘›

๐‘›!

๋ฌดํ•œํ•œํŠน์„ฑ๊ณต๊ฐ„์œผ๋กœ๋งคํ•‘ํ•˜๋Š”ํšจ๊ณผ, ๊ณ ์ฐจํ•ญ์ผ์ˆ˜๋กํŠน์„ฑ์ค‘์š”์„ฑ์€๋–จ์–ด์ง

7

Page 8: 2.supervised learning(epoch#2)-3

SVM ์ดํ•ดํ•˜๊ธฐ

์ปค๋„๊ธฐ๋ฒ•์„์ ์šฉํ•œ SVM์„ โ€˜์ปค๋„ SVMโ€™ ํ˜น์€๊ทธ๋ƒฅ โ€˜SVMโ€™์œผ๋กœ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

Scikit-Learn์€ SVC, SVR ํด๋ž˜์Šค๋ฅผ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

ํด๋ž˜์Šค๊ฒฝ๊ณ„์—์œ„์น˜ํ•œ๋ฐ์ดํ„ฐํฌ์ธํŠธ๋ฅผ

์„œํฌํŠธ๋ฒกํ„ฐ๋ผ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

RBF ์ปค๋„์„๊ฐ€์šฐ์‹œ์•ˆGaussian ์ปค๋„์ด๋ผ๊ณ ๋„๋ถ€๋ฆ…๋‹ˆ๋‹ค.

8์œ ํด๋ฆฌ๋””์•ˆ๊ฑฐ๋ฆฌ

์ปค๋„์˜ํญ(gamma), ๐‘’0~๐‘’โˆ’โˆž = 1~0

๐›พ๊ฐ€์ž‘์„์ˆ˜๋ก์ƒ˜ํ”Œ์˜์˜ํ–ฅ๋ฒ”์œ„์ปค์ง

๐›พ =1

2๐œŽ2์ผ๋•Œ ๐œŽ๋ฅผ์ปค๋„์˜ํญ์ด๋ผ๋ถ€๋ฆ„

exp(โˆ’๐›พ ๐‘ฅ๐‘Ž โˆ’ ๐‘ฅ๐‘2)

Page 9: 2.supervised learning(epoch#2)-3

SVC + forge dataset

9

์„œํฌํŠธ๋ฒกํ„ฐ

Page 10: 2.supervised learning(epoch#2)-3

๋งค๊ฐœ๋ณ€์ˆ˜ํŠœ๋‹

10

small gamma

less complex

large gamma

more complex

small C

ํฐ์ œ์•ฝ, ๊ณผ์†Œ์ ํ•ฉ

large C

์ž‘์€์ œ์•ฝ, ๊ณผ๋Œ€์ ํ•ฉ

Page 11: 2.supervised learning(epoch#2)-3

SVC + cancer dataset

11

C=1, gamma=1/n_features

๊ณผ๋Œ€์ ํ•ฉ

cancer ๋ฐ์ดํ„ฐ์…‹์˜๋ฐ์ดํ„ฐ์Šค์ผ€์ผ

SVM์€ํŠน์„ฑ์˜๋ฒ”์œ„์—ํฌ๊ฒŒ๋ฏผ๊ฐํ•จ

Page 12: 2.supervised learning(epoch#2)-3

๋ฐ์ดํ„ฐ์ „์ฒ˜๋ฆฌ

MinMaxScaler : ๐‘‹ โˆ’min(๐‘‹)

max ๐‘‹ โˆ’min(๐‘‹), 0 ~ 1 ์‚ฌ์ด๋กœ์กฐ์ •

12

Page 13: 2.supervised learning(epoch#2)-3

SVC + ์ „์ฒ˜๋ฆฌ๋ฐ์ดํ„ฐ

13

์ „์ฒ˜๋ฆฌ๋œํ›„์—๋Š”์˜คํžˆ๋ ค๊ณผ์†Œ์ ํ•ฉ๋ฉ๋‹ˆ๋‹ค.

์ œ์•ฝ์™„ํ™”

Page 14: 2.supervised learning(epoch#2)-3

์žฅ๋‹จ์ ๊ณผ๋งค๊ฐœ๋ณ€์ˆ˜

์žฅ์ 

๊ฐ•๋ ฅํ•˜๋ฉฐ์—ฌ๋Ÿฌ์ข…๋ฅ˜์˜๋ฐ์ดํ„ฐ์…‹์—์ ์šฉ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ํŠน์„ฑ์ด์ ์„๋•Œ์—๋„๋ณต์žกํ•œ๊ฒฐ์ •๊ฒฝ๊ณ„๋งŒ๋“ฆ๋‹ˆ๋‹ค(์ปค๋„ ํŠธ๋ฆญ).

ํŠน์„ฑ์ด๋งŽ์„๋•Œ์—๋„์ž˜์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

SVC/SVR(libsvm), LinearSVC/LinearSVR(liblinear)

๋‹จ์ 

์ƒ˜ํ”Œ์ด๋งŽ์„๊ฒฝ์šฐํ›ˆ๋ จ์†๋„๊ฐ€๋Š๋ฆฌ๊ณ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ๋งŽ์ด์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค(>100,000).

๋ฐ์ดํ„ฐ์ „์ฒ˜๋ฆฌ์™€๋งค๊ฐœ๋ณ€์ˆ˜์— ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค.(๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ, ๊ทธ๋ž˜๋””์–ธํŠธ๋ถ€์ŠคํŒ…)

๋ถ„์„ํ•˜๊ธฐ์–ด๋ ต๊ณ ๋น„์ „๋ฌธ๊ฐ€์—๊ฒŒ ์„ค๋ช…ํ•˜๊ธฐ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

๋งค๊ฐœ๋ณ€์ˆ˜

C, gamma, coef0(๋‹คํ•ญ, ์‹œ๊ทธ๋ชจ์ด๋“œ), degree(๋‹คํ•ญ)

๐‘™๐‘–๐‘›๐‘’๐‘Ž๐‘Ÿ: ๐‘ฅ1 โ‹… ๐‘ฅ2, ๐‘๐‘œ๐‘™๐‘ฆ: ๐›พ ๐‘ฅ1 โ‹… ๐‘ฅ2 + ๐‘ ๐‘‘ , ๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘: tanh(๐›พ ๐‘ฅ1 โ‹… ๐‘ฅ2 + ๐‘) 14

Page 15: 2.supervised learning(epoch#2)-3

์‹ ๊ฒฝ๋งneural network

15

Page 16: 2.supervised learning(epoch#2)-3

ํผ์…‰ํŠธ๋ก perceptron

1957๋…„์—ํ”„๋ž‘ํฌ๋กœ์  ๋ธ”๋ผํŠธ๊ฐ€๋ฐœํ‘œํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ข…์ข…๋‰ด๋Ÿด๋„คํŠธ์›Œํฌ๋ฅผ๋ฉ€ํ‹ฐ๋ ˆ์ด์–ดํผ์…‰ํŠธ๋ก multilayer perceptron์œผ๋กœ๋„๋ถ€๋ฆ…๋‹ˆ๋‹ค.

์‚ฌ์ดํ‚ท๋Ÿฐ์€ sklearn.linear_model.Perceptron ํด๋ž˜์Šค๋ฅผ์ œ๊ณตํ•˜๋‹ค๊ฐ€ 0.18 ๋ฒ„์ „์—์„œ

MLPClassifier, MLPRegressor ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

16์ž…๋ ฅ๋ ˆ์ด์–ด

์ถœ๋ ฅ๋ ˆ์ด์–ด

Page 17: 2.supervised learning(epoch#2)-3

์„ ํ˜•๋ชจ๋ธ์˜์ผ๋ฐ˜ํ™”

๐‘ฆ = ๐‘ค 0 ร— ๐‘ฅ 0 + ๐‘ค 1 ร— ๐‘ฅ 1 + โ‹ฏ+ ๐‘ค ๐‘ ร— ๐‘ฅ ๐‘ + ๐‘

17

๊ฐ€์ค‘์น˜

์œ ๋‹›

์‹ ๊ฒฝ๋ง๊ทธ๋ฆผ์—์„œํŽธํ–ฅ์ดํ‘œํ˜„๋˜์ง€์•Š๋Š”๊ฒฝ์šฐ๊ฐ€์ข…์ข…์žˆ์Šต๋‹ˆ๋‹ค.

Page 18: 2.supervised learning(epoch#2)-3

๋‹ค์ธตํผ์…‰ํŠธ๋ก 

h 0 = ๐‘ค 0,0 ร— ๐‘ฅ 0 + ๐‘ค 1,0 ร— ๐‘ฅ 1 + ๐‘ค 2,0 ร— ๐‘ฅ 2 + ๐‘ค 3,0 ร— ๐‘ฅ 3 + ๐‘ 0

h 0 = ๐‘ค 0,1 ร— ๐‘ฅ 0 + ๐‘ค 1,1 ร— ๐‘ฅ 1 + ๐‘ค 2,1 ร— ๐‘ฅ 2 + ๐‘ค 3,1 ร— ๐‘ฅ 3 + ๐‘ 1

h 0 = ๐‘ค 0,2 ร— ๐‘ฅ 0 + ๐‘ค 1,2 ร— ๐‘ฅ 1 + ๐‘ค 2,2 ร— ๐‘ฅ 2 + ๐‘ค 3,2 ร— ๐‘ฅ 3 + ๐‘ 2

๐‘ฆ = ๐‘ฃ 0 ร— โ„Ž[0] + ๐‘ฃ 1 ร— โ„Ž 1 + ๐‘ฃ 2 ร— โ„Ž 2 + ๐‘

18

Page 19: 2.supervised learning(epoch#2)-3

Naming

์™„์ „์—ฐ๊ฒฐ๋‰ด๋Ÿด๋„คํŠธ์›Œํฌ

(Fully Connected Neural Network)

๋ด์Šค๋„คํŠธ์›Œํฌ

(Dense Network)

๋ฉ€ํ‹ฐ๋ ˆ์ด์–ดํผ์…‰ํŠธ๋ก 

(Multi-Layer Perceptron)

ํ”ผ๋“œํฌ์›Œ๋“œ๋‰ด๋Ÿด๋„คํŠธ์›Œํฌ

(Feed Forward Neural Network)

๋”ฅ๋Ÿฌ๋‹

(Deep Learning)

19

๋‰ด๋Ÿฐํ˜น์€์œ ๋‹›unit์œผ๋กœ๋ถˆ๋ฆฝ๋‹ˆ๋‹ค.

(๋™๋ฌผ์˜๋‡Œ์™€์ „ํ˜€๊ด€๋ จ์ด์—†์Šต๋‹ˆ๋‹ค)

Page 20: 2.supervised learning(epoch#2)-3

๋‹ค์ธตํผ์…‰ํŠธ๋ก 

h 0 = tanh(๐‘ค 0,0 ร— ๐‘ฅ 0 + ๐‘ค 1,0 ร— ๐‘ฅ 1 + ๐‘ค 2,0 ร— ๐‘ฅ 2 + ๐‘ค 3,0 ร— ๐‘ฅ 3 + ๐‘ 0 )

h 0 = tanh(๐‘ค 0,1 ร— ๐‘ฅ 0 + ๐‘ค 1,1 ร— ๐‘ฅ 1 + ๐‘ค 2,1 ร— ๐‘ฅ 2 + ๐‘ค 3,1 ร— ๐‘ฅ 3 + ๐‘ 1 )

h 0 = tanh(๐‘ค 0,2 ร— ๐‘ฅ 0 + ๐‘ค 1,2 ร— ๐‘ฅ 1 + ๐‘ค 2,2 ร— ๐‘ฅ 2 + ๐‘ค 3,2 ร— ๐‘ฅ 3 + ๐‘ 2 )

๐‘ฆ = ๐‘ฃ 0 ร— โ„Ž[0] + ๐‘ฃ 1 ร— โ„Ž 1 + ๐‘ฃ 2 ร— โ„Ž 2 + ๐‘

20

์ด์ง„๋ถ„๋ฅ˜์ผ๊ฒฝ์šฐ์‹œ๊ทธ๋ชจ์ด๋“œ๋‹ค์ค‘๋ถ„๋ฅ˜์ผ๊ฒฝ์šฐ์†Œํ”„ํŠธ๋งฅ์Šคํ•จ์ˆ˜์‚ฌ์šฉ

๋น„์„ ํ˜•ํ™œ์„ฑํ™”ํ•จ์ˆ˜

Page 21: 2.supervised learning(epoch#2)-3

๋น„์šฉํ•จ์ˆ˜

๋กœ์ง€์Šคํ‹ฑํšŒ๊ท€์™€๋งˆ์ฐฌ๊ฐ€์ง€๋กœ๋กœ์ง€์Šคํ‹ฑ๋น„์šฉํ•จ์ˆ˜(์ด์ง„๋ถ„๋ฅ˜)๋‚˜ํฌ๋กœ์Šค์—”ํŠธ๋กœํ”ผ

๋น„์šฉํ•จ์ˆ˜(๋‹ค์ค‘๋ถ„๋ฅ˜)๋ฅผ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธํ•จ์ˆ˜๊ฐ€๋น„์„ ํ˜•์ด๋ฏ€๋กœ์ตœ์ ๊ฐ’์„ํ•ด์„์ ์œผ๋กœ๊ณ„์‚ฐํ•˜์ง€๋ชปํ•ด๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•gradient

descent ๊ณ„์—ด์˜์•Œ๊ณ ๋ฆฌ์ฆ˜์„์ฆ๊ฒจ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๊ฐ๊ฐ€์ค‘์น˜ w์—๋Œ€ํ•ด๋น„์šฉํ•จ์ˆ˜๋ฅผ๋ฏธ๋ถ„ํ•˜์—ฌ๊ธฐ์šธ๊ธฐ์•„๋ž˜์ชฝ์œผ๋กœ์กฐ๊ธˆ์”ฉ(learning_rate)

์ด๋™ํ•ฉ๋‹ˆ๋‹ค.

๋น„์šฉํ•จ์ˆ˜๋ฅผ์ง์ ‘ w์—๋Œ€ํ•ด๋ฏธ๋ถ„ํ•˜๋Š”๋Œ€์‹ ์ฒด์ธ๋ฃฐ์„์ด์šฉํ•ด๋ฏธ๋ถ„๊ฐ’(๊ทธ๋ž˜๋””์–ธํŠธ)์„

๋ˆ„์ ํ•˜์—ฌ๊ณฑํ•ด๊ฐ‘๋‹ˆ๋‹ค(์—ญ์ „ํŒŒbackpropagation)

21

โˆ’

๐‘–=1

๐‘›

๐‘ฆ๐‘™๐‘œ๐‘” ๐‘ฆ

๐‘‘ ๐‘ฆ

๐‘‘๐‘ค=

๐‘‘ ๐‘ฆ

๐‘‘๐‘ขโ‹…๐‘‘๐‘ข

๐‘‘๐‘ค

Page 22: 2.supervised learning(epoch#2)-3

ํ™œ์„ฑํ™”ํ•จ์ˆ˜activation function

์€๋‹‰์œ ๋‹›์˜๊ฐ€์ค‘์น˜ํ•ฉ์˜๊ฒฐ๊ณผ์—๋น„์„ ํ˜•์„ฑ์„์ฃผ์ž…

๋ ๋ฃจReLU, ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธtanh, ์‹œ๊ทธ๋ชจ์ด๋“œsigmoid

22

๊ธฐ์šธ๊ธฐ๊ฐ€ 0์—๊ฐ€๊นŒ์›Œ์ง‘๋‹ˆ๋‹ค.

x>0 ๊ทธ๋ž˜๋””์–ธํŠธ๋Š” 1

x=0 (sub)๊ทธ๋ž˜๋””์–ธํŠธ๋Š” 0

x<0 ๊ทธ๋ž˜๋””์–ธํŠธ๋Š” 0

(์—ฌ๋Ÿฌ๋ณ€์ข…๋“ฑ์žฅ)

๐‘‘๐‘ฆ

๐‘‘๐‘ค=

๐‘‘๐‘ฆ

๐‘‘๐‘ขโ‹…๐‘‘๐‘ข

๐‘‘๐‘ค

Page 23: 2.supervised learning(epoch#2)-3

MLPClassifier + two_moons

23

hidden_layer_sizes=[100],

activation=โ€˜reluโ€™

Limited BFGS

์˜์‚ฌ๋‰ดํ„ด๋ฐฉ๋ฒ•

Page 24: 2.supervised learning(epoch#2)-3

์€๋‹‰์œ ๋‹›๊ฐœ์ˆ˜ = 10

24

Page 25: 2.supervised learning(epoch#2)-3

๋ ˆ์ด์–ด์ถ”๊ฐ€, tanh ํ™œ์„ฑํ™”ํ•จ์ˆ˜

25

Page 26: 2.supervised learning(epoch#2)-3

L2 ๊ทœ์ œ๋งค๊ฐœ๋ณ€์ˆ˜ alpha

26

๊ธฐ๋ณธ๊ฐ’

๋‚ฎ์€๊ทœ์ œ ๋†’์€๊ทœ์ œ

Page 27: 2.supervised learning(epoch#2)-3

์‹ ๊ฒฝ๋ง์˜๋ณต์žก๋„

์€๋‹‰์ธต์˜์ˆ˜๊ฐ€๋งŽ์„์ˆ˜๋ก,

์€๋‹‰์ธต์˜์œ ๋‹›๊ฐœ์ˆ˜๊ฐ€๋งŽ์„์ˆ˜๋ก,

๊ทœ์ œ๊ฐ€๋‚ฎ์„์ˆ˜๋ก๋ณต์žก๋„๊ฐ€์ฆ๊ฐ€๋จ

๋ชจ๋ธํ›ˆ๋ จ์‹œํ›ˆ๋ จ์ƒ˜ํ”Œ๋งˆ๋‹ค์€๋‹‰์ธต์˜์œ ๋‹›์˜์ผ๋ถ€๋ฅผ๋žœ๋คํ•˜๊ฒŒ์ž‘๋™์‹œํ‚ค์ง€์•Š์•„

๋งˆ์น˜์—ฌ๋Ÿฌ๊ฐœ์˜์‹ ๊ฒฝ๋ง์„์•™์ƒ๋ธ”ํ•˜๋Š”๊ฒƒ๊ฐ™์€๋“œ๋กญ์•„์›ƒdropout์ด์‹ ๊ฒฝ๋ง์—์„œ๊ณผ๋Œ€์ ํ•ฉ

๋ฐฉ์ง€ํ•˜๋Š”๋Œ€ํ‘œ์ ์ธ๋ฐฉ๋ฒ• scikit-learn์—์ถ”๊ฐ€๋ ์˜ˆ์ •

27

Page 28: 2.supervised learning(epoch#2)-3

๋žœ๋ค์ดˆ๊ธฐํ™”

๋ชจ๋ธ์„ํ›ˆ๋ จํ• ๋•Œ๊ฐ€์ค‘์น˜๋ฅผ๋ฌด์ž‘์œ„๋กœ์ดˆ๊ธฐํ™”

๋ชจ๋ธ์˜ํฌ๊ธฐ์™€๋ณต์žก๋„๊ฐ€๋‚ฎ์œผ๋ฉด์˜ํ–ฅ์„๋ฏธ์น ์ˆ˜์žˆ์Œ

28

Page 29: 2.supervised learning(epoch#2)-3

๊ฐ€์ค‘์น˜์ดˆ๊ธฐํ™”๋ฐฉ๋ฒ•

Scikit-Learn์—์„œ๋Š” Glorot ์ดˆ๊ธฐํ™”๋ฐฉ์‹์„์‚ฌ์šฉํ•˜์—ฌ๊ฐ€์ค‘์น˜์™€ํŽธํ–ฅ์„๋ชจ๋‘

์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

์‹œ๊ทธ๋ชจ์ด๋“œํ•จ์ˆ˜: โˆ’2

๐‘›๐‘–๐‘›๐‘๐‘ข๐‘ก๐‘ +๐‘›๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก๐‘ ~ +

2

๐‘›๐‘–๐‘›๐‘๐‘ข๐‘ก๐‘ +๐‘›๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก๐‘ ์˜๊ท ๋“ฑ๋ถ„ํฌ

tanh, relu ํ•จ์ˆ˜: โˆ’6

๐‘›๐‘–๐‘›๐‘๐‘ข๐‘ก๐‘ +๐‘›๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก๐‘ ~ +

6

๐‘›๐‘–๐‘›๐‘๐‘ข๐‘ก๐‘ +๐‘›๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก๐‘ ์˜๊ท ๋“ฑ๋ถ„ํฌ

* Xavier ์ดˆ๊ธฐํ™”

์‹œ๊ทธ๋ชจ์ด๋“œ: ยฑ6

๐‘›๐‘–๐‘›๐‘๐‘ข๐‘ก๐‘ +๐‘›๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก๐‘ tanh: ยฑ4

6

๐‘›๐‘–๐‘›๐‘๐‘ข๐‘ก๐‘ +๐‘›๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก๐‘ relu: ยฑ 2

6

๐‘›๐‘–๐‘›๐‘๐‘ข๐‘ก๐‘ +๐‘›๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก๐‘ 

29

Page 30: 2.supervised learning(epoch#2)-3

MLPClassifier + cancer dataset

์‹ ๊ฒฝ๋ง์—์„œ๋„๋ฐ์ดํ„ฐ์˜๋ฒ”์ฃผ์—๋ฏผ๊ฐํ•จ

30

ํ‰๊ท ์ด 0์ด๊ณ , ๋ถ„์‚ฐ์ด 1์ธํ‘œ์ค€์ •๊ทœ๋ถ„ํฌํ‘œ์ค€์ ์ˆ˜๋˜๋Š” z ์ ์ˆ˜StandardScaler

Page 31: 2.supervised learning(epoch#2)-3

MLPClassifier + adam

31

์ตœ๋Œ€๋ฐ˜๋ณตํšŸ์ˆ˜๋„๋‹ฌ(๊ธฐ๋ณธ๊ฐ’ 200)

solver=โ€˜adamโ€™ (Adaptive Moment Estimation)

Page 32: 2.supervised learning(epoch#2)-3

์‹ ๊ฒฝ๋ง์˜๊ฐ€์ค‘์น˜๊ฒ€์‚ฌ

32

๋ฐ์„์ˆ˜๋กํฐ๊ฐ’์ž…๋ ฅ์ธต๊ณผ์€๋‹‰์ธต์‚ฌ์ด์˜๊ฐ€์ค‘์น˜

์€๋‹‰์ธต๊ณผ์ถœ๋ ฅ์ธต์‚ฌ์ด์˜๊ฐ€์ค‘์น˜๋Š”ํ•ด์„ํ•˜๊ธฐ๋”์–ด๋ ต์Šต๋‹ˆ๋‹ค.

๋œ์ค‘์š”ํ•˜๋‹ค๊ณ ํŒ๋‹จ๋จ

Page 33: 2.supervised learning(epoch#2)-3

DL framework landscape

33

+

PyTorch

Caffe2(Python)

lasagna

Pythonโ€™17 Mar. KDnuggets

Wrapper

Page 34: 2.supervised learning(epoch#2)-3

์žฅ๋‹จ์ ๊ณผ๋งค๊ฐœ๋ณ€์ˆ˜

์žฅ์ 

์ถฉ๋ถ„ํ•œ์‹œ๊ฐ„๊ณผ๋ฐ์ดํ„ฐ๊ฐ€์žˆ์œผ๋ฉด๋งค์šฐ๋ณต์žกํ•œ๋ชจ๋ธ์„๋งŒ๋“ค์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

์ข…์ข…๋‹ค๋ฅธ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์••๋„ํ•˜๋Š”์„ฑ๋Šฅ์„๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค(์Œ์„ฑ์ธ์‹, ์ด๋ฏธ์ง€๋ถ„๋ฅ˜, ๋ฒˆ์—ญ๋“ฑ)

๋‹จ์ 

๋ฐ์ดํ„ฐ์ „์ฒ˜๋ฆฌ์—๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค(๋ฐฐ์น˜ ์ •๊ทœํ™”).

์ด์ข…์˜๋ฐ์ดํ„ฐํƒ€์ž…์ผ๊ฒฝ์šฐํŠธ๋ฆฌ๋ชจ๋ธ์ด๋”๋‚˜์„์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.

๋งค๊ฐœ๋ณ€์ˆ˜ํŠœ๋‹์ด๋งค์šฐ์–ด๋ ต์Šต๋‹ˆ๋‹ค(ํ•™์Šต๋œ ๋ชจ๋ธ์žฌ์‚ฌ์šฉ).

Scikit-Learn์€์ฝ˜๋ณผ๋ฃจ์…˜convolution์ด๋‚˜๋ฆฌ์ปค๋ŸฐํŠธrecurrent ์‹ ๊ฒฝ๋ง์„์ œ๊ณตํ•˜์ง€์•Š์Šต๋‹ˆ๋‹ค.

๋งค๊ฐœ๋ณ€์ˆ˜

solver=[โ€˜adamโ€™, โ€˜sgdโ€™, โ€˜lbfgsโ€™],

sgd์ผ๊ฒฝ์šฐ momentum + nesterovs_momentum

alpha(L2 ๊ทœ์ œ)

34

๐›ผ โ†‘ โ†’ ๊ทœ์ œ โ†‘ ๐‘ค โ†“๐›ผ โ†“ โ†’ ๊ทœ์ œ โ†“ ๐‘ค โ†‘

Page 35: 2.supervised learning(epoch#2)-3

์‹ ๊ฒฝ๋ง์˜๋ณต์žก๋„์™€๊ถŒ์žฅ์„ค์ •

100๊ฐœ์˜ํŠน์„ฑ๊ณผ 100๊ฐœ์˜์€๋‹‰์œ ๋‹›, 1๊ฐœ์˜์ถœ๋ ฅ์œ ๋‹› = 10,100๊ฐœ์˜๊ฐ€์ค‘์น˜

+ 100๊ฐœ์˜์œ ๋‹›์„๊ฐ€์ง„์€๋‹‰์ธต์ถ”๊ฐ€ 10,000๊ฐœ์˜๊ฐ€์ค‘์น˜์ฆ๊ฐ€

๋งŒ์•ฝ 1000๊ฐœ์œ ๋‹›์„๊ฐ€์ง„์€๋‹‰์ธต์ด๋ผ๋ฉด 101,000 + 1,000,000 = 1,101,000๊ฐœ

solver์˜๊ธฐ๋ณธ๊ฐ’์€ โ€˜adamโ€™์œผ๋กœ๋ฐ์ดํ„ฐ์Šค์ผ€์ผ์—์กฐ๊ธˆ๋ฏผ๊ฐํ•ฉ๋‹ˆ๋‹ค.("The Marginal Value of Adaptive Gradient Methods in Machine Learning," A. C. Wilson et al. (2017), ์—์„œ

Adam, RMSProp ๋“ฑ์ด๋ณด์—ฌ์ค€ํšŒ์˜์ ์ธ๊ฒฐ๊ณผ๋กœ๋ชจ๋ฉ˜ํ…€๋ฐฉ์‹์„ํ•จ๊ป˜๊ณ ๋ คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.)

โ€˜lbfgsโ€™๋Š”์•ˆ์ •์ ์ด์ง€๋งŒ๊ทœ๋ชจ๊ฐ€ํฐ๋ชจ๋ธ์ด๋‚˜๋Œ€๋Ÿ‰์˜๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š”์‹œ๊ฐ„์ด์˜ค๋ž˜

๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.

โ€˜sgdโ€™๋Š” โ€™momentumโ€™๊ณผ โ€˜nesterov_momentumโ€™ ์˜ต์…˜๊ณผํ•จ๊ป˜์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.35

Page 36: 2.supervised learning(epoch#2)-3

๋ชจ๋ฉ˜ํ…€

๋ชจ๋ฉ˜ํ…€์•Œ๊ณ ๋ฆฌ์ฆ˜์€์ด์ „๊ทธ๋ž˜๋””์–ธํŠธ๊ฐ’์„๊ฐ€์†๋„์™€๊ฐ™์ด์ƒ๊ฐํ•˜์—ฌ์ „์—ญ์ตœ์ ๊ฐ’์˜

๋ฐฉํ–ฅ์œผ๋กœ๋”๋น ๋ฅด๊ฒŒ์ˆ˜๋ ดํ•˜๊ฒŒ๋งŒ๋“ญ๋‹ˆ๋‹ค.

momentum ๋งค๊ฐœ๋ณ€์ˆ˜๋Š”์ผ์ข…์˜๋Œํผ์™€๊ฐ™์€์—ญํ• ์„ํ•ฉ๋‹ˆ๋‹ค.

๐‘ฃ๐‘ก+1 = ๐œ‡๐‘ฃ๐‘ก โˆ’ ๐œ–๐‘” ๐œƒ๐‘ก

๐œƒ๐‘ก+1 = ๐œƒ๐‘ก + ๐‘ฃ๐‘ก+1

36

learning_ratemomentum

Page 37: 2.supervised learning(epoch#2)-3

๋„ค์Šคํ…Œ๋กœํ”„๋ชจ๋ฉ˜ํ…€

๋„ค์Šคํ…Œ๋กœํ”„๋ชจ๋ฉ˜ํ…€์€๋ชจ๋ฉ˜ํ…€๋ฐฉ์‹์œผ๋กœ์ง„ํ–‰ํ•œํ›„์˜๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ๊ณ„์‚ฐํ•˜์—ฌํ˜„์žฌ์—

์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๐‘ฃ๐‘ก+1 = ๐œ‡๐‘ฃ๐‘ก โˆ’ ๐œ–๐‘” ๐œƒ๐‘ก + ๐œ‡๐‘ฃ๐‘ก

๐œƒ๐‘ก+1 = ๐œƒ๐‘ก + ๐‘ฃ๐‘ก+1

์‹ค์ œ๊ตฌํ˜„์—์„œ๋Š”์•ž์„ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ๊ณ„์‚ฐํ•˜๋Š”๋Œ€์‹ ๋ชจ๋ฉ˜ํ…€๋ฐฉ์‹์„๋‘๋ฒˆ์ ์šฉํ•˜์—ฌ

๋„ค์Šคํ…Œ๋กœํ”„์˜๊ทผ์‚ฌ๊ฐ’์„๊ตฌํ•ฉ๋‹ˆ๋‹ค.(https://tensorflow.blog/2017/03/22/momentum-nesterov-momentum/ ์ฐธ์กฐ)

37

Page 38: 2.supervised learning(epoch#2)-3

๋ถ„๋ฅ˜์˜๋ถˆํ™•์‹ค์„ฑ์ถ”์ •

38

Page 39: 2.supervised learning(epoch#2)-3

๋ถ„๋ฅ˜์˜๋ถˆํ™•์‹ค์„ฑ

์–ด๋–คํ…Œ์ŠคํŠธํฌ์ธํŠธ์—๋Œ€ํ•ด์˜ˆ์ธกํด๋ž˜์Šค๋ฟ๋งŒ์•„๋‹ˆ๋ผ์–ผ๋งˆ๋‚˜๊ทธํด๋ž˜์Šค์ž„์„

ํ™•์‹ ํ•˜๋Š”์ง€๊ฐ€์ค‘์š”ํ• ๋•Œ๊ฐ€์žˆ์Œ

๋Œ€๋ถ€๋ถ„ decision_function ๊ณผ predict_proba ๋ฉ”์„œ๋“œ๋‘˜์ค‘ํ•˜๋‚˜๋Š”์ œ๊ณตํ•จ

39

Page 40: 2.supervised learning(epoch#2)-3

๊ฒฐ์ •ํ•จ์ˆ˜

40

์ด์ง„๋ถ„๋ฅ˜์—์„œ decision_function() ๋ฐ˜ํ™˜๊ฐ’์˜ํฌ๊ธฐ (n_samples,)

๐‘ฆ = ๐‘ค 0 ร— ๐‘ฅ 0 + ๐‘ค 1 ร— ๐‘ฅ 1 + โ‹ฏ+ ๐‘ค ๐‘ ร— ๐‘ฅ ๐‘ + ๐‘

Page 41: 2.supervised learning(epoch#2)-3

๊ฒฐ์ •๊ฒฝ๊ณ„ + decision_function

41

Page 42: 2.supervised learning(epoch#2)-3

์˜ˆ์ธกํ™•๋ฅ 

predict_proba() ๋ฐ˜ํ™˜๊ฐ’์˜ํฌ๊ธฐ (n_samples, n_classes)

๊ฐ€์žฅํฐํ™•๋ฅ ๊ฐ’์„๊ฐ€์ง„ํด๋ž˜์Šค๋ฅผ์˜ˆ์ธกํด๋ž˜์Šค๋กœ์„ ํƒํ•จ

42

Page 43: 2.supervised learning(epoch#2)-3

๊ฒฐ์ •๊ฒฝ๊ณ„ + predict_proba

43

Page 44: 2.supervised learning(epoch#2)-3

๋‹ค์ค‘๋ถ„๋ฅ˜ + decision_function

๋ฐ˜ํ™˜๊ฐ’์˜ํฌ๊ธฐ๋Š” (n_samples, n_classes)

๊ฐ€์žฅํฐ๊ฐ’์ด์˜ˆ์ธกํด๋ž˜์Šค๊ฐ€๋จ

44

Page 45: 2.supervised learning(epoch#2)-3

๋‹ค์ค‘๋ถ„๋ฅ˜ + predict_proba

๋ฐ˜ํ™˜๊ฐ’์˜ํฌ๊ธฐ๋Š” (n_samples, n_classes)

45

Page 46: 2.supervised learning(epoch#2)-3

์š”์•ฝ๋ฐ์ •๋ฆฌ

46

Page 47: 2.supervised learning(epoch#2)-3

์•Œ๊ณ ๋ฆฌ์ฆ˜์ •๋ฆฌ

์ตœ๊ทผ์ ‘์ด์›ƒ

์ž‘์€๋ฐ์ดํ„ฐ์…‹์ผ๊ฒฝ์šฐ, ๊ธฐ๋ณธ๋ชจ๋ธ๋กœ์„œ์ข‹๊ณ ์„ค๋ช…ํ•˜๊ธฐ์‰ฌ์›€.

์„ ํ˜•๋ชจ๋ธ

์ฒซ๋ฒˆ์งธ๋กœ์‹œ๋„ํ• ์•Œ๊ณ ๋ฆฌ์ฆ˜. ๋Œ€์šฉ๋Ÿ‰๋ฐ์ดํ„ฐ์…‹๊ฐ€๋Šฅ. ๊ณ ์ฐจ์›๋ฐ์ดํ„ฐ์—๊ฐ€๋Šฅ.

๋‚˜์ด๋ธŒ๋ฒ ์ด์ฆˆ

๋ถ„๋ฅ˜๋งŒ๊ฐ€๋Šฅ. ์„ ํ˜•๋ชจ๋ธ๋ณด๋‹คํ›จ์”ฌ๋น ๋ฆ„. ๋Œ€์šฉ๋Ÿ‰๋ฐ์ดํ„ฐ์…‹๊ณผ๊ณ ์ฐจ์›๋ฐ์ดํ„ฐ์—๊ฐ€๋Šฅ.

์„ ํ˜•๋ชจ๋ธ๋ณด๋‹ค๋œ์ •ํ™•ํ•จ.

๊ฒฐ์ •ํŠธ๋ฆฌ

๋งค์šฐ๋น ๋ฆ„. ๋ฐ์ดํ„ฐ์Šค์ผ€์ผ์กฐ์ •์ดํ•„์š”์—†์Œ. ์‹œ๊ฐํ™”ํ•˜๊ธฐ์ข‹๊ณ ์„ค๋ช…ํ•˜๊ธฐ์‰ฌ์›€.

47

Page 48: 2.supervised learning(epoch#2)-3

๋žœ๋คํฌ๋ ˆ์ŠคํŠธ

๊ฒฐ์ •ํŠธ๋ฆฌํ•˜๋‚˜๋ณด๋‹ค๊ฑฐ์˜ํ•ญ์ƒ์ข‹์€์„ฑ๋Šฅ์„๋ƒ„. ๋งค์šฐ์•ˆ์ •์ ์ด๊ณ ๊ฐ•๋ ฅํ•จ.

๋ฐ์ดํ„ฐ์Šค์ผ€์ผ์กฐ์ •ํ•„์š”์—†์Œ. ๊ณ ์ฐจ์›ํฌ์†Œ๋ฐ์ดํ„ฐ์—๋Š”์ž˜์•ˆ๋งž์Œ.

๊ทธ๋ž˜๋””์–ธํŠธ๋ถ€์ŠคํŒ…

๋žœ๋คํฌ๋ ˆ์ŠคํŠธ๋ณด๋‹ค์กฐ๊ธˆ๋”์„ฑ๋Šฅ์ด์ข‹์Œ. ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ๋ณด๋‹ค ํ•™์Šต์€๋Š๋ฆฌ๋‚˜์˜ˆ์ธก์€

๋น ๋ฅด๊ณ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ์กฐ๊ธˆ์‚ฌ์šฉ. ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ๋ณด๋‹ค ๋งค๊ฐœ๋ณ€์ˆ˜ํŠœ๋‹์ด๋งŽ์ดํ•„์š”ํ•จ.

์„œํฌํŠธ๋ฒกํ„ฐ๋จธ์‹ 

๋น„์Šทํ•œ์˜๋ฏธ์˜ํŠน์„ฑ์œผ๋กœ์ด๋ค„์ง„์ค‘๊ฐ„๊ทœ๋ชจ๋ฐ์ดํ„ฐ์…‹์—์ž˜๋งž์Œ.

๋ฐ์ดํ„ฐ์Šค์ผ€์ผ์กฐ์ •ํ•„์š”. ๋งค๊ฐœ๋ณ€์ˆ˜์—๋ฏผ๊ฐ.

์‹ ๊ฒฝ๋ง

ํŠน๋ณ„ํžˆ๋Œ€์šฉ๋Ÿ‰๋ฐ์ดํ„ฐ์…‹์—์„œ ๋งค์šฐ๋ณต์žกํ•œ๋ชจ๋ธ์„๋งŒ๋“ค์ˆ˜์žˆ์Œ.

๋งค๊ฐœ๋ณ€์ˆ˜์„ ํƒ๊ณผ๋ฐ์ดํ„ฐ์Šค์ผ€์ผ์—๋ฏผ๊ฐ. ํฐ๋ชจ๋ธ์€ํ•™์Šต์ด์˜ค๋ž˜๊ฑธ๋ฆผ.

48