training examples. entropy and information gain information answers questions the more clueless i am...
TRANSCRIPT
![Page 1: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/1.jpg)
Training Examples
NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1
Play GolfWindHumidityTemp.OutlookDay
![Page 2: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/2.jpg)
Entropy and Information Gain
• Information answers questions• The more clueless I am about the answer initially,
the more information is contained in the final answer.
• Scale: – 1 = completely clueless – the answer to Boolean
question with prior <0.5,0.5>– 0 bit = complete knowledge – the answer to Boolean
question with prior <1.0,0.0>– ? = answer to Boolean question with prior <0.75,0.25>– The concept of Entropy
![Page 3: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/3.jpg)
Entropy
• S is a sample of training examples
• p+ is the proportion of positive examples
• p- is the proportion of negative examples
• Entropy measures the impurity of S
Entropy(S) = -p+ log2 p+ - p- log2 p-
![Page 4: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/4.jpg)
Information Gain• Gain(S,A): expected reduction in entropy due to
sorting S on attribute A
Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)
![Page 5: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/5.jpg)
Information Gain• Gain(S,A): expected reduction in entropy due to
sorting S on attribute A
Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)
![Page 6: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/6.jpg)
Training Examples
NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1
Play GolfWindHumidityTemp.OutlookDay
![Page 7: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/7.jpg)
Selecting the First Attribute
Humidity
High Normal
[3+, 4-] [6+, 1-]
S=[9+,5-]E=0.940
Gain(S,Humidity)=0.940-(7/14)*0.985 – (7/14)*0.592=0.151
E=0.985 E=0.592
Wind
Weak Strong
[6+, 2-] [3+, 3-]
S=[9+,5-]E=0.940
E=0.811 E=1.0
Gain(S,Wind)=0.940-(8/14)*0.811 – (6/14)*1.0=0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.
![Page 8: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/8.jpg)
Selecting the First Attribute
Outlook
Sunny Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]E=0.940
Gain(S,Outlook)=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.971=0.247
E=0.971
E=0.971
Overcast
[4+, 0]
E=0.0
![Page 9: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/9.jpg)
Selecting the First AttributeThe information gain values for the 4 attributes are:• Gain(S,Outlook) =0.247• Gain(S,Humidity) =0.151• Gain(S,Wind) =0.048• Gain(S,Temperature) =0.029
where S denotes the collection of training examples
![Page 10: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/10.jpg)
Selecting the Next AttributeOutlook
Sunny Overcast Rain
Yes
[D1,D2,…,D14] [9+,5-]
Ssunny=[D1,D2,D8,D9,D11] [2+,3-]
? ?
[D3,D7,D12,D13] [4+,0-]
[D4,D5,D6,D10,D14] [3+,2-]
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
![Page 11: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/11.jpg)
ID3 AlgorithmOutlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
[D3,D7,D12,D13]
[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]
![Page 12: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/12.jpg)
Which attribute should we start with?
ID# Texture Temp Size Classification
1 Smooth Cold Large Yes
2 Smooth Cold Small No
3 Smooth Cool Large Yes
4 Smooth Cool Small Yes
5 Smooth Hot Small Yes
6 Wavy Cold Medium No
7 Wavy Hot Large Yes
8 Rough Cold Large No
9 Rough Cool Large Yes
10 Rough Hot Small No
11 Rough Warm Medium Yes
![Page 13: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/13.jpg)
Which node is the best?
• Texture (smooth,wavy,rough)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +
2/11 * (-1/2*log1/2 – ½ *log1/2) +
4/11 * (-2/4*log2/4 – 2/4*log2/4)
= 5/11*(.722) + 2/11*1 + 4/11*1
= .874
![Page 14: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/14.jpg)
Which node is the best?
• Temperature(cold,cool,hot,warm)4/11* ( -1/4*log1/4 – 3/4*log3/4) +
3/11 * (-3/3*log3/3 – 0/3 *log0/3) +
3/11 * (-2/3*log2/3 – 1/3 *log1/3) +
1/11 * (-1/1*log1/1 – 0/1*log0/1)
= 4/11*(.811) + 0 + 3/11*(.918) + 0
= .545
![Page 15: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/15.jpg)
Which node is the best?
• Size (large,medium,small)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +
2/11 * (-1/2*log1/2 – ½ *log1/2) +
4/11 * (-2/4*log2/4 – 2/4*log2/4)
= 5/11*(.722) + 2/11*1 + 4/11*1
= .874
![Page 16: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/16.jpg)
![Page 17: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/17.jpg)
Learning over time
• How do you evolve knowledge over time when you learn little bit by little bit?– Abstract version – the “Frinkle”
![Page 18: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/18.jpg)
The Question
• The Question– How can we build this kind of representation
over time?
• The Answer– Rely on the concepts of false positives and false
negatives
![Page 19: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/19.jpg)
The idea
• False Positive– An example which is predicted to be positive but whose
known outcome is negative– The problem is that our hypothesis is too general.– The solution is to add another condition to our hypothesis.
• False Negative– An example which is predicted to be negative but whose
known outcome is positive– The problem is that our hypothesis is too restrictive.– The solution is to remove a condition to our hypothesis [or
to add disjunction]
![Page 20: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information](https://reader035.vdocuments.site/reader035/viewer/2022081603/5697c02a1a28abf838cd7fba/html5/thumbnails/20.jpg)
Creating a model one “case” at a time
ID# Texture Temp Size Classification
1 Smooth Cold Large Yes
2 Smooth Cold Small No
3 Smooth Cool Large Yes
4 Smooth Cool Small Yes
5 Smooth Hot Small Yes
6 Wavy Cold Medium No
7 Wavy Hot Large Yes
8 Rough Cold Large No
9 Rough Cool Large Yes
10 Rough Hot Small No
11 Rough Warm Medium Yes