lecture 3 cbr indexing

20
Case Based Reasoning Lecture 3: CBR Case-Base Indexing

Upload: deva-putra

Post on 04-Jun-2018

233 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 1/20

Case Based Reasoning

Lecture 3: CBR Case-Base Indexing

Page 2: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 2/20

Outline

Indexing CBR case knowledge

Why might we want an index?

Decision tree indexes

C4.5 algorithm

Summary

Page 3: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 3/20

Why might we want an index?

Efficiency

Similarity matching is computationally

expensive for large case-bases

Similarity matching can be computationallyexpensive for complex case representations

Relevancy of cases for similarity matching

some features of new problem may make

certain cases irrelevant

despite being very similar

Cases are pre-selected from case-base

Similarity matching is applied to subset of cases

Page 4: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 4/20

What to index?

Client Ref #: 64

Client Name: John Smith

 Address: 39 Union Street

Tel: 01224 665544

Photo:

 Age: 37

Occupation: IT AnalystIncome: £ 20000

… 

Unindexed

features

Indexedfeatures

Case Features are:- Indexed

- Unindexed

Page 5: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 5/20

Indexed vs Unindexed Features

 Indexed features are:

 used for retrieval

 are predictive of the case’s solution 

 Unindexed feature are:

 not used for retrieval

 not predictive of the case’s solution 

 provide valuable contextual information andlessons learned

Page 6: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 6/20

Playing Tennis Example (case-base)

Outlook Temperature Humidity Windy Play

Sunny Hot High False No

Sunny Hot High True No

Cloudy Hot High False  Yes

Rainy Mild High False  Yes

Rainy Cool Normal False  Yes

Rainy Cool Normal True No

Cloudy Cool Normal True  Yes

Sunny Mild High False No

Sunny Cool Normal False  Yes

Rainy Mild Normal False  Yes

Sunny Mild Normal True  Yes

Cloudy Mild High True  Yes

Cloudy Hot Normal False  Yes

Rainy Mild High True No

Page 7: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 7/20

Decision Tree (Index) for Playing

Tennis

outlook

 Yes

sunny

cloudy

rainy

humidity

No  Yes

high normal

windy

No Yes

true false

Page 8: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 8/20

Choosing the Root Attribute

humidity

 Yes Yes YesNoNo

NoNo

 Yes Yes Yes Yes Yes

 YesNo

temperature

 Yes YesNoNo

 Yes Yes Yes YesNo

No

 Yes Yes YesNo

outlook

 Yes YesNoNoNo

 Yes Yes Yes Yes

 Yes Yes YesNoNo

windy

 Yes Yes Yes Yes Yes YesNoNo

 Yes Yes YesNoNo

No

Which attribute is best for the root of the tree?- the one that gives the best information gain

- in this case outlook  (as we are going to see)

sunny

cloudy

rainy high low true falsehot

mild

cold

Page 9: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 9/20

Building Decision Trees –  C4.5

Algorithm

Based on the Information Theory (Shannon 1948)

Divide and conquer strategy

Choose attribute for root node

Create branch for each value of that attribute Split cases according to branches

Repeat process for each branch until all cases inthe branch have the same class

 Assumption:

simplest tree which classifies the cases is best

Page 10: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 10/20

Entropy of a set of cases

Playing Tennis Example:

S is the set of 14 cases

We want to classify the cases according to the values of“Play”, i.e., Yes and No in this example. 

the proportion of “Yes” cases is 9 out of 14: 9/14 = 0.64

the proportion of “No” cases is 5 out of 14: 5/14 = 0.36

The Entropy measures the impurity of S

Entropy (S) = - 0.64(log2 0.64) – 0.36(log2 0.36)

= -0.64(-0.644)-0.36(-1.474) = 0.41+0.53 = 0.94Outlook Temperature Humidity Windy Play

Sunny Hot High False No

Cloudy Hot High False  Yes

…  …  …  …  … 

 “Yes” case 

 “No” case 

14 cases

Page 11: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 11/20

Entropy of a set of cases

S is a set of cases

 A is a feature

Play in the example

{S1 ... Si … Sn} are the partitions of S according tovalues of A

Yes and No in the example

{P1 ... Pi … Pn} are the proportions of {S1 ... Si … Sn}in S

i

n

i

i   plog  pS  Entropy2

*)(1

Page 12: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 12/20

)(*)(),(1

i

n

i

iS  Entropy

S S  Entropy AS Gain  

Gain of an attribute

Calculate Gain (S, A) for each attribute A

expected reduction in entropy due to sorting on A

Choose the attribute with highest gain as root of tree

Gain (S, A) = Entropy(S) – Expectation(A)

{S1, ..., Si, …, Sn} = partitions of S according tovalues of attribute A

n = number of attributes A

|Si| = number of cases in the partition Si

|S| = total number of cases in S

Page 13: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 13/20

Which attribute is root?

If Outlook is made root of the tree

There are 3 partitions of the cases

S1 for Sunny, S2 for Cloudy, S3 for Rainy

S1(Sunny)= {cases 1,2,8,9,11}

|S1| = 5

In these 5 cases values for Play are

3 No and 2 Yes

Entropy(S1)

= - 2/5 (log2 2/5) – 3/5(log2 3/5) = 0.97

Similarly

Entropy(S2)= 0

Entropy(S3)= 0.97

Outlook Tempe

rature

Humidity Windy Play

Sunny Hot High False No

Sunny Hot High True No

Cloudy Hot High False  Yes

Rainy Mild High False  Yes

Rainy Cool Normal False  Yes

Rainy Cool Normal True No

Cloudy Cool Normal True  Yes

Sunny Mild High False No

Sunny Cool Normal False  Yes

Rainy Mild Normal False  Yes

Sunny Mild Normal True  Yes

Cloudy Mild High True  Yes

Cloudy Hot Normal False  Yes

Rainy Mild High True No

Page 14: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 14/20

Choosing the Root Attribute

humidity

 Yes Yes YesNoNo

NoNo

 Yes Yes Yes Yes Yes

 YesNo

temperature

 Yes YesNoNo

 Yes Yes Yes YesNo

No

 Yes Yes YesNo

outlook

 Yes YesNoNoNo

 Yes Yes Yes Yes

 Yes Yes YesNoNo

windy

 Yes Yes Yes Yes Yes YesNoNo

 Yes Yes YesNoNo

No

Which attribute is best for the root of the tree?- the one that gives the best information gain

- in this case outlook  (as we are going to see)

sunny

cloudy

rainy high low true falsehot

mild

cold

Page 15: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 15/20

Which attribute is root?

Gain(S, Outlook) = Entropy(S) – Expectation(Outlook) =

Gain(S, Outlook) = 0.94 – [5/14 * 0.97 + 4/14 * 0 + 5/14 * 0.97]

= 0.247

Similarly

Gain(S, Temperature)= 0.059 Gain(S, Humidity)= 0.051

Gain(S, Windy)= 0.048

Gain(S, Outlook) is the highest  gain

Outlook should be the root of the decision tree (index)

  )3(*

||

||)2(*

||

||)(*

||

||)(   32

11 S  Entropy

S S  Entropy

S S  Entropy

S S  Entropy

Page 16: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 16/20

Repeat for Sunny Node

outlook

 Yes

sunny

cloudy

rainy

?temperature

NoNo

 YesNo

 Yes

hot mild cold

outlook

 Yes

sunny

cloudy

rainy

?windy

 YesNoNo

 YesNo

false true

outlook

 Yes

sunny

cloudy

rainy

?humidity

NoNoNo

 Yes Yes Yes

high normal

Page 17: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 17/20

Repeat for Rainy Node

outlook

 Yes

sunny

cloudy

rainy

humidity

No Yes

high normal

Mild High False YesCool Normal False YesCool Normal True NoMild Normal False YesMild High True No

Page 18: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 18/20

Decision Tree (Index) for Playing

Tennis

outlook

 Yes

sunny

cloudy

rainy

humidity

No  Yes

high normal

windy

No Yes

true false

Page 19: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 19/20

Case Retrieval via DTree Index

Typical implementation

e.g.

Case-Base indexedusing a decision-tree 

Cases are“stored” in the

indexleaves… 

DTree created from cases

 Automated indexing of case-base

Page 20: Lecture 3 Cbr Indexing

8/14/2019 Lecture 3 Cbr Indexing

http://slidepdf.com/reader/full/lecture-3-cbr-indexing 20/20

Summary

Decision tree is built from cases

Decision tree is often used for problem-solving

In CBR, decision tree is used to partition

cases Similarity matching is applied to cases in leaf

node

Indexing pre-selects relevant cases for k-NNretrieval

BRING CALCULATOR on MONDAY