comp 578 fuzzy sets in data mining keith c.c. chan department of computing the hong kong polytechnic...

24
COMP COMP 578 578 Fuzzy Fuzzy Sets Sets in Data Mining in Data Mining Keith C.C. Chan Keith C.C. Chan Department of Computing Department of Computing The Hong Kong Polytechnic University The Hong Kong Polytechnic University

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

COMP COMP 578578Fuzzy Fuzzy SetsSets in Data Mining in Data Mining

Keith C.C. ChanKeith C.C. Chan

Department of ComputingDepartment of Computing

The Hong Kong Polytechnic The Hong Kong Polytechnic UniversityUniversity

22

Fuzzy Data and Fuzzy Data and AssociationsAssociations

Fuzzy Fuzzy associationsassociations.. People who buy People who buy largelarge water melon also water melon also

buy buy manymany oranges. oranges. Fuzzy data in databases.Fuzzy data in databases.

E.g. E.g. Large water melonLarge water melon Definition of “Definition of “largelarge” = [” = [5kg,5kg, 10kg10kg]?]?

E.g. E.g. Many orangesMany oranges Definition of “Definition of “manymany” = [” = [10, 2010, 20]?]?

33

Fuzziness in The Real WorldFuzziness in The Real World Human reason approximately about behavior of a very Human reason approximately about behavior of a very

complex system.complex system. Closed-form mathematical expressionsClosed-form mathematical expressions, e.g.,, e.g.,

provide precise descriptions of systemsprovide precise descriptions of systems with little complexity and uncertainty.with little complexity and uncertainty.

Fuzzy Fuzzy logic and logic and reasoning for complex systems:reasoning for complex systems: When When no numerical data existno numerical data exist.. When When only ambiguous or imprecise information is available.only ambiguous or imprecise information is available. When When behavior behavior can only be can only be described and understood by:described and understood by:

Relating Relating observed input and outputobserved input and output ap approximately proximately rather than exactly.rather than exactly.

2

21

21)(

XeXf

44

Uncertainty and ImprecisionUncertainty and Imprecision Probability theory Probability theory for modelingfor modeling uncertainty arising from uncertainty arising from

randomnessrandomness (a matter of chance). (a matter of chance). Fuzzy set theory for modeling uncertainty associated with Fuzzy set theory for modeling uncertainty associated with

vagueness, imprecision vagueness, imprecision ((lack of informationlack of information).). Human communicatHuman communicatee with a computer requires extreme with a computer requires extreme

precisionprecision (e.g. (e.g. instructionsinstructions in a software program). in a software program). Natural language is vague and imprecise but powerful.Natural language is vague and imprecise but powerful. Two individuals Two individuals communicate in natural language that is communicate in natural language that is

vague and imprecise but powerful.vague and imprecise but powerful. They They do not require do not require an an identical definition of “tall” to identical definition of “tall” to

communicate effectively but computer would require a communicate effectively but computer would require a specific height.specific height.

Fuzzy set theory uses linguistic variables, rather than Fuzzy set theory uses linguistic variables, rather than quantitative variables, to represent imprecise concepts.quantitative variables, to represent imprecise concepts.

55

ApplicationsApplications of Fuzzy of Fuzzy LogicLogic

Sanyo fuzzy logic camcordersSanyo fuzzy logic camcorders.. FFuzzy focusing and image stabilization.uzzy focusing and image stabilization.

Mitsubishi fuzzy air conditionerMitsubishi fuzzy air conditioner.. CControls ontrols TToo changes according to human comfort indexes.changes according to human comfort indexes.

Matsushita fuzzy washing machineMatsushita fuzzy washing machine.. Sensors detect colorSensors detect color,, kind of clothes kind of clothes,, the quantity of grit the quantity of grit.. Select Select combinations of water temperature, detergent amount and combinations of water temperature, detergent amount and

wash and spin cycle time.wash and spin cycle time. Sendai's 16-station subway system.Sendai's 16-station subway system.

Fuzzy cFuzzy controller makes 70% fewer judgment errors in acceleration ontroller makes 70% fewer judgment errors in acceleration and braking than human operators.and braking than human operators.

Nissan fuzzy autoNissan fuzzy auto--transmission transmission & & anti-skid brakinganti-skid braking.. Tokyo's stock marketTokyo's stock market..

AAt least one stock-trading portfolio based on fuzzy logic that t least one stock-trading portfolio based on fuzzy logic that outperformed the Nikkei Exchange average.outperformed the Nikkei Exchange average.

Fuzzy golf diagnostic systems, fuzzy toasters, fuzzy rice Fuzzy golf diagnostic systems, fuzzy toasters, fuzzy rice cookers, fuzzy vacuum cleanerscookers, fuzzy vacuum cleaners, etc., etc.

66

Classical SetsClassical Sets X = universe of discourse = the set of all objects X = universe of discourse = the set of all objects

with the same characteristics.with the same characteristics. Let nLet nxx = cardinality = total number of elements in X. = cardinality = total number of elements in X. For crisp sets A and B in X, we define:For crisp sets A and B in X, we define:

x x A A x belongs to A. x belongs to A. x x A A x does not belong to A. x does not belong to A.

For sets A and B on X:For sets A and B on X: A A B B xxA, xA, xB.B. A A B B A is fully contained in B. A is fully contained in B. A = B A = B A A B and B B and B A. A.

The null set, The null set, , contains no elements., contains no elements.

77

Operations on Classical Operations on Classical SetsSets

Union:Union: AAB = {x | x B = {x | x A or x A or x B}. B}.

Intersection:Intersection: AAB = {x | x B = {x | x A and x A and x B}. B}.

Complement:Complement: AAcc = {x | x = {x | x A, x A, x X}. X}.

88

Classical Sets in Association Classical Sets in Association MiningMining

How do you define the set of large water melons?How do you define the set of large water melons? Large Water Melons = {x | 5kg < weight(x) < 10kg}.Large Water Melons = {x | 5kg < weight(x) < 10kg}.

How do you define the set of very large water melons?How do you define the set of very large water melons? Very Large Water Melons = {x | weight(x) > 10kg}.Very Large Water Melons = {x | weight(x) > 10kg}.

What about a water melon that is exactly 9.9kg?What about a water melon that is exactly 9.9kg? What about a water melon that is exactly 10.1kg?What about a water melon that is exactly 10.1kg? The difference of 0.2kg makes one large and the other The difference of 0.2kg makes one large and the other

very large!very large!

99

Fuzzy SetsFuzzy Sets TTransition between membership and non-ransition between membership and non-

membership membership can be gradualcan be gradual.. FFuzzy set containuzzy set containss elements which have elements which have

varying degrees of membership.varying degrees of membership. Degree of membershipDegree of membership measured by a measured by a

functionfunction.. Function maps elements to a real numbered Function maps elements to a real numbered

value on the interval 0 to 1value on the interval 0 to 1,, AA[0,1][0,1].. Elements in a fuzzy set can also be members Elements in a fuzzy set can also be members

of other fuzzy sets on the same universe.of other fuzzy sets on the same universe.

1010

A Fuzzy Set ExampleA Fuzzy Set Example Example:Example:

A water melon of exactly 9.9kg can A water melon of exactly 9.9kg can belong to:belong to: The set “large water melon” with a degree The set “large water melon” with a degree

of 0.1, and toof 0.1, and to The set of “very large water melon” with a The set of “very large water melon” with a

degree of 0.9.degree of 0.9.

But how do we determine the But how do we determine the degree of membership?degree of membership? It can be found from a fuzzy membership It can be found from a fuzzy membership

function.function.

1111

A A Membership FunctionMembership Function

0.5

1.0

0.05kg 8kg 9kg 10kg3kg

VeryLargewatermelon

Large watermelon

1212

Representing Degree of Representing Degree of MembershMembershipip

For a fuzzy set A, its membership For a fuzzy set A, its membership function is represented asfunction is represented as AA..

AA(x(xii) is the degree of membership ) is the degree of membership of xof xi i with respect to A.with respect to A.

For example, For example, Let A = Large water melonLet A = Large water melon Let xLet xii be a water melon of 9.9kg.be a water melon of 9.9kg. From the membership function in the From the membership function in the

last slide, last slide, AA(x(xii) ) = 0.1.= 0.1.

1313

Representing Representing Fuzzy SetsFuzzy Sets A notation convention for fuzzy setsA notation convention for fuzzy sets::

Numerator is membership valueNumerator is membership value, h, horizontal orizontal bar is delimiterbar is delimiter, , Plus sign denotes a function-Plus sign denotes a function-theoretic union.theoretic union.

Alternatively,Alternatively,

In general, e.g.In general, e.g.

...)()(

A2

2A

1

1A x

x

x

x

))}(,(,)),(,()),(,{(A A2A21A1 nn xxxxxx

}2

)(exp)(],,[|)(,{(A

2

2

A

AA

xxbaxxx

1414

Example of A Example of A Fuzzy SetFuzzy Set RepresentationRepresentation

A definition of the fuzzy set LW=“Large A definition of the fuzzy set LW=“Large Water Melon”.Water Melon”.

Alternatively,Alternatively, LW = {(6kg, 0.25), (7kg, 0.75), (8kg, 1.0), LW = {(6kg, 0.25), (7kg, 0.75), (8kg, 1.0),

(9.9kg, 0.1), …}(9.9kg, 0.1), …} In general, e.g.In general, e.g.

...9.9

1.0

8

0.1

7

75.0

6

25.0LW

kgkgkgkg

kgxwt

kgxwtkgxwt

kgxwtkg

kgxwtkgxwt

kgxwt

xkgkgxxx LWLW

10)(0

10)(910)(

9)(81

8)(53

5)(

3

15)(0

)(],20,0[|)(,{(LW

1515

Fuzzy Set OperationsFuzzy Set Operations Union:Union:

AABB(x) = max((x) = max(AA(x), (x), BB(x)). (x)).

Intersection:Intersection: AABB(x) = min((x) = min(AA(x), (x), BB(x)). (x)).

Complement:Complement:

Containment:Containment: If A If A X X AA(x) (x) XX(x). (x).

)(1 xAA

1616

Fuzzy LogicFuzzy Logic A fuzzy logic proposition, P, involves some A fuzzy logic proposition, P, involves some

concept without clearly defined boundaries.concept without clearly defined boundaries. Most natural language is fuzzy and involves Most natural language is fuzzy and involves

vague and imprecise terms.vague and imprecise terms. Truth value assigned to P can be any value Truth value assigned to P can be any value

on the interval [0, 1].on the interval [0, 1]. The degree of truth for P: xThe degree of truth for P: xA is equal to A is equal to

the membership grade of xthe membership grade of xA.A. Negation, disjunction, conjunction, and Negation, disjunction, conjunction, and

implication are also defined for a fuzzy logic.implication are also defined for a fuzzy logic.

1717

Fuzzy Fuzzy SetSet for Data Mining for Data Mining How could fuzzy data be considered for How could fuzzy data be considered for

association rule mining?association rule mining? How could the concept of fuzzy set be used for How could the concept of fuzzy set be used for

classification involving fuzzy classes.classification involving fuzzy classes. E.g. Risk classification = {High, Medium, Low}E.g. Risk classification = {High, Medium, Low}

With fuzzy sets, how could clustering be With fuzzy sets, how could clustering be performed to take into consideration:performed to take into consideration: Overlapping of clusters, andOverlapping of clusters, and To allow a record to belong to different clusters to To allow a record to belong to different clusters to

different degrees.different degrees.

1818

Fuzzy AssociationFuzzy Association The interestingness measures: AThe interestingness measures: ABB

Lift Ratio: Pr(B|A)/Pr(B).Lift Ratio: Pr(B|A)/Pr(B). Support and Confidence: Pr(A,B) and Pr(B|A).Support and Confidence: Pr(A,B) and Pr(B|A).

How much do you count?How much do you count?

EggsEggs CheeseCheese Water MellonWater Mellon

2 boxes2 boxes Low FatLow Fat {({(SmallSmall, 0.35), (, 0.35), (MediumMedium, 0.65)} , 0.65)}

1 box1 box Hi CalHi Cal {({(SmallSmall, 0.5), (, 0.5), (MediumMedium, 0.5)} , 0.5)}

3 boxes3 boxes RegularRegular {({(MediumMedium, 0.75), (, 0.75), (HighHigh, 0.25)} , 0.25)}

1 box1 box Low FatLow Fat {({(MediumMedium, 0.3), (, 0.3), (HighHigh, 0.7)} , 0.7)}

3 boxes3 boxes Hi CalHi Cal {({(MediumMedium, 0.4), (, 0.4), (HighHigh, 0.6)} , 0.6)}

1919

Fuzzy ClassificationFuzzy Classification

Information GainInformation Gain

How again do you count if a customer How again do you count if a customer belongs partially to both a “high risk” belongs partially to both a “high risk” and “low risk” group?and “low risk” group?

SSBB ppppCM 22 loglog)(

bits 954.08

5log

8

5

8

3log

8

3)( 22

CM

2020

Fuzzy ClusteringFuzzy Clustering The mean height value for cluster 2 (short) is 5’3” The mean height value for cluster 2 (short) is 5’3”

and cluster 3 (medium) is 5’7”.and cluster 3 (medium) is 5’7”. You are just over 5'5” and are classified "medium".You are just over 5'5” and are classified "medium". Fuzzy k-means is an extension of k-means.Fuzzy k-means is an extension of k-means. A membership value of each observation to each A membership value of each observation to each

cluster is determined.cluster is determined. User specifies a fuzzy MF.User specifies a fuzzy MF. A height of 5'5'' may give you a membership value A height of 5'5'' may give you a membership value

of 0.4 to cluster 1, 0.4 to cluster 2 and 0.1 to of 0.4 to cluster 1, 0.4 to cluster 2 and 0.1 to cluster 3.cluster 3.

Part IIPart IIFuzzy Rule InferencesFuzzy Rule Inferences

2222

Approximate ReasoningApproximate Reasoning Reasoning about imprecise propositions is referred Reasoning about imprecise propositions is referred to as approximate reasoning.to as approximate reasoning.

Given fuzzy rules: (1) If x is A Then y is B.Given fuzzy rules: (1) If x is A Then y is B. Induce a new antecedent, say A', find B' by fuzzy Induce a new antecedent, say A', find B' by fuzzy

composition:composition: B' = A' B' = A' R R

The idea of an inverse relationship between fuzzy The idea of an inverse relationship between fuzzy antecedents and fuzzy consequences arises from antecedents and fuzzy consequences arises from the composition operation.the composition operation.

The inference represent an approximate linguistic The inference represent an approximate linguistic characteristic of the relation between two universes characteristic of the relation between two universes of discourse, X and Y.of discourse, X and Y.

2323

Graphical Techniques of Graphical Techniques of InferenceInference Procedures (matrix operations) to conduct Procedures (matrix operations) to conduct

inference of IF-THEN rules illustrated.inference of IF-THEN rules illustrated. Use graphical techniques to conduct the Use graphical techniques to conduct the

inference computation manually with a few inference computation manually with a few rules to verify the inference operations.rules to verify the inference operations.

The graphical procedures can be easily The graphical procedures can be easily extended and will hold for fuzzy ESs with extended and will hold for fuzzy ESs with any number of antecedents (inputs) and any number of antecedents (inputs) and consequent (outputs).consequent (outputs).

2424

An ExampleAn Example•Conditions of two rules, R1 and R2, are both matched.