comp 578 fuzzy sets in data mining keith c.c. chan department of computing the hong kong polytechnic...
Post on 21-Dec-2015
218 views
TRANSCRIPT
COMP COMP 578578Fuzzy Fuzzy SetsSets in Data Mining in Data Mining
Keith C.C. ChanKeith C.C. Chan
Department of ComputingDepartment of Computing
The Hong Kong Polytechnic The Hong Kong Polytechnic UniversityUniversity
22
Fuzzy Data and Fuzzy Data and AssociationsAssociations
Fuzzy Fuzzy associationsassociations.. People who buy People who buy largelarge water melon also water melon also
buy buy manymany oranges. oranges. Fuzzy data in databases.Fuzzy data in databases.
E.g. E.g. Large water melonLarge water melon Definition of “Definition of “largelarge” = [” = [5kg,5kg, 10kg10kg]?]?
E.g. E.g. Many orangesMany oranges Definition of “Definition of “manymany” = [” = [10, 2010, 20]?]?
33
Fuzziness in The Real WorldFuzziness in The Real World Human reason approximately about behavior of a very Human reason approximately about behavior of a very
complex system.complex system. Closed-form mathematical expressionsClosed-form mathematical expressions, e.g.,, e.g.,
provide precise descriptions of systemsprovide precise descriptions of systems with little complexity and uncertainty.with little complexity and uncertainty.
Fuzzy Fuzzy logic and logic and reasoning for complex systems:reasoning for complex systems: When When no numerical data existno numerical data exist.. When When only ambiguous or imprecise information is available.only ambiguous or imprecise information is available. When When behavior behavior can only be can only be described and understood by:described and understood by:
Relating Relating observed input and outputobserved input and output ap approximately proximately rather than exactly.rather than exactly.
2
21
21)(
XeXf
44
Uncertainty and ImprecisionUncertainty and Imprecision Probability theory Probability theory for modelingfor modeling uncertainty arising from uncertainty arising from
randomnessrandomness (a matter of chance). (a matter of chance). Fuzzy set theory for modeling uncertainty associated with Fuzzy set theory for modeling uncertainty associated with
vagueness, imprecision vagueness, imprecision ((lack of informationlack of information).). Human communicatHuman communicatee with a computer requires extreme with a computer requires extreme
precisionprecision (e.g. (e.g. instructionsinstructions in a software program). in a software program). Natural language is vague and imprecise but powerful.Natural language is vague and imprecise but powerful. Two individuals Two individuals communicate in natural language that is communicate in natural language that is
vague and imprecise but powerful.vague and imprecise but powerful. They They do not require do not require an an identical definition of “tall” to identical definition of “tall” to
communicate effectively but computer would require a communicate effectively but computer would require a specific height.specific height.
Fuzzy set theory uses linguistic variables, rather than Fuzzy set theory uses linguistic variables, rather than quantitative variables, to represent imprecise concepts.quantitative variables, to represent imprecise concepts.
55
ApplicationsApplications of Fuzzy of Fuzzy LogicLogic
Sanyo fuzzy logic camcordersSanyo fuzzy logic camcorders.. FFuzzy focusing and image stabilization.uzzy focusing and image stabilization.
Mitsubishi fuzzy air conditionerMitsubishi fuzzy air conditioner.. CControls ontrols TToo changes according to human comfort indexes.changes according to human comfort indexes.
Matsushita fuzzy washing machineMatsushita fuzzy washing machine.. Sensors detect colorSensors detect color,, kind of clothes kind of clothes,, the quantity of grit the quantity of grit.. Select Select combinations of water temperature, detergent amount and combinations of water temperature, detergent amount and
wash and spin cycle time.wash and spin cycle time. Sendai's 16-station subway system.Sendai's 16-station subway system.
Fuzzy cFuzzy controller makes 70% fewer judgment errors in acceleration ontroller makes 70% fewer judgment errors in acceleration and braking than human operators.and braking than human operators.
Nissan fuzzy autoNissan fuzzy auto--transmission transmission & & anti-skid brakinganti-skid braking.. Tokyo's stock marketTokyo's stock market..
AAt least one stock-trading portfolio based on fuzzy logic that t least one stock-trading portfolio based on fuzzy logic that outperformed the Nikkei Exchange average.outperformed the Nikkei Exchange average.
Fuzzy golf diagnostic systems, fuzzy toasters, fuzzy rice Fuzzy golf diagnostic systems, fuzzy toasters, fuzzy rice cookers, fuzzy vacuum cleanerscookers, fuzzy vacuum cleaners, etc., etc.
66
Classical SetsClassical Sets X = universe of discourse = the set of all objects X = universe of discourse = the set of all objects
with the same characteristics.with the same characteristics. Let nLet nxx = cardinality = total number of elements in X. = cardinality = total number of elements in X. For crisp sets A and B in X, we define:For crisp sets A and B in X, we define:
x x A A x belongs to A. x belongs to A. x x A A x does not belong to A. x does not belong to A.
For sets A and B on X:For sets A and B on X: A A B B xxA, xA, xB.B. A A B B A is fully contained in B. A is fully contained in B. A = B A = B A A B and B B and B A. A.
The null set, The null set, , contains no elements., contains no elements.
77
Operations on Classical Operations on Classical SetsSets
Union:Union: AAB = {x | x B = {x | x A or x A or x B}. B}.
Intersection:Intersection: AAB = {x | x B = {x | x A and x A and x B}. B}.
Complement:Complement: AAcc = {x | x = {x | x A, x A, x X}. X}.
88
Classical Sets in Association Classical Sets in Association MiningMining
How do you define the set of large water melons?How do you define the set of large water melons? Large Water Melons = {x | 5kg < weight(x) < 10kg}.Large Water Melons = {x | 5kg < weight(x) < 10kg}.
How do you define the set of very large water melons?How do you define the set of very large water melons? Very Large Water Melons = {x | weight(x) > 10kg}.Very Large Water Melons = {x | weight(x) > 10kg}.
What about a water melon that is exactly 9.9kg?What about a water melon that is exactly 9.9kg? What about a water melon that is exactly 10.1kg?What about a water melon that is exactly 10.1kg? The difference of 0.2kg makes one large and the other The difference of 0.2kg makes one large and the other
very large!very large!
99
Fuzzy SetsFuzzy Sets TTransition between membership and non-ransition between membership and non-
membership membership can be gradualcan be gradual.. FFuzzy set containuzzy set containss elements which have elements which have
varying degrees of membership.varying degrees of membership. Degree of membershipDegree of membership measured by a measured by a
functionfunction.. Function maps elements to a real numbered Function maps elements to a real numbered
value on the interval 0 to 1value on the interval 0 to 1,, AA[0,1][0,1].. Elements in a fuzzy set can also be members Elements in a fuzzy set can also be members
of other fuzzy sets on the same universe.of other fuzzy sets on the same universe.
1010
A Fuzzy Set ExampleA Fuzzy Set Example Example:Example:
A water melon of exactly 9.9kg can A water melon of exactly 9.9kg can belong to:belong to: The set “large water melon” with a degree The set “large water melon” with a degree
of 0.1, and toof 0.1, and to The set of “very large water melon” with a The set of “very large water melon” with a
degree of 0.9.degree of 0.9.
But how do we determine the But how do we determine the degree of membership?degree of membership? It can be found from a fuzzy membership It can be found from a fuzzy membership
function.function.
1111
A A Membership FunctionMembership Function
0.5
1.0
0.05kg 8kg 9kg 10kg3kg
VeryLargewatermelon
Large watermelon
1212
Representing Degree of Representing Degree of MembershMembershipip
For a fuzzy set A, its membership For a fuzzy set A, its membership function is represented asfunction is represented as AA..
AA(x(xii) is the degree of membership ) is the degree of membership of xof xi i with respect to A.with respect to A.
For example, For example, Let A = Large water melonLet A = Large water melon Let xLet xii be a water melon of 9.9kg.be a water melon of 9.9kg. From the membership function in the From the membership function in the
last slide, last slide, AA(x(xii) ) = 0.1.= 0.1.
1313
Representing Representing Fuzzy SetsFuzzy Sets A notation convention for fuzzy setsA notation convention for fuzzy sets::
Numerator is membership valueNumerator is membership value, h, horizontal orizontal bar is delimiterbar is delimiter, , Plus sign denotes a function-Plus sign denotes a function-theoretic union.theoretic union.
Alternatively,Alternatively,
In general, e.g.In general, e.g.
...)()(
A2
2A
1
1A x
x
x
x
))}(,(,)),(,()),(,{(A A2A21A1 nn xxxxxx
}2
)(exp)(],,[|)(,{(A
2
2
A
AA
xxbaxxx
1414
Example of A Example of A Fuzzy SetFuzzy Set RepresentationRepresentation
A definition of the fuzzy set LW=“Large A definition of the fuzzy set LW=“Large Water Melon”.Water Melon”.
Alternatively,Alternatively, LW = {(6kg, 0.25), (7kg, 0.75), (8kg, 1.0), LW = {(6kg, 0.25), (7kg, 0.75), (8kg, 1.0),
(9.9kg, 0.1), …}(9.9kg, 0.1), …} In general, e.g.In general, e.g.
...9.9
1.0
8
0.1
7
75.0
6
25.0LW
kgkgkgkg
kgxwt
kgxwtkgxwt
kgxwtkg
kgxwtkgxwt
kgxwt
xkgkgxxx LWLW
10)(0
10)(910)(
9)(81
8)(53
5)(
3
15)(0
)(],20,0[|)(,{(LW
1515
Fuzzy Set OperationsFuzzy Set Operations Union:Union:
AABB(x) = max((x) = max(AA(x), (x), BB(x)). (x)).
Intersection:Intersection: AABB(x) = min((x) = min(AA(x), (x), BB(x)). (x)).
Complement:Complement:
Containment:Containment: If A If A X X AA(x) (x) XX(x). (x).
)(1 xAA
1616
Fuzzy LogicFuzzy Logic A fuzzy logic proposition, P, involves some A fuzzy logic proposition, P, involves some
concept without clearly defined boundaries.concept without clearly defined boundaries. Most natural language is fuzzy and involves Most natural language is fuzzy and involves
vague and imprecise terms.vague and imprecise terms. Truth value assigned to P can be any value Truth value assigned to P can be any value
on the interval [0, 1].on the interval [0, 1]. The degree of truth for P: xThe degree of truth for P: xA is equal to A is equal to
the membership grade of xthe membership grade of xA.A. Negation, disjunction, conjunction, and Negation, disjunction, conjunction, and
implication are also defined for a fuzzy logic.implication are also defined for a fuzzy logic.
1717
Fuzzy Fuzzy SetSet for Data Mining for Data Mining How could fuzzy data be considered for How could fuzzy data be considered for
association rule mining?association rule mining? How could the concept of fuzzy set be used for How could the concept of fuzzy set be used for
classification involving fuzzy classes.classification involving fuzzy classes. E.g. Risk classification = {High, Medium, Low}E.g. Risk classification = {High, Medium, Low}
With fuzzy sets, how could clustering be With fuzzy sets, how could clustering be performed to take into consideration:performed to take into consideration: Overlapping of clusters, andOverlapping of clusters, and To allow a record to belong to different clusters to To allow a record to belong to different clusters to
different degrees.different degrees.
1818
Fuzzy AssociationFuzzy Association The interestingness measures: AThe interestingness measures: ABB
Lift Ratio: Pr(B|A)/Pr(B).Lift Ratio: Pr(B|A)/Pr(B). Support and Confidence: Pr(A,B) and Pr(B|A).Support and Confidence: Pr(A,B) and Pr(B|A).
How much do you count?How much do you count?
EggsEggs CheeseCheese Water MellonWater Mellon
2 boxes2 boxes Low FatLow Fat {({(SmallSmall, 0.35), (, 0.35), (MediumMedium, 0.65)} , 0.65)}
1 box1 box Hi CalHi Cal {({(SmallSmall, 0.5), (, 0.5), (MediumMedium, 0.5)} , 0.5)}
3 boxes3 boxes RegularRegular {({(MediumMedium, 0.75), (, 0.75), (HighHigh, 0.25)} , 0.25)}
1 box1 box Low FatLow Fat {({(MediumMedium, 0.3), (, 0.3), (HighHigh, 0.7)} , 0.7)}
3 boxes3 boxes Hi CalHi Cal {({(MediumMedium, 0.4), (, 0.4), (HighHigh, 0.6)} , 0.6)}
1919
Fuzzy ClassificationFuzzy Classification
Information GainInformation Gain
How again do you count if a customer How again do you count if a customer belongs partially to both a “high risk” belongs partially to both a “high risk” and “low risk” group?and “low risk” group?
SSBB ppppCM 22 loglog)(
bits 954.08
5log
8
5
8
3log
8
3)( 22
CM
2020
Fuzzy ClusteringFuzzy Clustering The mean height value for cluster 2 (short) is 5’3” The mean height value for cluster 2 (short) is 5’3”
and cluster 3 (medium) is 5’7”.and cluster 3 (medium) is 5’7”. You are just over 5'5” and are classified "medium".You are just over 5'5” and are classified "medium". Fuzzy k-means is an extension of k-means.Fuzzy k-means is an extension of k-means. A membership value of each observation to each A membership value of each observation to each
cluster is determined.cluster is determined. User specifies a fuzzy MF.User specifies a fuzzy MF. A height of 5'5'' may give you a membership value A height of 5'5'' may give you a membership value
of 0.4 to cluster 1, 0.4 to cluster 2 and 0.1 to of 0.4 to cluster 1, 0.4 to cluster 2 and 0.1 to cluster 3.cluster 3.
2222
Approximate ReasoningApproximate Reasoning Reasoning about imprecise propositions is referred Reasoning about imprecise propositions is referred to as approximate reasoning.to as approximate reasoning.
Given fuzzy rules: (1) If x is A Then y is B.Given fuzzy rules: (1) If x is A Then y is B. Induce a new antecedent, say A', find B' by fuzzy Induce a new antecedent, say A', find B' by fuzzy
composition:composition: B' = A' B' = A' R R
The idea of an inverse relationship between fuzzy The idea of an inverse relationship between fuzzy antecedents and fuzzy consequences arises from antecedents and fuzzy consequences arises from the composition operation.the composition operation.
The inference represent an approximate linguistic The inference represent an approximate linguistic characteristic of the relation between two universes characteristic of the relation between two universes of discourse, X and Y.of discourse, X and Y.
2323
Graphical Techniques of Graphical Techniques of InferenceInference Procedures (matrix operations) to conduct Procedures (matrix operations) to conduct
inference of IF-THEN rules illustrated.inference of IF-THEN rules illustrated. Use graphical techniques to conduct the Use graphical techniques to conduct the
inference computation manually with a few inference computation manually with a few rules to verify the inference operations.rules to verify the inference operations.
The graphical procedures can be easily The graphical procedures can be easily extended and will hold for fuzzy ESs with extended and will hold for fuzzy ESs with any number of antecedents (inputs) and any number of antecedents (inputs) and consequent (outputs).consequent (outputs).