on applications of rough sets theory to knowledge discovery frida coaquira university of puerto rico...
TRANSCRIPT
On Applications of Rough On Applications of Rough Sets theory to Knowledge Sets theory to Knowledge
DiscoveryDiscoveryFrida Coaquira
UNIVERSITY OF PUERTO RICOMAYAGÜEZ CAMPUS
Introduction
One goal of the Knowledge Discovery is extract meaningfulknowledge.Rough Sets theory was introduced by Z. Pawlak (1982) as a mathematical tool for data analysis.
Rough sets have many applications in the field ofKnowledge Discovery: feature selection, discretizationprocess, data imputations and create decision Rules.
Rough set have been introduced as a tool to deal with,uncertain Knowledge in Artificial Intelligence Application.
Equivalence RelationEquivalence Relation
Let X be a set and let x, y, and z be elements of X. An equivalence relation R on X is a Relation on X such that:
Reflexive Property: xRx for all x in X.
Symmetric Property: if xRy, then yRx.
Transitive Property: if xRy and yRz, then xRz.
Rough Sets Theory
Let , be a Decision system data,
Where: U is a non-empty, finite set called the universe ,
A is a non-empty finite set of attributes, C and D are subsetsof A, Conditional and Decision attributes subsets respectively.
for is called the value set of a , The elements of U are objects, cases, states, observations.
The Attributes are interpreted as features, variables,
characteristics conditions, etc.
),,,( DCAUT
aVUa : ,Aa aV
Indiscernibility Relation
The Indecernibility relation IND(P) is an
equivalence relation.
Let , , the indiscernibility
relation IND(P), is defined as follows:
for all
Aa AP
:),{()( UUyxPIND ,Pa )}()( yaxa
Indiscernibility Relation
The indiscernibility relation defines a partition in U.
Let , U/IND(P) denotes a family of all equivalence
classes of the relation IND(P), called elementary sets.
Two other equivalence classes U/IND(C) and
U/IND(D), called condition and decision equivalence
classes respectively, can also be defined.
AP
R-lower approximation
Let and , R is a subset of conditional
features, then the R-lower approximation
set of X, is the set of all elements of U which
can be with certainty classified as elements of X.
R-lower approximation set of X is a subset of X
CR UX
}:/{ XYRUYXR
R-upper approximation
the R-upper approximation set of X, is theset of all elements of U such that:
X is a subset of R-upper approximation set of X.R-upper approximation contains all data which can possiblybe classified as belonging to the set X
the R-Boundary set of X is defined as:
}:/{ XYRUYXR
XRXRXBN )(
Representation of the approximation sets
XRXR If then, X is R-definible (the boundary set is empty) If then X is Rough with respect to R.
ACCURACY := Card(Lower)/ Card (Upper)
XRXR
Decision Class
The decision d determines the partition
of the universe U.
Where for
will be called the classification of objects
in T
determined by the decision d.
The set Xk is called the k-th decision class of T
},...,{)( )(1 drT XXdCLASS
})(:{ kxdUxX k )(1 drk
)(dCLASST
Decision Class
This system data information has 3 classes, We represent the partition: lower approximation, upper approximation and boundary set.
Rough Sets Theory
Lets consider U={x1, x2, x3, x4, x5, x6, x7, x8} and the
equivalence relation R with the equivalence classes:
X1={x1,x3,x5}, X2={x2,x4}and X3={x6,x7,x8} is a Partition.
Let the classification C={Y1,Y2,Y3} such that
Y1={x1, x2, x4}, Y2={x3, x5, x8}, Y3={x6, x7}
Only Y1has lower approximation, i.e. ,21 XYR
Positive region and Reduct
Positive region
POSR(d) is called the positive region of classification
CLASST(d) is equal to the union of all lower approximation
of decision classes.
Reducts ,are defined as minimal subset of condition
attributes which preserve positive region defined by the set
of all condition attributes, i.e.
A subset is a relative reduct iff
1 ,
2 For every proper subset condition 1 is not true.
)()( DPOSDPOS CR CR
RR '
Dependency coefficient
Is a measure of association, Dependency coefficient
between condition attributes A and a decision attribute d is
defined by the formula:
Where, Card represent the cardinality of a set.
)(
))((),(
UCard
dPOSCarddA A
Discernibility matrix
Let U={x1, x2, x3,…, xn} the universe on decision system
Data. Discernibility matrix is defined by:
,
where, is the set of all attributes that classify objects
xi and xj into different decision classes in U/D partition.
for some i, j } .
))}()(,()()(:{ jijiij xdxdDdxaxaCam nji ,...,3,2,1,
ijm
}{:{)( amCaCCORE ij
Dispensable feature
Let R a family of equivalence relations and let P R,
P is dispensable in R if IND(R) = IND(R-{P}),
otherwise P is indispensable in R.
CORE
The set of all indispensable relation in C will be called the
core of C.
CORE(C)= ∩RED(C), where RED(C) is the family of all
reducts of C.
Small Example
Let , the universe set.
, the conditional features set.
, Decision features set.
},,,,,,{ 7654321 xxxxxxxU
},,,{ 4321 aaaaC
}{dD
d
1 0 2 1 1
1 0 2 0 1
1 2 0 0 2
1 2 2 1 0
2 1 0 0 2
2 1 1 0 2
2 1 2 1 1
1a 2a 3a 4a
1x
2x
3x
4x
5x
6x
7x
{,,{, {{, {,{,,,{,,{,,, {,,,,,{,,, ,,
,,
Discernibility Matrix
-
-
- -
- -
1x2x 3x 4x 5x 6x
2x
3x
4x
5x
6x
7x
},,{ 432 aaa
}{ 2a
},{ 32 aa
},{ 42 aa
},,{ 321 aaa},,,{ 4321 aaaa
},,,{ 4321 aaaa },,{ 321 aaa
},,,{ 4321 aaaa
},,,{ 4321 aaaa
},,,{ 4321 aaaa
},{ 43 aa
},{ 43 aa},{ 43 aa},{ 21 aa
Example
Then, the Core(C) = {a2}
The partition produces by Core is
U/{a2} = {{ x1,x2 },{x5, x6,x7 },{x3,x4 }},
and the partition produces by the decision feature d is
U/{d}={{ x4},{ x1,x2 ,x7 },{x3 ,x5 ,x6 }}
Similarity relation
A similarity relation on the set of objects is
, It contain all objects similar to x.
Lower approximation
, is the set of all element of U
which can be with certainty classified as elements of X.
Upper approximation
SIM-Possitive region of partition
Let
}:{ xySIMUyxSIM TT
}:{)( XxSIMXxXSIM TT
UX
Xx
TT xSIMXSIM
)(
)}(,...,1:{ driX i
})(:{ ixdUxX i )(
1
)(}){(dr
iiTT XSIMdSIMPOS
UX
Similarity measures
a
b
are parameters, this measure is not symmetric.
Similarity for nominal attribute
minmax
1),(aa
vvvvS
ji
jia
otherwise. 0
if 1),( jia vvS ajaji vvv
aa ,
)(
1 )().(
),(),(),(
dr
k
ji
jia kdPdr
vakdPvakdPvvS
Quality of approximation of classification
Is the ratio of all correctly classified objects to all objects.
Relative Reduct
is s relative reduct for SIMA{d} iff
1)
2) for every proper subset condition 1) is not true.
)(
})){((}){(
UCard
dSIMPOSCarddSIM T
T
AR }){(}){( dSIMPOSdSIMPOS RA
Attribute Reduction
The purpose is select a subset of attributes from an Original
set of attributes to use in the rest of the process.
Selection criteria: Reduct concept description.
Reduct is the essential part of the knowledge, which define
all basic concepts.
Other methods are:• Discernibility matrix (n×n)• Generate all combination of attributes and then evaluate
the classification power or dependency coefficient (complete search).
Discretization Methods
The purpose is development an algorithm that find aconsistent set of cuts point which minimizes the number ofRegions that are consistent.Discretization methods based on Rough set theory try to findThese cutpoints A set of S points P1, …, Pn in the plane R2 , partitioned intotwo disjoint categories S1, S2 and a natural number T. Is there a consistent set of lines such that the partition of theplane into region defined by them consist of at most Tregions?
Consistent
Def. A set of cuts P is consistent with A (or A-consistent) iff,
where and are general decisions of A and AP
respectively.
Def. A set Pirr of cuts is A-irreducible iff Pirr is A-consistent
and any its proper subfamily P’ ( P’ PPirr) is not
A-inconsistent.
PAA A
PA
Level of Inconsistency
Let B a subset of A and
Where Xi is a classification of U and
, i = 1,2,…,n
Lc represents the percentage of instances which can beCorrectly classified into class Xi with respect to subset B.
U
XBL i
c
ji XX
UX i
Imputation Data
The rules of the system should have Maximum in terms of consistency.
The relevant attributes for x is defined by.
is defined }
And the relation
for all
x and y are consistent if .Example
Let x=(1,3,?,4), y=(2,?,5,4) and z=(1,?,5,4)
x and z are consistent
x and y are not consistent
)(:{)( xaRaxrelR
)()( yaxayxRc )()( yrelxrela RR yxRc
zxRc
Decision rules
F1F1 F2F2 F3F3 F4F4 DD RuleRuless
O3O3 00 00 00 11 LL R1R1
O5O5 00 00 11 33 LL R1R1
O1O1 00 11 00 22 LL R2R2
O4O4 00 11 11 00 MM R3R3
O2O2 11 11 00 22 HH R4R4
Rule1 if (F2=0) then (D=L)Rule2 if (F1=0) then (D=L)Rule3 if (F4=0) then (D=M)Rule4 if (F1=0) then (D=H)
The algorithm should minimize the number of features included in decision rules.
ReferencesReferences
[1] Gediga, G. And Duntsch, I. (2002) Maximum Consistency of [1] Gediga, G. And Duntsch, I. (2002) Maximum Consistency of Incomplete Data Via Non-invasive Imputation. Artificial Incomplete Data Via Non-invasive Imputation. Artificial Intelligence. Intelligence.
[2] Grzymala, J. and Siddhave, S. (2004) Rough set Approach to Rule [2] Grzymala, J. and Siddhave, S. (2004) Rough set Approach to Rule Induction from Incomplete Data. Proceeding of the IPMU’2004, Induction from Incomplete Data. Proceeding of the IPMU’2004, the10th International Conference on information Processing and the10th International Conference on information Processing and Management of Uncertainty in Knowledge-Based System.Management of Uncertainty in Knowledge-Based System.
[3] Pawlak, Z. (1995) Rough sets. Proccedings of the 1995 ACM 23rd [3] Pawlak, Z. (1995) Rough sets. Proccedings of the 1995 ACM 23rd annual conference on computer science.annual conference on computer science.
[4]Tay, F. and Shen, L. (2002) A modified Chi2 Algorithm for [4]Tay, F. and Shen, L. (2002) A modified Chi2 Algorithm for Discretization. In IEEE Transaction on Knowledge and Data Discretization. In IEEE Transaction on Knowledge and Data engineering, Vol 14, No. 3 may/june.engineering, Vol 14, No. 3 may/june.
[5] Zhong, N. (2001) Using Rough Sets with Heuristics for Feature [5] Zhong, N. (2001) Using Rough Sets with Heuristics for Feature Selection. Journal of Intelligent Information Systems, 16, 199-214, Selection. Journal of Intelligent Information Systems, 16, 199-214, Kluwer Academic Publishers. Kluwer Academic Publishers.