fuzzy meta-association rulesdecsai.ugr.es/~mdruiz/2015_ifsa-eusflat-pres.pdf · 3rd july 2015....
TRANSCRIPT
Fuzzy Meta-Association Rules
M.D. Ruiz, J. Gomez-Romero, M.J. Martin-Bautista,D. Sanchez, M.A. Vila, M. Delgado
3rd July 2015
Motivation
I Datasets are often distributed and are processed separately (severalmining processes are carried out over data with similar meaningcoming from a different source)
I Raw data may not be always available. For instance, stream data istemporarily available for processing or only summarized knowledgeis preserved.
I Sometimes organizations may not be allowed to disclose theirprimary data (due to privacy and legal restrictions), but they canshare some results or summaries obtained from them.
Solution: Perform pattern analysis instead of data analysis.
2
Motivation
I A new paradigm arise: Higher Order Mining (HOM) concerned withapplying mining techniques over patterns/models derived from oneor more large and/or complex datasets
I New information by combining several Data Mining techniques(association discovery, clustering, classification, trend analysis, etc.)
I Our proposal combines association rule mining techniques in boththe primary and the processed data → meta-association rules.
I Meta-association rules are rules about rules, i.e. they can containrules in the antecedent and/or in the consequent.
I We can use meta-association rules when the provided information isin the form of rules or to obtain new information about distributeddatabases.
3
Overview
4
Example in Crime Data Analysis
I We want to study the relation between crime incidents happened inthe city of Chicago and the educational systems by district.
I Each district of the Chicago has its own dataset: D1, D2, . . . , Dk
some of them sharing some of their attributes.
I Association rule mining algorithms are executed separately in eachdistrict obtaining different sets of rules: R1, R2, . . . , Rk.
I There are several attributes concerning/describing some aspects ofthe districts: at1, at2, . . . , atm
Proposal:
Use Meta-Association Rules to obtain interesting information
5
Proposal
…"R1# R2# Rk&1# Rk#
"Meta#database#
Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""
Meta&associa1on#rules#
6
Association Rules
I Data is usually stored in datasets D composed by transactions ti(rows) and attributes (columns).
I We call item to a pair 〈attribute, value〉 or 〈attribute, interval〉.
D i1 i2 . . . ij ij+1 . . . im
t1 1 0 . . . 0 1 . . . 0t2 0 1 . . . 1 1 . . . 1...
......
. . ....
.... . .
...tn 1 1 . . . 0 1 . . . 1
I Association Rules are expressions of the form A→ B where A, Bare non-empty set of items with no intersection.
I An association rule represents a relation between the jointco-occurrence of A and B.
7
Association Rules
I The support of an itemset A is defined as probability that atransaction contains the item
supp(A) =|t ∈ D : A ⊆ t|
|D|
I For assessing the ARs validity, the most common measures aresupport (joint probability P (A ∪B)) and confidence (conditionalprobability P (B|A)
Supp(A→ B) =supp(A ∪B)
|D|; Conf(A→ B) =
supp(A ∪B)
supp (A)
that must be ≥ minsupp and ≥ minconf resp. (thresholdsimposed by the user), that is, the rule is frequent and confident.
8
Association Rules
I An alternative framework is to measure the accuracy by means ofthe certainty factor, CF (A→ B)
Conf(A→ B)− supp(B)
1− supp(B)if Conf(A→ B) > supp(B)
Conf(A→ B)− supp(B)
supp(B)if Conf(A→ B) < supp(B)
0 otherwise.
I CF measures how our belief that B is in a transaction changes whenwe are told that A is in that transaction.
I Certainty factor has better properties than confidence and otherquality measures, in particular, it helps to reduce the number ofrules obtained by filtering those rules corresponding to statisticalindependence or negative dependence.
I When CF (A→ B) ≥ minCF the rule is called certain.
9
Fuzzy Association Rules
I I a finite set of items.
I Fuzzy transaction: a non empty fuzzy subset τ ⊆ I.
I An item i ∈ I will belong to τ with degree τ(i) ∈ [0, 1].
I An itemset A ⊂ I belongs to τ with degree
τ(A) = mini∈A
τ(i)
I A fuzzy association rule A→ B is satisfied in D ⇔
τ(A) ≤ τ(B) ∀ τ ∈ D
I This definition preserves the usual meaning of crisp association rules.
10
Fuzzy Association RulesExample:
I Set of items I = {i1, i2, i3, i4}I Set of fuzzy transactions
i1 i2 i3 i4τ1 0 0.6 0.7 0.9τ2 0 1 0 1τ3 1 0.5 0.75 1τ4 1 0 0.1 1τ5 0.5 1 0 1τ6 1 0 0.75 1
I τ2 is a crisp transaction.
I Some inclusion degrees are: τ1({i3, i4}) = 0.7, τ1({i2, i3, i4}) = 0.6and τ4({i1, i4}) = 1.
11
Fuzzy Association Rules
To assess fuzzy association rules, we employ a proposal based onquantified sentence evaluation using the fuzzy quantifier QM (x) = xrepresenting the quantifier “most” in the following way:
I The support of an itemset A is the evaluation of the quantifiedsentence “most of the transactions in D are A” where A is thefuzzy set defined as µA(τ) = τ(A).
I The support of a fuzzy rule A→ B, noted by FSupp(A→ B), isthe evaluation of the quantified sentence “most of the transactionsin D are A ∩B”.
I FConf(A→ B), is the evaluation of the quantified sentence “mostof the transactions of A are B” .
I FCF (A→ B), is computed using the fuzzy versions of support andconfidence.
12
Meta-Association Rules
Meta-association rules are association rules where theantecedent or the consequent can contain regular rules that have
been previously extracted with a high reliability in a highpercentage of the source databases.
13
Proposal
…"R1# R2# Rk&1# Rk#
"Meta#database#
Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""
Meta&associa1on#rules#
14
Crisp Meta-Association Rules
1. From each database a set of rules Ri is obtained.
2. We compile these rules in a new database D joint with theattributes at1, . . . , atm.
D r1 r2 · · · rn at1 · · · atmD1 1 1 · · · 0 1 · · · 1D2 0 1 · · · 0 0 · · · 1
......
.... . .
......
. . ....
Dk 1 0 · · · 1 1 · · · 0
3. We search meta-association rules that will involve the rulespreviously extracted r1, . . . , rn and the attributes at1, . . . , atmadded.
15
Why using fuzzy meta-rules?
I Crisp meta-rules are discovered by taking into account only if anassociation has been previously mined from the original dataset ornot.
I This means that, to build meta-rules, a rule mined with aconfidence of 0.9 has the same importance than another rule with aconfidence of 0.5.
⇒ It is convenient to consider the available measures, like theconfidence, to quantify the importance of the rule in the HOMprocess.
Solution:
Fuzzy Meta-Association Rules
16
Fuzzy Meta-Association Rules
1. From each database we have: the set of rules Ri and theirassessment value ∈ [0, 1] (Conf or CF ).
2. We compile these rules in a fuzzy database joint with the attributesat1, . . . , atm that can be fuzzy.
D r1 r2 · · · rn at1 · · · atmD1 0.8 1 · · · 0 0.6 · · · 1D2 0 0.3 · · · 0 0 · · · 0.7
......
.... . .
......
. . ....
Dk 0.9 0 · · · 0.5 1 · · · 0
3. Mining association rules in D will discover fuzzy meta-associationrules.
17
Crisp and Fuzzy Meta-Association Rules
Formally, we can obtain three types of meta-association rules:
I ri → rj where ri, rj can be rules or a conjunction of rules.For example: ri = ri1 ∧ · · · ∧ ris.
I ati → atj where ati, atj can be attributes or a conjunction ofattributes.
I ri → atj or atj → ri where ri, atj can be a conjunction ofrules and a conjunction of attributes resp., and they can bemixed. For example: r1 ∧ at2 → r3.
18
Experimental Evaluation: DataSet Description
I 22 Databases about crime related to the districts in the city ofChicago
I Number of transactions: min = 5694 and max = 22493.
I 6 types of attributes (around 300 items) in each database:
• Quarter of the year in which the incident happened.• Day period: morning, afternoon, evening, night.• Crime description according to police standard protocols.• Location description: street, residence, etc.• Arrest, if there is an arrest associated to the crime.• Domestic, if the crime happened in a domestic environment.
I Additional attributes about the districts:
• Number of students in the district: low, medium, high, veryhigh.
• Number of misconducts notified in the district: low, very low,medium, high, very high.
• Perceived safety index, obtained by means of surveys: low,medium, high.
19
Experimental Evaluation: Some Results
Example of obtained meta-association rule:
“IFnumber of misconducts = Very high
THENCrime-Desc.= POSS:CANNABIS ≤ 300GMS → Domestic = f
AND Safety-Index=Low”
with FSupp = 0.136 and FCF = 0.658.
That means that when there is a very high number of misconducts then
it is frequent and reliable to have a low perception of security and a
relation between the low possession of cannabis crime and its occurrence
in a non-domestic environment.
20
Experimental Evaluation: Some Results
100
1000
10000
100000
1e+006
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
#ru
les
minCF/minFCF
Chicago database
DDCF
0
2000
4000
6000
8000
10000
12000
14000
16000
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
tim
ein
sec
minCF / minFCF
Chicago database
DDCF
Figure: Left: Number of crisp/fuzzy meta-association rules (y-axis in logarithmicscale) vs minCF/minFCF (x-axis) when minSupp = 0.05. Right: Time in sec. formining crisp/fuzzy meta-association rules (y-axis) vs minCF/minFCF (x-axis) whenminSupp = 0.05.
21
Experimental Evaluation: Some Results
Antecedent Consequent (F )Supp (F )CF D DCF
Number-of-Misconducts=Very high (Crime-Desc.=POSS:
CANNABIS ≤30GMS→Domestic=f)AND
Safety-Index=Low
0.136 0.658 x X
(Crime-Desc. = ≤ 500$ →Domestic=f) AND (Crime-Desc.=TO
VEHICLE → Arrest=f)
(Location-Description=STREET
→ Domestic=f)
0.455 0.778 X x
(Arrest=t→Domestic=f) AND
Safety-Index=Low
Number-of-Students=Medium 0.064 0.7 X x
Number-of-Students=Low AND
Number-of-Misconducts=Very high
Safety-Index=Low 0.091 1 x X
Table: Examples of meta-association rules found in the City of Chicago dataset forminSupp = minFSupp = 0.05 and minCF = minFCF = 0.5.
22
Conclusions and Future Research
I We have have proposed different types of meta-association rules:crisp and fuzzy meta-rules.
I Fuzzy meta-rules take advantage of the assessment measuresprovided when mining rules from the original datasets.
I The databases considered have the same structure (i.e. the sameset of attributes).
• It would be convenient to address the problem of havingdatasets with similar attributes that are not related beforehand.
• The information should be semantically integrated allowing tolink attributes corresponding to similar semantics.
Future: Using a knowledge repository assisting the algorithm inmatching similar items.
23
References
[Delgado et al. 2000] - M. Delgado, D. Sanchez, and M.A. Vila . Fuzzycardinality based evaluation of quantified sentences. Int. J. ofApproximate Reasoning, 23: 23-66, 2000.
[Delgado et al. 2003] - M. Delgado, N. Marın, D. Sanchez, and M.A.Vila. Fuzzy association rules: General model and applications. IEEETransactions on Fuzzy Systems, 11 (2), pp. 214-225, 2003.
[Ruiz et al. 2015] - M. D. Ruiz, J. Gomez-Romero, M. J.Martin-Bautista, D.Sanchez . Meta-association rules for fusingregular association rules from different databases. In Proc. of the17th Int. Conf. on Information Fusion, 2014.
24
Thank you. Any questions?
25