fuzzy meta-association rulesdecsai.ugr.es/~mdruiz/2015_ifsa-eusflat-pres.pdf · 3rd july 2015....

25
Fuzzy Meta-Association Rules M.D. Ruiz, J. G´ omez-Romero, M.J. Martin-Bautista, D. S´ anchez, M.A. Vila, M. Delgado 3rd July 2015

Upload: others

Post on 12-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Fuzzy Meta-Association Rules

M.D. Ruiz, J. Gomez-Romero, M.J. Martin-Bautista,D. Sanchez, M.A. Vila, M. Delgado

3rd July 2015

Page 2: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Motivation

I Datasets are often distributed and are processed separately (severalmining processes are carried out over data with similar meaningcoming from a different source)

I Raw data may not be always available. For instance, stream data istemporarily available for processing or only summarized knowledgeis preserved.

I Sometimes organizations may not be allowed to disclose theirprimary data (due to privacy and legal restrictions), but they canshare some results or summaries obtained from them.

Solution: Perform pattern analysis instead of data analysis.

2

Page 3: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Motivation

I A new paradigm arise: Higher Order Mining (HOM) concerned withapplying mining techniques over patterns/models derived from oneor more large and/or complex datasets

I New information by combining several Data Mining techniques(association discovery, clustering, classification, trend analysis, etc.)

I Our proposal combines association rule mining techniques in boththe primary and the processed data → meta-association rules.

I Meta-association rules are rules about rules, i.e. they can containrules in the antecedent and/or in the consequent.

I We can use meta-association rules when the provided information isin the form of rules or to obtain new information about distributeddatabases.

3

Page 4: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Overview

4

Page 5: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Example in Crime Data Analysis

I We want to study the relation between crime incidents happened inthe city of Chicago and the educational systems by district.

I Each district of the Chicago has its own dataset: D1, D2, . . . , Dk

some of them sharing some of their attributes.

I Association rule mining algorithms are executed separately in eachdistrict obtaining different sets of rules: R1, R2, . . . , Rk.

I There are several attributes concerning/describing some aspects ofthe districts: at1, at2, . . . , atm

Proposal:

Use Meta-Association Rules to obtain interesting information

5

Page 6: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Proposal

…"R1# R2# Rk&1# Rk#

"Meta#database#

Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""

Meta&associa1on#rules#

6

Page 7: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Association Rules

I Data is usually stored in datasets D composed by transactions ti(rows) and attributes (columns).

I We call item to a pair 〈attribute, value〉 or 〈attribute, interval〉.

D i1 i2 . . . ij ij+1 . . . im

t1 1 0 . . . 0 1 . . . 0t2 0 1 . . . 1 1 . . . 1...

......

. . ....

.... . .

...tn 1 1 . . . 0 1 . . . 1

I Association Rules are expressions of the form A→ B where A, Bare non-empty set of items with no intersection.

I An association rule represents a relation between the jointco-occurrence of A and B.

7

Page 8: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Association Rules

I The support of an itemset A is defined as probability that atransaction contains the item

supp(A) =|t ∈ D : A ⊆ t|

|D|

I For assessing the ARs validity, the most common measures aresupport (joint probability P (A ∪B)) and confidence (conditionalprobability P (B|A)

Supp(A→ B) =supp(A ∪B)

|D|; Conf(A→ B) =

supp(A ∪B)

supp (A)

that must be ≥ minsupp and ≥ minconf resp. (thresholdsimposed by the user), that is, the rule is frequent and confident.

8

Page 9: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Association Rules

I An alternative framework is to measure the accuracy by means ofthe certainty factor, CF (A→ B)

Conf(A→ B)− supp(B)

1− supp(B)if Conf(A→ B) > supp(B)

Conf(A→ B)− supp(B)

supp(B)if Conf(A→ B) < supp(B)

0 otherwise.

I CF measures how our belief that B is in a transaction changes whenwe are told that A is in that transaction.

I Certainty factor has better properties than confidence and otherquality measures, in particular, it helps to reduce the number ofrules obtained by filtering those rules corresponding to statisticalindependence or negative dependence.

I When CF (A→ B) ≥ minCF the rule is called certain.

9

Page 10: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Fuzzy Association Rules

I I a finite set of items.

I Fuzzy transaction: a non empty fuzzy subset τ ⊆ I.

I An item i ∈ I will belong to τ with degree τ(i) ∈ [0, 1].

I An itemset A ⊂ I belongs to τ with degree

τ(A) = mini∈A

τ(i)

I A fuzzy association rule A→ B is satisfied in D ⇔

τ(A) ≤ τ(B) ∀ τ ∈ D

I This definition preserves the usual meaning of crisp association rules.

10

Page 11: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Fuzzy Association RulesExample:

I Set of items I = {i1, i2, i3, i4}I Set of fuzzy transactions

i1 i2 i3 i4τ1 0 0.6 0.7 0.9τ2 0 1 0 1τ3 1 0.5 0.75 1τ4 1 0 0.1 1τ5 0.5 1 0 1τ6 1 0 0.75 1

I τ2 is a crisp transaction.

I Some inclusion degrees are: τ1({i3, i4}) = 0.7, τ1({i2, i3, i4}) = 0.6and τ4({i1, i4}) = 1.

11

Page 12: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Fuzzy Association Rules

To assess fuzzy association rules, we employ a proposal based onquantified sentence evaluation using the fuzzy quantifier QM (x) = xrepresenting the quantifier “most” in the following way:

I The support of an itemset A is the evaluation of the quantifiedsentence “most of the transactions in D are A” where A is thefuzzy set defined as µA(τ) = τ(A).

I The support of a fuzzy rule A→ B, noted by FSupp(A→ B), isthe evaluation of the quantified sentence “most of the transactionsin D are A ∩B”.

I FConf(A→ B), is the evaluation of the quantified sentence “mostof the transactions of A are B” .

I FCF (A→ B), is computed using the fuzzy versions of support andconfidence.

12

Page 13: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Meta-Association Rules

Meta-association rules are association rules where theantecedent or the consequent can contain regular rules that have

been previously extracted with a high reliability in a highpercentage of the source databases.

13

Page 14: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Proposal

…"R1# R2# Rk&1# Rk#

"Meta#database#

Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""

Meta&associa1on#rules#

14

Page 15: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Crisp Meta-Association Rules

1. From each database a set of rules Ri is obtained.

2. We compile these rules in a new database D joint with theattributes at1, . . . , atm.

D r1 r2 · · · rn at1 · · · atmD1 1 1 · · · 0 1 · · · 1D2 0 1 · · · 0 0 · · · 1

......

.... . .

......

. . ....

Dk 1 0 · · · 1 1 · · · 0

3. We search meta-association rules that will involve the rulespreviously extracted r1, . . . , rn and the attributes at1, . . . , atmadded.

15

Page 16: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Why using fuzzy meta-rules?

I Crisp meta-rules are discovered by taking into account only if anassociation has been previously mined from the original dataset ornot.

I This means that, to build meta-rules, a rule mined with aconfidence of 0.9 has the same importance than another rule with aconfidence of 0.5.

⇒ It is convenient to consider the available measures, like theconfidence, to quantify the importance of the rule in the HOMprocess.

Solution:

Fuzzy Meta-Association Rules

16

Page 17: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Fuzzy Meta-Association Rules

1. From each database we have: the set of rules Ri and theirassessment value ∈ [0, 1] (Conf or CF ).

2. We compile these rules in a fuzzy database joint with the attributesat1, . . . , atm that can be fuzzy.

D r1 r2 · · · rn at1 · · · atmD1 0.8 1 · · · 0 0.6 · · · 1D2 0 0.3 · · · 0 0 · · · 0.7

......

.... . .

......

. . ....

Dk 0.9 0 · · · 0.5 1 · · · 0

3. Mining association rules in D will discover fuzzy meta-associationrules.

17

Page 18: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Crisp and Fuzzy Meta-Association Rules

Formally, we can obtain three types of meta-association rules:

I ri → rj where ri, rj can be rules or a conjunction of rules.For example: ri = ri1 ∧ · · · ∧ ris.

I ati → atj where ati, atj can be attributes or a conjunction ofattributes.

I ri → atj or atj → ri where ri, atj can be a conjunction ofrules and a conjunction of attributes resp., and they can bemixed. For example: r1 ∧ at2 → r3.

18

Page 19: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Experimental Evaluation: DataSet Description

I 22 Databases about crime related to the districts in the city ofChicago

I Number of transactions: min = 5694 and max = 22493.

I 6 types of attributes (around 300 items) in each database:

• Quarter of the year in which the incident happened.• Day period: morning, afternoon, evening, night.• Crime description according to police standard protocols.• Location description: street, residence, etc.• Arrest, if there is an arrest associated to the crime.• Domestic, if the crime happened in a domestic environment.

I Additional attributes about the districts:

• Number of students in the district: low, medium, high, veryhigh.

• Number of misconducts notified in the district: low, very low,medium, high, very high.

• Perceived safety index, obtained by means of surveys: low,medium, high.

19

Page 20: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Experimental Evaluation: Some Results

Example of obtained meta-association rule:

“IFnumber of misconducts = Very high

THENCrime-Desc.= POSS:CANNABIS ≤ 300GMS → Domestic = f

AND Safety-Index=Low”

with FSupp = 0.136 and FCF = 0.658.

That means that when there is a very high number of misconducts then

it is frequent and reliable to have a low perception of security and a

relation between the low possession of cannabis crime and its occurrence

in a non-domestic environment.

20

Page 21: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Experimental Evaluation: Some Results

100

1000

10000

100000

1e+006

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

#ru

les

minCF/minFCF

Chicago database

DDCF

0

2000

4000

6000

8000

10000

12000

14000

16000

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

tim

ein

sec

minCF / minFCF

Chicago database

DDCF

Figure: Left: Number of crisp/fuzzy meta-association rules (y-axis in logarithmicscale) vs minCF/minFCF (x-axis) when minSupp = 0.05. Right: Time in sec. formining crisp/fuzzy meta-association rules (y-axis) vs minCF/minFCF (x-axis) whenminSupp = 0.05.

21

Page 22: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Experimental Evaluation: Some Results

Antecedent Consequent (F )Supp (F )CF D DCF

Number-of-Misconducts=Very high (Crime-Desc.=POSS:

CANNABIS ≤30GMS→Domestic=f)AND

Safety-Index=Low

0.136 0.658 x X

(Crime-Desc. = ≤ 500$ →Domestic=f) AND (Crime-Desc.=TO

VEHICLE → Arrest=f)

(Location-Description=STREET

→ Domestic=f)

0.455 0.778 X x

(Arrest=t→Domestic=f) AND

Safety-Index=Low

Number-of-Students=Medium 0.064 0.7 X x

Number-of-Students=Low AND

Number-of-Misconducts=Very high

Safety-Index=Low 0.091 1 x X

Table: Examples of meta-association rules found in the City of Chicago dataset forminSupp = minFSupp = 0.05 and minCF = minFCF = 0.5.

22

Page 23: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Conclusions and Future Research

I We have have proposed different types of meta-association rules:crisp and fuzzy meta-rules.

I Fuzzy meta-rules take advantage of the assessment measuresprovided when mining rules from the original datasets.

I The databases considered have the same structure (i.e. the sameset of attributes).

• It would be convenient to address the problem of havingdatasets with similar attributes that are not related beforehand.

• The information should be semantically integrated allowing tolink attributes corresponding to similar semantics.

Future: Using a knowledge repository assisting the algorithm inmatching similar items.

23

Page 24: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

References

[Delgado et al. 2000] - M. Delgado, D. Sanchez, and M.A. Vila . Fuzzycardinality based evaluation of quantified sentences. Int. J. ofApproximate Reasoning, 23: 23-66, 2000.

[Delgado et al. 2003] - M. Delgado, N. Marın, D. Sanchez, and M.A.Vila. Fuzzy association rules: General model and applications. IEEETransactions on Fuzzy Systems, 11 (2), pp. 214-225, 2003.

[Ruiz et al. 2015] - M. D. Ruiz, J. Gomez-Romero, M. J.Martin-Bautista, D.Sanchez . Meta-association rules for fusingregular association rules from different databases. In Proc. of the17th Int. Conf. on Information Fusion, 2014.

24

Page 25: Fuzzy Meta-Association Rulesdecsai.ugr.es/~mdruiz/2015_IFSA-EUSFLAT-pres.pdf · 3rd July 2015. Motivation I Datasets are often distributed and are processed separately (several mining

Thank you. Any questions?

25