Transcript

Zhou Zhao, Da Yan and Wilfred NgThe Hong Kong University of Science and Technology

Mining Probabilistically Frequent Sequential Patterns in Uncertain

Databases

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

BackgroundUncertain data are inherent in many real

world applicationsSensor networkRFID tracking

Sensor 2: AB

Sensor 1: BC

Prob. = 0.9

Prob. = 0.1

C B A

Readings:

BackgroundUncertain data are inherent in many real

world applicationsSensor networkRFID tracking

Reader BReader C

Reader A

t1: (A, 0.95)

t2: (B, 0.95), (C, 0.05)

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

Problem Definition

Pruning rules for p-FSP

Early ValidatingSuppose that pattern α is p-frequent on D’

⊆ D, then α is also p-frequent on D

D

D1 D2

D11 D12 D21 D22

… … …… … …

If α is p-FSP in D11, then α is p-FSP in D.

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

Sequence-level probabilistic model

Sequence ID

Instances

Probability

s1 s11= ABC 1

s2 s21 = ABs22 = BC

0.90.05

DB: Possible World Space:

Prefix-projection of PrefixSpan

SID Sequence

s1 ABCBC

s2 BABC

s3 AB

s4 BC

SID Sequence

s1 _BCBC

s2 _BC

s3 _B

SID Sequence

s1 _CBC

s2 _C

s3 _

D

D|A D|AB

A B

SeqU-PrefixSpan AlgorithmSeqU-PrefixSpan recursively performs

pattern-growth from the previous pattern α to the current β = αe, by appending an p-frequent element e ∈ D |α

We can stop growing a pattern α for examination, once we find that α is p-infrequent

Sequence ProjectionSeq-Instances

Prob.

si1 = ABCBC 0.3

si2 = BABC 0.2

si3 = AB 0.4

si4 = BC 0.1

Seq-Instances

Prob.

si1 = _BCBC 0.3

si2 = _BC 0.2

si3 = _B 0.4

ASeq-Instances

Prob.

si1 = _CBC 0.3

si2 = _BC 0.2

si3 = _ 0.4

B

si

si|A si|B

Seq-Instances

Prob.

si1 = _BCBC 0.3

si2 = _BC 0.2

si3 = _B 0.4

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

Element-level probabilistic model

Sequence ID

Probabilistic Elements

s1 s1[1]={(A,0.95)}s1[2]={(B,0.95),(C,0.05)}

s2 s2[1]={(A,1)},s2[2] = {(B,1)}

DB: Possible World Space:

Possible world explosionProbabilistic

Elements

si[1] = {(A,0.7), (B,0.3)}

si[2] = {(B,0.2),(C,0.8)}

si[3] = {(C,0.4),(A,0.6)}

si[4] = {(B,0.1), (A,0.9)}Seq-

InstanceProb. Seq-

InstanceProb.

pw1(si)=ABCBpw2(si)=ABCApw3(si)=ABABpw4(si)=ABAApw5(si)=ACCBpw6(si)=ACCApw7(si)=ACABpw8(si)=ACAA

0.00560.05040.00840.07560.02240.20160.03360.3024

pw9(si)=BBCBpw10(si)=BBCApw11(si)=BBABpw12(si)=BBAApw13(si)=BCCBpw14(si)=BCCApw15(si)=BCABpw16(si)=BCAA

0.00240.02160.00360.03240.00960.08640.01440.1296

# of possible instances is

exponential to sequence length

ElemU-PrefixSpan Algorithm

Sequence Projection

pos suffix Pr.

0 _si[1]si[2]si[3]si[4]

1 B

Probabilistic Elements

si[1] = {(A,0.7), (B,0.3)}

si[2] = {(B,0.2),(C,0.8)}

si[3] = {(C,0.4),(A,0.6)}

si[4] = {(B,0.1), (A,0.9)}

Sequence Projection

Probabilistic Elements

si[1] = {(A,0.7), (B,0.3)}

si[2] = {(B,0.2),(C,0.8)}

si[3] = {(C,0.4),(A,0.6)}

si[4] = {(B,0.1), (A,0.9)}

Sequence Projection

A

Probabilistic Elements

si[1] = {(A,0.7), (B,0.3)}

si[2] = {(B,0.2),(C,0.8)}

si[3] = {(C,0.4),(A,0.6)}

si[4] = {(B,0.1), (A,0.9)}

Sequence Projection

A

Probabilistic Elements

si[1] = {(A,0.7), (B,0.3)}

si[2] = {(B,0.2),(C,0.8)}

si[3] = {(C,0.4),(A,0.6)}

si[4] = {(B,0.1), (A,0.9)}

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

Efficiency of SeqU-PrefixSpanEfficiency on the effects of

size of databasenumber of seq-instances length of sequence

Efficiency of ElemU-PrefixSpanEfficiency on the effects of

size of databasenumber of element-instances length of sequence

ElemU-PrefixSpan v.s. Full ExpansionEfficiency on the effects of

size of databasenumber of element-instances length of sequence

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

OutlineBackgroundProblem DefinitionSequential-Level U-PrefixSpanElement-Level U-PrefixSpanExperimentsConclusion

ConclusionWe formulate the problem of mining p-SFP

in uncertain databases.

We propose two new U-PrefixSpan algorithms to mine p-FSPs from data that conform to our probabilistic models.

Experiments show that our algorithms effectively avoid the problem of “possible world explosion”.

Thank you!


Top Related